Scalable Distributed System Design Principals

M Adnan A
26 min readNov 3, 2020

Do you ever wonder how software like Uber, Facebook, google are designed, I am not talking about initial design when they were designed for the first time, probably then they were designed with the simple monolithic approach but currently they are most reliable, scalable and very well-performing software.

In 90s software weren’t as scalable or reliable, I saw many times websites going down on peak hours or for maintenance but websites like Amazon, Paypal can’t even afford to go down for even a minute so there are means to keep them up even during upgrades or maintenance.

We all are familiar with basic system design principles like Caching, Replication, Sharding, Hashing, Load balancer, Client-Server model etc but it all depends on which of them you actually using and how well they being used. Good thing is to implement or achieve these principles, it’s super easy nowadays, thanks to so many ready to use open source tools/libraries but obviously, you ought to know before you can. As the problem is they are vaguely understood, there are no hard and fast rules and in order to use them properly, you need to have a proper understanding of them. As sometimes you need to trade off certain things to achieve certain functionality so you have to choose which is more important for your system/users if you can trade off more space for caching then obviously you can achieve better performance but obviously there is a risk of rendering old data with it if not properly implemented.

Client-Server Model

Probably the most famous and used model, every system implement this at a certain extent. In this model, one system (client) request and another system (server) serve, in a simple example, think of when you decided to click on some link of my profile to land on this article, your system browser was Client and the LinkedIn server was a server which serves you this article (which you probably already regretting clicking). Let’s just go through what would have happened at a high level.

Browser (Client) → send DNS request and the DNS server responds with IP address — -> IP address is like the address of Server — -> Browser (Client) sends a request to that particular IP address via HTTP protocol (which I will explain next but consider its language between machines on the internet) → Http request normally go through port 80 (port is like a door if IP is like an address of the building and that building has many doors/rooms then door 80 is the one HTTP request normally knocks) → somebody at that door will be listening always and based on request responds accordingly and HTTP response get filled with text/images of this article in HTML/CSS/JS format and Browser render that response.

In this model, the client always initiates the request and the server just responds to those. That’s how almost every application is, most applications I encountered have database server or web services or it could be like this, your application goes to web services server and that web services which are server to you but to serve your database request, forward request to the Database server and becomes a client to the database server, so each application is connected to some other machine/system and serving or requesting.

Latency & Throughput

I guess it’s widely (mis)understood but quite pivotal measures of performance of the system. Latency is about how long data takes to move from point A to B (client to Server or back), it could include multiple hops, routers, proxy servers etc but the final source and destination matters in the end. It’s very important to know when working on performance tuning and that’s why Content Delivery Network is there to reduce latency for different geographical located clients from servers. It’s obvious if we have a client in Europe trying to access server in America, data transfer will take more time so no matter how well your web service is performing, the data transfer time will kill the speed, that is why CDN was introduced and is widely used as with CDN you can deploy copies of your web service on different locations so if the client will always be served by the nearest available server.

Throughput is more about capacity and processing of data in a given amount of time, if you have 5000 users accessing your website and let’s suppose your website home page is like 500KB, It means your server suppose to serve 5000 x 100KB of data which is about 250MB of data throughput required to serve that but normally it’s more than that considering the images/media so if you have that much of capacity and you got more users like 10000 then obviously it’s gone beyond actual capacity so either expect timeout and slower response times.

To achieve better latency and throughput, even type of hardware matter a lot, if you are using a high-speed network connection, high speed hard drive as reading/writing data into disks with low seek time matters as well when dealing with high-end IO operation based applications.

Nevertheless, having high throughput and low latency is not all you need to ensure good performance of your server, it obviously matters a lot but other factors also matter, like instead of having 1 high-end server, have multiple servers which will give you better reliability and performance as well. I’ll be discussing later in this article how we can divide our traffic to multiple servers.


It’s a very important factor for any system, how the system is tolerant for database failures, web servers failure or any dependency failure if there are any alternate ways for your system if any specific dependency is failed, it’s very important to design that considering sooner or later it’s going to happen.

It won’t matter how fast and good your system is if it is not available to the customer at all times so as System designer you need to ensure if anything fails which could happen for sure how your system should react, is there any offline operations your system can perform whilst network is disconnected and sync it back when networks come back on?

Imaging if Paypal is down for 5 minutes, how much businesses like eBay will lose, how much their sellers will lose business and the main thing is trust, which is kind of being taken for granted as no one expects that anymore and there is a reason for that, it has been achieved by a designing system for failures.

There are terms called NINES

More NINES means more availability, even 99% of uptime is not enough nowadays, as it means 3.65 days of downtime, can you imagine if Azure/AWS or Google being down for 3 days, it will certainly feel like going back to the stone age if cloud services are only 99%, it sounds good but not enough and can be better for sure.

The image below from Wikipedia shows more NINES the marrier, ideally, I think system designer should aim for 5 NINES 99.999% or more as that’s what is Gold standard and normally regarded as HighAvailability among the system designers.

These uptimes are defined in ServiceLevelAgreement (SLA) of your service providers like Amazon, Google, Azure, these are kind of guarantee they try to keep up and I guess mostly they do but obviously, it doesn’t mean they are playing god and can actually just do what they say so even they have IF FAIL plan, so if they can’t keep SLA then they give you refund so it’s best if you read carefully your SLAgreements.

It sounds good and obviously, everyone prefers to have maximum availability but it comes with the price and it’s not cheap and easy to achieve. The easy answer for how to achieve, Redundancy, you have to have multiple servers/instances so in case of failure other instance can take serve the requests. When you talk about multiple instances, obviously you also need load balancer or Leader Election to achieve Active Redundancy which I will discuss later so there are different ways to achieve that but there is always some trade-off so you have to choose which components/services of your system needs to be highly available and which you can compromise with low availability as well so critical parts of system needs to be identified.


Caching can be done at any level, it can be done to save round trips to different servers, or to save CPU processing but there is a trade-off, the major trade-off is memory but if not expired and refreshed on time then another trade-off is old data which might not be good for business so has to be very careful about that as it could compromise data integrity.

Let’s say you have a system which has different tiers and you want to implement a caching. If data is quite static, like FAQs, the company’s About us or such static contents which hardly changed, that can be cached for a long time on the server level and then to client even, so load from the database into server cache lets say very popular nowadays Redis cache and then to client browser but if data is regarding company’s products, which could change often like their availability in stock, pricings but obviously their other information like images and description doesn’t change often that can be cached for longer than but maybe not good idea to cache their pricing/inventory information.

You can also cache high-end operations based data but the most important thing is where you cache, you may not want to keep such data at client side so server-side caching makes more sense provided expiration of that data is well defined and tested.


There are two types of proxy servers, Forward and Reverse, the one we as system designer needed is Reverse proxy but I’ll briefly describe Forward as well to clarify the difference.

Forward Proxy is a mechanism for clients to hide their identity from servers so if the client wants to send a request to the server, the server would know client’s details from their request header like IP, browser type etc it could be obviously dangerous if you end up on malicious website/servers so its best to hide your identity where possible and needed, you can also access blocked websites or if for a different region different membership fee (as Netflix does) or airline tickets being charged then obviously it could save you some money as well, obviously the down point is its performance overhead but as I said before there is always a trade-off.

Reverse Proxy is for servers to hide their identity so when a client goes to DNS server to get IP of the server, instead of the server IP/location they end up getting proxy server IP, where servers can do numbers of pre-operations like logging, preliminary authentication checks, caching, load balancing, secure for DDoS attacks and such that. With low latency and high throughput between Reverse Proxy server to actual servers can minimize the performance overhead it may cause but there is a number of advantages so its worth it.

Following code of node.js shows, the simple server listens at port 3000

const express = require('express');
const app = express();
app.listen(3000, () => console.tog('Listening on port 3000.'));
app.get('/hello', (req, res) => {
console.log(g.headers); res.send('Hello Adnan');

Below code is conf code for Nginx to set up very basic Reverse Proxy server which listens at 8081 and forward request to a server listening at 3000

events { } 
http {
upstream nodejs-backend { server localhost:3000 }
listen 8081;
location / {
proxy_set_header system-designer-principals true;
proxy_pass http://nodejs-backend;

Load Balancer

It might sound a worry for network/IT guys but being a system designer and currently, at age of Kubernetes and dockers availability in every cloud service, you can take advantage of this and set up very easily. Every system with multiple instances could use load balancers, there are plenty of ways of doing it and even there are hardware-based load balancers which are specialized hardware only for this purpose and obviously, there are software-based load balancers which are cheaper and more flexible ones but there is always some trade-off of one on another so can’t argue one is better than another by all aspects.

As the name implies, its simply distribute traffic to different servers/instances and there are several distribution techniques and probably most popular is RoundRobin technique which equally distributes (unless weight is defined) so if there are 3 servers each server will get requests one by one in order. But in case one server is more high-end than the others, you can define the weight of that server so that gets more traffic than another.

Another way is Randomly to distribute traffic but obviously, it could cause issues as one server could get more traffic unnecessarily so slow response from that server.

Load balancers also check the health and status of each server so in case there is any server not performing well, it could get less traffic or no traffic depending on how bad it is performing.

Another very popular way is Hashing the IP address of the client and redirects to a specific server, its very useful if you caching results so you can always get a response from cache (if applicable) for the same client as that can’t be achieved if selecting the server randomly or in sequence.

Path-based requests mean different server(s) serves based on request path or you can say a type of requests like for Orders based requests you can choose certain servers and for Payments different servers, it makes more sense so you can set up certain servers accordingly

You might need to consider multiple load balancers as an idea is to not let servers get overwhelmed so if you got 5 servers to serve through load balancers which are in good shape to serve but your load balancer is not big enough to handle those many requests obviously no matter adding more servers won’t help.

Below is simple server.js code run a server on the specified port

express = require ('express') ; 
app = express();
port = process.env.PORT; I
app.listen(port, () => console.log('Listening on port ${port}.');
app.get('/hello', (req, res) => {
res.send('Hello from port ${port}.\n');

Nginx conf for load balancing which will listen at 8081 and can redirect to ports 3000 and 3001 as to run this example you run aforementioned code on these ports and you will notice on port 3000 it will get 3 times more requests than port 3001 servers. Voila!!!

events { } 
http {
upstream nodejs-backend {
server localhost:3000 weight=3;
server localhost:3001
server { listen 8081;
location / {
proxy_set_header system-design-tutorial true;
proxy_pass http://nodejs-backend;

Load balancers are critically important for large scale systems and is a must needed at some stage sooner or later.

Load balancer comes with their own trade-offs like think if the server has cached data and for the first client but for the second request it goes to server 2 which doesn’t have its data cached so again long processings would require and kill the purpose of caching and each server has big piles up of unused cached data as different servers serving to each client’s different requests.

To fix this problem and to ensure the server gets associated with certain clients, it can be done by hashing technique on clients and server which assign a server to a client by generating token against each client so every time same token gets generated and assigned to a specific server. But what if any server dies? so all hashing mechanism will be disturbed for each client, here comes consistent hashing or rendezvous hashing. When a server dies, its cached client data can’t be maintained but with these mechanisms, other servers with their clients’ data don’t get affected much or mostly.

right Database

So far I never came across any application being commercially used which doesn’t have any sort of database, so it’s very very likely you need to have some database but nowadays there are so many options to choose database from, actually difficult part is being a system designer to choose the right database and design it for optimal utilization.

At a very high and oversimplistic level, there are 2 classifications used to describe different types of database, SQL and NoSql but I’ll go beyond that and would like to divide them into 6 paradigms

Key-Value Database

Popular databases like Redis and Memcached, data stored in a structure like a javascript object, as the name implies, set of keys (unique), associated with values (object). The big difference is all data is stored/loaded into Memory instead of Disk which caters a big speed advantage over disk-based databases but obviously, it cost a lot if you choose to store all data into this database, it might sound crazy but Twitter and Github already doing it. Even though it’s fast and you can afford, still doesn’t mean can be used universally, its best for Caching, PUB/SUB, Leaderboards/Live Dashboards data. Wouldn’t suggest as primary database.

Wide Column Database

Popular databases like Cassandra and HBase, in terms of its data structure it’s kind of a hybrid version of the Key-Value database and conventional Tabular format database. Each Key is called Row Key and Column space is multi-column data but it’s not like the number of columns as there is no schema. Unlike Key-Value database you can query data using CQL but can’t use JOINS. If databases definitely needing scaling up or replications so it’s decentralized and can be easily horizontally scaled up. Best for Time-Series based data, High write and low read type operational data. Wouldn’t suggest as primary database.

Document Database

Popular databases, MongoDB, DynamoDb and CouchDb. in terms of data structure, these are documents (JSON) with no schema but they are arranged in the specific collection and could be linked with other documents. It’s could be used as Relational-ish but no schema and no Joins as well so complex queries could be super challenging. Best use for Content Management like blogs, IoT, Mobile games, Analytics but not perfect for relational data like social websites like LinkedIn, Facebook.

Relational Database

A most popular type of databases, probably even now, used for mainstream databases, SQL Server, Oracle, MySql, PostgresDb. As the name implies, its relational data structure, Schema yes, Joins yes, Relations yes. Queryable yes. So in terms of what we need from databases normally, All boxes are Checked but too good to be true if there is no trade-off. Too expensive to store data, Need normalization, Slower than other types of databases. The best advantage for Relational databases are they are ACID based for transactional data which ensures in case of failure, glitch or network problem even, partial, bad data can’t be committed. When you have to add data into multiple tables, imagine half of the tables got data and half not and there goes network or some error occurs, in such cases, there can be so much orphan (incompleted) data to avoid that these databases are ACID based. These databases are normally difficult to scale but if you do it comes with heavy price and complexity. It can be universally used and it used to be but ideally shouldn’t be used for unstructured data as now there are many other options available.

Graph Database

The popular database would be neo4j database, very natural for relationships in data, especially for many to many relationships, data is queryable, best for graphs, recommendation engines and very useful where data is very relations and a lot of joins are required as in normal relational database though its supported but more join means more cartesian product performance which is performance killer but graph database is quite faster when a lot of joins are required.

Elastic Database

It’s more like an elastic search than a database and popular databases would be Lucene and Solr database. Imagine if you building a google-like search engine, you have a huge amount of data and user can query with few words only which matches with tons of data but you have to choose based on most relevance and ranking, an elastic search is an answer. Magic is in Indexing, the way it does indexing on contents is absolute beauty so searching only happens on index instead of whole text which makes it super fast to bring best results in fractions of seconds. Best for search engines and Autocomplete.

Replication & Sharding

Databases are the backbone of any applications so if the database is unavailable, the system is unusable if the database has high latency and low throughput, the system is meant to be slow. So being a system designer must ensure the database is always available and have low latency and high throughput to deal with database operations.

The replica is a copy of the database in case the main database is down, replication is the mechanism of synchronisation of the main database to the replica. Replica always has to be updated when updates happen on the main databases, like when records are being added into the main database, same data needs to be added into replica database, in the case due to some problem in the main database, complete data transaction couldn’t be completed on the replica, main shouldn’t keep that data as well to ensure both databases has same data obviously you can retry the failed transactions to ensure if there was any glitch but replicas always have to be a same copy of the main database otherwise it won’t flawlessly take over the operations when the main database is down.

Let’s discuss other than ensuring no downtime form database end, where else we can take advantage of replicas, suppose you have two database server, 1 in the UK and another in Pakistan, your UK users see data coming from UK servers and obviously, Pakistani users see data from Pakistani server, you from the UK do the updates in UK database and its replica in Pakistan gets updated so same data can be accessed from Pakistani users without Pakistani users has to switch to UK website or your application has to make a round trip to UK database, so once data is sync all users from Pakistan have access without any round trip, now imagine 1 million users in Pakistan had to access data from UK server if not replicated, 1 million round trip, now instead of that 1 round trip Replica server made and saved 1 million round trip.

The replica is full-fledged database copy which might not be required always considering some local data in the database would also be unnecessarily replicated so to avoid that here comes Sharding.

Sharding is a partial copy(ies) of the database, so you make multiple small copies of the main database depending on need. Imagine if you have one database server and your user base is increasing, you either do horizontal scaling or verticle scaling and your replicas also being scaled accordingly, by the way, verticle scaling is quite expensive and synchronization could be a nightmare for each database even for the network. Sharding helps splitting up data into multiple databases so one part of data like products data could go to one database and events data could go to another database, there are different types of sharding strategies.

Sharding on basis of rows

Suppose we got the relational database and we can split data into different tables, we can choose to store particular clients data into one shard/partition of the database, like clients names starting from A to F data goes into Shard 1 and rest accordingly. This strategy is definitely splitting data but may not be the best way as names starting with XYZ data, probably hardly any data and names starting with ABC would probably get more traffic and data so quite uneven traffic would be split which will still not solve the scalability problem.

Sharding based on region

Another way of splitting data could be regional-based, but considering more traffic from one region than another like traffic coming from China or Russia could be way more than traffic coming from Pakistan or depending which region has your more users.

Sharding based on Hashing

Hashing for the rescue, in this strategy, you would be generating hashing to ensure uniformity to determine what shard would get data after using a hashing function, as described above in load balancing and hashing to choose next server, same sort of problem could happen here if any specific shard is down or removed and new shard needs to be added and you can use consistent/Rendevous hashing mechanism.

For implementation, we will have two shards, one proxy server which will redirect the request to shards and is the mediator between the application server and database shards as moving forward when you going to have multiple shards, you don’t want to bombard your application server with to handle and choose shards, lets shard (proxy) server handle this.

Leader Election

We are talking about redundancy, multiple servers, load balancers means multiple servers handling tasks at once which is good for scalability but could cause transactional issues like sending multiple payment requests from different servers, if the tasks are performing individual elements of a complex calculation in parallel, the results need to be aggregated when they all complete. In a cloud-based system that implements horizontal scaling, multiple instances of the same task could be running at the same time with each instance serving a different user. If these instances write to a shared resource, it’s necessary to coordinate their actions to prevent each instance from overwriting the changes made by the others.

Solution: A single task instance should be elected to act as the leader, and this instance should coordinate the actions of the other subordinate task instances. If all of the task instances are running the same code, they are each capable of acting as the leader. Therefore, the election process must be managed carefully to prevent two or more instances taking over the leadership role at the same time.

The system must provide a robust mechanism for selecting the leader. This method has to cope with events such as network outages or process failures. In many solutions, the subordinate task instances monitor the leader through some type of heartbeat method, or by polling. If the designated leader terminates unexpectedly, or a network failure makes the leader unavailable to the subordinate task instances, it’s necessary for them to elect a new leader.

Implementation wise, you can use the singleton pattern, a dedicated process to choose a leader and perform the tasks.

Peer to Peer Networks

No, I am not going to talk about how to download illicit software or free MILF videos from the internet through torrents rather using the same mechanism for your company to transfer BIG files. Why do we need that? Imagine we have big backups/database files to sync, imagine Netflix, Youtube has big videos to transfer to their servers as a replica, Big banks got security footages needs to be transferred to their backup servers

For sake of example, imagine you have 5 GB file, Network throughput is also 5GB per seconds, meaning each file can be transferred to 1 server within 1 second, pretty fast, now imagine there are 1000 servers, now to transfer 1 file to 1000 servers would take 17 minutes, pretty slow, huh, now if you scale out to 10 servers which going to distribute to 1000 machines it comes down to 1.7 minutes considering each server has that file, it’s going to be a nightmare in terms of cost to bring down transfer time to 10 times less. it’s good but can we have a cheaper and better solution, the answer is PeerToPeer network solution.

In this solution, we first split files into chunks then transfer chunks to some other machines called peers, each peer than transfer that chunk to next peer and to manage which peers has which chunks and to which they can transfer, we have Trackers.

It’s a big topic but the basic idea I wanted to share, in case data needs to be transferred at super high speed, P2P solutions can help. In terms of applications, there is another very good example of P2P solution developed by Uber and available on Github called Kraken, quoting “Kraken distributes 20K 100MB-1G blobs in under 30 sec.”

Polling and Streaming

Streaming you probably already familiar with due to streaming youtube/Netflix videos we watch almost every day but Polling is something we kind of already using when client sending a request to server and server responds that’s polling only difference is you do that in regular intervals like every few seconds or something.

Polling is useful when you don’t have to get information from a server on regular or very frequent intervals like if you want to perform some operation after every 15 minutes or something like this polling is a good solution to use for communicating between client and server.

Streaming is bit expensive for servers as they have to keep channel open between client and server, normally videos are streamed as they get to transfer without any interval or interruption in this case. chat applications, video streamings, betting websites, sports scoring applications, high end forex applications.

As I mentioned before streaming is quite expensive as it keep channel open and server can handle few simultaneous connections actively open, as if client doesn’t want to keep it open and server keep it open then new connections would be sluggish and to keep this up, scaling would required and more users means more servers, imagine video streaming websites. Polling may sound bit manual and interruption based solution but its quite light weight, May not work for video streaming but can work for many other places like scoring/exchange rates updates where its not very vital that it has to be very frequent.


const axios = require('axios'); 
const WebSocket = require('ws');
function createMessagingSocket()
return new WebSocket('ws://localhost:3001/messages1');
function getMessages() { return axios.get('http://localhost:3001/messages').then(res=>;}
function sendMessage(message) { return'http://localhost:3001/messages', message); }
module.exports.createMessagingSocket = createMessagingSocket;
module.exports.getMessages = getMessages;
module.exports.sendMessage = sendMessage;


function getRandomInt(max)
{ return Math.floor( Math.random() * Math.floor(max)); }
module.exports.getRandomInt = getRandomInt;


const express = require('express'); const expressWs = requireCexpress-wsTh 
const app = express(); expressWs(app);
const messages = 0, text: 'Welcome!', username: 'Chat Room'}]; const sockets = [];
app.listen(3001, () => { console.log('Listening on port 3001!'); });
app.get('/messages', (req, res) => {res.json(messages); });'/messages', (req, res) => {
const message = req.body; messages.push(message);
for (const socket of sockets)
{ socket.send(JSON.stringify(message)); }
});'/messages', socket => {
socket.on('close', () => { sockets.splice(sockets.index0f(socket), 1);


const helpers = require('./helpers');
const messagingApi = require('./messaging_api');
const readline = require('readline');
const displayedMessages = {};
const terminal = readline.createInterface({ input: process.stdin, });
terminal.on('line', text => {
const username = process.env.NAME;
const id = helpers.getRandomInt(100000); displayedMessages[id] = true;
const message = {id, text, username}; messagingApi. sendMessage(message);
function displayMessage(message) {
console.log( '> ${message.username}: ${message.text' );
displayedMessages[] = true;
async function getAndDisplayMessages() {
const messages = await messagingApi.getMessages();
for (const message of messages) {
const messageAlreadyDisplayed = in displayedMessages;
if (!messageAlreadyDisplayed) displayMessage(message);
function pollMessages() { setInterval(getAndDisplayMessages, 3000); }
function streamMessages() {
const messagingSocket = messagingApi.createMessagingSocket();
messagingSocket.on('message', data => {
const message = JSON.parse(data);
const messageAlreadyDisplayed = in displayedMessages;
if (!messageAlreadyDisplayed) displayMessage(message); });
if (process.env.MODE === 'poll') {
pollMessages(); }
if (process.env.MODE === 'stream')
{ getAndDisplayMessages();

Aforementioned code is good example for chat application with polling and streaming messages, its just for example to understand better.

Rate Limiting

It’s very relevant, if you are developing APIs for outside of your company use, systems like Uber, Lyft, their APIs are publicly exposed and need to ensure its not going to be misused by users that they start throttling or started a DoS or DDoS attacks by not limiting number of requests from same machine in given time interval. As it would overload your servers always if from some machines requests are allowed to bombard the servers. Then no matter how many scaling you did or how much efficient code you written at certain stage your servers will be overloaded and actual users won’t be able to get served which will definitely hurt the business which obviously nobody wants.

To implement this, I normally prefer to use key-value database like Redis, log each request in Reids cache as its fast, before forwarding the request to database or application server, setup proxy server where all these rate-limiting logic can be applied, check if user already requested under the limit if not, reply back to try later unless they keep hitting then best is to temporarily ban the IP as it might be DoS attach and you may not wanted to entertain every request by going through your rate-limiting logic.


const database = { [index.html]: '<html>Hello World!</html>', }; 
module.exports.get = (key, callback) => {
setTimeout(() => {
callback(database[key]); }, 1000);


const database = require('./database'); 
const express = require('express');
const app = express();
app.listen(3000, () => console.log('Listening on port 3000.'));
// Keep a hash table of the previous access time for each user.
const accesses = {};
app.get('/index.html', function(req, res) {
const {user} = req.headers; i
f (user in accesses)
{ const previousAccessTime = accesses[user];
// Limit to 1 request every 5 seconds.
if ( - previousAccessTime < 5000) {
res.status(429).send('Too many requests.\n');
// Serve the page and store this access time.
database.get('index.html', page => {
accesses[user] =; res.send(page + '\n');

Aforementioned code is bit naive but kept it simple for understanding the concept behind limiting access for limiting 1 request every 5 seconds.

Logging and Monitoring

I can’t stress enough how important and useful that is, regardless of the scope of the system whether it’s highly distributed or small to the mid-range system, it’s applicable on all type of system. Bigger the system is more important that becomes, it’s a very small price system pays on performance but the advantage is too big to sacrifice. Imagine you have a system where the user pays for the service or product, the user claims he paid already but some due to some glitch in network or system, he couldn’t get the service/product confirmation page. If the system is not maintaining the logs, how would you settle the dispute, especially where you have millions of users and millions of transactions happening? It will be like finding a needle in a haystack.

Logging is not only about recording events, but you can also log, exceptions, errors, processing times, request receive and response times, if calling some microservices or external system then worth noting down request time to know latency in peak and off-peak hours, a number of requests handling, you could know peak hours so you can scale during those hours only.

Monitoring is only possible if you are logging well, there are even systems which makes life easier when you want to monitor, like AppDynamics, Azure Application Insights, Relic and many more, which could help you visualize your boring logging records, they actually maintain logs of their own. Monitoring is very important to ensure, system health at all times if the system down what was the root cause to avoid those issues in future.

Pub/Sub Pattern

Publish and Subscriber pattern is very important for distributed or any system which has elements whose reliability are very pivotal for the business or the whole system. Normally in systems requests goes from client to server, in the case before the server could respond, the client got disconnected or server gets down during processing the request, imagine request was about processing the payment or some exchange rate information or insurance of ticket or multi-million banking transaction.

There are four main entities, Publisher, Subscriber, Topic and Message.

Publisher could be considered as a client which is sending requests.

Subscriber could be also considered as a client but at the same time as the server also as it sends a request to take a message from Topic and also serve that request.

Message is simple a request

Topic is something which makes this pattern possible and brings persistence and reliability to the system. It’s like a tunnel where message resides between Publisher and Subscriber, so Publish send the message to Topic instead of Subscriber, Subscriber is busy serving messages so once it has capacity for another message, it goes to the topic and takes a message from it, serve it accordingly.

This solution has its own issues, imagine, Message is picked up by Subscriber but acknowledgement of receiving the message is not sent back to Topic tunnel/queue/bus. To deal with such situation that message stays in Q and get picked up again but to ensure it doesn’t cause any duplicaiton/redundency, or you have some operation which increased something then it cause increasing twice like count, transactions, post likes, video watch counter increased. You can use different checks to ensure it doesn’t happen, or use shared cache of requests to check if request has been already executed. There are some idempotent operations which doesn’t make any different no matter how many times they have been executed like setting status to complete or anything but issue could happen if messages are executed in wrong order like if first message is pending and then complete and due to wrong order status would be complete first then pending.


const axios = require('axios'); 
const WebSocket = require('ws');
function publish(message, topicId) { return'http://localhost:3001/${topicId}' , message); }
function subscribe(topicId) { return new WebSocket('ws://localhost:3001/$topicId }'); }
module.exports.publish = publish;
module.exports.subscribe = subscribe;


const express = require('express'); 
const expressWs = require('express—ws');
const app = express(); expressWs(app);
const sockets = {};
app.listen(3001, () =>
{ console.log('Listening on port 3001!'); });'/:topicId', (req, res) => {
const {topicId} = req.params;
const message = req.body;
const topicSockets = sockets[topicId] || [];
for (const socket of topicSockets) {
socket.send(JSON.stringify(message)); }
});'/:topicId', (socket, req) => {
const ItopicIdl = req.params;
if (!sockets[topicId]) sockets[topicId] = [];
const topicSockets = sockets[topicId];
socket.on('close', () => { topicSockets.splice(topicSockets.index0f(socket), 1);


const messagingApi = require('imessaging_api'); 
const readline = require('readline');
const TOPIC_ID = process.env.TOPIC_ID;
const terminal = readline.createInterface({ input: process.stdin, });
terminal.on('line', text => {
const name = process.env.NAME;
const message = {name, text}; messagingApi.publish(message, TOPIC_ID);


const messagingApi = require('imessaging_api'); 
const TOPIC_ID = process.env.TOPIC_ID;
function displayMessage(message) {
console.log('> ${}: ${message.text}' }
function streamMessages() {
const messagingSocket = messagingApi.subscribe(TOPIC_ID);
messagingSocket.on('message'),(data) =>{
const message = JSON.parse(data); displayMessage(message); });

Sorry I am not explaining the code as code is here only for reference and if you are reading distributed system article on internet, I believe you are enough capable of understanding the code yourself.

Nevertheless, the topic is quite large and open ended and mostly you find different solutions and as one size doedsn’t fit for all, you have to think and find the best solution for your system.



M Adnan A

Love writing, learning, sharing and love being sarcastic :)