What every distributed application architects ought to know

10 min readNov 26, 2018

I have been developing applications for over a decade. Back then, I was extremely interested in distributed applications however; it was not as easy as it is today. The practices, tools and technologies associated with developing distributed applications have improved immensely, making distributed application development easier than it was. Back then, the technologies used mainly COM+, RPC, CORBA etc. Nowadays, the communication technology is mainly centered around web services. To me, building an application is like building a structure out of Lego blocks. The application should be split into blocks and shouldn’t be PITA to integrate those blocks. In this article, I will highlight some common myths regarding distributed applications, the best practices and my favorite design patterns to build reliable and scalable distributed applications.

Firstly, I will debunk several myths that people believe when developing an application/service and also provide some insight in how to overcome these issues and what we ought to do as good architects.

Network is (un)reliable

Problem: Nowadays, when communicating with databases; we have to communicate through web services. If the network is down, the application is unable to work. This is an unforeseen issue to some extent but I believe that it is our responsibility to gracefully react in this situation and reach a solution.

Solution: We can combat this issue by implementing the retry pattern. If the network had a timeout or didn’t respond at all; we can retry 2 to 3 times to assure ourselves that it wasn’t a glitch. In most cases, the cloud server machines able to heal themselves.

Nowadays, the queue management systems are considered the ‘norm’ and by implementing these to communicate with different services, can bring reliability and allows the application work offline for a while.

2. Zero latency

Problem: Round trip interval between client and server is an overhead. Personally, if the service ping time is more than 100ms, the positioning of the server should be reconsidered and recommended to be closer to the clients.

Solution: Your client application should not be chatty by sending too many requests. By gathering information and predicting the data needed beforehand, less calls can be made to the server thus decreasing the chance of overhead.

A content delivery network (CDN) is able to solve the latency problem. It is a system that not many people utilise. You are able to choose the cloud provider with a data centre that is nearest to you.

3. Infinite Bandwidth

Problem: Normally, bandwidth is not an issue until the application is streaming videos, performing VoIP calls or dealing with BLOBs. Nowadays, when using simple data driven applications with methods that allow your application to convert data into big lists and putting conditions into it after retrieving whole chunk can be quite expensive task.

Solution: I would suggest to use an aggregate . An aggregate is a pattern in Domain-DrivenDesign according to Martin Fowler. This involves dividing the chunks into domain and query only the particular domain data needed instead of a mixture of all data.

Command Query Responsibility Segregation (CQRS) is another pattern that aids this issue. This pattern suggests that the database should be split into models i.e. the Write model and the Read model. Create, Update and Delete are assigned to the Write model and Retrieve is assigned from the Read model with data being synchronized between the two models.

4. Secure network

Problem: Normally, the network is not the responsibility of the designers or the developers so it is not a main issue for us. Even cloud applications or virtual private networks (VPNs) cannot take network security for granted.

Solution: Networks are usually never secure so the application designer has to always keep that in mind and design accordingly. Requests should be authenticated and keys must be well encrypted. Another issue to keep in mind is when using third party libraries and using these to improve your application can result in making your application more vulnerable to attacks and security issues.

A solution to this is to secure your configuration files using the Cryptographic Message Syntax (CMS) standard.

5. Topology not changed

Problem: Network topology has the ability to change drastically, especially in a cloud based environment. New servers and instances are added all the time to ensure reliability and availability of servers for your clients. Power outages can occur to data centers which leads to the whole cluster being shut down. An advantage in using the cloud environment is that they usually have 99.9% i.e, 26 seconds of downtime in the entire month. If your server/instances are not replicated on other machines, then this level of service cannot be promised.

Solution: I would suggest to never hardcode your IP address. It is common practice to add an IP address into the configuration file or database but it is not good enough. The transparency needed for this task can be done by using the DNS and Service Bus framework. Another suggestion would be to always test your client’s communication success by shutting down instances/server access and see the results to ensure.

Aforementioned points are mostly discussing how we should not take the network for granted and how to prevent failure in application handling should the network fail in any of the scenarios listed above. A good distributed application designer has to always keep in mind that the network can fail at any time and develop their application to withstand these failures whilst still being available and scalable.

In past you must have heard about SOA service oriented architecture.Microservices is not much different from it except some rules it has to follow like split into small & independent units of application.

Microservices:

A microservice architecture consists of a collection of small, autonomous services. Each service is self contained and should implement a single business capability.

Why moving away from Monolith services to Microservices

Scalability & Availability — two primary requirements of every distributed application. Imagine having one service being used twice as much as other services — the server is spending all its time processing that particular service request whilst neglecting all other lightweight services and being sluggish as a result of this. Instead, why not split that service and put it into a different server or container instance. Those services could be added into multiple container instances and load balancing can occur to improve the distribution of the workload. Lightweight servers can be hosted separately.
Different technology stack Almost every company has systems build with different technologies and people with different skill sets. With the microservice architecture, you are not limited to hiring employees with a certain skill set only; instead, you can host different services built using different technologies that are able to seamlessly communicate with each other using this architecture.
New features can be hosted as separate services on separate server with no downtown for existing services.

Microservice Myths:

With the microservice architecture, it is not all about the small and manageable code base. Splitting code into different business entities is not a new approach for microservice architecture. Microservices allow the structure to be divided not only based on Domain/Business logic but also on the logic that it can be split based on traffic — thus allowing more scalability for the applications.

A Failing service doesn’t impact on others, it does, let’s just say your authentication service is down, how would rest of the application users get authentication tokens. But it prevent your application to be completely inaccessible, as your other services are working so your public pages can get data and application can gracefully handle one small downfall of your application.

Twelve Rules Application

The Twelve-Factor App, as originally introduced by Heroku, eventually found a wide following in the industry. The 12 factors outline a set of best practices to use when building modern applications.

These principles, described below, apply to each individual service — not the collection of services that make up an app — when used in the context of modern distributed applications.

One codebase, with a single root repo, does not share code with another service.
Explicitly declare and deploy dependent libraries with the service, that is, do not share deployment of a library with other services. Instead, treat the service as a unit. This allows you to test your code as a unit.
Do not set any configurations in the code. Instead, externalize the configuration and read from there. Typically, this would apply to environment variables. Environment variables are supported by all platforms making them easy to configure and use.
Handle the unresponsive service dependencies robustly and uniformly. Make no distinction between local and third-party services, and assume that they can be swapped in and out with no code change. This is also all about embracing failure, and actually treating failures as a way of life.
Strictly separate build, release, and run stages.

Build. Build a version of the code repo and gather dependencies.
Release. Combine build step results with configuration to create an immutable release. This gives you the ability to reproduce the issue when someone says, “When I was running this service, I hit this bug.”
Run. Use the execution environment to run the service.

A service is one or more stateless processes that shares nothing. It is generally a good practice to separate stateless and stateful services. A twelve-factor app concentrates on stateless services. Managing stateful services is a complex area and will be discussed in later sections.
Expose the service on a port, and do not use other hosts (for example, a web host). Just have your service open a port to listen for requests, without relying on a host.
Use processes for isolation, and use multiple processes for concurrency. Writing multithreaded services is difficult. Instead, write services single-threaded, and use multiple processes concurrently.
Plan the processes so that they shut down gracefully and start fast. Always drain the request queue before shutting down. Try to not have long initialization sequences when services are starting up.
Keep development, staging, and production environments similar. This makes it more likely to encounter a potential issue earlier in the life cycle.
Treat logs as event streams. Don’t be concerned with routing or storing logs, and attempt to write to or manage log files. Write to the easiest place, stdout in an unbuffered way. Routing and archiving in production is the execution environment job, not the service’s. Keep the logging very simple in the application.
Deploy and run the administrative or maintenance tasks as a separate process, and do not bundle with the service executable. Make changes in a predictable way through scripts. Do not change settings manually.

Adhering to these best practices helps us concentrate on making the services:

Simple to code, build, test, deploy, log, start up, and shut down.
Lightweight, with fewer dependencies (for example; OS, language, runtime, or libraries), this allowing them to run fast and use less memory.
Reproducible, giving similar results on the development environment as on the test,
staging, and production environments.

Below are some final key points which i believe depends on application scope and domain but would like to mention as they assure reliability and extend-ability of distributed application.

Messaging communication

Resource efficient: The client does not wait for the service reply, thus there are no blocked threads or long lived locks.
Resilient: The client and service instances can be taken down or move to another server without any effect because the messages will be kept by the messaging service.
Elastic: The queue length can be used to determine the need to scale the service up and down. If the queue length is increasing over time, the orchestrator can decide to increase the instance number. If it is getting smaller over time, the orchestrator can decrease the number of instances.

Deployment

Rolling update in this case you partially update the service/application, so if there are 4 instance of application then you can take down 2 first and replace with new version so other 2 can keep application alive but it could hurt scale-ability and make application slow until you complete update.
Deleted & Update as name suggest, you first take down all instances of services and then in one go replace all instances. In this case down-time can happen and even it doesn’t guarantee if new version will work so if that doesn’t then you have to take that new version down and bring back so its quite risky but less costly.
Blue-green deployment it brings up new server where you can deploy new version and some of the traffic you can send to new server to assure if that new version is reliable and then you can switch off old server its obviously more costly deployment but more reliable and doesn’t hurt scale-ability

Backup and Restore

Document your backup and restore so in case of emergency if somebody has to do backup/restore of your application can do that
Validate your backup after making backup always test your backups as during time of disaster it could fail so you might not have any choice to bring it back

It is impossible to cover everything related with the challenges of developing distributed applications in one article so I ‘ll be writing more on related topics in the future. Aforementioned guidelines can help you in building reliable and scale-able applications of any level; even if its non-distributed.

Happy Architect-ing