How to judge a Distributed System based on its Scalability, Reliability, Availability, Efficiency, and Manageability.


Scalability is the capability of a system, process, or network to grow and manage increased demand. Any distributed system that can continuously evolve to support the growing amount of work is considered scalable.

A system may have to scale because of many reasons like increased data volume or increased work, e.g., number of transactions. A scalable system would like to achieve this scaling without performance loss.

Generally, although designed (or claimed) to be scalable, the performance of a system declines with the system size due to the management or environmental cost. For instance, network speed may become slower because machines…

Let’s design a file hosting service like Dropbox or Google Drive to enable users to store their data on remote servers.

Usually, these servers are maintained by cloud storage providers and made available to users over a network (typically through the Internet). Users pay for their cloud data storage monthly.

Similar Services: OneDrive, Google Drive

Why Cloud Storage?

Cloud file storage services have become very popular recently as they simplify the storage and exchange of digital resources among multiple devices. The shift from using single personal computers to using multiple devices with different platforms and operating systems such as smartphones and tablets and their mobile access from various geographical locations at any time is believed to be accountable for the massive popularity of cloud…

Let’s design a web service to store plain text and get a randomly generated URL to access it.

Similar Services:,,

What is Pastebin? like services enable users to store plain text or images over the network (typically the Internet) and generate unique URLs to access the uploaded data. Such services are also used to share data over the web quickly, as users would need to pass the URL to let other users see it.

If you haven’t used before, please try creating a new ‘Paste’ there and spend some time going through different options their service offers. This will help you a lot in understanding this chapter better.

Requirements and Goals of the System

Our Pastebin service should meet the following requirements:

Functional Requirements

Let’s design an instant messaging service like Facebook Messenger, where users can send text messages to each other.

What is Facebook Messenger?

Facebook Messenger is a software application that provides text-based instant messaging services to its users. Messenger users can chat with their Facebook friends both from cell phones and their websites.

Requirements and Goals of the System

Our Messenger should meet the following requirements:

Functional Requirements:

  1. Messenger should support one-on-one conversations between users.
  2. Messenger should keep track of the online/offline statuses of its users.
  3. Messenger should support the persistent storage of chat history.

Non-functional Requirements:

  1. Users should have a real-time chat experience with minimum latency.
  2. Our system should be highly consistent; users should see the same chat history on all their devices.
  3. Messenger’s high availability is desirable; we can tolerate lower…

Both of them differ in how they were built, the kind of information they store, and how they store it.

There are two main types of solutions in the world of databases: SQL and NoSQL — or relational databases and non-relational databases. Both of them differ in how they were built, the kind of information they store, and how they store it.

Relational databases are structured and have predefined schemas, like phone books that store phone numbers and addresses. Non-relational databases are unstructured, distributed, and have a dynamic schema, like file folders that hold everything from a person’s address and phone number to their Facebook ‘likes’ and online shopping preferences.


Relational databases store data in rows and columns. Each row…

Why do we even need Consistent Hashing and how does it work?

Distributed Hash Table (DHT) is one of the fundamental components used in distributed scalable systems.

Hash Tables need a key, value, and a hash function, where the hash function maps the key to a location where the value is stored.

index = hash_function(key)

Suppose we are designing a distributed caching system. Given ’n’ cache servers, an intuitive hash function would be ‘key % n.’ It is simple and commonly used. But it has two significant drawbacks:

  1. It is NOT horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. It will be a…

Ajax Polling, Long-Polling, WebSockets, and Server-Sent Events are popular communication protocols between clients like web browsers and web servers.

First, let’s start with understanding what a standard HTTP web request looks like. Following are a sequence of events for a regular HTTP request:

  1. The client opens a connection and requests data from the server.
  2. The server calculates the response.
  3. The server sends the response back to the client on the opened request.

Ajax Polling

Polling is a standard technique used by the vast majority of AJAX applications. The basic idea is that the client repeatedly polls (or requests) a server for data. The client makes a request and waits for the server to respond with data. …

Caching enables you to make vastly better use of the resources you already have

Caches take advantage of the locality of reference principle: recently requested data is likely to be requested again. A cache is like short-term memory: it has a limited amount of space but is typically faster than the original data source and contains the most recently accessed items.

They are used in almost every computing layer: hardware, operating systems, web browsers, web applications, and more. Caches can exist at all levels in architecture but are often found nearest to the front end, where they are implemented to return data quickly without taxing downstream levels.

Application server cache

Placing a cache directly on a request…

Why do we even need Load Balancers and how do they work?

Load Balancer (LB) is a critical component of any distributed system. It helps spread the traffic across a cluster of servers to improve the responsiveness and availability of applications, websites, or databases. LB also keeps track of the status of all the resources while distributing requests. If a server cannot take new requests or is not responding, or has an elevated error rate, LB will stop sending traffic to such a server.

Typically a load balancer sits between the client and the server accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various algorithms…

Let’s design a photo-sharing service like Instagram, where users can upload photos to share them with other users.

Photo by Alexander Shatov on Unsplash

Why Instagram?

Instagram is a social networking service that enables users to upload and share their pictures and videos publicly or privately with other users.

We plan to design a simpler version of Instagram for this exercise, where users can share photos and follow other users. The newsfeed for each user will consist of top images from all the people the user follows.

Requirements and Goals of the System

We will focus on the following set of requirements while designing Instagram:

Functional Requirements

  1. Users should be able to upload, download and view photos.
  2. Users can follow other users.
  3. The system should generate and display a user’s newsfeed consisting of top…

Crack Fang

Understand the technical Details behind all your favorite products. We help you put your best foot forward so you can get through the FANG door.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store