How To Design Software Systems on The Napkin’s Back

Every software engineer, at some point in their career, will be involved in designing systems. This blog post will highlight the steps involved in a general approach to solving design problems.

Step one – Gather Basic Requirements and Clarify Assumptions.

The first step in the design process is to start asking questions. We need to learn about the exact scope of the problem that our software system will try to solve. It’s good to keep in mind that the initial design questions are open-ended. Usually, they don’t have one correct answer. It becomes crucial to clarify requirements early in the design process. Time spent asking questions and explaining answers will save a lot of engineering time later on in the development process.
For example, let’s assume we try to design a service that allows us to exchange ideas and communicate over the Internet. Our initial set of questions should focus on basic information about the system goals and requirements.

  • Should the system allow post messages and allow subscribing to messages sent by other users?
  • Should the system allow post messages and allow subscribing to messages sent by other users?
  • Should the system display messages in chronological order?
  • Should users communicate using text, or they will be able to attach photos and video clips to the messages?
  • What about other attachment types like, for example, word documents, pdf files, etc.?
    Should the system design involve only the backend part with appropriate interfaces or both regions: backend and frontend?
  • Should the system provide search functionality? If yes, on what terms?
  • Should the system provide usage statistics, like, for example, most popular messages or topics?
  • Should the system allow users to subscribe to track specific topics of messages?

A few questions listed above will already define an initial scope of the design of the software system.

Step Two. Perform Rough Estimation.

In this step, we need to think about the scale of the software system we design. The estimation step is essential as it forms an initial idea about system requirements focusing on load balancing, partitioning, scaling and caching.

An example of some questions we try to answer in this step may be, for example:

  • What is the scale of the system?
  • What is the number of messages per second or minute that the system should handle?
  • What is the number of messages per second with attachments?
  • How many users per second will be viewing the messages?
  • How much storage will the system need to handle predicted traffic (messages and messages with attachments)?
  • Will the system require multiple types of storage for the data?
  • What is the expected network bandwidth?
  • How will the system handle the load balance between multiple servers?

Step Three. Define System Interfaces.

We need to learn and define how the system will exchange data with other systems. We need to determine which API (Application Programming Interface) to design to allow inter-system communication.
The APIs will form contracts expected from the software system.


Step Four. Model Data Flow in the System

Defining a data model will establish and clarify how data flow between system components should look. Later in the design process, this information will help with data partitioning. At this stage, we need to identify multiple sub-systems and entities in the overall design.
A clear picture of the system’s entities will allow projecting how they interact with each other. Also, we will be able to answer questions related to data management like encryption at rest and in transport, storage approach and data transportation.

A list of potential data entities in the system may involve:

  • Message: ID, Content, TimeStamp, and location (if the system require this functionality)
  • User: ID, Name, Email, LastLogin, Age, Nationality, etc.
  • ObservedMessages: UserID, MessageID, TimeStamp, etc.

In this step, we may also have a clear picture of a data management system that we will be using. Will we use PostgreSQL, MySQL or any other SQL database or NoSQL systems like DynamoDB or MongoDB. Another, most likely option, maybe we will need various storage types for different system components. This approach is even more valid when dealing with modern, cloud-based systems where we may leverage object storage like the S3 AWS service.

Step Five. Design with a High-level Approach.

In this part of the design, it is valuable to start drawing a high-level diagram or multiple diagrams. It is crucial to visualise the concepts we discussed in the first four steps. We could draw a graph with components displaying the core parts of the system that solve the initial business problem that the system is supposed to solve.
For our imaginary messaging platform, we would need multiple application servers responsible for handling the request-response cycle. In front of them, we would place a load balancer that takes care of network traffic distribution.
Depends on the request read/write proportion, we may have to have separate servers for handling writes and separate for handling reads.
We would need an efficient database capable of handling the estimated traffic and possible spikes in the number of requests. We would need either file system storage or object storage for storing message attachments, like AWS S3.

Step Six. Design Details of the System.

At this stage of the design process, we need to dig dipper into each system component. The crucial is to have enough business and technical requirements to reason about its elements and expected characteristics. It is always vital to remember that engineering is about tradeoffs. We may need to produce two or three design approaches where each of them addresses specific technical problems.

Some of the questions that we should consider at the detailed level design:

  • If we store a considerable amount of data, do we need to plan data partitioning?
  • Are we going to use a managed data store like RDS, Aurora, or host databases themselves?
  • How will we handle traffic spikes and request rate-limiting?
  • Where should we introduce cache in the system, at which layer?
  • Do any other components require load balancing strategies?
  • How will users be authenticated and authorised to use the system?
  • How will backup data strategies improve the security and reliability of the system?

Step Seven. Resolve Possible Bottlenecks.

We should have a relatively clear picture of how the system should look and behave at this stage. We should also be confident that our design satisfies primary requirements.
Now it would be a good time to reflect on the design and answer some of the following questions:

Does the system has a single point of failure? If yes, how to eliminate this risk? Again, answers to this question will vary depending on if we design an on-prem or cloud-based solution.
Speaking about data storage level, do we have enough replicas of the database? Do we need to split traffic and use read-only database replicas? If yes, how many?
If some of the database replicas are not reachable, are we still able to keep the system up? Does the designed system meet SLA contracts with our customers?
How are we monitoring events inside our system? Do we have a clear picture of how separate components behave under low and heavy traffic?
Do we have the ability to react quickly if some components go down? How fast we know that system is not performing correctly? What solution, if any, we could employ to ensure self-healing is in place?

The described seven stages of the fast system design on the napkin’s back could form a backbone for more detailed discussions. The presented guide covers the most critical aspects of the system design.

In future blog posts, we will discuss how to apply this seven-stage system to design various software systems in a more detailed way.

Leave a reply:

This site uses Akismet to reduce spam. Learn how your comment data is processed.