What is System Design?

System design is the process of defining the scope, architecture, and components of a computer system. The goal of system design is to create a system blueprint that satisfies all stakeholders’ requirements.

Framework for system design

This framework establishes 4 fundamental steps for system design:

  • Understand the problem and establish the scope
  • Propose high-level design and get buy-in
  • Design deep dive
  • Wrap up

Understand the problem

  • Lay down what is important
  • Ask as many questions as possible
  • Clarify the requirements
  • What features are required, in priority
  • Who are the users

Scale the non-functional requirements

  • Performance
  • Scalability
  • Security

Propose a high-level design

  • Start from a top-down approach
  • API’s first, how users interact with the systems
  • Use REST for APIs unless otherwise specified

Create a high-level diagram

  • At the top level are users
  • Second a load balancer, API gateway, etc
  • At the third level are backend services
  • The fourth level is database services

Don’t go into too much detail or overdesign. Leave potential improvements for later.

Design deep dive

  • Identify areas that can be problematic
  • Come up with alternative solutions
  • Discuss trade-offs

Wrap up

Provide a very quick summary of the design and what makes it unique.

What are the 3 types of distribution in a Load Balancer?

  • Random: Requests are distributed randomly across the hosts
  • Round robin: Requests are distributed evenly across the hosts
  • Weighted: Requests are distributed based on metrics, such as lowest response time, memory, CPU cycles

What are the key differences between SQL and No-SQL databases?

Relational database

  • Structured data
  • Predefined schema
  • Data is in rows and columns

Non-relational database

  • Unstructured data
  • Distributed
  • Dynamic schema

What does ACID acronym mean in database design?

It represents four principles that aim at reducing the anomalies and protecting the integrity of a database

  • Atomicity: Each statement in a transaction is treated as a single unit. Either the entire statement is executed or nothing is executed.
  • Consistency: Ensured that transactions only make changes in predictable ways
  • Isolation: When multiple users are reading/writing from the same table, their transactions are isolated

Durability: Guarantees that successful changes to data persist even in the event of system failures

What is the CAP theorem?

It’s a theorem where only 2 of the CAP can be achieved by any one system, normally represented by a triangle where a particular system lays on a side and achieving the middle is not possible

  • Consistency: All nodes see the same data at all times
  • Availability: Every request gets a response
  • Partition tolerance: The system continues to work despite message loss or partial failure

Components of System Design

  • Load Balancer
  • Key-value stores
  • Blob storage & Databases
  • Rate limiters
  • Monitoring System
  • Distributes system messaging queue
  • Distributed unique ID generator
  • Distributes search
  • Distributed logging services
  • Distributes task scheduler

What is a Load Balancer?

A load balancer is a resource that distributes requests across servers or other resources.

What is the difference between layer 4 and layer 7 load balancing?

Layer 4 load balancers distribute the load based on source and destination IP, while layer 7 load balancers can distribute requests based on the content of the requests, such as HTTP methods and URLs.

What is a key-value store?

A key-value store is a type of NoSQL database that stores data as key-value pairs. The main benefit of key-value stores is to store data that is accessed frequently as they often provide better performance than relational databases, besides being more scalable and easier to distribute.

What is blob storage?

Blog storage, also known as object storage is a type of storage that is designed to hold large amounts of structured data, like images, videos, documents, etc

Blob storage systems are designed to be highly scalable and handle large numbers of requests.

What are databases?

Databases hold structured data that is organized in a specific way.

What are rate limiters?

Rate limiters are components that are designed to protect a system by limiting the rate at an action can be performed. They can be divided in:

  • Request rate limiters
  • Action rate limiters
  • User rate limiters
  • Token bucket rate limiters

What is a monitoring system?

A monitoring system collects, analyses, and reports various metrics relating to a s system or application. Some common types are:

  • Network monitoring systems
  • System monitoring systems
  • Application monitoring systems
  • Infrastructure monitoring systems

What is a distributed system messaging queue?

A distributed system messaging queue is a system that enables messages to be exchanged asynchronously between different nodes in a system. They can be:

  • Point-to-point: Messages are delivered to a specific recipient
  • PubSub: Messages are published to a topic and delivered to all subscribers of that topic
  • Hybrid: Combines both the above queueing systems

What is a distributed unique ID generator

A distributed unique ID generator is a system that generates unique identifiers that can be used for entities in a distributed system.

Distributed search is the practice of distributing the search across several nodes or hosts to allow for parallel processing and improve performance and scalability.

What are distributed logging systems?

Distributed logging is the practice of collecting, storing, and analyzing data of a distributed system to enable tracking the health and performance of the system, as well as for the troubleshooting of issues.

What is a distributed scheduler?

A distributed task scheduler is a system that schedules and executes tasks on a distributed system. It can execute tasks at specific intervals, on a schedule or in response to events.

References