πŸ§‘πŸΌβ€πŸŽ¨ System Design

Created: 18.11.2020

System design is very important for any IT specialist. Software engineers are not the only ones who get frequently asked about this when interviewed in some top IT companies. Security engineers face this kind of questions as well. Here are my notes and ideas that I’ve made about system design. I had no prior knowledge whatsoever when I started learning this. I’m not even sure what kind of problems I am to expect for a security position. That’s why I’ll try to cover some courses to get the idea and try to come up with my own system design interview questions and how to answer them.

  1. Design a social network (from the course)
  2. Design Amazon-like e-store (from the course) and scale it to million users.
  3. Explain how DNS works (my presumption)
  4. Design a web application (front-end, back-end, load balancer, WAF) (my presumption)
  5. Design a system that uses IDS, IPS and firewalls. Which would you use and where? (my presumption)
  6. Hubs, switches and stuff (my presumption)
  7. Security policies and prevention systems to react to an incident (my presumption)
  8. Online shopping website backend using Microservices
  9. Recommendation engine like Netflix (AI, BD)
  10. How will you build a scalable auto suggest feature?
  11. How will you build a simple LRU Cache?
  12. Health monitoring system that gives overal score of a user?
  13. Design an image classification system? (IA, BD)

πŸ’‘ I think a good idea is to design some building block which can later be used to design a complete system. Like, demilitarised zone (what’s in there and why), IDS, IPS, back + front + WAF and something like that. Of course, the most important, since it’s a security position, is to make a building block for security policies and controls (that are needed for incident response). Also, it’s a good thing to come up with my own ideas regarding the matter. Divide and conquer πŸ™‚ πŸ‘ . Also make some templates that will require some minor adjustments. There are alsready some best practices.

8 Steps to Design a system

Below are the steps. For each step I’m choosing different problems and try to learn each step separately.

Step 1. Read and Understand the Problem

What building block will you need? Database, website? Cloud (private or public)? Ask specifying questions (scalable, how many users etc).

Example 1

❓ URL shortening service

  • How many users are we going to have?
  • How many URLs per user?
  • Is this system to be scalable?
  • How long should the links live?
  • Do we need some analytics like what are the most visited links?

Step 2. Defining the Scope

Meaning, defining some additional features.

❓ Design twitter

  • Would only users be posting or there are some automatic posts as well?
  • Follow another user
  • User timelines
  • How many users?

❓ URL shortening service

  • Any additional features like choose the pattern for links?
  • Should it be accessible as API for other services?
  • Do we need some accounts, or all can be manages with an API keys?

Step 3. Back of Envelope Calculations*

It’s an optional stage. Ask whether needed. Calculate some resources needed. Possible resources:

  1. Scale of system, active users per data
  2. DB size. Advice to limit uploads (DoS type attack otherwise is possible)
  3. Network bandwidth
  4. Memory usage, cache

πŸ’‘ DoS protection

πŸ’‘ For how long to keep the data

❓ Question: Design a Youtube

Calcluation example:

description value
Number of users 1 billion
Daily active 500 million
Average video views per person in a Day 5
Number of Videos uploaded 1/300 watched
Video size on average 100 mb
Each video upload bandwidth 500 000 000 * 5 * 1/300 * 100 per day = 500 000 000 * 5 * 1/300 * 100 * 1/(24 * 60 * 60) = 0.17 mb/sec
View per sec (500 000 000 * 5) / 24 * 60 * 60 = 29k views/sec
Upload per sec 29k / 300 = 97 uploads/sec
Storage size 97 * 100 mb = 97000 Mb/sec ~ 10 Gb /sec. 315 Petabytes per year
Bandwidth per sec 0.17 * 97 = 16.49 Mb/sec

❓ Design Yelp

❓ Design Yelp love messages chat

❓ URL shortening

This system reads more that writes. Since it reads more that writes, it’s useful to implement indexing in DB.

description value
users 1 billion
read per day 5 000 000
writes per day reads * 1/100 = 50 000
writes per sec 50 000 / 86400 sec = 0.57 writes/sec
read per sec 0.57 writes * 100 = 57 reads/sec
Url size 500 bytes
read bandwidth 57 reads/sec * 500 bytes = 28Kb/sec
write bandwidth 0.57 writes/sec * 500 bytes = 286 bytes/sec
each url lives for 5 years
db size 5 years * 500 bytes * 50 000 writes/day = 365 days * 50 000 writes/day * 500 bytes = 8Gb. Some additional space just in case - 10Gb
Cache size 8Gb * 0.2 = 1.6 Gb

If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs.

Step 4. Service API Design

  1. REST or SOAP API. Determine which methods you’ll have.

  2. Security and logging, authentication

  3. Segregate

  4. Code

    πŸ’‘ 3DS

Example 1.

❓ Ticket booking app.

I need such methods as buyTickets and searchMovies, for example:

def buyTickets(apiKey, user, movie, cinemaHall, seat):
  # 
  
def searchMovies(apiKey, srchQry):
  # check apiKey and available movies
  # connect to DB securely
  # get a list of movies and convert into JSON for example
  # call page generator (some ViewController) and pass JSON to it.

πŸ—’ Difference between RESTful\less and SOAP

Example 2

❓ URL shortening

What apis do I need? read and write obviously. Then update, delete probably as well. For other services, perhaps, analytics.

def get_redirect(url) -> str  
def add_url(api_key, orig_url, ttl) -> str
def update_url(api_key, orig_url, info) -> bool
def delete_url(api_key, orig_url)
def get_reads_per_day(api_key, short_url) -> int

Step 5. Database Design

Relational DB or noSQL? ER Diagram. Don’t go into too much detail

πŸ—’ ER diagram. Learn to write them and relations to tables.

Step 6. High-level Design

Few boxes: servers, caches, db’s. Explain the role of each component.

❓ Design Twitter

high-level-design

high-level-design

Step 7. Low-level Design

Which feature to design in detail?

Example 1

❓ LinkedIn. Design the linked in feed feature as part of low level design.

  • Requirements. Acquired before
  • Exceptional scenarios: a celebrity who’s posting too much πŸ˜„
  • build. Controllers, Loaders, What goes in the feed, feed object.
  • Performance. Precomputing, storing in cache

Step 8. Non Functional Requirements

  1. Availability and scalability. πŸ”‘
  2. Security
  3. Performance (caching, load balancing, preloading)
  4. Testing
  5. Bottlenecks in the system (possible points of failure, how to improve)

Pieces of System Design

Choose Database

ACID principle. Only SQL DBs are ACID compliant. They also claim that NoSQL wide column is also ACID compliant.

  1. Atomicity. An operation is performed to the end or cancelled completele. So, for example, if there is a payment operation, you can’t add it to DB before it was completed successfully. If something goes wrong, all the mini-operations used in that operation are cancelled.
  2. Consistency. All items comply with certain rules and guaranteed to have certain properties. For example, for DB of usernames, each is guaranteed to have a name.
  3. Isolation. Operations cannot interfere with each other.
  4. Durability. Everything should be saved even in case of a system crash.

SQL is better for such applications as banks. If it’s structured and unchanging. For a rapid growth and high traffic - NoSQL. Presumably, for cases when BigData is in place.

πŸ—’ Determine which is better for social media and messaging apps. It sound like NoSQL is better for Social Media and SQL is better for messaging. Social media is less structured. There are lots of thing that users might not fill in, while messaging is always of the same structure (text, from, to, time, delivered, read etc).

Types of NoSQL:

  • Key-Value. Like and XML file (plist)

  • Document.A file system (the way you organize folders and files within them) is something like that.

  • Wide Colimn. For large data sets. Good for:

    • Sensor Logs [Internet of Things (IOT)]
    • User preferences
    • Geographic information
    • Reporting systems
    • Time Series Data
    • Logging and other write heavy applications
    nosql Columnar Relational (SQL)
    keyspace schema
    column family table
    row –> columns

Column family consists of:

  • Row Key. Each row has a unique key, which is a unique identifier for that row.
  • Column. Each column contains a name, a value, and timestamp.
  • Name. This is the name of the name/value pair.
  • Value. This is the value of the name/value pair.
  • Timestamp. This provides the date and time that the data was inserted. This can be used to determine the most recent version of data.

Cassandra allows nesting columns. As I see it, Bob in the above image could have a pet Kiki that had its own set of attributes. Probably like a class having an attribute that is an instance of another class in programming. Like there is an object Person and Dog and class Person can have an attribute difined as: Dog dog = new Dog(). Examples: Bigtable, Cassandra, Druid, HBase, Accumulo, Hybertable.

  • Graph. Good for speeding up a search. See here for more details.

SQL databases are normalized databases where the data is broken down into various logical tables to avoid data redundancy and data duplication. In this scenario, SQL databases are faster than their NoSQL counterparts for joins, queries, updates, etc.

On the other hand, NoSQL databases are specifically designed for unstructured data which can be document-oriented, column-oriented, graph-based, etc. In this case, a particular data entity is stored together and not partitioned. So performing read or write operations on a single data entity is faster for NoSQL databases as compared to SQL databases.

The great thing about column stores is that you can retrieve all related information using a single record ID, rather than using the complex Structured Query Language (SQL) join as in an RDBMS. Doing so does require a little upfront modeling and data analysis, though.

In the example shown, you can retrieve all order information by selecting a single column store row, which means the developer doesn’t need to be aware of the exact complex join syntax of a query in a column store, unlike they would have to be using complex SQL joins in an RDBMS.

CAP

  • Availability. Usually means having some replicas of data in case of a crash.
  • Partition tolerance. If something fails, keep on working.
  • Consistency. There are some rules to the structures that we need to comply with.

It’s impossible to provide all the three. At most - 2. 1 +2: Cassandra, CouchDB; 2+3: BigTable, Mongo, HBase.

Choosing the protocol

Ajax-polling. The client asks the server for the updates every N seconds. βž– HTTP overhead since many responses are empty.

Long-polling. Also know as hanging GET. Client sends the request, but the server doesn’t respons immediatly. It waits for update and sends the response only when it has something to send. Each connection has a timeout anyway.

WebSockets. The TCP connection is persistent. Initiated with WebSocket handshake. This is a two way communication. The best choice for messaging.

Server-sent events. Something like WebSockets, but the client remains a client and cannot send requests to the server. Some notification or push-notifications are the best choice for that.

Todo

AWS read and watch, might be very useful. IA is also something to keep in mind.

Another course by Rajat Mehta about system design interviews in depth.

Useful Tips

  1. Don’t go into detail prematurely
  2. Don’t talk all the time. Wait for feedback along the way. Make pauses.
  3. Don’t have a set architecture in mind
  4. KISS (Keep it Simple Stupid)
  5. Making points without justifications
  6. Aware of current technologies

πŸ’‘ Regular security audit

References

Expand…

https://www.udemy.com/course/intro-system-design-interviews/learn/lecture/17567018?start=0#overview

https://www.youtube.com/watch?v=CtmBGH8MkX4

Thanks to the guy from the previous video, I’ve learned about this course - https://www.educative.io/courses/grokking-the-system-design-interview/B8nMkqBWONo.

https://www.forbes.com/sites/metabrown/2018/03/31/get-the-basics-on-nosql-databases-wide-column-store-databases/?sh=6c34a826e508

https://database.guide/what-is-a-column-store-database/

NoSQL vs SQL -https://www.geeksforgeeks.org/sql-vs-nosql-which-one-is-better-to-use/

Why NoSQL is better for BG - https://www.dezyre.com/article/nosql-vs-sql-4-reasons-why-nosql-is-better-for-big-data-applications/86

Columnar NoSQL vs SQL - https://www.dummies.com/programming/big-data/columnar-data-in-nosql/