System design is very important for any IT specialist. Software engineers are not the only ones who get frequently asked about this when interviewed in some top IT companies. Security engineers face this kind of questions as well. Here are my notes and ideas that I’ve made about system design. I had no prior knowledge whatsoever when I started learning this. I’m not even sure what kind of problems I am to expect for a security position. That’s why I’ll try to cover some courses to get the idea and try to come up with my own system design interview questions and how to answer them.
- Design a social network (from the course)
- Design Amazon-like e-store (from the course) and scale it to million users.
- Explain how DNS works (my presumption)
- Design a web application (front-end, back-end, load balancer, WAF) (my presumption)
- Design a system that uses IDS, IPS and firewalls. Which would you use and where? (my presumption)
- Hubs, switches and stuff (my presumption)
- Security policies and prevention systems to react to an incident (my presumption)
- Online shopping website backend using Microservices
- Recommendation engine like Netflix (AI, BD)
- How will you build a scalable auto suggest feature?
- How will you build a simple LRU Cache?
- Health monitoring system that gives overal score of a user?
- Design an image classification system? (IA, BD)
π‘ I think a good idea is to design some building block which can later be used to design a complete system. Like, demilitarised zone (what’s in there and why), IDS, IPS, back + front + WAF and something like that. Of course, the most important, since it’s a security position, is to make a building block for security policies and controls (that are needed for incident response). Also, it’s a good thing to come up with my own ideas regarding the matter. Divide and conquer π π . Also make some templates that will require some minor adjustments. There are alsready some best practices.
8 Steps to Design a system
Below are the steps. For each step I’m choosing different problems and try to learn each step separately.
Step 1. Read and Understand the Problem
What building block will you need? Database, website? Cloud (private or public)? Ask specifying questions (scalable, how many users etc).
Example 1
β URL shortening service
- How many users are we going to have?
- How many URLs per user?
- Is this system to be scalable?
- How long should the links live?
- Do we need some analytics like what are the most visited links?
Step 2. Defining the Scope
Meaning, defining some additional features.
β Design twitter
- Would only users be posting or there are some automatic posts as well?
- Follow another user
- User timelines
- How many users?
β URL shortening service
- Any additional features like choose the pattern for links?
- Should it be accessible as API for other services?
- Do we need some accounts, or all can be manages with an API keys?
Step 3. Back of Envelope Calculations*
It’s an optional stage. Ask whether needed. Calculate some resources needed. Possible resources:
- Scale of system, active users per data
- DB size. Advice to limit uploads (DoS type attack otherwise is possible)
- Network bandwidth
- Memory usage, cache
π‘ DoS protection
π‘ For how long to keep the data
β Question: Design a Youtube
Calcluation example:
description | value |
---|---|
Number of users | 1 billion |
Daily active | 500 million |
Average video views per person in a Day | 5 |
Number of Videos uploaded | 1/300 watched |
Video size on average | 100 mb |
Each video upload bandwidth | 500 000 000 * 5 * 1/300 * 100 per day = 500 000 000 * 5 * 1/300 * 100 * 1/(24 * 60 * 60) = 0.17 mb/sec |
View per sec | (500 000 000 * 5) / 24 * 60 * 60 = 29k views/sec |
Upload per sec | 29k / 300 = 97 uploads/sec |
Storage size | 97 * 100 mb = 97000 Mb/sec ~ 10 Gb /sec. 315 Petabytes per year |
Bandwidth per sec | 0.17 * 97 = 16.49 Mb/sec |
β Design Yelp
β Design Yelp love messages chat
β URL shortening
This system reads more that writes. Since it reads more that writes, it’s useful to implement indexing in DB.
description | value |
---|---|
users | 1 billion |
read per day | 5 000 000 |
writes per day | reads * 1/100 = 50 000 |
writes per sec | 50 000 / 86400 sec = 0.57 writes/sec |
read per sec | 0.57 writes * 100 = 57 reads/sec |
Url size | 500 bytes |
read bandwidth | 57 reads/sec * 500 bytes = 28Kb/sec |
write bandwidth | 0.57 writes/sec * 500 bytes = 286 bytes/sec |
each url lives for | 5 years |
db size | 5 years * 500 bytes * 50 000 writes/day = 365 days * 50 000 writes/day * 500 bytes = 8Gb. Some additional space just in case - 10Gb |
Cache size | 8Gb * 0.2 = 1.6 Gb |
If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs.
Step 4. Service API Design
-
REST or SOAP API. Determine which methods you’ll have.
-
Security and logging, authentication
-
Segregate
-
Code
π‘ 3DS
Example 1.
β Ticket booking app.
I need such methods as buyTickets
and searchMovies
, for example:
def buyTickets(apiKey, user, movie, cinemaHall, seat):
#
def searchMovies(apiKey, srchQry):
# check apiKey and available movies
# connect to DB securely
# get a list of movies and convert into JSON for example
# call page generator (some ViewController) and pass JSON to it.
π Difference between RESTful\less and SOAP
Example 2
β URL shortening
What apis do I need? read and write obviously. Then update, delete probably as well. For other services, perhaps, analytics.
def get_redirect(url) -> str
def add_url(api_key, orig_url, ttl) -> str
def update_url(api_key, orig_url, info) -> bool
def delete_url(api_key, orig_url)
def get_reads_per_day(api_key, short_url) -> int
Step 5. Database Design
Relational DB or noSQL? ER Diagram. Don’t go into too much detail
π ER diagram. Learn to write them and relations to tables.
Step 6. High-level Design
Few boxes: servers, caches, db’s. Explain the role of each component.
β Design Twitter
Step 7. Low-level Design
Which feature to design in detail?
Example 1
β LinkedIn. Design the linked in feed feature as part of low level design.
- Requirements. Acquired before
- Exceptional scenarios: a celebrity who’s posting too much π
- build. Controllers, Loaders, What goes in the feed, feed object.
- Performance. Precomputing, storing in cache
Step 8. Non Functional Requirements
- Availability and scalability. π
- Security
- Performance (caching, load balancing, preloading)
- Testing
- Bottlenecks in the system (possible points of failure, how to improve)
Pieces of System Design
Choose Database
ACID principle. Only SQL DBs are ACID compliant. They also claim that NoSQL wide column is also ACID compliant.
- Atomicity. An operation is performed to the end or cancelled completele. So, for example, if there is a payment operation, you can’t add it to DB before it was completed successfully. If something goes wrong, all the mini-operations used in that operation are cancelled.
- Consistency. All items comply with certain rules and guaranteed to have certain properties. For example, for DB of usernames, each is guaranteed to have a name.
- Isolation. Operations cannot interfere with each other.
- Durability. Everything should be saved even in case of a system crash.
SQL is better for such applications as banks. If it’s structured and unchanging. For a rapid growth and high traffic - NoSQL. Presumably, for cases when BigData is in place.
π Determine which is better for social media and messaging apps. It sound like NoSQL is better for Social Media and SQL is better for messaging. Social media is less structured. There are lots of thing that users might not fill in, while messaging is always of the same structure (text, from, to, time, delivered, read etc).
Types of NoSQL:
-
Key-Value. Like and XML file (
plist
) -
Document.A file system (the way you organize folders and files within them) is something like that.
-
Wide Colimn. For large data sets. Good for:
- Sensor Logs [Internet of Things (IOT)]
- User preferences
- Geographic information
- Reporting systems
- Time Series Data
- Logging and other write heavy applications
nosql Columnar Relational (SQL) keyspace schema column family table row –> columns
Column family consists of:
- Row Key. Each row has a unique key, which is a unique identifier for that row.
- Column. Each column contains a name, a value, and timestamp.
- Name. This is the name of the name/value pair.
- Value. This is the value of the name/value pair.
- Timestamp. This provides the date and time that the data was inserted. This can be used to determine the most recent version of data.
Cassandra allows nesting columns. As I see it, Bob in the above image could have a pet Kiki that had its own set of attributes. Probably like a class having an attribute that is an instance of another class in programming. Like there is an object Person
and Dog
and class Person
can have an attribute difined as: Dog dog = new Dog()
. Examples: Bigtable, Cassandra, Druid, HBase, Accumulo, Hybertable.
- Graph. Good for speeding up a search. See here for more details.
SQL databases are normalized databases where the data is broken down into various logical tables to avoid data redundancy and data duplication. In this scenario, SQL databases are faster than their NoSQL counterparts for joins, queries, updates, etc.
On the other hand, NoSQL databases are specifically designed for unstructured data which can be document-oriented, column-oriented, graph-based, etc. In this case, a particular data entity is stored together and not partitioned. So performing read or write operations on a single data entity is faster for NoSQL databases as compared to SQL databases.
The great thing about column stores is that you can retrieve all related information using a single record ID, rather than using the complex Structured Query Language (SQL) join as in an RDBMS. Doing so does require a little upfront modeling and data analysis, though.
In the example shown, you can retrieve all order information by selecting a single column store row, which means the developer doesnβt need to be aware of the exact complex join syntax of a query in a column store, unlike they would have to be using complex SQL joins in an RDBMS.
CAP
- Availability. Usually means having some replicas of data in case of a crash.
- Partition tolerance. If something fails, keep on working.
- Consistency. There are some rules to the structures that we need to comply with.
It’s impossible to provide all the three. At most - 2. 1 +2: Cassandra, CouchDB; 2+3: BigTable, Mongo, HBase.
Choosing the protocol
Ajax-polling. The client asks the server for the updates every N seconds. β HTTP overhead since many responses are empty.
Long-polling. Also know as hanging GET. Client sends the request, but the server doesn’t respons immediatly. It waits for update and sends the response only when it has something to send. Each connection has a timeout anyway.
WebSockets. The TCP connection is persistent. Initiated with WebSocket handshake. This is a two way communication. The best choice for messaging.
Server-sent events. Something like WebSockets, but the client remains a client and cannot send requests to the server. Some notification or push-notifications are the best choice for that.
Todo
AWS read and watch, might be very useful. IA is also something to keep in mind.
Another course by Rajat Mehta about system design interviews in depth.
Useful Tips
- Don’t go into detail prematurely
- Don’t talk all the time. Wait for feedback along the way. Make pauses.
- Don’t have a set architecture in mind
- KISS (Keep it Simple Stupid)
- Making points without justifications
- Aware of current technologies
π‘ Regular security audit
References
Expand…
https://www.udemy.com/course/intro-system-design-interviews/learn/lecture/17567018?start=0#overview
https://www.youtube.com/watch?v=CtmBGH8MkX4
Thanks to the guy from the previous video, I’ve learned about this course - https://www.educative.io/courses/grokking-the-system-design-interview/B8nMkqBWONo.
https://database.guide/what-is-a-column-store-database/
NoSQL vs SQL -https://www.geeksforgeeks.org/sql-vs-nosql-which-one-is-better-to-use/
Why NoSQL is better for BG - https://www.dezyre.com/article/nosql-vs-sql-4-reasons-why-nosql-is-better-for-big-data-applications/86
Columnar NoSQL vs SQL - https://www.dummies.com/programming/big-data/columnar-data-in-nosql/