Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- System design interview
- CUSTOMER PERSPECTIVE
- Problem
- Definition.
- Who is the customer?
- Pain points.
- Use cases
- Scenarios that will not be covered
- Functional requirements
- Entities and verbs.
- High-level contract (API)
- Make several iterations if possible
- Non-functional requirements
- Performance
- P99 latency for read/write queries?
- Write-to-read data delay?
- Scalability
- Usage patterns, e.g. reads vs writes.
- How many users?
- How many read queries per second?
- How much data is queried per request?
- How many video views are processed per second?
- Can there be spikes in traffic?
- Cost
- Maximize cost of {developmet, time-to-market, maintenance}
- Availability vs Consistency
- Durability
- ESTIMATIONS [5 min]
- Throughput (QPS for read and write queries)
- Latency expected from the system (for read and write queries)
- Read/Write ratio
- Traffic estimates
- Write (QPS, Volume of data)
- Read (QPS, Volume of data)
- Storage estimates
- Memory estimates
- If we are using a cache, what is the kind of data we want to store in cache
- How much RAM and how many machines do we need for us to achieve this ?
- Amount of data you want to store in disk/ssd
- HIGH LEVEL DESIGN [5-10 min]
- APIs for Read/Write scenarios for crucial components
- Database schema
- Basic algorithm
- High level design for Read/Write scenario
- DEEP DIVE [15-20 min]
- Scaling the algorithm
- Scaling individual components
- Availability, Consistency and Scale story for each component
- Consistency and availability patterns
- Think about the following components, how they would fit in and how it would help
- DNS
- CDN [Push vs Pull]
- Load Balancers [Active-Passive, Active-Active, Layer 4, Layer 7]
- Reverse Proxy
- Application layer scaling [Microservices, Service Discovery]
- DB [RDBMS, NoSQL]
- RDBMS
- Master-slave, Master-master, Federation, Sharding, Denormalization, SQL Tuning, Indexing
- NoSQL (in general - Denormalized data + no-joins)
- Key-Value, Wide-Column, Graph, Document
- Fast-lookups:
- RAM [Bounded size] => Redis, Memcached
- Availability [Unbounded size] => Cassandra, RIAK, Voldemort
- Consistency [Unbounded size] => HBase, MongoDB, Couchbase, DynamoDB
- Caches
- Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache @Query level, Cache @Object level
- Eviction policies:
- LRU, LFU, FIFO
- Caching patterns:
- Cache aside
- Write through
- Write behind
- Refresh ahead
- Asynchronism
- Message queues
- Task queues
- Back pressure - Resistance or force opposing the desired flow of data through software("pipes") - buffering vs. dropping
- Communication
- TCP
- UDP
- RESTRPC
- Binary protocols - Apache Avro (evolved from Protocol Buffers and Thrift)
- Security
- Encryption: during transfer/at rest
- Government compliance (EU/China/US)
- Authentication/authorization
- Firewalls
- Payment data storage/handling/compliance
- High level threat modeling (obvious ones)
- Telemetry/monitoring/logs aggregation/Dashboards
- Host level metrics: CPU, Memory, Threads, Disk I/O, Garbage Collection runs
- Fleet - AVG. to first byte response, Surge queue on LB, VIP Spillover, Database preassure, cache tier
- Alarms/setting up thresholds/canaries
- Actions feed/Key business metrics: daily active users, retention, revenue, etc. - Buisness Intelligence
- Control the producer (slow down/speed up is decided by consumer)
- Buffer (accumulate incoming data spikes temporarily)
- Drop (sample a percentage of the incoming data)
- Technically there’s a fourth option — ignore the backpressure — which, to be honest, is not a bad idea if the backpressure isn’t causing critical issues. Introducing more complexity comes at a cost too.
- Costs/optimizations. When using cloud services, it’s important to keep a lid on your costs.
- Autoscaling/adding "elasticity" (discuss traffic patterns: regional/seasonal), failovers
- SSD vs. HDD
- Commodity hardware vs. specialized ("optimized" for Memory, Disk I/O, CPU, GPU)
- Open-source vs Paid vs Built in-house
- Experimentation capability sooner or later comes into large scale products
- Testing capability/testing tools and hooks/Gremlins/Hogs/Gameday-Outages excersice - intoducing chaos into the system
- Deployments/rollbacks/canaries/soak-times/etc.
- Pluggable instrumentation
- JUSTIFY [5 min]
- Throughput of each layer
- Latency caused between each layer
- Overall latency justification
Advertisement
Add Comment
Please, Sign In to add comment