HLD + LLD · API DESIGN · PATTERNS · AI AGENTS

System Design in 30 Days

From CAP theorem to designing WhatsApp — with API design, HLD/LLD frameworks, 12 universal patterns, and AI agent systems. Every day: one concept, one real system, real trade-offs. Interview-ready by Day 30.

36
Days Total
~2h
Per Day
14
System Designs
HLD+LLD
Both Covered
AI Agents
Week 5

Prerequisites & Frameworks

READ BEFORE WEEK 1

The 5-Layer HLD Architecture

LAYER 1 — CLIENT Mobile / Web / 3rd Party HTTPS LAYER 2 — EDGE CDN · GeoDNS · Load Balancer · API Gateway (auth · rate limit · SSL termination) internal LAYER 3 — SERVICES Stateless Microservices / Monolith (each service owns its domain — independently scalable) async LAYER 4 — MESSAGING Message Queue · Kafka · SQS (event bus · async decoupling · backpressure) read/write LAYER 5 — DATA Primary DB SQL / NoSQL Cache Redis / Memcached Search Elasticsearch Object Store S3 / GCS CDN Edge Cache Latency per hop: DNS ~30ms · CDN hit ~5ms · Redis ~1ms · DB ~5–20ms p50 end-to-end: 50–100ms · p99 bottleneck: almost always DB or missing cache

HLD Drawing Rules

1. APIs before boxes

Define 2–3 key API endpoints before drawing any component. What does the outside world call? API-first thinking reveals the data flow.

2. One level of abstraction

HLD shows components and connections — NOT class fields or method signatures. Each box = independently deployable unit.

3. Label every arrow

HTTP/REST, gRPC, Kafka topic name, SQL. Unlabelled arrows = undefined contracts. Show direction: sync (→) vs async (⇢ dashed).

4. Annotate with numbers

"10K RPS", "50ms p99", "3 replicas". Numbers make diagrams credible and show you've thought about scale.

5. Separate read and write paths

Draw write path first (the harder one), then read path. Different paths often use different storage or caching strategies.

6. Box what scales independently

Each component box should be able to scale independently. If two things always scale together, they might be one service.

Full Request Flow — Memorize This

CLIENT browser/app DNS CDN ~5ms hit miss LB ~1ms API GATEWAY auth · rate-limit ~5ms SERVICE business logic CACHE Redis ~1ms hit miss DATABASE ~5–20ms p50: 50–100ms total · p99 bottleneck = DB miss or missing cache layer · CDN hit = ~5ms total

LLD = Entities + Relationships + State + Concurrency

// Example LLD: Ride Booking (Uber) [ ENTITIES ] ┌─────────────────┐ ┌─────────────────┐ │ Rider │ │ Driver │ ├─────────────────┤ ├─────────────────┤ │ id: UUID (PK) │ │ id: UUID (PK) │ │ name: String │ │ name: String │ │ phone: String │ │ location: GeoPoint│ │ rating: Float │ │ rating: Float │ │ created_at: TS │ │ vehicle: FK │ └────────┬────────┘ └────────┬────────┘ │ creates │ accepts ▼ ▼ ┌─────────────────────────────────┐ │ Ride │ ├─────────────────────────────────┤ │ id: UUID (PK) │ │ rider_id: UUID (FK→Rider) │ │ driver_id: UUID (FK→Driver) │ │ status: ENUM ←── state machine│ │ pickup: GeoPoint │ │ dropoff: GeoPoint │ │ fare: Decimal │ │ version: Int ←── optimistic │ │ created_at: Timestamp │ └─────────────────────────────────┘ [ STATE MACHINE — Ride.status ] REQUESTED ──accept──▶ MATCHED ──pickup──▶ IN_PROGRESS │ │ cancel complete │ │ ▼ ▼ CANCELLED COMPLETED [ DESIGN PATTERNS USED ] Strategy: PricingStrategy { standardFare(), surgeFare() } Factory: VehicleFactory.create(type) → Car | Bike | Truck Observer: RideStatusBoard.onStatusChange(ride) State: Ride itself implements state transitions

LLD Drawing Rules

1. Entities first (nouns)

Identify nouns in the problem: User, Order, Product, Ride. Each noun → a class + a DB table. List fields with types and PKs.

2. Relationships with cardinality

Label every relationship: 1:1, 1:N, M:N. M:N needs a junction table. Show foreign keys.

3. State machines for status fields

Any entity with a status field needs a state diagram. Show valid transitions and which events trigger them.

4. Indexes — which queries does this serve?

For each query pattern, show the index: rides_by_rider (rider_id, created_at DESC). Missing index = table scan at scale.

5. Name the design patterns

Call out Strategy, Observer, Factory, State. Names show architectural literacy and prevent over-explanation.

6. Address concurrency explicitly

Where can race conditions occur? Show the fix: optimistic locking (version field + CAS), pessimistic locking (SELECT FOR UPDATE), or distributed lock (Redis SETNX). This is the senior/staff differentiator in LLD rounds.

Common LLD Design Patterns

PatternUse WhenExample
StrategyMultiple interchangeable algorithmsPricingStrategy, SortStrategy
ObserverState change must notify multiple subscribersOrderStatus → notify email/SMS/push
FactoryCreate objects without specifying concrete classNotificationFactory.create(type)
SingletonExactly one instance needed globallyDB connection pool, config
DecoratorAdd behaviour without subclassingLoggingCoffee wraps SimpleCoffee
RepositoryAbstract data access layerUserRepository.findById()

Every system design question maps to one or more of these 12 patterns. Identify the pattern first — then fill in the specifics. Knowing the pattern unlocks the right trade-offs automatically.

PATTERN 01 · READ HEAVY
Cache-Aside + Read Replicas
App checks Redis first → miss → read DB → populate cache. Add read replicas to fanout query load. Key question: what's the invalidation strategy?
→ Social feeds · Product pages · YouTube view counts
PATTERN 02 · WRITE HEAVY
Write-Ahead Queue + Async Workers
Writes go to Kafka/SQS first, workers process async. Decouples ingestion from processing. Backpressure = queue depth. Key: exactly-once semantics?
→ Analytics events · IoT telemetry · Email queue · Logs
PATTERN 03 · REAL-TIME
WebSocket + Pub-Sub Backend
Clients hold persistent WebSocket connections. Backend pub-sub (Redis/Kafka) fans out to all connection servers. Key: reconnect + replay from offset.
→ Chat · Live scores · Collaborative editing · Dashboards
PATTERN 04 · GLOBAL SCALE
CDN + GeoDNS + Edge Caching
Static assets from CDN edge. GeoDNS routes to nearest PoP. Origin shield collapses cache misses. Key: cache invalidation at edge, data residency laws.
→ Netflix · YouTube delivery · Cloudflare · Global APIs
PATTERN 05 · DISTRIBUTED TX
Saga Pattern (Choreography / Orchestration)
Break multi-service transaction into steps. Each step publishes event. On failure: compensating transactions roll back in reverse. Key: idempotent compensations.
→ E-commerce checkout · Flight booking · Bank transfer
PATTERN 06 · DUAL WRITE
Outbox Pattern + CDC (Debezium)
Write to DB + Outbox table in same transaction. Debezium reads WAL → publishes to Kafka. Atomicity in DB, eventual delivery to bus. Eliminates dual-write failure.
→ Any service writing to DB AND publishing events
PATTERN 07 · SEARCH
Inverted Index + ES Sync via CDC
Primary DB = source of truth. ES = read-optimized search replica synced via Kafka CDC. Hybrid search: ANN (semantic) + BM25 (keyword) + RRF fusion.
→ Airbnb search · LinkedIn · Product catalog · Legal docs
PATTERN 08 · GEO PROXIMITY
Geohash Grid + Redis GEO
Encode lat/lng as geohash string. Query neighboring cells. Redis GEOSEARCH for radius. Update moving objects via WebSocket stream. Key: precision vs cell size.
→ Uber driver matching · Yelp nearby · Tinder radius
PATTERN 09 · RATE LIMITING
Token Bucket / Sliding Window at Gateway
API Gateway checks Redis before forwarding. Token bucket for burst-friendly limiting. Sliding window counter for precise limits. Return 429 + Retry-After.
→ API platform · Payment gateway · SMS OTP service
PATTERN 10 · UNIQUE IDS
Snowflake ID / UUID v7
Snowflake: 64-bit = 41-bit timestamp + 10-bit machine + 12-bit seq. Sortable, ~4096 IDs/ms/node, no coordination. UUID v7: time-ordered + universally unique.
→ Tweet IDs · Instagram media IDs · Order IDs · Message IDs
PATTERN 11 · CQRS
Command Query Responsibility Segregation
Separate write model (normalized DB) from read model (denormalized read store). Commands produce events → update multiple read models optimized for each query.
→ E-commerce (orders write, catalog read) · Banking ledger
PATTERN 12 · RESILIENCE
Circuit Breaker + Bulkhead + Retry
Circuit breaker opens on failure threshold → fail fast. Bulkhead: isolate thread pools per dependency. Retry with exponential backoff + jitter. Timeout on every call.
→ Any service with downstream dependencies. Always add this.
Pattern Recognition Decision Tree
Read:write > 5:1 → Pattern 01 (cache)
Write-heavy, >50K/s → Pattern 02 (queue)
Server must push to client → Pattern 03 (WS)
Global <100ms latency → Pattern 04 (CDN)
Spans multiple services → Pattern 05 (saga)
DB write + publish event → Pattern 06 (outbox)
Full-text search needed → Pattern 07 (ES)
Location / proximity → Pattern 08 (geo)
API abuse prevention → Pattern 09 (rate limit)
Unique IDs at scale → Pattern 10 (snowflake)
Different read/write models → Pattern 11 (CQRS)
Has downstream deps → Pattern 12 (circuit)

The Universal 45-Minute Structure

MIN 0–5
Clarify Requirements — You Drive This

Ask before drawing. "Who are the users? Scale — DAU, req/sec? Read vs write ratio? Latency SLA — p50 or p99? Consistency requirements? Global or single region? Mobile or web?" Never design before you know this. Interviewers reward engineers who discover requirements.

MIN 5–10
Capacity Estimation + Bottleneck ID

Back-of-envelope math shows engineering judgment. "100M users, 10% DAU = 10M/day. 1 post each = 115 writes/sec. 20× read:write = 2300 reads/sec." Then ask: is this CPU-bound, memory-bound, I/O-bound, or network-bound? That determines the architecture.

MIN 10–20
High-Level Design — APIs + Core Flow

Define 2–3 key APIs first. Draw the 5-layer diagram. Identify which of the 12 patterns this maps to. Stay at one abstraction level — no class fields yet. Explicitly state what you're skipping: "I'll skip auth for now and focus on the write path."

MIN 20–35
Deep Dive — The Hardest Component

Ask "Where should I go deep?" or pick it yourself. Show internals: why Cassandra for messages (LSM-tree, write-optimized), why Redis ZSET for feeds (sorted by score, O(log n) insert), why Kafka for fan-out (durable, replay). This is where you differentiate.

MIN 35–45
Trade-offs + Bottlenecks + Wrap-Up

Proactively surface limitations before being asked. "One bottleneck here is celebrity fan-out — I'd handle it with hybrid push/pull." State trade-offs clearly. End: "What aspect would you like me to go deeper on?" — shows you're comfortable driving.

Pattern Recognition Questions

Is it read-heavy? → Cache-aside, read replicas, CDN
Is it write-heavy? → Queue + async, LSM-tree DB, sharding
Does it need real-time push? → WebSocket + pub-sub
Does it span multiple services? → Saga + Outbox
Does it need full-text search? → ES sync + inverted index
Does it need location queries? → Geohash + Redis GEO
Does it need global low latency? → CDN + GeoDNS + multi-region
Separate read/write models needed? → CQRS
Downstream dependency risk? → Circuit breaker + bulkhead
Unique IDs at massive scale? → Snowflake ID or UUID v7

Numbers Every Engineer Memorizes

OperationLatency
L1 cache hit~0.5 ns
L2 cache hit~5 ns
RAM read~100 ns
SSD read~16 μs
Redis GET~0.5 ms
DB query (indexed)~5–20 ms
CDN cache hit~5 ms
Network (same DC)~500 μs
Network (cross-region)~80–150 ms
1B req/day~11,500 req/sec

REST — Core Rules

  • Resource URLs, not verbs: /users/42/orders not /getUserOrders
  • HTTP verbs: GET (read), POST (create), PUT (replace), PATCH (partial), DELETE (remove)
  • Idempotent: GET, PUT, DELETE are idempotent. POST is not.
  • Status codes: 200 OK, 201 Created, 204 No Content, 400 Bad Request, 401 Unauth, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Rate Limited, 500 Server Error
  • Versioning: /api/v1/users — URL versioning is most visible and easiest to route
  • HATEOAS: include links to related actions in responses — reduces client coupling

Pagination Strategies

  • Offset: ?page=2&limit=20 — simple but inconsistent when records are inserted/deleted during pagination
  • Cursor: ?cursor=eyJpZCI6NDJ9&limit=20 — stable, works with real-time data. Cursor = opaque encoded value (base64 of last-seen ID)
  • Keyset: ?after_id=42&limit=20 — fastest for indexed columns, no OFFSET skip cost
  • Use cursor/keyset for any feed or list that receives new items while paginating
  • Always include: has_more: bool, next_cursor, total_count (optional — expensive)

Idempotency for Payments

  • Client sends unique key with every POST: Idempotency-Key: uuid-v4
  • Server: check Redis for key → if exists, return cached response → else process + cache
  • Cache response in Redis with 24h TTL keyed by idempotency key
  • On retry: return identical cached response — no duplicate charge
  • Used by: Stripe, PayPal, Braintree for all payment mutations

Auth Patterns

  • API Key: server-side secret, no expiry mechanism — best for server-to-server
  • JWT Bearer: stateless, self-contained claims, verify without DB. Short expiry (15min) + refresh token (7 days).
  • OAuth 2.0: delegated authorization — user grants app access to their data on another service
  • mTLS: both client and server present certs — zero-trust internal service mesh
  • HMAC: sign request body with shared secret — webhooks, S3 presigned URLs

REST vs GraphQL vs gRPC vs WebSocket

CriterionRESTGraphQLgRPCWebSocket
ProtocolHTTP/1.1–2HTTP/1.1–2HTTP/2TCP (WS upgrade)
FormatJSON/XMLJSONProtobuf (binary)Any (JSON/binary)
PerformanceGoodGoodExcellent (5-10× REST)Excellent (persistent)
CachingNative HTTP cacheComplex (POST)No HTTP cacheNo
Type safetyOpenAPI/SwaggerSchema enforcedProtobuf — compile-timeNone by default
StreamingSSE onlySubscriptionsNative bidirectionalNative full-duplex
Browser supportUniversalUniversalNeeds grpc-web proxyUniversal
Best forPublic APIs, CRUDMobile, multi-client appsInternal microservicesReal-time bidirectional
// gRPC Service Definition (Protobuf) syntax = "proto3"; service UserService { rpc GetUser(GetUserRequest) returns (User); // unary rpc ListUsers(ListRequest) returns (stream User); // server streaming rpc BulkCreate(stream CreateReq) returns (BatchResult); // client streaming rpc Chat(stream ChatMsg) returns (stream ChatMsg); // bidirectional } message User { string id = 1; // field numbers must NEVER change (wire format) string name = 2; string email = 3; google.protobuf.Timestamp created_at = 4; } // Auto-generates typed client + server code in Go, Java, Python, Rust, Node... // 5-10× faster than REST/JSON. Use for internal microservices. // REST for public APIs — universal client support + HTTP caching.

These now point to topic-matched resources instead of generic or mismapped videos. Watch in this order for the fastest ramp from framework → core primitives → real designs.

ByteByteGo · Alex Xu
System Design Interview — Step By Step Guide
The exact framework used in real interviews: requirements → estimation → HLD → deep dive → wrap-up. Watch this before any specific design.
START HERE · 12 min ↗
Gaurav Sen · ex-Google
Consistent Hashing | System Design
The best consistent hashing explanation on YouTube. Ring, virtual nodes, why it matters for sharding and CDN routing. Essential concept.
MUST WATCH · 12 min ↗
Gaurav Sen · ex-Google
Design WhatsApp — System Design
Full HLD walkthrough: WebSockets, message queue, Cassandra for messages, presence system. See how trade-offs are articulated verbally.
FULL DESIGN · 20 min ↗
Gaurav Sen · ex-Google
Design Twitter — Timelines at Scale
Fan-out on write vs fan-out on read. The hybrid approach for celebrities vs regular users. The definitive fan-out explanation on YouTube.
FAN-OUT · 22 min ↗
Gaurav Sen · ex-Google
Design Tinder — Geospatial Matching
A strong geospatial systems walkthrough. Useful for location indexing, candidate generation, ranking constraints, and realtime updates.
GEO + MATCHING · 18 min ↗
codeKarle
Design Uber — System Design Interview
Realtime location updates, dispatch, geohash-style proximity search, trip state machine, and pricing. Good for both HLD and verbal communication style.
REAL-TIME SYSTEMS · 35 min ↗
Gaurav Sen · ex-Google
Design YouTube — Video Platform at Scale
Use this for the storage + upload + CDN mental model: ingestion path, transcoding, chunking, and edge delivery. Pair it with Day 23 in this roadmap.
MEDIA PIPELINE · 20 min ↗
Gaurav Sen · ex-Google
Design Netflix / Hotstar — Video Streaming
A better companion for CDN-heavy streaming systems: edge caching, segment delivery, bandwidth adaptation, and global read-heavy architecture.
STREAMING + CDN · 20 min ↗
HelloInterview · Free Written Course
System Design in a Hurry — Full Guide
Built by FAANG hiring managers. Covers every topic in this 30-day roadmap with interview rubric context. The best written companion to the videos.
FREE GUIDE — BOOKMARK ↗

How To Study This Like A Senior Engineer

DON'T JUST MEMORIZE BOXES

Daily Study Loop

  • Pass 1: understand the user journey and write path before any component names.
  • Pass 2: estimate scale: QPS, peak traffic, data growth, latency target, read:write ratio.
  • Pass 3: draw the minimum viable architecture that works for day-1 scale.
  • Pass 4: harden it with cache, queue, partitioning, retry, idempotency, and observability.
  • Pass 5: say what fails first and what you would scale next.

What Interviewers Actually Score

  • Requirement clarity: did you separate must-have from nice-to-have?
  • Prioritization: did you solve the hardest path first instead of drawing everything equally?
  • Trade-offs: did you explain why a queue, cache, shard key, or datastore is justified?
  • Failure awareness: did you mention hot keys, retries, duplicate events, partial writes, and cache inconsistency?
  • Communication: can a teammate implement your design without guessing your intent?

Staff-Level Signal

  • Start with what can go wrong, not only what works in the happy path.
  • Name the operational knobs: TTL, partition count, batch size, retry budget, timeout budget.
  • Call out migration strategy: how you would move from monolith to queue, single DB to sharded DB, sync to async.
  • Tie design to business constraints: cost, compliance, latency, regional isolation, consistency tolerance.
  • End with observability: metrics, logs, traces, dashboards, and alerts for the critical path.

Capacity Estimation Shortcuts

  • RPS: daily requests / 86400, then multiply by 3-5 for peak.
  • Storage: writes per day × object size × retention period.
  • Bandwidth: RPS × response size. This usually exposes CDN and cache needs quickly.
  • Memory for cache: hot working set × object size × overhead factor.
  • Partitions: choose based on write throughput, ordering boundaries, and rebalancing blast radius.

Answer Template For Full Designs

1. Clarify scope and non-functionals. 2. Estimate scale and traffic shape. 3. Define APIs and core entities. 4. Draw the write path. 5. Draw the read path. 6. Deep dive into the bottleneck component. 7. Cover failure modes, data consistency, and scaling plan. 8. Wrap with trade-offs and what you would ship first.

If you practice every day using this sequence, your answers stop sounding memorized and start sounding like engineering judgment.

Fast Understanding Hub

LEARN FLOWS, NOT BOXES

The 7 Flows To Master

  • Read flow: client → edge → cache → DB → response
  • Write flow: client → validation → primary DB → cache invalidation → async fan-out
  • Realtime flow: client connects → state tracked → events fan out → replay on reconnect
  • Search flow: primary write → outbox/CDC → search index sync → query + rank
  • Media flow: upload → object store → async processing → CDN distribution
  • Analytics flow: event ingest → queue/log → stream/batch aggregation → OLAP store
  • Failure flow: timeout → retry/backoff → circuit breaker → fallback/degrade gracefully

How Good Engineers Read A Design

  • Start at the user action: what exact request kicked off this system path?
  • Follow state changes: where is truth stored, cached, replicated, and replayed?
  • Ask where ordering matters: globally, per user, per chat, per partition?
  • Ask where duplication happens: retries, queues, consumers, webhooks, payments.
  • Ask where backpressure appears: queues, DB pool, worker pool, cache miss storms.

What To Memorize vs Understand

  • Memorize: common patterns, latency orders of magnitude, core datastore trade-offs.
  • Understand: why each component exists and what breaks if removed.
  • Memorize: standard request/read/write/replay flows.
  • Understand: consistency, scaling, partitioning, and operational failure modes.
  • Goal: explain any diagram left-to-right in plain English without staring at notes.

Universal Read / Write / Async Model

READ PATH Client → CDN/Edge → LB/Gateway → Service → Redis ↓ miss Primary DB ↓ populate cache → respond WRITE PATH Client → Gateway → Service → Primary DB commit ↓ invalidate/update cache ↓ publish event to queue ASYNC PATH Queue/Log → Consumer → side effects → search index → notifications → analytics → downstream services

Almost every design in this course is some variation of these three paths. Once this clicks, system design stops feeling like 100 unrelated problems.

The 10 Questions To Ask For Any Design

  • What is the write path?
  • What is the read path?
  • What is the source of truth?
  • What can be stale, and for how long?
  • Where do we cache, and how do we invalidate?
  • What must be synchronous vs asynchronous?
  • How do we partition data and traffic?
  • Where can duplicates happen, and how do we make operations idempotent?
  • What fails first at 10x traffic?
  • What metric would wake you up at 3 AM?

System Families

  • CRUD systems: user/account/order management
  • Feed systems: timelines, notifications, recommendations
  • Realtime systems: chat, presence, dispatch, collaboration
  • Search systems: query + retrieval + ranking
  • Pipeline systems: upload, transcode, ETL, ML inference

Primary Bottlenecks

  • Read-heavy: cache misses, DB fan-out, replication lag
  • Write-heavy: WAL/compaction pressure, lock contention, queue depth
  • Realtime: connection fan-out, state tracking, replay on reconnect
  • Global: cross-region latency, consistency, data residency
  • Burst traffic: thundering herd, hot partitions, token starvation

Fast-Learning Sequence

  • Study the flow first
  • Name the pattern second
  • Choose storage third
  • Add failure handling fourth
  • Add observability last

This order makes designs easier to remember because it follows causality, not tool categories.

WEEK 01 Foundations — Scaling, Caching, API Design, CAP FIRST PRINCIPLES
DAY 01 Scaling Fundamentals — Vertical, Horizontal, Scale Cube Concept

Core Concepts

  • Vertical scaling: add CPU/RAM to one machine. Simple, no code change. Hard ceiling ~96 cores. SPOF.
  • Horizontal scaling: add more machines. Requires stateless design, service discovery, load balancing.
  • Stateless services: no local session — store state in Redis/DB so any instance can serve any request.
  • Scale Cube (AKF): X=clone (horizontal), Y=functional decompose (microservices), Z=data partition (sharding)
  • Bottleneck first: before scaling, profile — CPU, memory, I/O, or network? Scale the actual bottleneck.

When to Scale What

  • CPU bound → vertical first, then horizontal with load balancer
  • Memory bound → vertical or offload to Redis/external cache
  • I/O bound → read replicas, caching, async processing
  • Network bound → CDN, edge caching, compression
  • DB bound → connection pooling, read replicas, sharding
  • Stateful service → externalize state first, then scale horizontally

Key Trade-offs

Vertical: operationally simple, no distributed system complexity, but hard ceiling and SPOF. Horizontal: theoretically unlimited, fault tolerant, but requires distributed consensus, session externalization, consistent hashing for data locality. Most production systems do both: vertical for the database, horizontal for stateless API tiers.

Interview Approach

When asked "how do you scale X?": first identify the bottleneck. Don't say "add more servers" — say "I'd first profile to identify whether we're CPU, memory, or I/O bound. For the stateless API tier I'd scale horizontally behind an ALB. For the database I'd start with read replicas and connection pooling, escalating to sharding only when reads on replicas saturate."

DAY 02 Load Balancing — Algorithms, Layers, Health Checks Infra

L4 vs L7 Load Balancing

  • L4 (Transport): routes by IP + TCP port, no packet inspection — fastest, no TLS termination
  • L7 (Application): routes by HTTP headers, path, cookie — enables path-based routing, A/B testing, auth
  • AWS NLB = L4. AWS ALB = L7. HAProxy does both. Nginx = L7 (can do L4 with stream module).
  • TLS termination at LB: offloads crypto from app servers, enables header inspection

Algorithms

  • Round Robin: simple, equal distribution. Bad if requests vary in cost.
  • Least Connections: send to server with fewest active connections. Best for variable-cost requests.
  • IP Hash: same client → same server (sticky). Breaks with proxies/NAT.
  • Consistent Hashing: minimize reshuffling when servers added/removed. Best for cache locality.
  • Weighted: canary deployments — send 5% to new version.

Health Checks + Connection Draining

Active health checks: LB polls /health endpoint every 5s. Passive: detect failures from response codes. Connection draining: when removing a server, LB stops sending new connections but lets in-flight requests finish (30–60s grace period). Critical for zero-downtime deployments.

Common Interview Gotcha

Sticky sessions (session affinity) break horizontal scaling — if a server dies, all its sessions are lost. The correct answer: externalize sessions to Redis, make the service stateless, and use round-robin or least-connections. Only use sticky sessions when you can't externalize state (legacy systems).

DAY 03 Caching — Strategies, Invalidation, Redis Internals Caching

Caching Strategies

  • Cache-aside (Lazy): app checks cache first → miss → read DB → populate cache. Most common.
  • Read-through: cache fetches from DB on miss automatically. App only talks to cache.
  • Write-through: write to cache AND DB synchronously. Consistent, but doubles write latency.
  • Write-behind (Write-back): write to cache only, flush to DB async. Fast writes, risk of data loss on crash.
  • Refresh-ahead: proactively refresh cache before TTL expires on popular keys.

Redis Data Structures

  • String: counters, session tokens, simple cache values
  • Hash: user profile fields — partial updates without fetching all
  • List: message queues, activity feeds (LPUSH/RPOP)
  • Set: unique visitors, tags, friend lists (union/intersection)
  • Sorted Set (ZSET): leaderboards, rate limiting, priority queues
  • HyperLogLog: approximate unique count — 12KB for billions of items
  • Streams: durable event log — lighter-weight Kafka alternative

Cache Invalidation — "Two Hard Problems"

TTL-based: simple, eventual consistency acceptable. Event-driven: write → Kafka event → cache invalidation consumers. More consistent but complex. Versioned keys: user:42:v3 — change version on update, old key orphans and expires. Cache stampede (thundering herd): many requests miss simultaneously and hammer DB. Solutions: mutex lock (only one request rebuilds), probabilistic early expiry (PER algorithm), request coalescing.

Multilevel Caching

L1 = in-process memory (microseconds, limited size, per-instance). L2 = Redis cluster (milliseconds, shared, bounded). L3 = DB read replica. Mention hot keys: a single extremely popular key (celebrity post) can overwhelm one Redis shard — solutions: local replica per app server, key sharding by appending suffix.

DAY 04 CAP Theorem + Consistency Models + PACELC Theory

CAP Theorem

  • Consistency: every read sees the most recent write
  • Availability: every request gets a (non-error) response
  • Partition Tolerance: system continues despite network splits
  • Reality: P always occurs in distributed systems → choose C or A during partition
  • CP systems: HBase, ZooKeeper, etcd, MongoDB (strong mode)
  • AP systems: Cassandra, DynamoDB, CouchDB, Redis (default)

Consistency Models (Spectrum)

  • Linearizability: single-node illusion — reads always see latest write
  • Sequential: all ops appear in some consistent order
  • Causal: causally related operations appear in order
  • Read-your-writes: you always see your own writes (session)
  • Monotonic reads: once you read a value, you won't see older values
  • Eventual: all replicas converge if no new writes — lowest consistency

PACELC — More Practical Than CAP

PACELC extends CAP: "If Partition (P), choose between Availability (A) and Consistency (C). Else (E) — even without partition — choose between Latency (L) and Consistency (C)." Most distributed systems trade off latency vs consistency ALL the time, not just during partitions. DynamoDB: PA/EL (available, low latency). Spanner: PC/EC (consistent, higher latency). Understanding PACELC shows deeper distributed systems thinking.

Interview Signal

Never just say "I'd use CAP theorem." Say: "For a payment system, I need CP — a failed transaction is better than a duplicate credit. I'd accept reduced availability during network partitions and use circuit breakers to fail fast. For a social media feed, AP is fine — showing a slightly stale post is acceptable, and I'd use eventual consistency with short TTLs."

DAY 05 Rate Limiting — Algorithms, Distributed Implementation Resilience

Algorithms Compared

  • Fixed Window: count resets at interval boundary. Simple. Burst attack possible at boundary edges.
  • Sliding Window Log: log timestamps of each request in Redis sorted set. Precise. Memory = O(requests in window).
  • Sliding Window Counter: interpolate between current and previous window counts. Memory efficient, ~0.003% error rate.
  • Token Bucket: bucket refills at rate R, max burst = bucket size. Allows controlled bursting. AWS API Gateway uses this.
  • Leaky Bucket: requests queued, processed at constant rate. Smooths traffic. Queue fills under burst — requests dropped.

Distributed Rate Limiting

  • Single Redis node: atomic Lua script — INCR + EXPIRE in one round-trip
  • Redis Cluster: consistent hash user_id to shard — same user always hits same node
  • Sliding window with ZSET: ZADD + ZREMRANGEBYSCORE + ZCARD atomically
  • Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After
  • Return HTTP 429 Too Many Requests with Retry-After header
  • Different limits by tier: free (100/min), pro (1000/min), enterprise (unlimited)
Distributed Rate Limiter Architecture: Client → [API Gateway] → [Rate Limiter Middleware] ↓ [Redis Cluster — keyed by user_id:endpoint:window] MULTI INCR rate:user123:POST/api/v1/posts:1706745600 EXPIRE rate:user123:POST/api/v1/posts:1706745600 60 EXEC if count > limit → return 429 + Retry-After: 47 else → forward to service
DAY 06 CDN, DNS, Proxies — Full Request Flow Networking

CDN Internals

  • Edge PoPs: globally distributed cache nodes — serve content from nearest geography
  • Cache-Control headers: public, max-age=86400 tells CDN to cache for 1 day
  • Pull CDN: first request goes to origin, CDN caches response. Most common.
  • Push CDN: upload assets proactively to CDN edge nodes. Better for static files that don't change.
  • Origin Shield: intermediate layer between CDN edges and origin — collapses cache misses from multiple PoPs into one origin request. Reduces origin load ~90%.
  • Cache invalidation: URL versioning (app.v3.js) or API purge (CDN purge API on deploy).

DNS + Proxies

  • DNS resolution chain: Browser cache → OS cache → Recursive resolver → Root NS → TLD NS → Authoritative NS
  • TTL strategy: low TTL (60s) before blue-green switch, high TTL (3600s) normal operation
  • GeoDNS: return different IP based on client region — Route53 latency-based routing
  • Forward proxy: client-side (corp proxy, VPN) — client knows about it
  • Reverse proxy: server-side (Nginx, Cloudflare) — client doesn't know
  • API Gateway: reverse proxy + auth + rate limiting + routing + observability

Full Request Flow — Memorize This

Client → DNS lookup (~30ms first time, cached after) → CDN edge (cache HIT → ~5ms, done). If MISS → Load Balancer (~1ms) → API Gateway (auth, rate limit, ~5ms) → Service → Cache check (Redis HIT ~1ms, done). If MISS → Database query (~5–20ms) → populate cache → respond. Total p50: 50–100ms. p99 bottleneck is almost always the DB or a missing cache layer.

DAY 07 API Design — REST, GraphQL, gRPC, WebSocket, Best Practices API Design

REST — Principles & Best Practices

  • Resource-based URLs: nouns not verbs — /users/42/orders not /getUserOrders
  • HTTP verbs: GET (read), POST (create), PUT (replace), PATCH (partial update), DELETE (remove)
  • Status codes: 200 OK, 201 Created, 204 No Content, 400 Bad Request, 401 Unauth, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Too Many, 500 Server Error
  • Idempotency: GET/PUT/DELETE are idempotent. POST is not. Use idempotency keys for POST payments.
  • HATEOAS: responses include links to related actions — reduces client coupling

GraphQL vs REST vs gRPC

  • REST: simple, cacheable, universal client support. Overfetch/underfetch problem.
  • GraphQL: client specifies exact fields needed — eliminates over/under-fetch. Single endpoint. Complex queries, no HTTP caching by default. Best for: frontend-heavy apps, mobile (save bandwidth), multiple clients with different data needs.
  • gRPC: binary protocol (Protobuf), strongly typed, bidirectional streaming, HTTP/2. 5-10x faster than REST/JSON. Best for: internal microservice-to-microservice, streaming data, polyglot services.
  • WebSocket: persistent full-duplex. Use when server must push to client in real time.
REST API Design — URL Structure GET /api/v1/users → list users (paginated) POST /api/v1/users → create user GET /api/v1/users/{id} → get user by ID PATCH /api/v1/users/{id} → partial update DELETE /api/v1/users/{id} → delete user GET /api/v1/users/{id}/orders → get user's orders (nested resource) POST /api/v1/orders → create order GET /api/v1/orders?status=pending&limit=20&cursor=abc123 → filtered + paginated Pagination strategies: Offset: ?page=2&limit=20 → simple but inconsistent on inserts/deletes Cursor: ?cursor=eyJpZCI6NDJ9 → stable, works with real-time data changes Keyset: ?after_id=42&limit=20 → fast for indexed columns, no skip

API Versioning Strategies

  • URL versioning: /api/v1/users — most visible, easy to route. Industry default.
  • Header versioning: Accept: application/vnd.api+json;version=2 — cleaner URLs, harder to test in browser
  • Query param: /api/users?version=2 — simple but pollutes query space
  • Never break existing clients — add fields, never remove/rename
  • Deprecation: return Sunset header with deprecation date
  • Maintain N-1 versions minimum. Remove after migration period.

Idempotency & Safety

  • Idempotency key: client sends unique key with POST request (UUID)
  • Server caches response keyed by idempotency key for 24h
  • On retry: return cached response — no duplicate charge/order
  • Stripe, PayPal, Braintree all use this pattern for payments
  • Key header: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
  • Store key → response in Redis with TTL. Check before processing.

API Authentication & Authorization

  • API Key: simple, server-side secret. No expiry mechanism. Best for server-to-server.
  • JWT (Bearer token): stateless, self-contained claims. Verify signature without DB call. Short expiry (15min) + refresh token (7 days).
  • OAuth 2.0: delegated authorization. User grants app access to their data on another service. Authorization Code flow for web, PKCE for mobile/SPA.
  • mTLS: mutual TLS — both client and server present certificates. For internal service-to-service in zero-trust networks.
  • HMAC signatures: sign request body with shared secret — webhooks, S3 presigned URLs.

API Error Design

  • Return structured errors — not just HTTP status code
  • Include: error code (machine-readable), message (human), details array, request_id for tracing
  • Never expose stack traces or internal DB errors to clients
  • Use RFC 7807 Problem Details standard for REST APIs
  • Consistent envelope: {"error":{"code":"RATE_LIMITED","message":"...","retry_after":47}}
  • Distinguish client errors (4xx) from server errors (5xx) — client errors are caller's fault
gRPC Service Definition (Protobuf): syntax = "proto3"; service UserService { rpc GetUser(GetUserRequest) returns (User); // unary rpc ListUsers(ListUsersRequest) returns (stream User); // server streaming rpc CreateUsers(stream CreateUserRequest) returns (BatchResult); // client streaming rpc Chat(stream ChatMessage) returns (stream ChatMessage); // bidirectional } message User { string id = 1; string name = 2; string email = 3; google.protobuf.Timestamp created_at = 4; } // gRPC: HTTP/2 + Protobuf = 5-10x faster than REST/JSON // Auto-generates client/server code in Go, Java, Python, Rust, etc. // Use for: internal microservices. REST for: public APIs.

API Design Decision Framework

Public API or 3rd party developers? → REST. Universally understood, easy to test with curl, HTTP caching works naturally.
Mobile app with bandwidth concerns? → GraphQL. Client specifies exactly what fields it needs — no over-fetching.
Internal microservices, high throughput? → gRPC. Binary protocol, streaming support, strongly typed contracts catch breaking changes at compile time.
Real-time bidirectional? → WebSocket (full-duplex persistent) or SSE (server-push only, simpler).
Multiple frontend clients (web, mobile, IoT) with different data needs? → GraphQL with a BFF (Backend for Frontend) layer.

API Design Interview Tips

Always define the API before drawing the architecture. Walk through: (1) What are the key resources? (2) What operations does each resource support? (3) What are the access patterns — who calls what, how often? (4) What are the consistency requirements for each endpoint? Showing API-first thinking is a strong signal — it means you think about contracts and client usability, not just internal implementation.

🎯 Week 1 Checkpoint

WEEK 02 Storage Deep Dives — SQL, NoSQL, Sharding, Replication DATA LAYER
DAY 08 SQL vs NoSQL — Decision Framework Storage

SQL Strengths

  • ACID transactions — critical for financial systems, inventory, reservations
  • Complex joins and aggregations across normalized data
  • Schema enforcement = data integrity at DB level
  • Mature tooling: EXPLAIN plans, index advisors, migrations
  • Strong consistency by default
  • Best for: payments, e-commerce orders, user accounts, ERP, anything relational

NoSQL Categories

  • Key-Value (Redis, DynamoDB): O(1) get/set, simple access patterns, no joins. Best for: sessions, cache, counters.
  • Document (MongoDB, Firestore): flexible schema, nested JSON. Best for: catalogs, CMS, user profiles.
  • Wide-Column (Cassandra, HBase): write-optimized LSM tree, massive scale, tunable consistency. Best for: time-series, IoT, audit logs.
  • Graph (Neo4j, Amazon Neptune): relationship traversal. Best for: social networks, fraud detection, recommendations.
  • Search (Elasticsearch, OpenSearch): inverted index, full-text + faceted search.

Decision Framework — Never "I'd use MongoDB because it's flexible"

Ask: (1) Do I need multi-entity ACID transactions? → PostgreSQL. (2) Do I need complex multi-table joins? → SQL. (3) Is write throughput > 50K/s with simple access patterns? → Cassandra/DynamoDB. (4) Flexible schema, document-centric reads, no complex joins? → MongoDB. (5) Cache or session? → Redis. (6) Full-text search? → Elasticsearch alongside primary DB. (7) Graph traversal? → Neo4j. The key signal: which queries will this DB serve, and can it serve them efficiently?

DAY 09 Database Internals — B-Tree, LSM Tree, MVCC, WAL Internals

B-Tree (PostgreSQL, MySQL)

  • Self-balancing tree, O(log n) reads and writes
  • Pages: typically 8KB, tuned for disk block size
  • Good for mixed read/write workloads
  • Random writes cause write amplification (update internal nodes)
  • In-place updates — good for read-heavy workloads

LSM Tree (Cassandra, RocksDB, LevelDB)

  • Write to in-memory MemTable first — O(1) sequential write
  • MemTable flushed to immutable SSTable on disk when full
  • Compaction merges SSTables — removes tombstones, sorts keys
  • Bloom filter per SSTable: skip SSTables that don't contain key
  • O(1) writes, O(log n) reads. Better for write-heavy workloads.
  • Write amplification during compaction. Read amplification without bloom filter.

MVCC + WAL — How PostgreSQL Works

MVCC (Multi-Version Concurrency Control): each write creates a new row version with a transaction timestamp. Readers see a snapshot at their transaction start time — no blocking. Old versions cleaned up by VACUUM. Enables non-blocking reads alongside concurrent writes.

WAL (Write-Ahead Log): every change is written to the WAL log before it's applied to data pages. On crash: replay WAL to recover. Also enables streaming replication — ship WAL segments to replicas. PostgreSQL logical replication: replays WAL as SQL changes, enabling cross-version replication and CDC (Debezium).

DAY 10 Sharding — Strategies, Hotspot Prevention, Consistent Hashing Scalability

Sharding Strategies

  • Hash sharding: hash(key) % N → shard. Uniform distribution, no range queries across shards.
  • Range sharding: shard by key range (A-M, N-Z). Range queries efficient. Hotspot risk if data is skewed.
  • Directory sharding: lookup table maps key → shard. Most flexible, single point of failure.
  • Geo sharding: shard by region/country. Data locality, GDPR compliance.
  • Consistent hashing: hash ring — adding/removing nodes minimizes key movement. Used by DynamoDB, Cassandra.

Consistent Hashing Deep Dive

  • Place nodes on hash ring (0 to 2³²)
  • Key → hash → find nearest node clockwise on ring
  • Add node: only neighbors' keys move (≈ k/n keys)
  • Remove node: only that node's keys move to next node
  • Virtual nodes (vnodes): each physical node = 100-200 points on ring → better, even distribution even with heterogeneous hardware
  • Used by: DynamoDB, Cassandra, Redis Cluster, CDN cache routing

Shard Key Selection — The Most Critical Decision

Bad shard key = hotspots = one shard does all work while others are idle. For social network: user_id hash works for regular users but celebrities cause hot shards. Solution: celebrity detection + separate handling (fan-out on read for celebrities). For time-series: time-based shard key causes write hotspot on current time shard. Solution: composite key (time + device_id). Rule: choose a key with high cardinality AND even distribution across your access patterns. Avoid sequential keys (auto-increment IDs) — they create monotonic hotspots.

DAY 11 Replication — Sync vs Async, Leader Election, Quorums Availability

Replication Models

  • Single-leader: all writes to primary, replicated to read replicas. Simple. Primary is write bottleneck.
  • Multi-leader: writes accepted at multiple nodes. Conflict resolution required. Useful for multi-datacenter active-active.
  • Leaderless (Dynamo-style): any node accepts writes. Quorum-based consistency.
  • Sync replication: write confirmed only after replica ACKs. Durable but adds latency.
  • Async replication: write confirmed immediately, replica catches up. Low latency but possible data loss on primary crash.
  • Semi-sync: at least one replica must ACK. Balance between durability and latency. MySQL default.

Quorum Reads/Writes

  • n = total replicas, w = write quorum, r = read quorum
  • Strong consistency: w + r > n (overlap guarantees you read the latest write)
  • Typical: n=3, w=2, r=2 — tolerates 1 failure, strongly consistent
  • Write-heavy: w=1, r=3 (fast writes, consistent reads)
  • Read-heavy: w=3, r=1 (durable writes, fast reads)
  • Availability-first: w=1, r=1 (AP system — eventual consistency)
  • Cassandra: tunable per query — LOCAL_QUORUM for multi-DC consistency

Leader Election — How etcd/ZooKeeper Handle It

Raft consensus: nodes time out waiting for heartbeat → become candidate → request votes → win majority → become leader. Election timeout is randomized (150–300ms) to prevent split votes. Fencing tokens prevent split-brain: new leader issues higher token; storage layer rejects writes with lower token. PostgreSQL failover (Patroni): uses etcd/Consul for leader election, switches DNS/VIP to new primary. Zero-downtime failover in ~30s.

DAY 12–13 Indexing Deep Dive + Query Optimization Performance

Index Types

  • B-Tree (default): equality + range queries. Works with LIKE 'prefix%'.
  • Hash: equality only — slightly faster for exact match, no range support
  • Composite: (col_a, col_b, col_c). Leftmost-prefix rule: must include leading columns in query to use index.
  • Covering index: query fully served from index without table fetch (index includes all selected columns)
  • Partial index: index on subset of rows. WHERE active = true — smaller, faster for frequent filtered queries.
  • Full-text: inverted index with TF-IDF ranking. Use Elasticsearch for serious full-text search.
  • GIN / GiST: PostgreSQL indexes for arrays, JSONB, full-text, geometric types.

Query Optimization Signals

  • EXPLAIN ANALYZE: look for Seq Scan on large tables → missing index
  • Index selectivity: high cardinality columns make better indexes (email > gender)
  • N+1 query problem: 1 query for list + N queries for each item. Fix with JOIN or batch load.
  • Connection pooling: PgBouncer in transaction mode — 10K app connections → 100 real DB connections
  • Read replicas for analytics/reporting — separate from OLTP primary
  • Avoid SELECT * — select only needed columns (reduces I/O, enables covering indexes)
  • Pagination: use keyset pagination (WHERE id > last_id) not OFFSET for large tables

🎯 Week 2 Checkpoint

WEEK 03 Messaging, Event-Driven Architecture, Real-Time Systems EVENT-DRIVEN
DAY 15 Kafka Internals — Partitions, Offsets, Consumer Groups, EOS Messaging

Kafka Architecture

  • Topic → Partitions (ordered, immutable append-only log)
  • Offset: per-partition monotonic counter. Consumer tracks its position.
  • Consumer Group: each partition consumed by exactly one consumer in the group
  • Replication factor: ISR (In-Sync Replicas) — leader + N replicas
  • Leader election via KRaft (Kafka Raft — replaces ZooKeeper since Kafka 3.3)
  • Retention: time-based (7 days default) or size-based
  • Log compaction: keep only latest value per key — good for state stores

Performance Tuning Knobs

  • Partition count: parallelism ceiling for consumers — can't have more consumers than partitions
  • acks=0: fire and forget (fastest, data loss). acks=1: leader ack. acks=all: all ISR ack (slowest, most durable).
  • linger.ms + batch.size: batch small messages — increases throughput, adds latency
  • compression.type: lz4 (speed) or zstd (ratio). Apply at producer level.
  • max.poll.interval.ms: if consumer doesn't poll in time → rebalance triggered
  • Partition key: determines which partition a message lands on — use user_id for ordering per user

Exactly-Once Semantics (EOS)

Kafka guarantees: at-least-once by default (consumer may process duplicate on crash). For exactly-once: (1) Idempotent producer: enable.idempotence=true — sequence numbers deduplicate retries at broker. (2) Transactional API: atomic produce to multiple topics + consumer offset commit in one transaction. In practice: idempotent consumers (dedup on DB side using unique constraint on idempotency key) are simpler and more resilient than Kafka EOS transactions.

Ordering Nuance

Kafka guarantees ordering per partition only — not across partitions. For strict ordering of all events for a user: partition by user_id → all events for that user land in same partition in order. Cross-user ordering is not guaranteed. For globally ordered events (rare): use a single partition (loses parallelism) or use application-level sequence numbers.

DAY 16 Distributed Transactions — 2PC, Saga, Outbox Pattern Consistency

2-Phase Commit (2PC)

  • Coordinator: send PREPARE → all participants vote YES/NO
  • If all YES → send COMMIT to all. If any NO → send ABORT.
  • Blocking protocol: if coordinator crashes after PREPARE, participants are stuck (holding locks)
  • Used in distributed SQL (Spanner uses 2PC + Paxos). Avoid in microservices — too much coupling.
  • 3-Phase Commit (3PC) adds pre-commit phase to reduce blocking — rarely used in practice

Saga Pattern

  • Break distributed transaction into sequence of local transactions
  • Each step publishes an event that triggers the next service
  • On failure: compensating transactions undo previous steps in reverse order
  • Choreography: services react to events autonomously — no central coordinator. Simple, but harder to track overall state.
  • Orchestration: central saga orchestrator sends commands to each service. Easier to monitor and debug. Coupling to orchestrator.

Outbox Pattern — The Production-Grade Dual-Write Fix

Problem: write to DB and publish to Kafka — if Kafka publish fails after DB commit, event is lost. If you roll back on Kafka failure, you lose the DB write. This is the dual-write problem.
Solution — Outbox pattern: write to DB + Outbox table in same transaction (atomic). Debezium watches the WAL for Outbox table inserts and publishes to Kafka. Atomicity guaranteed by DB transaction. Eventual delivery guaranteed by CDC. This is the correct production approach for any service that writes to DB AND publishes events.

DAY 17 Real-Time Systems — WebSockets, SSE, Long Polling at Scale Real-Time

Protocol Comparison

  • Long Polling: client sends request → server holds until event or timeout → client immediately re-requests. Latency = server hold time. High connection overhead. Use only for legacy clients.
  • SSE (Server-Sent Events): persistent HTTP connection, server streams events to client. One-way (server → client only). HTTP/1.1 compatible. Good for: live feeds, notifications, progress bars.
  • WebSocket: full-duplex persistent TCP connection. Both sides can send at any time. Best for: chat, collaborative editing, gaming, live dashboards. Uses HTTP Upgrade handshake.
  • WebTransport (QUIC): emerging standard. Datagrams + streams over HTTP/3. Better for real-time gaming, video conferencing.

WebSocket at Scale

  • Go handles 100K+ concurrent WS connections per server with goroutines
  • Connection server is stateful — can't naively load balance (sticky sessions needed at WS tier)
  • Pub-sub backend: connection servers subscribe to Redis Pub-Sub or Kafka topics
  • Fan-out: message → Redis Pub-Sub → all connection servers for recipient's room/channel → push to client
  • Presence tracking: heartbeat every 15s → Redis key with 30s TTL. Key expiry = user offline.
  • Reconnect: client tracks last received sequence number → reconnect with offset → server replays missed messages
WebSocket Fan-Out Architecture: Client A ←→ [WS Server 1] ←→ [Redis Pub-Sub: room:123] Client B ←→ [WS Server 2] ←→ [Redis Pub-Sub: room:123] Client C ←→ [WS Server 1] ←→ [Redis Pub-Sub: room:123] Message from A → WS Server 1 → publish to Redis topic "room:123" Redis Pub-Sub → fan out to all WS servers subscribed to "room:123" WS Server 1 → push to Client C. WS Server 2 → push to Client B.
DAY 18–19 Observability — Metrics, Logs, Traces (The Three Pillars) Production

Metrics (Prometheus + Grafana)

  • RED method: Rate (requests/sec), Errors (error rate %), Duration (latency p50/p99)
  • USE method: Utilization, Saturation, Errors — for infrastructure resources
  • Histogram: p50, p90, p99 latency. Never use averages — they hide outliers.
  • Cardinality trap: avoid high-cardinality labels (user_id, request_id) — explodes Prometheus memory
  • SLI/SLO/SLA: SLI=measure (p99 latency), SLO=target (p99 < 200ms 99.9% of time), SLA=contractual guarantee

Distributed Tracing (Jaeger, Tempo)

  • Trace: tree of spans across services for one request
  • Span: single operation with start/end time, service name, tags
  • Propagate: W3C TraceContext header (traceparent) across service calls
  • Sampling: head-based (random %) vs tail-based (on error/slow). Tail-based catches all slow/error requests.
  • OpenTelemetry: vendor-neutral instrumentation standard. Collect once, export to Jaeger/Tempo/Datadog.

Structured Logging Best Practices

Always log JSON in production. Include: timestamp, level, service, trace_id, span_id, user_id (when applicable), message, and relevant context fields. Never log PII (emails, passwords, payment cards). Use log levels correctly: DEBUG (dev only), INFO (normal ops), WARN (unexpected but handled), ERROR (needs attention), FATAL (crash). Centralize with ELK (Elasticsearch + Logstash + Kibana) or Loki + Grafana. Correlate logs with traces via trace_id for end-to-end debugging.

🎯 Week 3 Checkpoint

Before Week 4 — How To Read Any Full System Design

WEEK 04 Real System Designs — HLD + LLD Full Walkthroughs FULL DESIGNS
DAY 22 Design WhatsApp — Messaging at Scale Full HLD+LLD
HLD: Client → [WebSocket Server Cluster] → [Message Service] ↓ [Kafka: msg-events topic] ↓ ↓ [Delivery Service] [Persistence Service] ↓ ↓ [Recipient WS conn] [Cassandra: messages by chat_id + timestamp] Presence: Redis SET user:{id}:online → 30s TTL, heartbeat every 15s Push Notif: if recipient offline → FCM/APNs via [Notification Service] Media: Client uploads to S3 directly → CDN URL stored in message

Key Design Decisions

  • Cassandra: partition by chat_id, cluster by timestamp DESC — efficient time-ordered reads per chat
  • Message ID: Snowflake (timestamp + server + seq) — globally unique, sortable, no coordination
  • E2E encryption: Signal Protocol — keys exchanged out-of-band, server stores only ciphertext
  • WebSocket server → stateful, use consistent hashing to route user's connections to same server
  • Message state machine: SENT → DELIVERED (ACK from recipient device) → READ (read receipt)
  • Offline delivery: store in Cassandra + push notification. On reconnect, fetch unread since last_seen_timestamp.

Group Messages — Fan-Out Strategy

  • Small groups (<100 members): fan-out on write — write message to each member's inbox
  • Large groups: fan-out on read — members pull from shared group inbox on open
  • Bounded fan-out: limit group size (WhatsApp: 1024) to cap write amplification
  • Group membership: cached in Redis, backed by PostgreSQL
  • Delivery receipt for groups: message delivered when ALL members receive (complex)
DAY 23 Design YouTube — Video Storage, Transcoding, CDN Full HLD+LLD
Upload Flow: Client → [Upload Service] → S3 (raw video) ↓ [Kafka: video-uploaded topic] ↓ [Transcoding Job Queue] ↓ [Worker Farm: FFmpeg containers (auto-scale)] ↓ [S3: segments in HLS format — 360p, 720p, 1080p, 4K] ↓ [CDN: distribute segments globally] View Flow: Client → CDN (HLS manifest .m3u8) → Adaptive Bitrate: pick quality based on bandwidth

Scale Numbers to Reason With

  • 500 hours of video uploaded per minute
  • 1 hour of video → ~5GB raw → ~1.5GB after compression across qualities
  • Transcoding: parallelize per video segment (5s chunks) — 1 hour video → 720 chunks → 720 parallel jobs
  • CDN: 95%+ of views served from edge — origin only serves cache misses
  • View count: approximate with Redis INCR per shard, aggregate periodically. Exact count: HyperLogLog for unique viewers.

Recommendation System (Offline ML)

  • Watch history + clicks → feature store → ML training pipeline (Spark)
  • Pre-compute recommendations per user nightly → store in Redis/DynamoDB
  • Real-time signals: last 10 watch actions → re-score with lightweight model
  • Cold start: new users get popular videos by category until enough signal
  • Candidate generation → ranking → filtering (already watched, policy) → serve
DAY 24 Design Uber — Real-Time Location, Matching, Surge Full HLD+LLD

Location Service

  • Driver sends location every 4 seconds via WebSocket
  • Geohash: encode lat/lng as string prefix (shorter string = larger cell)
  • Redis: GEOADD key lng lat driver_id. GEOSEARCH for radius queries.
  • Or: geohash prefix → Redis SET of driver IDs in that cell. Query neighboring cells.
  • Location history: write to Cassandra for trip replay and billing

Matching Service

  • Rider requests → query nearby available drivers (geosearch)
  • Score drivers: distance + rating + vehicle type + surge zone
  • Dispatch to top N drivers simultaneously (first to accept wins)
  • Distributed lock on driver during dispatch window (Redis SET NX PX 30000)
  • ETA: precomputed road graph (OSRM or Google Maps Platform)
  • Surge pricing: supply/demand ratio per geohash cell, multiplier applied at pricing service
Ride Request Flow: Rider app → POST /rides → [Ride Service] ↓ [Location Service: find nearby drivers via Redis GEO] ↓ [Matching Service: score + dispatch] ↓ (send offer to top 3 drivers) [Driver apps via WebSocket push] ↓ (first driver accepts) [Ride Service: create ride, lock driver, notify rider] ↓ [Kafka: ride-accepted → ETA service, billing service]
DAY 25 Design URL Shortener + Rate-Limited API Service HLD

URL Shortener

  • Short code generation: base62(auto-increment ID) — simple, unique, sortable
  • Alternative: MD5(url)[0:7] — collision check required, no guarantee of uniqueness
  • Redirect: 301 (permanent — browser caches, saves future requests) vs 302 (temporary — allows analytics tracking)
  • Storage: DynamoDB (short_code → long_url + metadata). 100M URLs/day → Cassandra for write throughput.
  • Cache hot redirects in Redis: top 10% of links → 90%+ of traffic
  • Analytics: log redirect events to Kafka → Spark/Flink pipeline → ClickHouse for querying
  • Custom domains: DNS CNAME to CDN → CDN routes to redirect service

Redis Cluster Architecture

  • Hash slots: 16384 slots distributed across nodes. CRC16(key) % 16384 → slot → node.
  • Each master has N replicas. Failure: replica auto-promotes (Sentinel or Cluster mode).
  • Hot key problem: single key too popular → one slot hot. Solution: replicate hot key across shards with suffix (user:42:0, user:42:1).
  • CLUSTER KEYSLOT cmd: check which slot a key maps to
  • Cross-slot transactions: not supported — redesign if needed (use Lua scripts within same slot)
DAY 26 Design Twitter/X Feed — Fan-Out Strategies Full HLD+LLD

Fan-Out on Write (Push)

  • On tweet: push tweet_id to all followers' feed caches (Redis ZSET, score=timestamp)
  • O(followers) work at write time. Read is O(1) — just fetch pre-built feed.
  • Celebrity problem: user with 50M followers = 50M Redis writes per tweet
  • Good for: regular users (<1M followers) — fast feed reads

Fan-Out on Read (Pull)

  • On feed load: fetch recent tweets from all followed users, merge + sort
  • Heavy reads but no write amplification for celebrities
  • N followed users → N DB queries → merge sort → paginate. Slow for users following many accounts.
  • Good for: celebrities — read cost is shared across followers, not pushed to write

Hybrid Approach — The Actual Twitter Answer

Classify users by follower count. Regular users (<1M followers): fan-out on write to Redis ZSET (score=timestamp). Celebrity users (>1M): on read, merge their recent tweets with the user's pre-built feed from Redis. This is what Twitter actually did. When you mention this unprompted, it signals you understand real production trade-offs, not just textbook patterns. Also discuss: feed truncation (keep only last 800 tweets in cache), backfill on first load, and how promoted tweets are injected.

DAY 27–28 LLD Deep Dives — Design Patterns + Concurrency LLD

LLD: Parking Lot System

  • Entities: ParkingLot, Level, Slot, Ticket, Vehicle (abstract), Car/Bike/Truck (concrete)
  • Strategy pattern: PricingStrategy interface — HourlyPricing, FlatRatePricing. Swap without changing Ticket.
  • Factory pattern: VehicleFactory.create(type) — returns correct Vehicle subclass
  • Observer pattern: SlotAvailabilityBoard subscribes to slot state changes
  • Concurrency: booking a slot → check availability + assign must be atomic. Use optimistic locking: slot has version field, CAS update fails if version changed.
  • State machine: AVAILABLE → RESERVED → OCCUPIED → AVAILABLE

LLD: Notification System

  • Entities: Notification, NotificationChannel (abstract), EmailChannel, SMSChannel, PushChannel
  • Strategy pattern: each channel implements send(notification). Plug in new channels without changing core.
  • Template method: BaseNotification.send() calls format() + deliver() — subclasses override format()
  • Queue-based delivery: write to Kafka topic per channel → consumers retry with exponential backoff
  • Idempotency: idempotency_key per notification → dedup on delivery side
  • Rate limiting per user per channel: prevent spam — max 10 emails/day per user

LLD Interview Signal

The gap between senior and mid-level in LLD is concurrency awareness. Don't just draw classes — say "this slot booking has a TOCTOU (time-of-check-time-of-use) race condition between the availability check and the reservation write. I'd handle this with optimistic locking — add a version field to the Slot entity. The update only succeeds if version matches what we read. On conflict, retry with fresh read." Proactively naming race conditions and their solutions is the signal interviewers are looking for.

🏆 Day 29–30: Full Mock System Design Sessions

WEEK 05 · OPTIONAL AI Agent System Design — The 2025 Frontier AI SYSTEMS

This week is bonus / no time pressure — every concept here builds on your existing system design foundation. AI agent design is now appearing at companies building AI products. All prior weeks remain fully intact.

DAY 31 AI Agent Patterns — ReAct, Plan-Execute, Reflection Architecture

ReAct Pattern (Reason + Act)

  • LLM alternates: Thought → Action (tool call) → Observation → repeat until done
  • Handles ambiguity well — can discover information dynamically
  • Risk: error compounds over long chains. Add max_steps hard limit.
  • Best for: research tasks, open-ended exploration, web navigation
  • Frameworks: LangGraph, LlamaIndex Agents

Plan-and-Execute Pattern

  • Planner LLM: generates full step-by-step task list upfront
  • Executor: smaller/cheaper model runs each step one at a time
  • Dramatically reduces cost — expensive Planner called once
  • More auditable — plan is a visible, loggable artifact
  • Best for: financial workflows, structured data pipelines, known task shapes
  • Add re-planning step if a step fails unexpectedly

ReAct vs Plan-Execute Trade-off

ReAct: flexible, handles unknown task shapes, discovers information dynamically. But error compounds — step 3 failure based on wrong step 1 reasoning can cascade. Plan-Execute: auditable, cost-efficient (Executor = small/cheap model), predictable — but brittle if plan is wrong at step 1. Production guideline (Anthropic): start with the simplest approach. Most production agent failures come from over-engineering, not under-engineering. Add complexity only when needed.

DAY 32 Agent Components — Memory, Tools, Context Engineering Deep Dive

Memory Architecture

  • In-context (working): current conversation + tool results in prompt window — fast but bounded (~128K tokens)
  • External long-term: vector DB (pgvector, Pinecone) — embed past interactions, retrieve relevant on each turn
  • Episodic: summarize older conversation chunks → write summary back to context. Google ADK compaction approach.
  • Semantic cache: embed user query → cosine match recent Q&A pairs → return cached if similarity > threshold. Saves 30–60% LLM costs.

Tool Design for Agents

  • Every tool = JSON schema: name, description, parameters, return type
  • Description quality matters enormously — LLM uses it to decide when and how to call
  • Tools must be idempotent where possible — agents retry on failure
  • MCP (Model Context Protocol): Anthropic's standard for plug-and-play tool integrations
  • Always validate + sanitize tool outputs before feeding back to LLM — prevent prompt injection from tool results

Context Engineering — The Real Production Bottleneck

Long-running agents exhaust their context window. Naive solution: bigger context window. Real solution: treat context as a compiled artifact. Separate durable state (DB) from per-call working context (prompt). Apply compaction: summarize old events, keep recent N turns verbatim, inject only relevant memory chunks retrieved from vector store. This is the difference between an agent that works for 3 turns and one that works for 300 turns in production.

DAY 33 Multi-Agent Systems — Supervisor, Specialist Agents, Orchestration Architecture
Supervisor Multi-Agent Pattern: User Request → [Supervisor Agent: classify + route task] ↓ ↓ ↓ [Research Agent] [Code Agent] [QA Agent] (web search tools) (shell tools) (test tools) ↓ ↓ ↓ [Results aggregated by Supervisor] ↓ [Final Response to User] Key: Each specialist runs its own ReAct loop internally. Supervisor only sees the final output of each agent — not intermediate steps. Each agent has bounded tool access (least privilege).

Single vs Multi-Agent — When to Escalate

Start single agent + many tools. Add multi-agent only when: (1) tasks are genuinely parallel and independent, (2) specialist isolation prevents error contamination between domains, (3) different agents need different tool permissions (security boundary). Don't add multi-agent for complexity's sake. Production lesson from Anthropic: "The most successful implementations consistently used simple, composable patterns rather than complex frameworks." Debugging multi-agent race conditions and message passing failures is brutal.

DAY 34 Design a Production LLM Platform — AI Gateway + RAG at Scale Full HLD
LLM Platform HLD: Client → [AI Gateway: auth, TPM rate-limit, cost tracking, model routing] ↓ ↓ [Semantic Cache] [Model Router] (Redis + pgvector) ↓ ↓ ↓ similarity > 0.95 → hit [GPT-4o] [Claude] [Llama-70B on GPU] ↓ [Response Logger + Cost Aggregator] RAG Pipeline: Docs → [Chunker (semantic boundaries)] → [Embedder: text-embedding-3] → [pgvector / Pinecone] Query → [Embed query] → [ANN top-20] → [Cross-encoder reranker: top-3] → [Budget-aware prompt builder] → [LLM] → [Streamed response]

AI Gateway Key Decisions

  • Rate limit in tokens (TPM), not requests: one request can cost 100 or 100K tokens
  • Model routing: classify query complexity → route simple to small/cheap model, complex to GPT-4o. 60-80% cost reduction.
  • Semantic cache: embed prompt → cosine search → return cached if similarity > 0.95. Massive savings on repeated queries.
  • Fallback chain: primary model timeout → retry secondary → hardcoded fallback response
  • Streaming: SSE for token-by-token output — reduces perceived latency dramatically

RAG Production Trade-offs

  • Chunking: fixed-size (fast) vs semantic (better recall). Parent-child: retrieve small chunk, return larger parent for context.
  • Hybrid search: ANN (semantic) + BM25 (keyword) combined with RRF scoring. Best precision.
  • Reranker: cross-encoder re-scores top-20 → return top-3. Expensive but dramatically improves answer quality.
  • Context budget: dynamically allocate tokens: system prompt + examples + retrieved chunks + answer reserve. Hard cap on chunk injection.
  • Eval: RAGAS metrics: faithfulness, answer relevance, context recall. Run on golden dataset per deploy.
DAY 35 Design an AI Coding Assistant (Copilot / Claude Code Style) Full HLD+LLD
AI Coding Assistant — Full System: IDE Plugin → [LSP Server] → [Context Assembler] ↓ [Repo Indexer: AST-chunked embeddings → pgvector (incremental, file-watcher)] [Current file + open tabs + cursor position + git diff + test results] ↓ [Retrieval: top-k relevant code chunks by semantic + BM25] ↓ [Prompt builder: system + repo context + user intent — 8K token budget] ↓ [LLM Gateway → streaming response] ↓ [Output Validator: AST parse, lint, run existing tests] ↓ [Streamed diff shown in IDE — user approves before apply]

The Three Hard Problems

1. Context assembly: what code do you put in the prompt and how do you rank relevance? AST-based chunking preserves function/class boundaries. Retrieve by semantic similarity + recency + call-graph proximity. Budget: 8K tokens for context, 4K for completion.
2. Latency: inline completions need <200ms p50. Use smaller/cached model for inline autocomplete; larger for chat and complex refactoring. Pre-warm model for active files.
3. Safety: LLM-generated code runs on your machine. Sandbox every tool call (file write, shell exec) in a container. Never auto-apply diffs — always require user approval. Cost cap per session: abort if token spend exceeds budget.

DAY 36 Agent Observability, Safety & Production Checklist Production

Observability for Agents

  • Trace every tool call: span ID, tool name, input, output, latency, token cost
  • Agent run trace: full Thought → Action → Observation tree per run_id — stored in persistent log
  • Token cost per step: attribute cost to each LLM call — find expensive steps to optimize
  • Eval pipeline: RAGAS / custom golden dataset — run on every deploy, catch regressions
  • Tools: LangSmith, Langfuse, Helicone, Arize Phoenix — purpose-built for LLM tracing

Safety Layers

  • Input validation: prompt injection detection, PII scrubbing before LLM
  • Output validation: structured output schemas (JSON Schema/Pydantic) — reject malformed
  • Tool permission scoping: each agent gets only tools it needs — least privilege
  • Sandboxed execution: code-running agents in network-isolated containers with CPU/mem limits
  • Human-in-the-loop gates: for irreversible actions (send email, deploy, delete data) — explicit approval required
  • Max iterations + cost cap: hard stop prevents runaway loops

Real Production Failure Modes

(1) Compounding hallucination: wrong tool call → wrong observation → wrong next step → cascades. Fix: validate tool output schema, add reflection on unexpected results. (2) Context poisoning / prompt injection: malicious content in RAG retrieval injects instructions. Fix: separate retrieved content from instruction sections structurally. (3) Cost explosion: agent loops without progress. Fix: max_steps + cost budget hard cap + loop detection (hash of recent states). (4) No audit trail: impossible to debug what the agent did. Fix: persistent trace log from day one — non-negotiable.

🤖 Week 5 Checkpoint — AI Agent Design