01Complexity & Amortized Analysis
Big-O Reference
- O(1) โ array index, hash map get/set
- O(log n) โ binary search, balanced BST insert/lookup
- O(n) โ linear scan, single pass
- O(n log n) โ merge sort, heap sort, TreeMap ops
- O(nยฒ) โ nested loops, bubble sort
- O(2โฟ) โ all subsets (recursive Fibonacci)
- O(n!) โ all permutations
Amortized Analysis
- Average cost per operation over a sequence โ not worst-case single op
- ArrayList/slice append: O(1) amortized (doubling strategy)
- Doubling: total copies = 1+2+4+...+n = 2n โ O(1) per append
- HashMap resize: O(n) per resize, O(1) amortized over n inserts
- Splay tree: O(log n) amortized, O(n) worst single op
- Key insight: occasional expensive ops are "paid for" by cheap ones
Space Complexity
- O(1) โ iterative algorithm with fixed variables
- O(log n) โ recursive binary search (call stack)
- O(n) โ storing all input elements (hash map, array copy)
- O(nยฒ) โ 2D matrix, adjacency matrix for dense graph
- In-place algorithms: O(1) extra space (quicksort partition)
- Auxiliary space vs total space โ distinguish in interviews
NP-Complete Problems (Know These)
- Travelling Salesman Problem (TSP)
- 0-1 Knapsack (NP-hard, DP gives pseudo-polynomial)
- Graph coloring (k โฅ 3)
- Boolean satisfiability (SAT)
- Subset sum
- These have no known polynomial-time solution โ use approximations or DP for small n
Amortized Analysis โ Dynamic Array Append
Quiz โ Complexity
Q1. ArrayList.append() in Java has O(1) amortized time. What is its worst-case single-operation time?
Q2. Binary search on a sorted array of 1 million elements โ how many comparisons worst case?
02Distributed Consensus โ Raft & Paxos
The CAP Theorem & PACELC
To study distributed systems without external resources, you must understand the fundamental constraints:
- CAP Theorem: In the presence of a Partition (network failure), you must choose between Consistency (all nodes see same data) or Availability (every request gets a response).
- PACELC: An extension of CAP. Partition ? (A vs C) : Else (Latency vs Consistency).
Linearizability: This is the "Gold Standard" of consistency. It makes the distributed system appear as if there is only one copy of the data, and all operations are atomic.
Raft Consensus Algorithm
Raft is designed for understandability. It decomposes consensus into: leader election, log replication, and safety. Used by etcd (Kubernetes state store), CockroachDB, Consul, and TiKV.
Key properties: Election safety (at most one leader per term). Log matching (same index+term โ same entry). Leader completeness (committed entries survive leadership changes).
Raft vs Paxos
- Both achieve same safety guarantees under same failure model
- Paxos: foundational theory, flexible, hard to implement correctly
- Raft: leader-based, sequential log, much easier to build and reason about
- Multi-Paxos: optimized Paxos for continuous log โ closer to Raft
- Paxos used: Zookeeper (ZAB protocol), Google Chubby
- Raft used: etcd, CockroachDB, Consul, TiKV, RethinkDB
Byzantine Fault Tolerance (BFT)
- Raft/Paxos assume crash failures โ nodes stop working
- BFT: nodes can behave maliciously (send wrong data)
- PBFT: needs 3f+1 nodes to tolerate f Byzantine failures
- Blockchain (Bitcoin/Ethereum): Nakamoto consensus โ probabilistic BFT via PoW/PoS
- Hyperledger Fabric: PBFT variant for permissioned chains
- BFT is much more expensive โ only use when adversarial actors possible
Consensus in Practice
- Leader bottleneck: all writes go through leader โ use partitioning
- Network partition: minority partition becomes unavailable (CP choice)
- Pre-vote optimization: prevents unnecessary elections (etcd)
- Joint consensus: safe cluster membership changes (add/remove nodes)
- Read scalability: followers can serve reads with linearizability via read index
- Lease reads: leader serves reads without log round-trip (clock-based)
FLP Impossibility
- Fischer-Lynch-Paterson (1985): no deterministic algorithm can achieve consensus in an async system with even one crash failure
- Why Raft works: uses randomized timeouts (randomness breaks the impossibility)
- Practical insight: consensus requires either synchrony assumptions or randomness
- Raft's randomized election timeouts: solve livelock, not covered by FLP
Quiz โ Consensus
Q1. In Raft, when does a leader commit a log entry?
Q2. Why do Raft election timeouts use randomization?
Q3. How many nodes does PBFT need to tolerate 1 Byzantine failure?
03Networking โ TCP, TLS, HTTP
TCP Deep Dive
To study networking as a self-contained book, you must master Congestion Control. This is why internet speed fluctuates.
- Slow Start: Start with a small window. Double size every RTT. (Exponential growth).
- Congestion Avoidance: After reaching a threshold, increase window by 1 per RTT. (Linear growth).
- Multiplicative Decrease: On packet loss, cut the window by half immediately.
TLS 1.3 Handshake
Certificate validation: OCSP stapling (server includes signed proof cert is not revoked), Certificate Transparency (CT) logs detect rogue CA issuance. mTLS: both client and server present certificates โ used in service meshes (Istio), gRPC, Hyperledger Fabric.
HTTP/1.1 vs HTTP/2 vs HTTP/3
- HTTP/1.1: one req/response per TCP connection (pipelining limited). Keep-alive reuses connection.
- HTTP/2: multiplexing โ multiple streams over one TCP. HPACK header compression. Server push. But: single TCP stream = HOL blocking on packet loss.
- HTTP/3: runs on QUIC (UDP). Per-stream loss recovery โ losing a packet only blocks that stream, not all streams. 0-RTT connection setup.
- QUIC: connection migration (IP change doesn't break session), built-in TLS 1.3, designed for mobile networks.
DNS Internals
- Recursive resolver: asks on your behalf (ISP DNS, 8.8.8.8)
- Authoritative server: knows the definitive answer for a zone
- Resolution chain: Root โ TLD (.com) โ Authoritative โ Answer
- TTL: how long to cache. Low TTL = slow (more DNS queries). High TTL = fast but slow propagation.
- GeoDNS: return different IPs based on client location (Route53 latency-based)
- DNSSEC: cryptographic signatures prevent DNS spoofing/cache poisoning
Load Balancer Layers
- L4 (Transport): routes by IP + TCP port, no packet inspection. Faster.
- L7 (Application): routes by HTTP headers, path, cookie. More flexible.
- L7 enables: path-based routing, A/B testing, auth at edge, sticky sessions by cookie
- AWS ALB: L7. NLB: L4. HAProxy: both modes.
- Health checks: L4 (TCP connect) or L7 (HTTP /health endpoint)
- Connection draining: allow in-flight requests to finish before removing instance
WebSockets vs SSE vs Polling
- Long Polling: client waits, server holds until event. Simple, higher latency.
- SSE: server โ client only. HTTP/1.1 compatible. Good for feeds, notifications.
- WebSocket: full-duplex over persistent TCP. Chat, gaming, real-time dashboards.
- WebTransport: QUIC-based, future. Datagrams + streams.
- WebSocket uses HTTP Upgrade header to switch protocol.
- Go handles 10K+ concurrent WS connections well via goroutines.
Quiz โ Networking
Q1. HTTP/2's main limitation compared to HTTP/3 is:
Q2. TLS 1.3 achieves forward secrecy by using:
Q3. Why is 0-RTT TLS resumption dangerous for POST requests?
04Probabilistic Data Structures
Bloom Filter โ Probabilistic Set Membership
A Bloom filter answers "is this element in the set?" with no false negatives (if it says NO, definitely not in the set) and a tunable false positive rate (if it says YES, probably in the set).
Where it's used: Cassandra (skip SSTable disk reads for non-existent keys), Google Chrome (safe browsing URL check), databases (avoid unnecessary disk lookups), CDNs (cache negative results), Bitcoin (SPV wallet transaction lookup).
HyperLogLog โ Approximate Cardinality
Count distinct elements (like COUNT DISTINCT in SQL) using a tiny fixed amount of memory โ regardless of how many unique items you've seen.
Where it's used: Redis PFADD/PFCOUNT, counting unique visitors at scale, distinct queries in analytics (Presto, BigQuery), network traffic analysis, streaming distinct count.
Skip List
- Probabilistic alternative to balanced BST
- Multiple levels of sorted linked lists
- Each element promoted to next level with probability p (usually 0.5)
- O(log n) average search, insert, delete
- Used by: Redis sorted sets (ZSETs), LevelDB, MemTable in Cassandra
- Advantage over BST: simpler to implement, lock-free concurrent versions exist
Count-Min Sketch
- Approximate frequency count for items in a stream
- 2D array: d rows of hash functions ร w counters per row
- Update: increment d cells (one per row per hash function)
- Query: take minimum of d cells โ upper bound on true count
- Never underestimates, may slightly overestimate
- Used for: top-k heavy hitters, rate limiting by user, network flow analysis
Consistent Hashing
- Map keys and nodes onto a ring (0 to 2ยณยฒ)
- Key โ hash โ find nearest node clockwise on ring
- Add/remove node: only adjacent keys move (not all keys)
- Virtual nodes (vnodes): each physical node = 100-200 points on ring โ better distribution
- Adding 1 node to N-node cluster: only 1/N keys move (not all)
- Used by: DynamoDB, Cassandra, Redis Cluster, CDN cache routing
Merkle Trees
- Binary tree where each leaf = hash(data), each node = hash(left+right)
- Root hash = fingerprint of entire dataset
- Verify any subset of data without downloading all data
- O(log n) proof size for single element membership
- Used by: Git (tree objects), Bitcoin (transaction Merkle root in block header), Cassandra (anti-entropy repair), IPFS
Quiz โ Probabilistic Data Structures
Q1. A Bloom filter says an element IS in the set. What can you conclude?
Q2. HyperLogLog is designed to solve which problem efficiently?
Q3. When consistent hashing adds 1 node to an N-node cluster, approximately what fraction of keys need to move?
05Encoding & Serialization Formats
JSON vs Protobuf vs Avro vs MessagePack
Protobuf Schema Evolution Rules
- โ Add new optional fields (old code ignores unknown fields)
- โ Remove fields (mark deprecated, don't reuse field number)
- โ Rename fields (wire format uses field number, not name)
- โ Change field type (int32 โ string breaks decoding)
- โ Change field number (changes the binary wire format)
- โ Change a repeated field to optional or vice versa
Data Compression
- Snappy: fast compression/decompression, moderate ratio (~2x). Google default for Hadoop, Kafka.
- LZ4: fastest decompression, lower ratio. Great for real-time streaming.
- Zstd: best ratio for text/JSON (~3-4x), still fast. Modern default.
- Gzip: slower, good ratio. Standard for HTTP responses.
- Kafka: set compression.type=zstd for best throughput/storage balance
Base64 & Encoding Schemes
- Base64: binary โ printable ASCII (3 bytes โ 4 chars, ~33% overhead)
- Used: email attachments, JWT tokens, image data in JSON/HTML
- URL-safe Base64: replace +/ with -_ (no URL encoding needed)
- Base62: URL shortener encoding โ uses 0-9, a-z, A-Z, no special chars
- Hex encoding: 1 byte โ 2 hex chars (100% overhead). Used for hashes, UUIDs.
Character Encoding
- ASCII: 128 chars, 7-bit, English only
- UTF-8: variable width (1-4 bytes), backward compatible with ASCII, universal
- UTF-16: 2-4 bytes, used in Java/JavaScript strings internally
- UTF-32: fixed 4 bytes, simple but wasteful
- Always use UTF-8 for network protocols and storage
- Emoji: 4 bytes in UTF-8 โ breaks naive string length counting
Quiz โ Encoding
Q1. In Protobuf, you need to rename a field. What must you NOT change?
Q2. For Kafka event streaming that needs schema evolution guarantees, which format is most appropriate?
06Distributed Clocks & Ordering
Lamport Timestamps & Vector Clocks
In a distributed system, there is no global clock. We need logical ordering to reason about causality.
Physical Clocks vs Logical Clocks
- Physical clocks (NTP): drift up to ms. Can go backwards after sync. Can't order events in same millisecond.
- Logical clocks: no drift, purely causal ordering, no real-time information
- Hybrid Logical Clocks (HLC): combines physical + logical. Tracks causality, close to wall clock. Used by CockroachDB.
- TrueTime (Google Spanner): GPS + atomic clocks. Bounded uncertainty interval โ waits out uncertainty for strict serializability.
CRDTs โ Conflict-Free Replicated Data Types
- Data structures that can be merged without conflicts โ always converge
- G-Counter: only increment (no decrement). Merge = take max per node.
- PN-Counter: two G-Counters (positive + negative). Net = P - N.
- G-Set: add-only set. Merge = union.
- LWW-Register: last-write-wins by timestamp.
- Used in: Redis (some types), Riak, collaborative editing (Google Docs)
Distributed Transactions โ Ordering
- Serializability: transactions appear to execute in some serial order
- Linearizability: real-time consistent reads (stronger than serializability)
- Snapshot isolation: each transaction sees consistent snapshot โ allows write skew
- Snapshot isolation + write skew: doctors on-call example (both check, both go off)
- Spanner: external consistency (linearizability) across datacenters via TrueTime
Fencing Tokens
- Problem: distributed lock holder pauses (GC), lock expires, another client acquires lock, original resumes with stale lock
- Solution: fencing token โ monotonically increasing number with each lock grant
- Storage service rejects writes with token โค last seen token
- Even if old client resumes, its old token is rejected
- Concept from Martin Kleppmann's DDIA โ essential for safe distributed locking
Quiz โ Distributed Clocks
Q1. Vector clocks detect concurrent events by:
Q2. A CRDT G-Counter merge operation takes:
07Best Resources
The bible for distributed systems. Chapters 5-9 are essential: replication, partitioning, transactions, consistency. Read this cover to cover.
Interactive animation of Raft leader election and log replication. 15 minutes here is worth more than any blog post.
University lecture series. Lectures 5-10 cover vector clocks, consensus, consistency, transactions. Dense and rigorous.
MIT course lecture on Raft. Best academic explanation of the algorithm and its safety proofs.
Visual + mathematical explanation. Build one from scratch โ 30 minutes, permanently understand it.
Best practical QUIC/HTTP3 explanation with real production numbers. Understand WHY QUIC fixes HTTP/2's problems.
How Meta counts distinct users at scale. Production accuracy numbers and memory savings.