ArchDesign Podcast
Alex & Sam break down complex systems in under 7 minutes — Netflix, Kafka, Kubernetes, GPT, and more.
Netflix Content Delivery
Alex and Sam unpack how Netflix streams to 260M subscribers worldwide without buffering — covering Open Connect CDN, adaptive bitrate streaming, and multi-region failover.
Twitter Fanout Timeline
How does a single tweet from Elon reach 100M followers in milliseconds? Alex and Sam dig into Twitter's hybrid push/pull fanout architecture and Redis timeline caches.
Uber Geospatial Architecture
Matching millions of riders and drivers in real time requires some clever geo-indexing. Alex and Sam explore H3 hexagonal grids, geohashing, and Uber's dispatch system.
WhatsApp Messaging at Scale
WhatsApp handles 100B messages per day with just hundreds of engineers. Sam and Alex break down Erlang's actor model, message queuing, and end-to-end encryption at scale.
Google Web Search Architecture
Returning relevant results from 100B+ web pages in under 200ms is no small feat. Alex and Sam explore Googlebot, inverted indexes, PageRank, and the Bigtable serving layer.
Amazon DynamoDB Internals
DynamoDB promises single-digit millisecond latency at any scale. Alex and Sam cover consistent hashing, virtual nodes, quorum reads/writes, and the original Dynamo paper.
YouTube Video Processing Pipeline
Every minute, 500 hours of video are uploaded to YouTube. Sam and Alex trace a video from upload through transcoding, thumbnail generation, CDN distribution, and recommendation signals.
Airbnb Search & Availability
Searching millions of listings with real-time availability across time zones is tricky. Alex and Sam explore Airbnb's Elasticsearch setup, calendar locking, and search ranking.
Stripe Payment Processing
Processing billions in payments requires bulletproof reliability. Sam and Alex unpack Stripe's idempotency keys, two-phase commits, and how they handle exactly-once payment guarantees.
Discord Real-Time Messaging
Discord serves 19M concurrent users with sub-100ms message delivery. Alex and Sam explore WebSockets, Cassandra message storage, and the infamous message ID snowflake.
Spotify Music Recommendations
How does Discover Weekly feel like it reads your mind? Sam and Alex break down collaborative filtering, audio embeddings, Bandits for exploration, and Spotify's ML platform.
GitHub CI/CD Pipeline
GitHub Actions runs millions of workflows daily. Alex and Sam dig into ephemeral runners, artifact caching, secrets management, and how GitHub keeps build queues fair.
LinkedIn Feed Ranking
LinkedIn's feed must balance virality, relevance, and professional tone across 900M users. Sam and Alex explore their two-tower model, feature stores, and online A/B testing.
Dropbox Block Sync Architecture
Syncing files across devices without uploading the whole file every time requires smart chunking. Alex and Sam explain content-defined chunking, deduplication, and delta sync.
Facebook Social Graph & TAO
Facebook's TAO (The Associations and Objects) system powers the social graph for 3B users. Sam and Alex walk through its graph data model, tiered caching, and eventual consistency.
Redis Cluster Architecture
Redis is the Swiss Army knife of infrastructure. Alex and Sam explore Redis's single-threaded event loop, cluster sharding with hash slots, replication, and persistence options.
Apache Kafka Architecture
Kafka is the backbone of modern data infrastructure. Sam and Alex break down topics, partitions, consumer groups, the log-based storage model, and why Kafka's ordering guarantees matter.
Kubernetes Orchestration
How does Kubernetes decide where to run your containers? Alex and Sam cover the control plane, etcd, the scheduler, kubelet, and how self-healing keeps services alive.
PostgreSQL MVCC & WAL
Postgres handles thousands of concurrent transactions without locking your table. Sam and Alex explain Multi-Version Concurrency Control, the Write-Ahead Log, and VACUUM.
Cloudflare Edge Workers
Running code at 300 PoPs worldwide with sub-millisecond cold starts sounds impossible. Alex and Sam explore V8 isolates, the edge execution model, and Cloudflare's global network.
GPT Inference Architecture
Serving GPT-4 to millions of users concurrently requires a novel approach to batching and memory. Sam and Alex cover KV caching, continuous batching, tensor parallelism, and flash attention.
RAG Pipeline Architecture
Retrieval-Augmented Generation lets LLMs answer questions about your private data without retraining. Alex and Sam walk through chunking, embedding, vector search, and context stuffing.
Vector Database Internals
Vector databases are the storage layer of the AI era. Sam and Alex go deep on HNSW approximate nearest neighbor search, quantization, and why Pinecone chose a serverless architecture.
LLM API Gateway
An LLM gateway sits between your app and multiple model providers, handling rate limiting, cost routing, and fallbacks. Alex and Sam design one from scratch.
Multi-Agent Orchestration
When a single LLM call isn't enough, you need multiple agents collaborating. Sam and Alex explore orchestrator-worker patterns, tool use, memory, and common failure modes.
LLM Fine-Tuning Pipeline
Fine-tuning an LLM on your own data can dramatically improve domain performance. Alex and Sam cover LoRA, QLoRA, SFT, RLHF, and how to build a practical fine-tuning pipeline.
Prompt Caching & KV Cache
Caching prompt prefixes can cut LLM costs by 90%. Sam and Alex explain the KV cache, prompt caching in Anthropic's Claude API, and how to architect applications to benefit.
Hybrid Search for LLMs
Pure vector search misses keyword matches; pure BM25 misses semantic meaning. Alex and Sam explore hybrid search, reciprocal rank fusion, and reranking for better RAG retrieval.
AI Safety Guardrails
Deploying LLMs in production requires guardrails to prevent harmful outputs. Sam and Alex cover input/output filtering, constitutional AI, red-teaming, and practical safety architectures.
LLM Serving Infrastructure
Serving LLMs at scale requires purpose-built infrastructure. Alex and Sam discuss vLLM, PagedAttention, speculative decoding, and how cloud providers think about GPU cluster scheduling.