30 EPISODES · ALL FREE

ArchDesign Podcast

Alex & Sam break down complex systems in under 7 minutes — Netflix, Kafka, Kubernetes, GPT, and more.

30Episodes

2h+Content

3Categories

FreeForever

Distributed Systems

Netflix Content Delivery

Alex and Sam unpack how Netflix streams to 260M subscribers worldwide without buffering — covering Open Connect CDN, adaptive bitrate streaming, and multi-region failover.

Distributed Systems

Twitter Fanout Timeline

How does a single tweet from Elon reach 100M followers in milliseconds? Alex and Sam dig into Twitter's hybrid push/pull fanout architecture and Redis timeline caches.

Distributed Systems

Uber Geospatial Architecture

Matching millions of riders and drivers in real time requires some clever geo-indexing. Alex and Sam explore H3 hexagonal grids, geohashing, and Uber's dispatch system.

Distributed Systems

WhatsApp Messaging at Scale

WhatsApp handles 100B messages per day with just hundreds of engineers. Sam and Alex break down Erlang's actor model, message queuing, and end-to-end encryption at scale.

Distributed Systems

Google Web Search Architecture

Returning relevant results from 100B+ web pages in under 200ms is no small feat. Alex and Sam explore Googlebot, inverted indexes, PageRank, and the Bigtable serving layer.

Distributed Systems

Amazon DynamoDB Internals

DynamoDB promises single-digit millisecond latency at any scale. Alex and Sam cover consistent hashing, virtual nodes, quorum reads/writes, and the original Dynamo paper.

Distributed Systems

YouTube Video Processing Pipeline

Every minute, 500 hours of video are uploaded to YouTube. Sam and Alex trace a video from upload through transcoding, thumbnail generation, CDN distribution, and recommendation signals.

Distributed Systems

Airbnb Search & Availability

Searching millions of listings with real-time availability across time zones is tricky. Alex and Sam explore Airbnb's Elasticsearch setup, calendar locking, and search ranking.

Distributed Systems

Stripe Payment Processing

Processing billions in payments requires bulletproof reliability. Sam and Alex unpack Stripe's idempotency keys, two-phase commits, and how they handle exactly-once payment guarantees.

Distributed Systems

Discord Real-Time Messaging

Discord serves 19M concurrent users with sub-100ms message delivery. Alex and Sam explore WebSockets, Cassandra message storage, and the infamous message ID snowflake.

Data & Infrastructure

Spotify Music Recommendations

How does Discover Weekly feel like it reads your mind? Sam and Alex break down collaborative filtering, audio embeddings, Bandits for exploration, and Spotify's ML platform.

Data & Infrastructure

GitHub CI/CD Pipeline

GitHub Actions runs millions of workflows daily. Alex and Sam dig into ephemeral runners, artifact caching, secrets management, and how GitHub keeps build queues fair.

Data & Infrastructure

LinkedIn Feed Ranking

LinkedIn's feed must balance virality, relevance, and professional tone across 900M users. Sam and Alex explore their two-tower model, feature stores, and online A/B testing.

Data & Infrastructure

Dropbox Block Sync Architecture

Syncing files across devices without uploading the whole file every time requires smart chunking. Alex and Sam explain content-defined chunking, deduplication, and delta sync.

Data & Infrastructure

Facebook Social Graph & TAO

Facebook's TAO (The Associations and Objects) system powers the social graph for 3B users. Sam and Alex walk through its graph data model, tiered caching, and eventual consistency.

Data & Infrastructure

Redis Cluster Architecture

Redis is the Swiss Army knife of infrastructure. Alex and Sam explore Redis's single-threaded event loop, cluster sharding with hash slots, replication, and persistence options.

Data & Infrastructure

Apache Kafka Architecture

Kafka is the backbone of modern data infrastructure. Sam and Alex break down topics, partitions, consumer groups, the log-based storage model, and why Kafka's ordering guarantees matter.

Data & Infrastructure

Kubernetes Orchestration

How does Kubernetes decide where to run your containers? Alex and Sam cover the control plane, etcd, the scheduler, kubelet, and how self-healing keeps services alive.

Data & Infrastructure

PostgreSQL MVCC & WAL

Postgres handles thousands of concurrent transactions without locking your table. Sam and Alex explain Multi-Version Concurrency Control, the Write-Ahead Log, and VACUUM.

Data & Infrastructure

Cloudflare Edge Workers

Running code at 300 PoPs worldwide with sub-millisecond cold starts sounds impossible. Alex and Sam explore V8 isolates, the edge execution model, and Cloudflare's global network.

GPT Inference Architecture

Serving GPT-4 to millions of users concurrently requires a novel approach to batching and memory. Sam and Alex cover KV caching, continuous batching, tensor parallelism, and flash attention.

RAG Pipeline Architecture

Retrieval-Augmented Generation lets LLMs answer questions about your private data without retraining. Alex and Sam walk through chunking, embedding, vector search, and context stuffing.

Vector Database Internals

Vector databases are the storage layer of the AI era. Sam and Alex go deep on HNSW approximate nearest neighbor search, quantization, and why Pinecone chose a serverless architecture.

LLM API Gateway

An LLM gateway sits between your app and multiple model providers, handling rate limiting, cost routing, and fallbacks. Alex and Sam design one from scratch.

Multi-Agent Orchestration

When a single LLM call isn't enough, you need multiple agents collaborating. Sam and Alex explore orchestrator-worker patterns, tool use, memory, and common failure modes.

LLM Fine-Tuning Pipeline

Fine-tuning an LLM on your own data can dramatically improve domain performance. Alex and Sam cover LoRA, QLoRA, SFT, RLHF, and how to build a practical fine-tuning pipeline.

Prompt Caching & KV Cache

Caching prompt prefixes can cut LLM costs by 90%. Sam and Alex explain the KV cache, prompt caching in Anthropic's Claude API, and how to architect applications to benefit.

Hybrid Search for LLMs

Pure vector search misses keyword matches; pure BM25 misses semantic meaning. Alex and Sam explore hybrid search, reciprocal rank fusion, and reranking for better RAG retrieval.

AI Safety Guardrails

Deploying LLMs in production requires guardrails to prevent harmful outputs. Sam and Alex cover input/output filtering, constitutional AI, red-teaming, and practical safety architectures.

LLM Serving Infrastructure

Serving LLMs at scale requires purpose-built infrastructure. Alex and Sam discuss vLLM, PagedAttention, speculative decoding, and how cloud providers think about GPU cluster scheduling.