Home ArchitecturesTwitter Fan-Out & Timeline Architecture

⚡ Distributed SystemsAdvancedWeek 1

Twitter Fan-Out & Timeline Architecture

The push vs pull dilemma at 500M tweets/day

X (Twitter)InstagramLinkedIn

Key Insight

The celebrity problem: uniform push fan-out breaks for accounts with extreme follower counts. Hybrid models win.

Request Journey

User posts tweet→

Write to Tweets table and Kafka event→

Fan-out service reads follower list→

Normal users: push tweet ID to Redis timeline→

Celebrities: skip push, pull on read

+1 more steps

How It Works

① User posts tweet

② Write to Tweets table and Kafka event

③ Fan-out service reads follower list

④ Normal users: push tweet ID to Redis timeline

⑤ Celebrities: skip push, pull on read

⑥ Client reads timeline from Redis sorted set

⚠The Problem

When a user opens their home timeline, they expect a real-time feed of tweets from everyone they follow, sorted chronologically. The naive approach — querying all followed users' tweets at read time — requires joining millions of rows per request, which at 300K+ timeline reads per second brings any database to its knees. The challenge is compounded by celebrity accounts like Katy Perry with 100M+ followers: a single tweet must appear in 100M timelines within seconds.

✓The Solution

Twitter uses a hybrid fan-out model. For regular users (fewer than ~500K followers), tweets are pushed at write time into each follower's precomputed timeline stored in Redis sorted sets. For celebrity accounts, tweets are fetched at read time and merged with the precomputed timeline on the fly. This hybrid approach bounds worst-case write amplification while keeping reads consistently fast at sub-5ms.

📊Scale at a Glance

500M+

Tweets/Day

300K+

Timeline Reads/sec

~5M

Fan-out Writes/sec

~800 tweets/user

Timeline Cache Size

🔬Deep Dive

Fan-Out on Write — The Push Model

When a regular user tweets, a fan-out service takes the tweet ID and pushes it into each follower's home timeline — a Redis sorted set keyed by user ID, scored by Snowflake timestamp. For a user with 10K followers, this means 10K Redis ZADD operations per tweet. Redis sorted sets keep the timeline naturally ordered, so reads are a simple ZREVRANGE call returning the latest N tweet IDs. This approach trades significant write amplification for constant-time, sub-5ms reads on the hot path.

Fan-Out on Read — The Celebrity Problem

When Katy Perry tweets to 100M followers, pushing to 100M Redis sorted sets would take minutes and overwhelm the entire cache cluster. Instead, tweets from accounts exceeding a follower threshold (~500K) are excluded from write-time fan-out. At read time, the user's precomputed timeline is merged with fresh tweets from any followed celebrity accounts fetched on demand. This mixed approach keeps write latency bounded at the cost of slightly more complex read-path logic and marginally higher tail latency.

Snowflake IDs — Time-Sortable Distributed Identifiers

Twitter's Snowflake generates 64-bit unique IDs composed of 41 bits for timestamp, 10 bits for machine ID, and 12 bits for sequence number. These IDs are sortable by creation time without any database lookup, which means Redis sorted sets can use the raw ID as the score for chronological ordering. Each Snowflake worker generates up to 4,096 IDs per millisecond. This eliminates the need for a centralized auto-increment counter — a critical single point of failure at Twitter's write volume.

Redis as the Timeline Store

Each user's home timeline is a Redis sorted set capped at roughly 800 tweet IDs. At read time, the client fetches the latest 20–50 tweet IDs via ZREVRANGE, then batch-fetches actual tweet content from a separate tweet object cache in a single multi-get. Keeping timelines in-memory means the common-case read latency is under 5ms. The trade-off is memory: 800 IDs × hundreds of millions of active users requires a massive Redis fleet — Twitter operated one of the largest Redis deployments in the world.

FlockDB — The Social Graph Store

Twitter built FlockDB, a distributed graph database optimized for adjacency-list queries like 'who follows this user.' It stores edges (follower → followee) sharded by source node ID across MySQL backends, with a graph-aware query layer supporting set operations like intersection (mutual followers) and difference. When a tweet triggers fan-out, the fan-out service queries FlockDB to retrieve the full follower list. FlockDB is optimized for high read throughput on large adjacency lists — critical when a single user can have millions of followers.

⬡Architecture Diagram

Twitter Fan-Out & Timeline Architecture — simplified architecture overview

✦Core Concepts

⚙️

Fan-out on Write

⚙️

Fan-out on Read

⚡

Redis Sorted Sets

⚙️

Finagle RPC

🗄️

FlockDB

⚙️

Snowflake IDs

⚖Tradeoffs & Design Decisions

Every architectural decision is a tradeoff. Here's what you gain and what you give up.

✓ Strengths

✓Sub-5ms timeline reads for all users via precomputed Redis sorted sets
✓Snowflake IDs eliminate centralized ID generation bottleneck entirely
✓Hybrid fan-out model bounds worst-case write amplification for celebrity tweets
✓Redis sorted sets provide natural chronological ordering without additional sorting

✗ Weaknesses

✗Write amplification: a tweet from a user with 100K followers generates 100K Redis writes
✗Celebrity tweets have higher read latency due to on-demand merge at read time
✗Redis memory cost is enormous — 800 tweet IDs × hundreds of millions of users
✗Cache invalidation for deleted or protected tweets must propagate across millions of timelines

🎯FAANG Interview Questions

Interview Prep

💡 These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.

These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.

Q1
Design a news feed system. When would you choose fan-out on write vs fan-out on read?
Q2
A user with 50M followers posts a tweet. Walk through exactly what happens in the system end to end.
Q3
How would you handle tweet deletions in a fan-out-on-write architecture where the tweet ID exists in millions of timelines?
Q4
Twitter's timeline uses Redis sorted sets. Why sorted sets instead of lists? What are the complexity trade-offs?
Q5
Design Snowflake: a distributed ID generator that produces time-sortable, globally unique 64-bit IDs without coordination between nodes.

Research Papers & Further Reading

2018

Scaling Twitter's Ad Targeting Platform

Twitter Engineering

No link

Listen to the Podcast Episode

🎙️ Free Podcast

Alex & Sam break it down

Listen to a conversational deep-dive on this architecture — real trade-offs, production context, and student-friendly explanations. Free, no login required.

Listen to Episode

Free · No account required · Listen in browser