Discord Real-Time Messaging Architecture
How Discord scaled from 5M to 500M users with Elixir and Rust
Key Insight
Erlang/BEAM's actor model makes it possible to hold millions of WebSocket connections with microsecond message passing.
Request Journey
How It Works
ā Client opens persistent WebSocket to Elixir gateway
ā” User sends message to channel
⢠Gateway publishes to channel topic
⣠Subscribers receive message via WebSocket push
⤠Message persisted to Cassandra by channel and snowflake ID
ā„ Offline members get push notification
ā The Problem
Discord must deliver messages to millionsof concurrent users across hundreds of thousands of servers (guilds) in real-time ā median latency under 50ms. A single popular server can have 500K+ simultaneous members in a voice or text channel. The original Python/MongoDB stack collapsed under load, and Cassandra's garbage collection pauses caused multi-second latency spikes that made real-time communication impossible.
āThe Solution
Discord re-architected in stages: migrated the WebSocket gateway to Elixir (leveraging the BEAM VM's actor model for millions of lightweight concurrent processes), replaced Cassandra with ScyllaDB (a C++ rewrite achieving 10Ć lower tail latency), and built a custom presence system tracking millions of online users. Each guild maps to Elixir GenServer processes that fan out messages to connected members with microsecond message passing.
šScale at a Glance
10M+
Concurrent Users
4B+
Messages/Day
Millions
WebSocket Connections
4B+
Voice Minutes/Day
š¬Deep Dive
Elixir Gateway ā Millions of WebSocket Connections
Discord's gateway servers run Elixir on the BEAM VM, the same platform behind WhatsApp's scale. Each WebSocket connection is an Elixir process consuming ~2KB of memory with its own isolated heap and preemptive scheduling. A single gateway server holds hundreds of thousands of concurrent connections. When a message is sent in a channel, the gateway process for that guild fans out the message to all connected member processes. The BEAM's actor model with message passing makes this fan-out natural ā no shared state, no locks, no thread coordination.
Cassandra to ScyllaDB ā Taming Tail Latency
Discord initially used Cassandra for message storage but suffered from JVM garbage collection pauses causing multi-second p99 latency spikes ā unacceptable for real-time chat. They migrated to ScyllaDB, a C++ reimplementation of Cassandra's architecture that eliminates GC pauses entirely. ScyllaDB uses a shard-per-core architecture: each CPU core owns a specific set of data and runs a single event loop, eliminating cross-core synchronization. The migration reduced p99 read latency from seconds to single-digit milliseconds while handling billions of messages per day.
Message Storage ā Bucketed by Channel and Time
Messages are stored in ScyllaDB with a partition key of (channel_id, bucket), where buckets are time windows. This ensures that recent messages in a channel ā the most common read pattern ā are co-located on disk for fast retrieval. Each partition is capped to prevent unbounded growth (hot partitions are a known ScyllaDB/Cassandra antipattern). When a user opens a channel, the client fetches the latest bucket; scrolling up triggers fetches of progressively older buckets. Deleted messages are tombstoned rather than physically removed, with compaction reclaiming space asynchronously.
Presence System ā Who's Online Right Now
Discord's presence system tracks the online/offline/idle/DND status of millions of users in real-time. Each user's presence is maintained by their gateway process. When a user's status changes, the update is published to a fan-out system that notifies all guilds the user belongs to, which in turn notify all online members of those guilds. This creates a cascade: a single user going offline can trigger millions of presence updates. Discord batches and rate-limits these updates, prioritizing smaller guilds where presence is more meaningful and sampling updates in mega-servers.
Voice Architecture ā WebRTC at Scale
Discord voice channels use WebRTC with Selective Forwarding Units (SFUs) rather than peer-to-peer mesh. Each voice server is an SFU that receives audio/video streams from each participant and selectively forwards them to others ā avoiding the N² bandwidth explosion of peer-to-peer in large calls. The SFU decides which streams to forward based on who's speaking (dominant speaker detection via audio energy levels). Voice servers are regionally deployed to minimize latency, and users are routed to the nearest server. Opus codec at 64kbps provides high-quality audio with minimal bandwidth.
⬔Architecture Diagram
Discord Real-Time Messaging Architecture ā simplified architecture overview
ā¦Core Concepts
WebSockets
Elixir/Phoenix
Actor Model
Cassandra ScyllaDB Migration
Presence System
Voice (WebRTC)
āTradeoffs & Design Decisions
Every architectural decision is a tradeoff. Here's what you gain and what you give up.
ā Strengths
- āElixir/BEAM actor model handles millions of concurrent WebSocket connections efficiently
- āScyllaDB eliminated GC-induced latency spikes, achieving single-digit ms p99 reads
- āSFU-based voice architecture scales to large calls without N² bandwidth explosion
- āPer-channel time-bucketed storage optimizes the dominant access pattern of reading recent messages
ā Weaknesses
- āElixir/BEAM ecosystem is smaller than JVM or Node.js, limiting library availability
- āPresence fan-out for users in large guilds creates cascading update storms requiring rate limiting
- āScyllaDB migration required careful data modeling to avoid hot partition antipatterns
- āSFU voice servers are regionally deployed, adding infrastructure complexity for global low-latency voice
šÆFAANG Interview Questions
Interview Prepš” These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.
These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.
- Q1
Design a real-time group messaging system like Discord. How would you handle a channel with 500K simultaneous viewers?
- Q2
Discord migrated from Cassandra to ScyllaDB. What problems does JVM garbage collection cause for real-time systems?
- Q3
Design a presence system that tracks online status for millions of users. How do you handle the fan-out when a user in 100 guilds goes offline?
- Q4
Explain the difference between SFU-based and mesh-based voice/video architectures. When would you choose each?
- Q5
How would you partition chat messages in a database to optimize for the 'load recent messages in channel' access pattern?
Research Papers & Further Reading
How Discord Stores Billions of Messages
Discord Engineering
Listen to the Podcast Episode
Alex & Sam break it down
Listen to a conversational deep-dive on this architecture ā real trade-offs, production context, and student-friendly explanations. Free, no login required.
Listen to EpisodeFree Ā· No account required Ā· Listen in browser
More Distributed Systems
View allNetflix Content Delivery Architecture
How Netflix streams to 260M users without a single datacenter
Netflix Ā· Disney+ Ā· Hulu
Twitter Fan-Out & Timeline Architecture
The push vs pull dilemma at 500M tweets/day
X (Twitter) Ā· Instagram Ā· LinkedIn
Uber Surge Pricing & Geospatial Architecture
H3 hexagonal indexing, real-time dispatch, and dynamic pricing
Uber Ā· Lyft Ā· DoorDash
Listen to more architecture deep-dives
30 free podcast episodes ā Alex & Sam break down every architecture in this library. Listen in your browser, no account needed.
All architecture articles are free Ā· No account needed