Home ArchitecturesDiscord Real-Time Messaging Architecture

⚡ Distributed SystemsAdvancedWeek 5

Discord Real-Time Messaging Architecture

How Discord scaled from 5M to 500M users with Elixir and Rust

DiscordSlackTeams

Key Insight

Erlang/BEAM's actor model makes it possible to hold millions of WebSocket connections with microsecond message passing.

Request Journey

Client opens persistent WebSocket to Elixir gateway→

User sends message to channel→

Gateway publishes to channel topic→

Subscribers receive message via WebSocket push→

Message persisted to Cassandra by channel and snowflake ID

+1 more steps

How It Works

① Client opens persistent WebSocket to Elixir gateway

② User sends message to channel

③ Gateway publishes to channel topic

④ Subscribers receive message via WebSocket push

⑤ Message persisted to Cassandra by channel and snowflake ID

⑥ Offline members get push notification

⚠The Problem

Discord must deliver messages to millionsof concurrent users across hundreds of thousands of servers (guilds) in real-time — median latency under 50ms. A single popular server can have 500K+ simultaneous members in a voice or text channel. The original Python/MongoDB stack collapsed under load, and Cassandra's garbage collection pauses caused multi-second latency spikes that made real-time communication impossible.

✓The Solution

Discord re-architected in stages: migrated the WebSocket gateway to Elixir (leveraging the BEAM VM's actor model for millions of lightweight concurrent processes), replaced Cassandra with ScyllaDB (a C++ rewrite achieving 10× lower tail latency), and built a custom presence system tracking millions of online users. Each guild maps to Elixir GenServer processes that fan out messages to connected members with microsecond message passing.

📊Scale at a Glance

10M+

Concurrent Users

4B+

Messages/Day

Millions

WebSocket Connections

4B+

Voice Minutes/Day

🔬Deep Dive

Elixir Gateway — Millions of WebSocket Connections

Discord's gateway servers run Elixir on the BEAM VM, the same platform behind WhatsApp's scale. Each WebSocket connection is an Elixir process consuming ~2KB of memory with its own isolated heap and preemptive scheduling. A single gateway server holds hundreds of thousands of concurrent connections. When a message is sent in a channel, the gateway process for that guild fans out the message to all connected member processes. The BEAM's actor model with message passing makes this fan-out natural — no shared state, no locks, no thread coordination.

Cassandra to ScyllaDB — Taming Tail Latency

Discord initially used Cassandra for message storage but suffered from JVM garbage collection pauses causing multi-second p99 latency spikes — unacceptable for real-time chat. They migrated to ScyllaDB, a C++ reimplementation of Cassandra's architecture that eliminates GC pauses entirely. ScyllaDB uses a shard-per-core architecture: each CPU core owns a specific set of data and runs a single event loop, eliminating cross-core synchronization. The migration reduced p99 read latency from seconds to single-digit milliseconds while handling billions of messages per day.

Message Storage — Bucketed by Channel and Time

Messages are stored in ScyllaDB with a partition key of (channel_id, bucket), where buckets are time windows. This ensures that recent messages in a channel — the most common read pattern — are co-located on disk for fast retrieval. Each partition is capped to prevent unbounded growth (hot partitions are a known ScyllaDB/Cassandra antipattern). When a user opens a channel, the client fetches the latest bucket; scrolling up triggers fetches of progressively older buckets. Deleted messages are tombstoned rather than physically removed, with compaction reclaiming space asynchronously.

Presence System — Who's Online Right Now

Discord's presence system tracks the online/offline/idle/DND status of millions of users in real-time. Each user's presence is maintained by their gateway process. When a user's status changes, the update is published to a fan-out system that notifies all guilds the user belongs to, which in turn notify all online members of those guilds. This creates a cascade: a single user going offline can trigger millions of presence updates. Discord batches and rate-limits these updates, prioritizing smaller guilds where presence is more meaningful and sampling updates in mega-servers.

Voice Architecture — WebRTC at Scale

Discord voice channels use WebRTC with Selective Forwarding Units (SFUs) rather than peer-to-peer mesh. Each voice server is an SFU that receives audio/video streams from each participant and selectively forwards them to others — avoiding the N² bandwidth explosion of peer-to-peer in large calls. The SFU decides which streams to forward based on who's speaking (dominant speaker detection via audio energy levels). Voice servers are regionally deployed to minimize latency, and users are routed to the nearest server. Opus codec at 64kbps provides high-quality audio with minimal bandwidth.

⬡Architecture Diagram

Discord Real-Time Messaging Architecture — simplified architecture overview

✦Core Concepts

⚙️

WebSockets

⚙️

Elixir/Phoenix

🧠

Actor Model

🗄️

Cassandra ScyllaDB Migration

⚙️

Presence System

⚙️

Voice (WebRTC)

⚖Tradeoffs & Design Decisions

Every architectural decision is a tradeoff. Here's what you gain and what you give up.

✓ Strengths

✓Elixir/BEAM actor model handles millions of concurrent WebSocket connections efficiently
✓ScyllaDB eliminated GC-induced latency spikes, achieving single-digit ms p99 reads
✓SFU-based voice architecture scales to large calls without N² bandwidth explosion
✓Per-channel time-bucketed storage optimizes the dominant access pattern of reading recent messages

✗ Weaknesses

✗Elixir/BEAM ecosystem is smaller than JVM or Node.js, limiting library availability
✗Presence fan-out for users in large guilds creates cascading update storms requiring rate limiting
✗ScyllaDB migration required careful data modeling to avoid hot partition antipatterns
✗SFU voice servers are regionally deployed, adding infrastructure complexity for global low-latency voice

🎯FAANG Interview Questions

Interview Prep

💡 These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.

These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.

Q1
Design a real-time group messaging system like Discord. How would you handle a channel with 500K simultaneous viewers?
Q2
Discord migrated from Cassandra to ScyllaDB. What problems does JVM garbage collection cause for real-time systems?
Q3
Design a presence system that tracks online status for millions of users. How do you handle the fan-out when a user in 100 guilds goes offline?
Q4
Explain the difference between SFU-based and mesh-based voice/video architectures. When would you choose each?
Q5
How would you partition chat messages in a database to optimize for the 'load recent messages in channel' access pattern?

Research Papers & Further Reading

2023

How Discord Stores Billions of Messages

Discord Engineering

Read

Listen to the Podcast Episode

🎙️ Free Podcast

Alex & Sam break it down

Listen to a conversational deep-dive on this architecture — real trade-offs, production context, and student-friendly explanations. Free, no login required.

Listen to Episode

Free · No account required · Listen in browser