Redis Cluster Architecture
Hash slots, gossip protocol, and replication without a coordinator
Key Insight
16,384 slots (not 65,536) was chosen so that the cluster state (slot ownership bitmap) fits in a single Redis message.
Request Journey
How It Works
โ Client sends SET/GET command to smart client library
โก Smart client computes CRC16(key) mod 16384 to determine target hash slot
โข Cached slot table maps slot to the correct master node
โฃ If routed to wrong node (stale table), master replies with MOVED redirect containing correct node address
โค Master writes to in-memory data structure and asynchronously streams to replica via replication backlog
โฅ Gossip protocol exchanges PING/PONG heartbeats every second, propagating slot ownership and node health across cluster
โฆ If a master stops responding, peers mark it PFAIL; majority agreement escalates to FAIL
โง Replica with latest replication offset increments config epoch and requests votes from all masters (RAFT-like election)
โจ Winning replica promotes itself to master, claims the failed node's hash slots, and broadcasts new configuration
โฉ Persistence layer: RDB snapshots fork the process for point-in-time backup; AOF logs every write command for crash recovery
โ The Problem
A single Redis instance is limited by the memory of one machine (typically 64โ256GB) and the throughput of one CPU core (Redis is single-threaded for data operations). As datasets grow beyond a single machine's memory and throughput requirements exceed 100K+ operations per second, you need horizontal scaling. Traditional approaches use external coordinators like ZooKeeper, but these introduce a single point of failure and operational complexity that contradicts Redis's philosophy of simplicity.
โThe Solution
Redis Cluster partitions the keyspace into 16,384 hash slots distributed across master nodes using CRC16 hashing. There is no central coordinator โ nodes maintain cluster state via a gossip protocol, exchanging heartbeats and slot ownership maps. Each master can have one or more replicas for high availability. When a master fails, its replicas automatically trigger a RAFT-like election to promote a new master. Clients cache the slot-to-node mapping and redirect automatically via MOVED/ASK responses.
๐Scale at a Glance
16,384
Hash Slots
1,000 nodes
Max Cluster Size
Millions
Operations/sec (cluster)
1โ2 seconds
Failover Time
๐ฌDeep Dive
Hash Slots โ Why 16,384?
Redis Cluster divides the entire keyspace into exactly 16,384 hash slots. Each key is mapped to a slot via CRC16(key) mod 16384. The choice of 16,384 (not 65,536 or more) was deliberate: the slot ownership bitmap must be exchanged in gossip messages between every node pair. At 16,384 slots, the bitmap is only 2KB โ small enough to fit in a single network packet even for large clusters. With 65,536 slots, the bitmap would be 8KB, significantly increasing gossip protocol overhead. Slots are assigned to nodes during cluster creation and can be migrated between nodes for rebalancing without downtime.
Gossip Protocol โ Decentralized Cluster State
Redis Cluster nodes maintain cluster state (slot ownership, node health, configuration epoch) via a gossip protocol. Every second, each node picks a random peer and sends a PING containing its view of the cluster state. The receiving node responds with a PONG containing its own view. Nodes exchange information about other nodes they know about, and state converges across the cluster in O(log N) communication rounds. Failure detection uses a combination of PING/PONG timeouts and quorum voting โ a node is marked as PFAIL (possibly failed) by any node that can't reach it, and as FAIL (confirmed failed) when a majority of masters agree.
Automatic Failover โ RAFT-Like Election
When a master node is marked as FAIL by a majority of the cluster, its replicas initiate an automatic failover. The replica with the most up-to-date replication offset starts a RAFT-inspired election: it increments the cluster configuration epoch and requests votes from all master nodes. A master grants its vote if the requesting replica's master is indeed marked as FAIL and no vote has been granted for that epoch. Once a replica receives votes from a majority of masters, it promotes itself to master status, takes ownership of the failed master's hash slots, and broadcasts the new configuration. The entire failover completes in 1โ2 seconds.
Client-Side Routing and Redirection
Redis Cluster clients maintain a local cache of the slot-to-node mapping (the 'slot table'). When a command is sent to the wrong node (due to a stale slot table), the node returns a MOVED response indicating the correct node for that slot. The client updates its slot table and retries. During slot migration (live resharding), an ASK response indicates a temporary redirect โ the slot is being moved and the client should try the target node for this specific command without updating its cached mapping. Smart clients (like redis-cluster-py, Jedis) handle MOVED/ASK transparently, making the clustering invisible to application code.
Lua Scripting and Hash Tags for Data Locality
Redis Cluster executes commands atomically only when all keys in the command map to the same hash slot. Multi-key operations (MGET, transactions, Lua scripts) across different slots are not supported. To force related keys onto the same slot, Redis supports hash tags: only the substring within curly braces is hashed, so {user:1000}.name and {user:1000}.email both map to the same slot because CRC16('user:1000') is used for both. Lua scripts execute atomically on a single node and can operate on any number of keys โ provided they all share the same hash slot. This constraint is the fundamental trade-off of Redis Cluster: horizontal scalability at the cost of limited cross-key atomicity.
โฌกArchitecture Diagram
Redis Cluster Architecture โ simplified architecture overview
โฆCore Concepts
Hash Slots
CRC16 Hashing
Gossip Protocol
Primary-Replica Replication
Automatic Failover
Lua Scripting
โTradeoffs & Design Decisions
Every architectural decision is a tradeoff. Here's what you gain and what you give up.
โ Strengths
- โNo central coordinator eliminates single point of failure for cluster management
- โ16,384 hash slots keep gossip message overhead minimal even in large clusters
- โAutomatic failover with RAFT-like election achieves 1โ2 second recovery without human intervention
- โClient-side routing via MOVED/ASK makes clustering transparent to most application code
โ Weaknesses
- โMulti-key operations across different hash slots are not supported โ requires careful key design
- โGossip protocol has convergence delay โ cluster state changes take O(log N) rounds to fully propagate
- โAsynchronous replication means acknowledged writes can be lost during master failure before replication
- โHash slot rebalancing during resharding causes temporary ASK redirects, adding latency to affected keys
๐ฏFAANG Interview Questions
Interview Prep๐ก These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.
These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.
- Q1
Explain how Redis Cluster partitions data. Why 16,384 hash slots instead of a larger or smaller number?
- Q2
A Redis Cluster master fails. Walk through the automatic failover process step by step. What data can be lost?
- Q3
How does the gossip protocol work in Redis Cluster? What are the trade-offs vs a centralized coordinator like ZooKeeper?
- Q4
You need to perform a transaction on keys that live on different Redis Cluster nodes. How would you handle this?
- Q5
Design a distributed in-memory cache with automatic sharding and failover. What are your key design decisions?
Listen to the Podcast Episode
Alex & Sam break it down
Listen to a conversational deep-dive on this architecture โ real trade-offs, production context, and student-friendly explanations. Free, no login required.
Listen to EpisodeFree ยท No account required ยท Listen in browser
More Data & Infrastructure
View allSpotify Music Recommendation System
Collaborative filtering, Discover Weekly, and the AudioEmbeddings pipeline
Spotify ยท Apple Music ยท YouTube Music
GitHub Pull Request & CI/CD Pipeline
Git internals, check suites, and the webhook fanout that powers DevOps
GitHub ยท GitLab ยท Bitbucket
LinkedIn Feed Ranking Architecture
Heavyweight ML scoring with online/offline feature pipelines
LinkedIn ยท Facebook ยท Twitter
Listen to more architecture deep-dives
30 free podcast episodes โ Alex & Sam break down every architecture in this library. Listen in your browser, no account needed.
All architecture articles are free ยท No account needed