HomeArchitecturesRedis Cluster Architecture
๐Ÿ—„๏ธ Data & InfrastructureAdvancedWeek 8

Redis Cluster Architecture

Hash slots, gossip protocol, and replication without a coordinator

RedisAzure CacheAWS ElastiCache

Key Insight

16,384 slots (not 65,536) was chosen so that the cluster state (slot ownership bitmap) fits in a single Redis message.

Request Journey

Client sends SET/GET command to smart client libraryโ†’
Smart client computes CRC16(key) mod 16384 to determine target hash slotโ†’
Cached slot table maps slot to the correct master nodeโ†’
If routed to wrong node (stale table), master replies with MOVED redirect containing correct node addressโ†’
Master writes to in-memory data structure and asynchronously streams to replica via replication backlog
+5 more steps

How It Works

1

โ‘  Client sends SET/GET command to smart client library

2

โ‘ก Smart client computes CRC16(key) mod 16384 to determine target hash slot

3

โ‘ข Cached slot table maps slot to the correct master node

4

โ‘ฃ If routed to wrong node (stale table), master replies with MOVED redirect containing correct node address

5

โ‘ค Master writes to in-memory data structure and asynchronously streams to replica via replication backlog

6

โ‘ฅ Gossip protocol exchanges PING/PONG heartbeats every second, propagating slot ownership and node health across cluster

7

โ‘ฆ If a master stops responding, peers mark it PFAIL; majority agreement escalates to FAIL

8

โ‘ง Replica with latest replication offset increments config epoch and requests votes from all masters (RAFT-like election)

9

โ‘จ Winning replica promotes itself to master, claims the failed node's hash slots, and broadcasts new configuration

10

โ‘ฉ Persistence layer: RDB snapshots fork the process for point-in-time backup; AOF logs every write command for crash recovery

โš The Problem

A single Redis instance is limited by the memory of one machine (typically 64โ€“256GB) and the throughput of one CPU core (Redis is single-threaded for data operations). As datasets grow beyond a single machine's memory and throughput requirements exceed 100K+ operations per second, you need horizontal scaling. Traditional approaches use external coordinators like ZooKeeper, but these introduce a single point of failure and operational complexity that contradicts Redis's philosophy of simplicity.

โœ“The Solution

Redis Cluster partitions the keyspace into 16,384 hash slots distributed across master nodes using CRC16 hashing. There is no central coordinator โ€” nodes maintain cluster state via a gossip protocol, exchanging heartbeats and slot ownership maps. Each master can have one or more replicas for high availability. When a master fails, its replicas automatically trigger a RAFT-like election to promote a new master. Clients cache the slot-to-node mapping and redirect automatically via MOVED/ASK responses.

๐Ÿ“ŠScale at a Glance

16,384

Hash Slots

1,000 nodes

Max Cluster Size

Millions

Operations/sec (cluster)

1โ€“2 seconds

Failover Time

๐Ÿ”ฌDeep Dive

1

Hash Slots โ€” Why 16,384?

Redis Cluster divides the entire keyspace into exactly 16,384 hash slots. Each key is mapped to a slot via CRC16(key) mod 16384. The choice of 16,384 (not 65,536 or more) was deliberate: the slot ownership bitmap must be exchanged in gossip messages between every node pair. At 16,384 slots, the bitmap is only 2KB โ€” small enough to fit in a single network packet even for large clusters. With 65,536 slots, the bitmap would be 8KB, significantly increasing gossip protocol overhead. Slots are assigned to nodes during cluster creation and can be migrated between nodes for rebalancing without downtime.

2

Gossip Protocol โ€” Decentralized Cluster State

Redis Cluster nodes maintain cluster state (slot ownership, node health, configuration epoch) via a gossip protocol. Every second, each node picks a random peer and sends a PING containing its view of the cluster state. The receiving node responds with a PONG containing its own view. Nodes exchange information about other nodes they know about, and state converges across the cluster in O(log N) communication rounds. Failure detection uses a combination of PING/PONG timeouts and quorum voting โ€” a node is marked as PFAIL (possibly failed) by any node that can't reach it, and as FAIL (confirmed failed) when a majority of masters agree.

3

Automatic Failover โ€” RAFT-Like Election

When a master node is marked as FAIL by a majority of the cluster, its replicas initiate an automatic failover. The replica with the most up-to-date replication offset starts a RAFT-inspired election: it increments the cluster configuration epoch and requests votes from all master nodes. A master grants its vote if the requesting replica's master is indeed marked as FAIL and no vote has been granted for that epoch. Once a replica receives votes from a majority of masters, it promotes itself to master status, takes ownership of the failed master's hash slots, and broadcasts the new configuration. The entire failover completes in 1โ€“2 seconds.

4

Client-Side Routing and Redirection

Redis Cluster clients maintain a local cache of the slot-to-node mapping (the 'slot table'). When a command is sent to the wrong node (due to a stale slot table), the node returns a MOVED response indicating the correct node for that slot. The client updates its slot table and retries. During slot migration (live resharding), an ASK response indicates a temporary redirect โ€” the slot is being moved and the client should try the target node for this specific command without updating its cached mapping. Smart clients (like redis-cluster-py, Jedis) handle MOVED/ASK transparently, making the clustering invisible to application code.

5

Lua Scripting and Hash Tags for Data Locality

Redis Cluster executes commands atomically only when all keys in the command map to the same hash slot. Multi-key operations (MGET, transactions, Lua scripts) across different slots are not supported. To force related keys onto the same slot, Redis supports hash tags: only the substring within curly braces is hashed, so {user:1000}.name and {user:1000}.email both map to the same slot because CRC16('user:1000') is used for both. Lua scripts execute atomically on a single node and can operate on any number of keys โ€” provided they all share the same hash slot. This constraint is the fundamental trade-off of Redis Cluster: horizontal scalability at the cost of limited cross-key atomicity.

โฌกArchitecture Diagram

Redis Cluster Architecture โ€” simplified architecture overview

โœฆCore Concepts

๐Ÿ”‘

Hash Slots

๐Ÿ”‘

CRC16 Hashing

โš™๏ธ

Gossip Protocol

๐Ÿ”

Primary-Replica Replication

๐Ÿง 

Automatic Failover

โš™๏ธ

Lua Scripting

โš–Tradeoffs & Design Decisions

Every architectural decision is a tradeoff. Here's what you gain and what you give up.

โœ“ Strengths

  • โœ“No central coordinator eliminates single point of failure for cluster management
  • โœ“16,384 hash slots keep gossip message overhead minimal even in large clusters
  • โœ“Automatic failover with RAFT-like election achieves 1โ€“2 second recovery without human intervention
  • โœ“Client-side routing via MOVED/ASK makes clustering transparent to most application code

โœ— Weaknesses

  • โœ—Multi-key operations across different hash slots are not supported โ€” requires careful key design
  • โœ—Gossip protocol has convergence delay โ€” cluster state changes take O(log N) rounds to fully propagate
  • โœ—Asynchronous replication means acknowledged writes can be lost during master failure before replication
  • โœ—Hash slot rebalancing during resharding causes temporary ASK redirects, adding latency to affected keys

๐ŸŽฏFAANG Interview Questions

Interview Prep

๐Ÿ’ก These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.

These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.

  1. Q1

    Explain how Redis Cluster partitions data. Why 16,384 hash slots instead of a larger or smaller number?

  2. Q2

    A Redis Cluster master fails. Walk through the automatic failover process step by step. What data can be lost?

  3. Q3

    How does the gossip protocol work in Redis Cluster? What are the trade-offs vs a centralized coordinator like ZooKeeper?

  4. Q4

    You need to perform a transaction on keys that live on different Redis Cluster nodes. How would you handle this?

  5. Q5

    Design a distributed in-memory cache with automatic sharding and failover. What are your key design decisions?

Listen to the Podcast Episode

๐ŸŽ™๏ธ Free Podcast

Alex & Sam break it down

Listen to a conversational deep-dive on this architecture โ€” real trade-offs, production context, and student-friendly explanations. Free, no login required.

Listen to Episode

Free ยท No account required ยท Listen in browser

More Data & Infrastructure

View all
๐ŸŽ™๏ธ Podcast ยท All Free

Listen to more architecture deep-dives

30 free podcast episodes โ€” Alex & Sam break down every architecture in this library. Listen in your browser, no account needed.

All architecture articles are free ยท No account needed