Home ArchitecturesDropbox Block-Level Sync Architecture

🗄️ Data & InfrastructureAdvancedWeek 7

Dropbox Block-Level Sync Architecture

Delta sync, content-addressing, and conflict resolution

DropboxGoogle DriveOneDrive

Key Insight

Content-addressable storage means identical blocks across different files are stored once massive deduplication at scale.

Request Journey

File saved on desktop→

Dropbox splits file into 4MB blocks, SHA-256 hash computed per block→

Client checks which blocks already exist on server→

Only new/changed blocks uploaded to S3→

Metadata DB updated, other signed-in devices notified via long-poll

+1 more steps

How It Works

① File saved on desktop

② Dropbox splits file into 4MB blocks, SHA-256 hash computed per block

③ Client checks which blocks already exist on server

④ Only new/changed blocks uploaded to S3

⑤ Metadata DB updated, other signed-in devices notified via long-poll

⑥ LAN sync transfers blocks peer-to-peer if devices on same network

⚠The Problem

Dropbox must synchronize files across millionsof devices in near-real-time. A user edits a 2GB PowerPoint file on their laptop and expects it to appear on their phone within seconds. Uploading the entire file on every save would be prohibitively slow on typical upload speeds. Conflicts arise when two devices edit the same file offline. The system must work across flaky mobile networks, handle files of any size, and minimize both bandwidth and storage costs across 700M+ registered users.

✓The Solution

Dropbox splits every file into content-addressed blocks (SHA-256 of each 4MB chunk). Only modified blocks are uploaded — editing one slide in a 2GB presentation transmits only the changed 4MB block. The Block Server stores unique blocks in S3-compatible storage; the Metadata Server tracks which blocks compose each file version. Identical blocks across different users are stored once (global deduplication). A desktop daemon monitors filesystem events (inotify/FSEvents/ReadDirectoryChangesW) to detect changes instantly.

📊Scale at a Glance

700M+

Registered Users

Hundreds of Billions

Files Stored

4MB

Block Size

<1 sec

Sync Latency (LAN)

🔬Deep Dive

Content-Addressable Block Storage

Dropbox splits files into blocks (typically 4MB) and computes a SHA-256 hash of each block. The hash serves as both the block's identifier and its integrity check. Two identical blocks — even across different files or different users — produce the same hash and are stored only once. This content-addressable storage provides massive deduplication: common files (OS installers, popular PDFs) are stored once globally but appear in millions of accounts. When uploading, the client first sends block hashes to the server; if a hash already exists, the block doesn't need to be uploaded at all.

Delta Sync — Minimizing Transfer Size

When a user modifies a file, the desktop client re-chunks the file and computes new block hashes. Only blocks whose hashes have changed are uploaded — typically a small fraction of the total file. For a 2GB presentation where one slide was edited, perhaps only one or two 4MB blocks need transmission. The Metadata Server is updated with the new block list for that file version. Dropbox also applies streaming compression (LZ4) to blocks before transmission, further reducing bandwidth. For tiny edits within a block, sub-block binary diffing can reduce the transfer to just the changed bytes.

Metadata Server — The Source of Truth

The Metadata Server maintains the authoritative mapping of file paths to block lists, along with version history, permissions, and sharing metadata. It's backed by a sharded MySQL cluster with write-ahead logging for durability. Every file operation (create, modify, delete, move, rename) generates a journal entry with a monotonically increasing server-side sequence number (cursor). Clients poll for journal entries since their last known cursor to discover changes. This cursor-based sync protocol ensures clients never miss an update and can resume sync after any disconnection by replaying from their last cursor.

Conflict Resolution — Last-Write-Wins with Safety Net

When two devices edit the same file before syncing, a conflict occurs. Dropbox uses a last-writer-wins strategy based on server-side timestamps — the later modification becomes the canonical version. However, the 'losing' version is not silently discarded: Dropbox creates a 'conflicted copy' file (named with the device and timestamp) so users can manually reconcile differences. For collaborative editing scenarios, Dropbox Sync APIs provide operational transform support. This approach prioritizes data preservation over automatic resolution — no user data is ever silently lost.

LAN Sync — Peer-to-Peer on Local Networks

If two Dropbox clients are on the same local network, LAN Sync enables direct peer-to-peer block transfer without routing through Dropbox's servers. Clients broadcast their presence on the local network via UDP, discover peers with shared folders, and transfer blocks directly over the LAN at full local network speed (typically 100Mbps–1Gbps vs. potentially slow internet upload). This is transformative in office environments where multiple team members share the same folders — large file syncs complete in seconds instead of minutes. The feature reduces Dropbox's bandwidth costs while dramatically improving user experience.

⬡Architecture Diagram

Dropbox Block-Level Sync Architecture — simplified architecture overview

✦Core Concepts

📚

Content-Addressable Storage

🔁

Delta Sync

⚙️

Block Chunking

⚙️

Conflict Resolution

🔁

LAN Sync

⚙️

Metadata Server

⚖Tradeoffs & Design Decisions

Every architectural decision is a tradeoff. Here's what you gain and what you give up.

✓ Strengths

✓Content-addressable blocks enable global deduplication across hundreds of millions of users
✓Delta sync transmits only changed blocks, reducing bandwidth by 90%+ for large file edits
✓Cursor-based sync protocol is resilient to disconnection and enables reliable resume
✓LAN Sync provides near-instant transfers on local networks without server round-trips

✗ Weaknesses

✗4MB fixed block size is suboptimal for both very small files (overhead) and very large files (granularity)
✗SHA-256 computation on large files consumes significant client CPU and delays initial sync detection
✗Conflict resolution via 'conflicted copy' files creates user confusion and manual reconciliation burden
✗Metadata Server is a centralized bottleneck — must be highly available and horizontally sharded

🎯FAANG Interview Questions

Interview Prep

💡 These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.

These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.

Q1
Design a file synchronization system like Dropbox. How would you minimize bandwidth usage for large file edits?
Q2
Explain content-addressable storage. Two users independently upload the same 500MB file — what happens?
Q3
How would you handle conflict resolution when two devices edit the same file offline? What are the trade-offs of different approaches?
Q4
Design the metadata storage for a file sync system serving 700M users. What database would you choose and how would you shard it?
Q5
Your sync client needs to detect file changes on the local filesystem instantly. How would you implement this on different operating systems?

Listen to the Podcast Episode

🎙️ Free Podcast

Alex & Sam break it down

Listen to a conversational deep-dive on this architecture — real trade-offs, production context, and student-friendly explanations. Free, no login required.

Listen to Episode

Free · No account required · Listen in browser