Dropbox Block-Level Sync Architecture
Delta sync, content-addressing, and conflict resolution
Key Insight
Content-addressable storage means identical blocks across different files are stored once massive deduplication at scale.
Request Journey
How It Works
โ File saved on desktop
โก Dropbox splits file into 4MB blocks, SHA-256 hash computed per block
โข Client checks which blocks already exist on server
โฃ Only new/changed blocks uploaded to S3
โค Metadata DB updated, other signed-in devices notified via long-poll
โฅ LAN sync transfers blocks peer-to-peer if devices on same network
โ The Problem
Dropbox must synchronize files across millionsof devices in near-real-time. A user edits a 2GB PowerPoint file on their laptop and expects it to appear on their phone within seconds. Uploading the entire file on every save would be prohibitively slow on typical upload speeds. Conflicts arise when two devices edit the same file offline. The system must work across flaky mobile networks, handle files of any size, and minimize both bandwidth and storage costs across 700M+ registered users.
โThe Solution
Dropbox splits every file into content-addressed blocks (SHA-256 of each 4MB chunk). Only modified blocks are uploaded โ editing one slide in a 2GB presentation transmits only the changed 4MB block. The Block Server stores unique blocks in S3-compatible storage; the Metadata Server tracks which blocks compose each file version. Identical blocks across different users are stored once (global deduplication). A desktop daemon monitors filesystem events (inotify/FSEvents/ReadDirectoryChangesW) to detect changes instantly.
๐Scale at a Glance
700M+
Registered Users
Hundreds of Billions
Files Stored
4MB
Block Size
<1 sec
Sync Latency (LAN)
๐ฌDeep Dive
Content-Addressable Block Storage
Dropbox splits files into blocks (typically 4MB) and computes a SHA-256 hash of each block. The hash serves as both the block's identifier and its integrity check. Two identical blocks โ even across different files or different users โ produce the same hash and are stored only once. This content-addressable storage provides massive deduplication: common files (OS installers, popular PDFs) are stored once globally but appear in millions of accounts. When uploading, the client first sends block hashes to the server; if a hash already exists, the block doesn't need to be uploaded at all.
Delta Sync โ Minimizing Transfer Size
When a user modifies a file, the desktop client re-chunks the file and computes new block hashes. Only blocks whose hashes have changed are uploaded โ typically a small fraction of the total file. For a 2GB presentation where one slide was edited, perhaps only one or two 4MB blocks need transmission. The Metadata Server is updated with the new block list for that file version. Dropbox also applies streaming compression (LZ4) to blocks before transmission, further reducing bandwidth. For tiny edits within a block, sub-block binary diffing can reduce the transfer to just the changed bytes.
Metadata Server โ The Source of Truth
The Metadata Server maintains the authoritative mapping of file paths to block lists, along with version history, permissions, and sharing metadata. It's backed by a sharded MySQL cluster with write-ahead logging for durability. Every file operation (create, modify, delete, move, rename) generates a journal entry with a monotonically increasing server-side sequence number (cursor). Clients poll for journal entries since their last known cursor to discover changes. This cursor-based sync protocol ensures clients never miss an update and can resume sync after any disconnection by replaying from their last cursor.
Conflict Resolution โ Last-Write-Wins with Safety Net
When two devices edit the same file before syncing, a conflict occurs. Dropbox uses a last-writer-wins strategy based on server-side timestamps โ the later modification becomes the canonical version. However, the 'losing' version is not silently discarded: Dropbox creates a 'conflicted copy' file (named with the device and timestamp) so users can manually reconcile differences. For collaborative editing scenarios, Dropbox Sync APIs provide operational transform support. This approach prioritizes data preservation over automatic resolution โ no user data is ever silently lost.
LAN Sync โ Peer-to-Peer on Local Networks
If two Dropbox clients are on the same local network, LAN Sync enables direct peer-to-peer block transfer without routing through Dropbox's servers. Clients broadcast their presence on the local network via UDP, discover peers with shared folders, and transfer blocks directly over the LAN at full local network speed (typically 100Mbpsโ1Gbps vs. potentially slow internet upload). This is transformative in office environments where multiple team members share the same folders โ large file syncs complete in seconds instead of minutes. The feature reduces Dropbox's bandwidth costs while dramatically improving user experience.
โฌกArchitecture Diagram
Dropbox Block-Level Sync Architecture โ simplified architecture overview
โฆCore Concepts
Content-Addressable Storage
Delta Sync
Block Chunking
Conflict Resolution
LAN Sync
Metadata Server
โTradeoffs & Design Decisions
Every architectural decision is a tradeoff. Here's what you gain and what you give up.
โ Strengths
- โContent-addressable blocks enable global deduplication across hundreds of millions of users
- โDelta sync transmits only changed blocks, reducing bandwidth by 90%+ for large file edits
- โCursor-based sync protocol is resilient to disconnection and enables reliable resume
- โLAN Sync provides near-instant transfers on local networks without server round-trips
โ Weaknesses
- โ4MB fixed block size is suboptimal for both very small files (overhead) and very large files (granularity)
- โSHA-256 computation on large files consumes significant client CPU and delays initial sync detection
- โConflict resolution via 'conflicted copy' files creates user confusion and manual reconciliation burden
- โMetadata Server is a centralized bottleneck โ must be highly available and horizontally sharded
๐ฏFAANG Interview Questions
Interview Prep๐ก These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.
These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.
- Q1
Design a file synchronization system like Dropbox. How would you minimize bandwidth usage for large file edits?
- Q2
Explain content-addressable storage. Two users independently upload the same 500MB file โ what happens?
- Q3
How would you handle conflict resolution when two devices edit the same file offline? What are the trade-offs of different approaches?
- Q4
Design the metadata storage for a file sync system serving 700M users. What database would you choose and how would you shard it?
- Q5
Your sync client needs to detect file changes on the local filesystem instantly. How would you implement this on different operating systems?
Listen to the Podcast Episode
Alex & Sam break it down
Listen to a conversational deep-dive on this architecture โ real trade-offs, production context, and student-friendly explanations. Free, no login required.
Listen to EpisodeFree ยท No account required ยท Listen in browser
More Data & Infrastructure
View allSpotify Music Recommendation System
Collaborative filtering, Discover Weekly, and the AudioEmbeddings pipeline
Spotify ยท Apple Music ยท YouTube Music
GitHub Pull Request & CI/CD Pipeline
Git internals, check suites, and the webhook fanout that powers DevOps
GitHub ยท GitLab ยท Bitbucket
LinkedIn Feed Ranking Architecture
Heavyweight ML scoring with online/offline feature pipelines
LinkedIn ยท Facebook ยท Twitter
Listen to more architecture deep-dives
30 free podcast episodes โ Alex & Sam break down every architecture in this library. Listen in your browser, no account needed.
All architecture articles are free ยท No account needed