Spotify Music Recommendation System
Collaborative filtering, Discover Weekly, and the AudioEmbeddings pipeline
Key Insight
The best recommendation signal isn't user ratings it's implicit feedback (plays, skips, saves) at massive scale.
Request Journey
How It Works
โ User plays track
โก Stream event to Kafka
โข Offline pipeline runs BaRT collaborative filtering daily on 600M user histories
โฃ Candidate tracks generated per user
โค Real-time ranking model scores candidates using fresh features (time of day, recent listens)
โฅ Top tracks returned to Discover Weekly
โ The Problem
Spotify must recommend music to 600M+ usersfrom a catalog of 100M+ tracks, most of which any individual user has never heard. Traditional collaborative filtering struggles with the cold-start problem โ new songs and new users have no interaction history. Users expect personalized discovery (Discover Weekly) that feels serendipitous yet relevant, not just popular-hit echo chambers. The recommendation must blend multiple signals: listening history, playlist curation, social context, and even the raw audio itself.
โThe Solution
Spotify's recommendation engine combines three complementary approaches: collaborative filtering via matrix factorization (users who listen to X also like Y), NLP models analyzing playlist titles and music blog text to derive semantic track embeddings, and deep audio CNNs that extract features directly from raw audio spectrograms. These signals are combined in a multi-arm ensemble. Discover Weekly is generated via a massive offline pipeline: Spark-based collaborative filtering โ neural embedding generation โ approximate nearest-neighbor lookup โ personalized playlist assembly.
๐Scale at a Glance
600M+
Monthly Active Users
100M+
Track Catalog
100M+/week
Discover Weekly Listeners
Billions
Daily Listening Events
๐ฌDeep Dive
Collaborative Filtering via Matrix Factorization
Spotify models the user-track interaction matrix (billions of rows ร 100M+ columns) using implicit matrix factorization โ factoring the sparse matrix into user and track embedding vectors of ~128 dimensions. Similar users map to nearby points in embedding space, and a user's predicted affinity for an unheard track is the dot product of their embeddings. The model is trained on implicit feedback signals โ plays, skips, saves, and playlist additions โ weighted by engagement strength. This runs as a massive Apache Spark job processing billions of interaction events.
NLP-Based Track Understanding
Spotify crawls the web for music blogs, reviews, and playlist descriptions, then applies NLP techniques inspired by Word2Vec to derive track embeddings from textual context. If two tracks frequently appear in similar textual contexts ('chill vibes for studying'), their embeddings converge. Playlist titles are especially valuable: a user-created playlist called 'Sad Rainy Day Songs' provides semantic labels that no listening data alone could capture. This approach solves the cold-start problem for new tracks โ even with zero listens, a track mentioned in music blogs gets meaningful embeddings.
Audio CNN โ Understanding the Sound Itself
For tracks with zero interaction data and no web presence (the extreme cold start), Spotify trains convolutional neural networks directly on raw audio spectrograms. The CNN learns to extract features like tempo, key, energy, instrumentalness, and genre from the audio waveform. These audio embeddings complement collaborative filtering: two tracks that sound similar are embedded nearby even if they've never been co-listened. The audio model uses mel-spectrograms as input and is trained to predict collaborative filtering embeddings as the target, aligning the audio and interaction spaces.
Discover Weekly โ The Pipeline Behind the Playlist
Every Monday, Spotify generates a personalized 30-track Discover Weekly playlist for 100M+ users. The pipeline runs over the weekend: (1) Compute user and track embeddings via collaborative filtering on the latest interaction data, (2) For each user, find candidate tracks via approximate nearest-neighbor search in the embedding space, (3) Filter out tracks the user has already heard, (4) Apply diversity rules to avoid genre monotony, (5) Rank final candidates using a multi-objective model balancing predicted engagement with exploration. The entire pipeline processes petabytes of data using Spark on Google Cloud.
A/B Testing and Bandit-Based Exploration
Spotify runs thousands of concurrent A/B experiments on recommendation algorithms. The key tension is exploitation vs exploration: recommending familiar-sounding tracks maximizes short-term engagement, but users who discover new favorites have higher long-term retention. Spotify uses multi-armed bandit approaches to balance this โ allocating a fraction of recommendation slots to exploratory tracks from underrepresented genres or new artists. Each experiment tracks both immediate signals (skip rate, save rate) and long-term metrics (30-day retention, subscription conversion) to avoid optimizing for shallow engagement.
โฌกArchitecture Diagram
Spotify Music Recommendation System โ simplified architecture overview
โฆCore Concepts
Collaborative Filtering
Matrix Factorization
Word2Vec
Audio CNNs
A/B Testing
Apache Spark
โTradeoffs & Design Decisions
Every architectural decision is a tradeoff. Here's what you gain and what you give up.
โ Strengths
- โThree-signal ensemble (collaborative + NLP + audio) handles cold-start for new tracks and users
- โImplicit feedback signals (plays, skips) are vastly more abundant than explicit ratings
- โApproximate nearest-neighbor search enables real-time candidate retrieval from 100M+ tracks
- โMulti-armed bandit exploration prevents filter bubble effects and promotes artist diversity
โ Weaknesses
- โMatrix factorization on billions of interactions requires massive Spark compute infrastructure
- โAudio CNN features can recommend sonically similar but contextually inappropriate tracks
- โDiscover Weekly pipeline must complete for 600M+ users every weekend โ tight operational window
- โPopularity bias in collaborative filtering systematically disadvantages new and niche artists
๐ฏFAANG Interview Questions
Interview Prep๐ก These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.
These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.
- Q1
Design a music recommendation system for 600M users and 100M tracks. What signals would you use and how would you combine them?
- Q2
Explain the cold-start problem. A brand-new artist uploads their first track โ how do you recommend it with zero listening data?
- Q3
How does collaborative filtering via matrix factorization work? What are the computational challenges at Spotify's scale?
- Q4
Design Discover Weekly: a personalized 30-track playlist generated weekly for 100M+ users. What's the end-to-end pipeline?
- Q5
How would you balance exploitation (recommending what users like) vs exploration (introducing new music) in a recommendation system?
Research Papers & Further Reading
Deep Learning for Audio-based Music Classification and Tagging
Nam, J. et al.
Listen to the Podcast Episode
Alex & Sam break it down
Listen to a conversational deep-dive on this architecture โ real trade-offs, production context, and student-friendly explanations. Free, no login required.
Listen to EpisodeFree ยท No account required ยท Listen in browser
More Data & Infrastructure
View allGitHub Pull Request & CI/CD Pipeline
Git internals, check suites, and the webhook fanout that powers DevOps
GitHub ยท GitLab ยท Bitbucket
LinkedIn Feed Ranking Architecture
Heavyweight ML scoring with online/offline feature pipelines
LinkedIn ยท Facebook ยท Twitter
Dropbox Block-Level Sync Architecture
Delta sync, content-addressing, and conflict resolution
Dropbox ยท Google Drive ยท OneDrive
Listen to more architecture deep-dives
30 free podcast episodes โ Alex & Sam break down every architecture in this library. Listen in your browser, no account needed.
All architecture articles are free ยท No account needed