Home ArchitecturesSpotify Music Recommendation System

🗄️ Data & InfrastructureAdvancedWeek 6

Spotify Music Recommendation System

Collaborative filtering, Discover Weekly, and the AudioEmbeddings pipeline

SpotifyApple MusicYouTube Music

Key Insight

The best recommendation signal isn't user ratings it's implicit feedback (plays, skips, saves) at massive scale.

Request Journey

User plays track→

Stream event to Kafka→

Offline pipeline runs BaRT collaborative filtering daily on 600M user histories→

Candidate tracks generated per user→

Real-time ranking model scores candidates using fresh features (time of day, recent listens)

+1 more steps

How It Works

① User plays track

② Stream event to Kafka

③ Offline pipeline runs BaRT collaborative filtering daily on 600M user histories

④ Candidate tracks generated per user

⑤ Real-time ranking model scores candidates using fresh features (time of day, recent listens)

⑥ Top tracks returned to Discover Weekly

⚠The Problem

Spotify must recommend music to 600M+ usersfrom a catalog of 100M+ tracks, most of which any individual user has never heard. Traditional collaborative filtering struggles with the cold-start problem — new songs and new users have no interaction history. Users expect personalized discovery (Discover Weekly) that feels serendipitous yet relevant, not just popular-hit echo chambers. The recommendation must blend multiple signals: listening history, playlist curation, social context, and even the raw audio itself.

✓The Solution

Spotify's recommendation engine combines three complementary approaches: collaborative filtering via matrix factorization (users who listen to X also like Y), NLP models analyzing playlist titles and music blog text to derive semantic track embeddings, and deep audio CNNs that extract features directly from raw audio spectrograms. These signals are combined in a multi-arm ensemble. Discover Weekly is generated via a massive offline pipeline: Spark-based collaborative filtering → neural embedding generation → approximate nearest-neighbor lookup → personalized playlist assembly.

📊Scale at a Glance

600M+

Monthly Active Users

100M+

Track Catalog

100M+/week

Discover Weekly Listeners

Billions

Daily Listening Events

🔬Deep Dive

Collaborative Filtering via Matrix Factorization

Spotify models the user-track interaction matrix (billions of rows × 100M+ columns) using implicit matrix factorization — factoring the sparse matrix into user and track embedding vectors of ~128 dimensions. Similar users map to nearby points in embedding space, and a user's predicted affinity for an unheard track is the dot product of their embeddings. The model is trained on implicit feedback signals — plays, skips, saves, and playlist additions — weighted by engagement strength. This runs as a massive Apache Spark job processing billions of interaction events.

NLP-Based Track Understanding

Spotify crawls the web for music blogs, reviews, and playlist descriptions, then applies NLP techniques inspired by Word2Vec to derive track embeddings from textual context. If two tracks frequently appear in similar textual contexts ('chill vibes for studying'), their embeddings converge. Playlist titles are especially valuable: a user-created playlist called 'Sad Rainy Day Songs' provides semantic labels that no listening data alone could capture. This approach solves the cold-start problem for new tracks — even with zero listens, a track mentioned in music blogs gets meaningful embeddings.

Audio CNN — Understanding the Sound Itself

For tracks with zero interaction data and no web presence (the extreme cold start), Spotify trains convolutional neural networks directly on raw audio spectrograms. The CNN learns to extract features like tempo, key, energy, instrumentalness, and genre from the audio waveform. These audio embeddings complement collaborative filtering: two tracks that sound similar are embedded nearby even if they've never been co-listened. The audio model uses mel-spectrograms as input and is trained to predict collaborative filtering embeddings as the target, aligning the audio and interaction spaces.

Discover Weekly — The Pipeline Behind the Playlist

Every Monday, Spotify generates a personalized 30-track Discover Weekly playlist for 100M+ users. The pipeline runs over the weekend: (1) Compute user and track embeddings via collaborative filtering on the latest interaction data, (2) For each user, find candidate tracks via approximate nearest-neighbor search in the embedding space, (3) Filter out tracks the user has already heard, (4) Apply diversity rules to avoid genre monotony, (5) Rank final candidates using a multi-objective model balancing predicted engagement with exploration. The entire pipeline processes petabytes of data using Spark on Google Cloud.

A/B Testing and Bandit-Based Exploration

Spotify runs thousands of concurrent A/B experiments on recommendation algorithms. The key tension is exploitation vs exploration: recommending familiar-sounding tracks maximizes short-term engagement, but users who discover new favorites have higher long-term retention. Spotify uses multi-armed bandit approaches to balance this — allocating a fraction of recommendation slots to exploratory tracks from underrepresented genres or new artists. Each experiment tracks both immediate signals (skip rate, save rate) and long-term metrics (30-day retention, subscription conversion) to avoid optimizing for shallow engagement.

⬡Architecture Diagram

Spotify Music Recommendation System — simplified architecture overview

✦Core Concepts

⚙️

Collaborative Filtering

⚙️

Matrix Factorization

⚙️

Word2Vec

⚙️

Audio CNNs

⚙️

A/B Testing

⚙️

Apache Spark

⚖Tradeoffs & Design Decisions

Every architectural decision is a tradeoff. Here's what you gain and what you give up.

✓ Strengths

✓Three-signal ensemble (collaborative + NLP + audio) handles cold-start for new tracks and users
✓Implicit feedback signals (plays, skips) are vastly more abundant than explicit ratings
✓Approximate nearest-neighbor search enables real-time candidate retrieval from 100M+ tracks
✓Multi-armed bandit exploration prevents filter bubble effects and promotes artist diversity

✗ Weaknesses

✗Matrix factorization on billions of interactions requires massive Spark compute infrastructure
✗Audio CNN features can recommend sonically similar but contextually inappropriate tracks
✗Discover Weekly pipeline must complete for 600M+ users every weekend — tight operational window
✗Popularity bias in collaborative filtering systematically disadvantages new and niche artists

🎯FAANG Interview Questions

Interview Prep

💡 These questions appear in FAANG system design rounds. Focus on tradeoffs, not just what the system does.

These are real system design interview questions asked at Google, Meta, Amazon, Apple, Netflix, and Microsoft. Study the architecture above before attempting.

Q1
Design a music recommendation system for 600M users and 100M tracks. What signals would you use and how would you combine them?
Q2
Explain the cold-start problem. A brand-new artist uploads their first track — how do you recommend it with zero listening data?
Q3
How does collaborative filtering via matrix factorization work? What are the computational challenges at Spotify's scale?
Q4
Design Discover Weekly: a personalized 30-track playlist generated weekly for 100M+ users. What's the end-to-end pipeline?
Q5
How would you balance exploitation (recommending what users like) vs exploration (introducing new music) in a recommendation system?

Research Papers & Further Reading

2018

Deep Learning for Audio-based Music Classification and Tagging

Nam, J. et al.

No link

Listen to the Podcast Episode

🎙️ Free Podcast

Alex & Sam break it down

Listen to a conversational deep-dive on this architecture — real trade-offs, production context, and student-friendly explanations. Free, no login required.

Listen to Episode

Free · No account required · Listen in browser