Recommender Systems – Content Filtering vs. Collaborative Filtering
Problem Context and Motivation
• 2006 Netflix Prize framed a now-ubiquitous task: predict which items a user will like based on past behaviour.
• Comparable stakes across all major e-commerce and content platforms (movies, music, news, shopping, policy analysis).
• Graph metaphor: people on one side, items on the other; line thickness = strength of preference.
• Red, missing connections = unknown future preferences to estimate.
• Perfect prediction is impossible because:
We forecast future behaviour from past data.
Human taste is dynamic — it drifts over time.
Matrix Representation of Preferences
• Organise data into a matrix :
• Rows = users , columns = items (movies).
• Cell holds a rating on an agreed scale (e.g. where "hate", "neutral", "love"). • Reality: is sparse (most users have rated only a few items). • Core computational goal: infer the missing ratings .
Content-Based (Feature) Filtering
• Idea: "Tell me what the item is like and what the user likes; I’ll compute the match".
• Steps
Define explicit features (genres, actors, mood…).
Encode each user as a feature vector (e.g. Alice = for comedy 3, action 0).
Encode each item as a feature vector (Matrix = ).
Compute predicted preference via dot product / matrix multiplication.
• For Alice–Matrix: → "She’ll hate it".
• Matrix view:
• User-Feature matrix (size ).
• Feature-Item matrix (size ).
• Predicted ratings .
• Optional scale correction: divide by 8 then round to nearest 0.5 keeps .
• Limitations
• Needs many accurate, human-supplied features → heavy onboarding friction.
• Users often can’t articulate their tastes; some factors are subconscious.
• Model oversimplifies nuanced preference patterns → mediocre accuracy.
Collaborative Filtering (Latent Factor Model)
• Philosophy: "People similar to you liked X; therefore you might like X".
• Reference milestone: 2009 paper by Cohen, Bell & Volinsky.
• Key twist: learn the features directly from rating patterns, not from questionnaires.
• Procedure
Start with sparse .
Factorise where:
• = User-LatentFeature matrix ().
• = LatentFeature-Item matrix ().
• (10–100) ≪ .Use a machine-learning optimiser (e.g. stochastic gradient descent, alternating least squares) to minimise error:
.Resulting vectors capture latent factors (taste dimensions we cannot directly name).
Predict missing entries with ; fill in the whole matrix.
• Interpretation of latent factors
• May correlate with genre, pacing, era, "cult classic" vibe, etc., but labels remain unknown → emergent structure.
• Why accuracy improves
• Factors arise from actual co-rating behaviour, capturing subtle, non-obvious affinities.
• Connection to compression
• Full (millions×thousands) stored as two much smaller matrices.
• Possible because carries pattern, not random noise.
• If ratings were random, low-rank factorisation would incur massive reconstruction error.
Broader Applications & Synthetic Control
• Same math used in policy evaluation ("synthetic control"):
• To gauge effect of gun control or minimum-wage hike in City A, blend data from Cities B, C, D with similar latent characteristics → counterfactual outcome.
• Any domain where users interact with items: e-commerce, social feeds, ads, playlists, academic paper suggestions.
Ethical, Philosophical & Practical Considerations
• Burden-of-input trade-off: content filtering asks intrusive questions, collaborative filtering offloads work to the algorithm.
• Taste manipulation vs. reflection: recommender may shape future preferences, blurring prediction and influence.
• Dynamic preference drift: models must retrain to track temporal changes.
• Privacy: latent factors are learned from many individuals → risk of de-anonymisation or sensitive attribute leakage.
Key Numerical & Formula Recap
• Rating scale: (after normalisation).
• Dot-product example: .
• Normalization step for two-feature demo: divide raw by then round.
• Optimisation objective (regularised MSE) shown above.
Connections to Prior Knowledge
• Matrix factorisation parallels Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) — both uncover low-dimensional structure.
• Compression analogy echoes Fourier or JPEG transforms: leverage redundancy to store information efficiently.
• Links to clustering: latent factors implicitly cluster users/items in feature space.
Summary Bullets
• Recommender problem = predict unknown entries in a user-item rating matrix.
• Content filtering relies on explicit, human-defined features → simple but limited.
• Collaborative filtering discovers latent features via matrix factorisation → higher accuracy, less user burden.
• Latent factors provide both predictive power and a compressed representation.
• Technique generalises from Netflix movies to music, shopping, policy forecasting, etc.
• Success hinges on shared patterns; purely random data defeats the approach.
Problem Context and Motivation
The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
Users are on one side, and items (e.g., movies, products) are on the other.
Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.
Matrix Representation of Preferences
User-item interaction data is conventionally organised into a matrix, denoted as :
Rows of the matrix represent individual users ().
Columns represent specific items (), such as movies.
Each cell contains a known rating provided by user for item , typically on an agreed-upon scale (e.g., , where signifies "hate," "neutral," and "love").
In reality, the matrix is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings () to provide personalised suggestions.
Content-Based (Feature) Filtering
This approach is founded on the principle: *
Problem Context and Motivation
The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
Users are on one side, and items (e.g., movies, products) are on the other.
Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.
Matrix Representation of Preferences
User-item interaction data is conventionally organised into a matrix, denoted as :
Rows of the matrix represent individual users ().
Columns represent specific items (), such as movies.
Each cell contains a known rating provided by user for item , typically on an agreed-upon scale (e.g., , where signifies "hate," "neutral," and "love").
In reality, the matrix is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings () to provide personalised suggestions.
Content-Based (Feature) Filtering
This approach is founded on the principle: *
Problem Context and Motivation
The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
Users are on one side, and items (e.g., movies, products) are on the other.
Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.
Matrix Representation of Preferences
User-item interaction data is conventionally organised into a matrix, denoted as :
Rows of the matrix represent individual users ().
Columns represent specific items (), such as movies.
Each cell contains a known rating provided by user for item , typically on an agreed-upon scale (e.g., , where signifies "hate," "neutral," and "love").
In reality, the matrix is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings () to provide personalised suggestions.
Content-Based (Feature) Filtering
This approach is founded on the principle: *