Recommender Systems – Content Filtering vs. Collaborative Filtering

Problem Context and Motivation

• 2006 Netflix Prize framed a now-ubiquitous task: predict which items a user will like based on past behaviour.
• Comparable stakes across all major e-commerce and content platforms (movies, music, news, shopping, policy analysis).
• Graph metaphor: people on one side, items on the other; line thickness = strength of preference.
• Red, missing connections = unknown future preferences to estimate.
• Perfect prediction is impossible because:

We forecast future behaviour from past data.
Human taste is dynamic — it drifts over time.

Matrix Representation of Preferences

• Organise data into a matrix $R$ :
• Rows = users $u$ , columns = items $i$ (movies).
• Cell $R{ui}$ holds a rating on an agreed scale (e.g. $0!\to!4$ where $0$ "hate", $2$ "neutral", $4$ "love"). • Reality: $R$ is sparse (most users have rated only a few items). • Core computational goal: infer the missing ratings $(R{ui}=\ ?\,)$ .

Content-Based (Feature) Filtering

• Idea: "Tell me what the item is like and what the user likes; I’ll compute the match".
• Steps

Define explicit features (genres, actors, mood…).
Encode each user as a feature vector $P_{u*}$ (e.g. Alice = $[3,0]$ for comedy 3, action 0).
Encode each item as a feature vector $Q_{*i}$ (Matrix = $[0,4]$ ).
Compute predicted preference via dot product / matrix multiplication.
• For Alice–Matrix: $3\times0 + 0\times4 = 0$ → "She’ll hate it".
• Matrix view:
• User-Feature matrix $P$ (size $U\times K$ ).
• Feature-Item matrix $Q$ (size $K\times I$ ).
• Predicted ratings $\hat R = P\;Q$ .
• Optional scale correction: divide by 8 then round to nearest 0.5 keeps $\hat R\in[0,4]$ .
• Limitations
• Needs many accurate, human-supplied features → heavy onboarding friction.
• Users often can’t articulate their tastes; some factors are subconscious.
• Model oversimplifies nuanced preference patterns → mediocre accuracy.

Collaborative Filtering (Latent Factor Model)

• Philosophy: "People similar to you liked X; therefore you might like X".
• Reference milestone: 2009 paper by Cohen, Bell & Volinsky.
• Key twist: learn the features directly from rating patterns, not from questionnaires.
• Procedure

Start with sparse $R$ .
Factorise $R \approx P\,Q$ where:
• $P$ = User-LatentFeature matrix ( $U\times K$ ).
• $Q$ = LatentFeature-Item matrix ( $K\times I$ ).
• $K$ (10–100) ≪ $\min(U,I)$ .
Use a machine-learning optimiser (e.g. stochastic gradient descent, alternating least squares) to minimise error:
$\min{P,Q} \sum{(u,i)\,\in\,\text{known}} (R{ui}-Pu Qi)^2 + \lambda (|Pu|^2+|Q_i|^2)$ .
Resulting vectors capture latent factors (taste dimensions we cannot directly name).
Predict missing entries with $\hat R = P\,Q$ ; fill in the whole matrix.
• Interpretation of latent factors
• May correlate with genre, pacing, era, "cult classic" vibe, etc., but labels remain unknown → emergent structure.
• Why accuracy improves
• Factors arise from actual co-rating behaviour, capturing subtle, non-obvious affinities.
• Connection to compression
• Full $R$ (millions×thousands) stored as two much smaller matrices.
• Possible because $R$ carries pattern, not random noise.
• If ratings were random, low-rank factorisation would incur massive reconstruction error.

Broader Applications & Synthetic Control

• Same math used in policy evaluation ("synthetic control"):
• To gauge effect of gun control or minimum-wage hike in City A, blend data from Cities B, C, D with similar latent characteristics → counterfactual outcome.
• Any domain where users interact with items: e-commerce, social feeds, ads, playlists, academic paper suggestions.

Ethical, Philosophical & Practical Considerations

• Burden-of-input trade-off: content filtering asks intrusive questions, collaborative filtering offloads work to the algorithm.
• Taste manipulation vs. reflection: recommender may shape future preferences, blurring prediction and influence.
• Dynamic preference drift: models must retrain to track temporal changes.
• Privacy: latent factors are learned from many individuals → risk of de-anonymisation or sensitive attribute leakage.

Key Numerical & Formula Recap

• Rating scale: $0,0.5,1,1.5,\dots,4$ (after normalisation).
• Dot-product example: $\text{score}{user,item}=\sumk P{uk}Q{ki}$ .
• Normalization step for two-feature demo: divide raw $P\,Q$ by $8$ then round.
• Optimisation objective (regularised MSE) shown above.

Connections to Prior Knowledge

• Matrix factorisation parallels Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) — both uncover low-dimensional structure.
• Compression analogy echoes Fourier or JPEG transforms: leverage redundancy to store information efficiently.
• Links to clustering: latent factors implicitly cluster users/items in feature space.

Summary Bullets

• Recommender problem = predict unknown entries in a user-item rating matrix.
• Content filtering relies on explicit, human-defined features → simple but limited.
• Collaborative filtering discovers latent features via matrix factorisation → higher accuracy, less user burden.
• Latent factors provide both predictive power and a compressed representation.
• Technique generalises from Netflix movies to music, shopping, policy forecasting, etc.
• Success hinges on shared patterns; purely random data defeats the approach.

Problem Context and Motivation

The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
- Users are on one side, and items (e.g., movies, products) are on the other.
- Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
- The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
- Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
1. We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
2. Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
3. Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.

Matrix Representation of Preferences

User-item interaction data is conventionally organised into a matrix, denoted as $R$ :
- Rows of the matrix represent individual users ( $u$ ).
- Columns represent specific items ( $i$ ), such as movies.
- Each cell $R_{ui}$ contains a known rating provided by user $u$ for item $i$ , typically on an agreed-upon scale (e.g., $0\to4$ , where $0$ signifies "hate," $2$ "neutral," and $4$ "love").
In reality, the matrix $R$ is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings ( $R_{ui}=?$ ) to provide personalised suggestions.

Content-Based (Feature) Filtering

This approach is founded on the principle: *

Problem Context and Motivation

The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
- Users are on one side, and items (e.g., movies, products) are on the other.
- Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
- The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
- Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
1. We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
2. Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
3. Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.

Matrix Representation of Preferences

User-item interaction data is conventionally organised into a matrix, denoted as $R$ :
- Rows of the matrix represent individual users ( $u$ ).
- Columns represent specific items ( $i$ ), such as movies.
- Each cell $R_{ui}$ contains a known rating provided by user $u$ for item $i$ , typically on an agreed-upon scale (e.g., $0\to4$ , where $0$ signifies "hate," $2$ "neutral," and $4$ "love").
In reality, the matrix $R$ is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings ( $R_{ui}=?$ ) to provide personalised suggestions.

Content-Based (Feature) Filtering

This approach is founded on the principle: *

Problem Context and Motivation

The 2006 Netflix Prize fundamentally shaped the now-widespread challenge of predicting user preferences for items based on their historical behavior. This task is crucial for all major e-commerce and content streaming platforms, spanning domains like movies, music, news, retail, and even sophisticated applications such as policy analysis.
This problem can be visualised using a graph metaphor:
- Users are on one side, and items (e.g., movies, products) are on the other.
- Connections (edges) between users and items represent known preferences (e.g., a rating given by a user to a movie).
- The thickness of these lines can denote the strength or intensity of the preference (e.g., a higher rating).
- Red or missing connections signify unknown future preferences that the system aims to estimate.
Achieving perfect prediction is inherently impossible for several reasons:
1. We are forecasting future behavior using only past data, which is an inherently uncertain endeavor.
2. Human taste is dynamic and fluid; it naturally drifts and evolves over time due to new experiences, cultural shifts, or personal development.
3. Data sparsity is a significant challenge: users typically interact with and rate only a tiny fraction of the available items.

Matrix Representation of Preferences

User-item interaction data is conventionally organised into a matrix, denoted as $R$ :
- Rows of the matrix represent individual users ( $u$ ).
- Columns represent specific items ( $i$ ), such as movies.
- Each cell $R_{ui}$ contains a known rating provided by user $u$ for item $i$ , typically on an agreed-upon scale (e.g., $0\to4$ , where $0$ signifies "hate," $2$ "neutral," and $4$ "love").
In reality, the matrix $R$ is highly sparse, meaning that the vast majority of its cells are empty (most users have rated only a small subset of the available items). This sparsity is a fundamental challenge.
The core computational objective of recommender systems is to infer or estimate these missing ratings ( $R_{ui}=?$ ) to provide personalised suggestions.

Content-Based (Feature) Filtering

This approach is founded on the principle: *