Note

0.0(0)

Take a practice test

Chat with Kai

Recommendation Systems - Key Concepts

Value of Recommendation Systems

Recommendation systems increase sales, CTRs, conversions, customer satisfaction, and retention.
YouTube's recommendation algorithm significantly boosts viewership.

Anatomy of the Long Tail

Online services offer more inventory than traditional retailers.
Recommendation systems help manage the abundance of choices.

Scarcity versus Abundance

Recommendation systems help discover new products, increasing diversity.
Some algorithms may create a rich-get-richer effect for popular products.
Collaborative filtering algorithms promote diversity by recommending novel items.

Two Types of Recommendation Systems

Content-based filtering: Recommends items similar to those a user likes using item similarity/clustering.
Collaborative filtering: Uses user similarity to make recommendations based on the links between users and the items they chose.

Many companies employ hybrid systems using both techniques.

Basic Flowchart of a Recommender System

Gather user behavior and item ratings data.
Extract item's attributes.
Design recommendation model.
Run scheduled updates of recommendation engine models.
Serve recommendations to users.
Recommendation Performance Evaluation

Algorithm 0

Recommend the most popular restaurants based on positive votes minus negative votes.
Ignores individual culinary preferences.
Exploits the "wisdom of like-minded" people.
Preferences are not random.

The Utility Matrix

Represents user preferences for items.
Consists of users and items.
Represents the degree of preference of a user for an item.
The matrix is generally sparse.

Recommending Documents

Profiles are sets of important words in the document
- Use Term Frequency (TF) and Inverse Document Frequency (IDF) to pick important words.
- TF, IDF
- N: Number of the documents
- n_i: How many times an element is seen in all of the documents
- f{i,j}: Number of times an element is seen in the document dj
- Profile of a document dj is the vector of weights w{i,j}

Making Predictions

Estimate the similarity of U(x,i) = cos(θ) = (x.i)/(|x| |i|)
x user profile, i item profile

Pros: Content-based Approach

No need for data on other users.
Able to recommend to users with unique tastes.
Able to recommend new and unpopular items.
Explanations for recommended items.

Cons: Content-Based Approach

Finding the appropriate features is not always obvious.
Overspecialization.
Cold-start problem for new users.

Collaborative Filtering

Makes automatic predictions about user interests by collecting preferences from many users.
If person A has the same opinion as person B on an issue, A is more likely to have B's opinion on a different issue.

Similar Users and Jaccard Similarity

Jaccard Similarity: sim(A,B) = |ra \cap rb| / |ra \cup rb |
Considers users x and y with rating vectors rx and ry

Cosine Similarity

Uses the angle between the vectors; treats unknown values as zero.
sim (A, B) = cos(ra, rb)

Centered Cosine Captures User Preferences

Normalize ratings by subtracting row mean.
Handles “tough raters” and “easy raters.”
Another name for centered cosine is Pearson Correlation

Making Rating Predictions for a User

Let r_x be the vector of user x’s ratings
Let N be the set of K users most similar to x who have also rated item i
Prediction for user x and item i
- Option 1: r{xi} = 1/k \sum(r{yi}) use the average rating of users who rated i
- Option 2: r{xi} = \sum (s{xy} r{xy} ) / \sum s{xy} where s_{xy} = sim(x,y)

Item-Item Collaborative Filtering

Apply cosine similarity to content filtering.
Estimate rating for item i based on ratings for similar items.
Use the same similarity metrics and prediction functions as in the user-user model
N(i;x) is the neighborhood of items that are rated by user x and similar to item i

Algorithm Complexity

Expensive step: finding k most similar users (or items): O(|U|)
Naïve pre-computation takes time O(n*|U|)

Problems with Collaborative Filtering

Cold Start, Sparsity, First Rater, Popularity Bias

Combining Content and Collaboration

Content-based and collaborative methods have complementary strengths and weaknesses.
Combine methods to obtain the best of both.

Evaluation Metrics for Recommendation Engines

Recall: the proportion of items that a user like that were recommended
- tp = number of recommended items
- fn = the remaining items
Precision: out of all recommended items, how many did the user like
- tp number of recommended items
- tp + fp the total items recommended
Root Mean Squared Error (RMSE): Measures error in the predicted rating
Mean Reciprocal Rank: The larger the MRR, the better the recommendation

Note

0.0(0)

Take a practice test

Chat with Kai