Recommendation Systems - Key Concepts
Value of Recommendation Systems
- Recommendation systems increase sales, CTRs, conversions, customer satisfaction, and retention.
- YouTube's recommendation algorithm significantly boosts viewership.
Anatomy of the Long Tail
- Online services offer more inventory than traditional retailers.
- Recommendation systems help manage the abundance of choices.
Scarcity versus Abundance
- Recommendation systems help discover new products, increasing diversity.
- Some algorithms may create a rich-get-richer effect for popular products.
- Collaborative filtering algorithms promote diversity by recommending novel items.
Two Types of Recommendation Systems
- Content-based filtering: Recommends items similar to those a user likes using item similarity/clustering.
- Collaborative filtering: Uses user similarity to make recommendations based on the links between users and the items they chose.
- Many companies employ hybrid systems using both techniques.
Basic Flowchart of a Recommender System
- Gather user behavior and item ratings data.
- Extract item's attributes.
- Design recommendation model.
- Run scheduled updates of recommendation engine models.
- Serve recommendations to users.
- Recommendation Performance Evaluation
Algorithm 0
- Recommend the most popular restaurants based on positive votes minus negative votes.
- Ignores individual culinary preferences.
- Exploits the "wisdom of like-minded" people.
- Preferences are not random.
The Utility Matrix
- Represents user preferences for items.
- Consists of users and items.
- Represents the degree of preference of a user for an item.
- The matrix is generally sparse.
Recommending Documents
- Profiles are sets of important words in the document
- Use Term Frequency (TF) and Inverse Document Frequency (IDF) to pick important words.
- TF, IDF
- N: Number of the documents
- n_i: How many times an element is seen in all of the documents
- f{i,j}: Number of times an element is seen in the document dj
- Profile of a document dj is the vector of weights w{i,j}
Making Predictions
- Estimate the similarity of U(x,i) = cos(θ) = (x.i)/(|x| |i|)
- x user profile, i item profile
Pros: Content-based Approach
- No need for data on other users.
- Able to recommend to users with unique tastes.
- Able to recommend new and unpopular items.
- Explanations for recommended items.
Cons: Content-Based Approach
- Finding the appropriate features is not always obvious.
- Overspecialization.
- Cold-start problem for new users.
Collaborative Filtering
- Makes automatic predictions about user interests by collecting preferences from many users.
- If person A has the same opinion as person B on an issue, A is more likely to have B's opinion on a different issue.
Similar Users and Jaccard Similarity
- Jaccard Similarity: sim(A,B) = |ra \cap rb| / |ra \cup rb |
- Considers users x and y with rating vectors rx and ry
Cosine Similarity
- Uses the angle between the vectors; treats unknown values as zero.
- sim (A, B) = cos(ra, rb)
Centered Cosine Captures User Preferences
- Normalize ratings by subtracting row mean.
- Handles “tough raters” and “easy raters.”
- Another name for centered cosine is Pearson Correlation
Making Rating Predictions for a User
- Let r_x be the vector of user x’s ratings
- Let N be the set of K users most similar to x who have also rated item i
- Prediction for user x and item i
- Option 1: r{xi} = 1/k \sum(r{yi}) use the average rating of users who rated i
- Option 2: r{xi} = \sum (s{xy} r{xy} ) / \sum s{xy} where s_{xy} = sim(x,y)
Item-Item Collaborative Filtering
- Apply cosine similarity to content filtering.
- Estimate rating for item i based on ratings for similar items.
- Use the same similarity metrics and prediction functions as in the user-user model
- N(i;x) is the neighborhood of items that are rated by user x and similar to item i
Algorithm Complexity
- Expensive step: finding k most similar users (or items): O(|U|)
- Naïve pre-computation takes time O(n*|U|)
Problems with Collaborative Filtering
- Cold Start, Sparsity, First Rater, Popularity Bias
Combining Content and Collaboration
- Content-based and collaborative methods have complementary strengths and weaknesses.
- Combine methods to obtain the best of both.
Evaluation Metrics for Recommendation Engines
- Recall: the proportion of items that a user like that were recommended
- tp = number of recommended items
- fn = the remaining items
- Precision: out of all recommended items, how many did the user like
- tp number of recommended items
- tp + fp the total items recommended
- Root Mean Squared Error (RMSE): Measures error in the predicted rating
- Mean Reciprocal Rank: The larger the MRR, the better the recommendation