Note
0.0(0)
MR

Recommendation Systems - Key Concepts

Value of Recommendation Systems

  • Recommendation systems increase sales, CTRs, conversions, customer satisfaction, and retention.
  • YouTube's recommendation algorithm significantly boosts viewership.

Anatomy of the Long Tail

  • Online services offer more inventory than traditional retailers.
  • Recommendation systems help manage the abundance of choices.

Scarcity versus Abundance

  • Recommendation systems help discover new products, increasing diversity.
  • Some algorithms may create a rich-get-richer effect for popular products.
  • Collaborative filtering algorithms promote diversity by recommending novel items.

Two Types of Recommendation Systems

  1. Content-based filtering: Recommends items similar to those a user likes using item similarity/clustering.
  2. Collaborative filtering: Uses user similarity to make recommendations based on the links between users and the items they chose.
  • Many companies employ hybrid systems using both techniques.

Basic Flowchart of a Recommender System

  1. Gather user behavior and item ratings data.
  2. Extract item's attributes.
  3. Design recommendation model.
  4. Run scheduled updates of recommendation engine models.
  5. Serve recommendations to users.
  6. Recommendation Performance Evaluation

Algorithm 0

  • Recommend the most popular restaurants based on positive votes minus negative votes.
  • Ignores individual culinary preferences.
  • Exploits the "wisdom of like-minded" people.
  • Preferences are not random.

The Utility Matrix

  • Represents user preferences for items.
  • Consists of users and items.
  • Represents the degree of preference of a user for an item.
  • The matrix is generally sparse.

Recommending Documents

  • Profiles are sets of important words in the document
    • Use Term Frequency (TF) and Inverse Document Frequency (IDF) to pick important words.
    • TF, IDF
    • N: Number of the documents
    • n_i: How many times an element is seen in all of the documents
    • f{i,j}: Number of times an element is seen in the document dj
    • Profile of a document dj is the vector of weights w{i,j}

Making Predictions

  • Estimate the similarity of U(x,i) = cos(θ) = (x.i)/(|x| |i|)
  • x user profile, i item profile

Pros: Content-based Approach

  • No need for data on other users.
  • Able to recommend to users with unique tastes.
  • Able to recommend new and unpopular items.
  • Explanations for recommended items.

Cons: Content-Based Approach

  • Finding the appropriate features is not always obvious.
  • Overspecialization.
  • Cold-start problem for new users.

Collaborative Filtering

  • Makes automatic predictions about user interests by collecting preferences from many users.
  • If person A has the same opinion as person B on an issue, A is more likely to have B's opinion on a different issue.

Similar Users and Jaccard Similarity

  • Jaccard Similarity: sim(A,B) = |ra \cap rb| / |ra \cup rb |
  • Considers users x and y with rating vectors rx and ry

Cosine Similarity

  • Uses the angle between the vectors; treats unknown values as zero.
  • sim (A, B) = cos(ra, rb)

Centered Cosine Captures User Preferences

  • Normalize ratings by subtracting row mean.
  • Handles “tough raters” and “easy raters.”
  • Another name for centered cosine is Pearson Correlation

Making Rating Predictions for a User

  • Let r_x be the vector of user x’s ratings
  • Let N be the set of K users most similar to x who have also rated item i
  • Prediction for user x and item i
    • Option 1: r{xi} = 1/k \sum(r{yi}) use the average rating of users who rated i
    • Option 2: r{xi} = \sum (s{xy} r{xy} ) / \sum s{xy} where s_{xy} = sim(x,y)

Item-Item Collaborative Filtering

  • Apply cosine similarity to content filtering.
  • Estimate rating for item i based on ratings for similar items.
  • Use the same similarity metrics and prediction functions as in the user-user model
  • N(i;x) is the neighborhood of items that are rated by user x and similar to item i

Algorithm Complexity

  • Expensive step: finding k most similar users (or items): O(|U|)
  • Naïve pre-computation takes time O(n*|U|)

Problems with Collaborative Filtering

  • Cold Start, Sparsity, First Rater, Popularity Bias

Combining Content and Collaboration

  • Content-based and collaborative methods have complementary strengths and weaknesses.
  • Combine methods to obtain the best of both.

Evaluation Metrics for Recommendation Engines

  • Recall: the proportion of items that a user like that were recommended
    • tp = number of recommended items
    • fn = the remaining items
  • Precision: out of all recommended items, how many did the user like
    • tp number of recommended items
    • tp + fp the total items recommended
  • Root Mean Squared Error (RMSE): Measures error in the predicted rating
  • Mean Reciprocal Rank: The larger the MRR, the better the recommendation
Note
0.0(0)