1/58
Vocabulary flashcards covering key concepts in recommender systems, matrix factorization, feedback types, distance metrics, and community detection in graphs.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
User–Item Rating Matrix
A matrix where each row is a user, each column is an item, and entries contain ratings when available.
Missing Values
Unobserved user–item interactions represented as empty cells in the rating matrix.
Sparsity
The condition where most user–item ratings are missing, leading to a sparse matrix.
Impact of Sparsity
Makes similarity estimates noisy because users share few co-rated items.
Long-Tail Distribution
Popular items form a small 'head,' while many niche items form a large 'tail.'
User–User Collaborative Filtering
Predict ratings using users who have similar rating patterns.
Item–Item Collaborative Filtering
Predict ratings using items that are similar to those a user has rated.
Neighborhood
A set of users or items deemed similar based on similarity metrics.
Cold-Start Problem
Difficulty recommending items or users with insufficient historical data.
Latent Factors
Low-dimensional vectors representing hidden traits of users and items.
Global Mean
Overall average rating across all user–item pairs.
User Bias
Tendency of a user to rate higher or lower than average.
Item Bias
Tendency of an item to receive higher or lower ratings than average.
Latent Space
The learned embedding space where users and items are represented as vectors.
RMSE
Rating prediction error measuring average squared deviation between predicted and true ratings.
Precision@k
Fraction of top-k recommended items that are relevant.
Recall@k
Fraction of relevant items that appear in the top-k recommendations.
Hit-Rate
Whether at least one relevant item appears in the recommendations.
NDCG@k
Ranking metric that assigns higher weight to correctly ranked relevant items near the top.
Rating Prediction Task
Predicting explicit numerical ratings.
Ranking Task
Ordering items by predicted relevance rather than predicting exact rating values.
Explicit Feedback
Direct user-provided ratings or evaluations.
Implicit Feedback
Behavioral signals such as clicks, views, or watch time.
Noisy Feedback
Implicit signals that do not directly reflect true preference strength.
Exposure Bias
Observed behavior depends on what users were shown, not all available items.
Position Bias
Higher-ranked items receive more attention regardless of true relevance.
Missing-Not-Negative
Absence of interaction is not the same as disliking an item.
Similarity Measure
Quantifies how similar two users or items are (e.g., cosine similarity).
Cosine Similarity
Measures angle-based similarity between vectors.
Euclidean Distance
Measures straight-line distance between vectors.
Curse of Dimensionality
Distance metrics become less meaningful in high-dimensional spaces.
Feature Scaling
Adjusting feature magnitudes to ensure equal influence in distance-based models.
k Value (k-NN)
Number of neighbors; low k risks high variance, high k risks high bias.
Linear Separability
Existence of a linear boundary that perfectly separates classes.
Perceptron Convergence
The perceptron converges only if the data is linearly separable.
Decision Boundary
A hyperplane that divides classes.
Order Dependence
Perceptron updates depend on the sequence of training examples.
Margin
Distance between the decision boundary and the nearest data points.
Support Vectors
Data points that lie on or near the margin and define the decision boundary.
Soft-Margin SVM
Allows some misclassification to improve generalization.
Kernel Trick
Method for learning non-linear boundaries by computing similarity in transformed feature spaces.
RBF Kernel
A popular kernel that measures similarity based on distance in feature space.
Overfitting
Model fits training data too closely and performs poorly on unseen data.
Underfitting
Model is too simple and fails to capture important patterns.
Train/Validation/Test Split
Partitioning data to train, tune, and evaluate a model.
Cross-Validation
Repeated training/testing on multiple splits for more reliable evaluation.
Regularization
Penalizing model complexity to reduce overfitting.
L2 Regularization
Penalizes large parameter values via squared magnitude.
Community
Group of nodes densely connected internally and sparsely connected externally.
Modularity
Metric evaluating how well a division separates dense communities.
Modularity Resolution Limit
Modularity may fail to detect small but real communities.
Edge Betweenness
Number of shortest paths that pass through an edge.
Girvan–Newman Algorithm
Detects communities by iteratively removing edges with highest betweenness.
Bridge Edge
An edge whose removal disconnects parts of the network.
Bottleneck
Node or edge that many shortest paths depend on.
Affiliation Graph Model
Model where nodes belong to multiple communities and connect based on shared memberships.
Overlapping Communities
Communities where nodes can have more than one membership.
Overlap Region
Area where nodes with multiple affiliations show higher connectivity.
Connection Probability
Likelihood that two nodes connect increases with the number of shared affiliations.