Bias & Feedback Loops in Music Recommendations
Introduction Objective: Look into how record labels influence music recommendations and their effects on recommendation systems. Traditionally focused on:
Artist name
Track title
Album title
User ID
Listening context (like what was played) Recently, there’s growing interest in factors like popularity and gender, but record labels need attention too.
Data Collection Methodology
Multi-Stage Web Crawling: Collect record label info for albums and link them to major companies (Universal, Sony, Warner, Independent).
Datasets for Analysis:
Spotify Million Playlist Dataset: 1 million playlists from US users.
LFM-2b Dataset: Listening profiles from Last.fm based on user data.
Analysis of Record Label Diversity
The enhanced dataset helps spot characteristics and biases in music recommendations.
Feedback Loop Simulation: Explore how recommendations might change the distribution of record labels.
Record Labels and Streaming
Major vs. Independent labels:
Major labels (like Sony, Universal, Warner) heavily influence streaming platforms.
Independent labels have a tougher time competing with major labels.
Recommendations affect which music gets noticed and played more.
Framework for Dataset Augmentation 5.1 Stepwise Approach:
Step 1: Preprocessing - Gather basic record label info using Spotify API.
Step 2: Mapping Trivial Cases - Identify clear matches (like Universal Group).
Step 3: Label Crawling from Discogs - Collect structured metadata through an API.
Step 4: Label Crawling from Wikipedia - Get unstructured label info (like parent company).
Step 5: Interim Mapping - Add more info for better mapping.
Step 6: Copyright Classification - Check copyright data for accuracy.
Step 7: Final Mapping - Classify remaining unknown labels as Independent.
Measuring Diversity with Simpson Index
Calculate the Simpson index to measure track diversity in playlists.
Findings:
Most playlists show high diversity.
Trends indicated that less diversity correlates with major labels gaining more prominence.
Feedback Loop Investigations
Simulation of Recommendations:
Use Alternating Least Squares (ALS) for collaborative filtering recommendations.
Simulate top recommendations and user interactions.
Results:
MPD showed stable distributions without clear feedback loops.
LFM-2b indicated an over-representation of major labels over iterations, even if independent labels were strong initially.
Discussion and Implications
Initial findings show record labels have a complicated role in recommendation biases.
More research is needed to understand how popularity biases influence recommendations and diversity.
It’s important to assess fairness in music recommendations as a long-term goal.