Bias & Feedback Loops in Music Recommendations

  1. Introduction Objective: Look into how record labels influence music recommendations and their effects on recommendation systems. Traditionally focused on:

  • Artist name

  • Track title

  • Album title

  • User ID

  • Listening context (like what was played) Recently, there’s growing interest in factors like popularity and gender, but record labels need attention too.

  1. Data Collection Methodology

  • Multi-Stage Web Crawling: Collect record label info for albums and link them to major companies (Universal, Sony, Warner, Independent).

  • Datasets for Analysis:

    • Spotify Million Playlist Dataset: 1 million playlists from US users.

    • LFM-2b Dataset: Listening profiles from Last.fm based on user data.

  1. Analysis of Record Label Diversity

  • The enhanced dataset helps spot characteristics and biases in music recommendations.

  • Feedback Loop Simulation: Explore how recommendations might change the distribution of record labels.

  1. Record Labels and Streaming

  • Major vs. Independent labels:

    • Major labels (like Sony, Universal, Warner) heavily influence streaming platforms.

    • Independent labels have a tougher time competing with major labels.

    • Recommendations affect which music gets noticed and played more.

  1. Framework for Dataset Augmentation 5.1 Stepwise Approach:

  • Step 1: Preprocessing - Gather basic record label info using Spotify API.

  • Step 2: Mapping Trivial Cases - Identify clear matches (like Universal Group).

  • Step 3: Label Crawling from Discogs - Collect structured metadata through an API.

  • Step 4: Label Crawling from Wikipedia - Get unstructured label info (like parent company).

  • Step 5: Interim Mapping - Add more info for better mapping.

  • Step 6: Copyright Classification - Check copyright data for accuracy.

  • Step 7: Final Mapping - Classify remaining unknown labels as Independent.

  1. Measuring Diversity with Simpson Index

  • Calculate the Simpson index to measure track diversity in playlists.

  • Findings:

    • Most playlists show high diversity.

    • Trends indicated that less diversity correlates with major labels gaining more prominence.

  1. Feedback Loop Investigations

  • Simulation of Recommendations:

    • Use Alternating Least Squares (ALS) for collaborative filtering recommendations.

    • Simulate top recommendations and user interactions.

  • Results:

    • MPD showed stable distributions without clear feedback loops.

    • LFM-2b indicated an over-representation of major labels over iterations, even if independent labels were strong initially.

  1. Discussion and Implications

  • Initial findings show record labels have a complicated role in recommendation biases.

  • More research is needed to understand how popularity biases influence recommendations and diversity.

  • It’s important to assess fairness in music recommendations as a long-term goal.