Marketing Analytics for Big Data
Overview of Big Data Uses in Marketing
Descriptive Analysis:
Purpose: To determine what is currently happening.
Example Research Question: Are shoppers buying more plant-based meat alternatives?
Data Type: Scanner panel data.
Predictive Analysis:
Purpose: To forecast what is likely to happen based on patterns in data.
Example Research Question: Can flu severity be predicted in advance?
Data Type: Google search query data.
Causal Analysis:
Purpose: To investigate the cause-and-effect relationships.
Example Research Question: Does Spotify affect what we listen to?
Data Type: LastFM service: API and web scraping.
Research Motivation
Investigate if there is a transition from animal-based to plant-based meat consumption, specifically focusing on the Netherlands and Germany over the past decade.
Question: What types of big data can be harnessed to answer this consumption trend?
Scanner Panel Data
Definition: Data that tracks the purchases of a panel of consumers over time as they shop at various retailers.
Quote: "If you want to measure change, don’t change the measure."
Using Scanner Panel Data
Objective: To address our descriptive research question by collecting data from 2012-2021 in the Netherlands and Germany.
Steps:
Aggregate the volume of animal and plant-based meat purchases in supermarkets.
Analyze results and create applications based on findings.
Marketing Metrics
Market Share
Definition of Market Share: The percentage of a market category accounted for by a specific product.
Formula:
Market\, Share = \frac{Unit\, Market\, Sales}{Total\, Category\, Unit\, Sales}Application in context:
Unit Plant Market Share:
Unit\, Plant\, Market\, Share = \frac{Unit\, Plant\, Sales}{Unit\, Meat\, Sales}
Penetration
Definition: The proportion of people who have purchased a specific product at least once over a defined period.
Formula:
Penetration = \frac{Customers\, who\, have\, purchased\, product}{Customers\, who\, purchased\, in\, category}
Buyer Volume Share
Definition: Represents a product's share of purchases among customers who have already bought the product, indicating consumer loyalty.
Formula:
Buyer\, Volume\, Share = \frac{Unit\, Product\, Sales}{Unit\, Category\, Sales\, by\, Product\, Buyers}
Heavy Usage Index
Definition: Measures the intensity of consumption, indicating how heavily customers buy within the product category.
Formula: Heavy\, Usage\, Index = \frac{Unit\, Category\, Sales\, by\, Product\, Buyers}{Unit\, Category\, Sales\, by\, All\, Buyers}
Interpretation: An index value greater than 1 indicates that users consume the category more than average.
Relationships Between Metrics
Key Relationship:
Market\, Share = Penetration \times Buyer\, Volume\, Share \times Heavy\, Usage\, IndexStrategies to Increase Market Share:
Convincing more people to try the product.
Encouraging existing customers to switch more frequently to your product
Increasing the quantity bought in the product's category.
Double Jeopardy Effect
Concept: Brands with low market share often have lower buyer volume share as well.
Implications:
Brands with a low market share have fewer buyers (first jeopardy).
Buyers of low market share brands tend to be less loyal (second jeopardy).
Concentration and Inequality
The 80/20 Rule: Indicates that 80% of outcomes come from 20% of inputs.
Applications: Sales, profits, complaints, and returns analysis.
Importance: Understanding concentration helps to identify key customers and focus strategies on them.
Lorenz Curve and Gini Coefficient
Sort customers by purchase volume (from largest to smallest).
Compute cumulative proportions for customers (x-axis) and their corresponding purchases (y-axis).
Plot these cumulative purchases against cumulative customers.
Include a 45° line for reference, which indicates equality.
Prediction in Marketing
Concern: Influenza (flu) causes significant respiratory deaths (290,000 - 650,000 annually).
Objectives: Identify early indicators to allocate resources efficiently.
CDC Incidence Reports: Delayed by 2 weeks; need for real-time data.
Nowcasting Flu Instances
Concept: Investigate trends in flu-related search activity on Google to forecast actual flu activity patterns.
Research Method: Analyze billions of search trends compared to historical flu data.
Google Flu Trends Analysis
Notable Data Observations:
ILI percentage correlates with Google search queries regarding flu symptoms.
Historical discrepancies noted leading to criticism in the prediction accuracy.
Group Discussion Topics
Algorithm Changes: Impact of modifications on Google Flu Trends (GFT) performance.
Data Type Discussion: Analyze if the data structure is tall or wide and implications for overfitting.
All Data Revolution Concept: Contrast it with the big data revolution and its application in marketing/business analytics.
Causality in Marketing
Causal Relationships: Understanding the impact of treatment (X) on outcome (Y) while controlling for other variables (ceteris paribus).
Importance of observational data with comparable units to draw valid causal conclusions.
Utilizing Big Data for Approximate Experiments
Methodology: Find similar units among the data that exhibit variation in treatment (T) and observe outcome (Y).
Expectation: If Y differs, likely attributed to T changes rather than omitted variables affecting validity.
Research Example: Spotify's Impact
Research Strategy: Evaluating how Spotify adoption changes users’ listening behaviors.
Treatment (T): Spotify adoption, Outcome (Y): Listening habits.
Data Collection and Analysis Methodology
Utilize web scraping to collect data across multiple platforms.
Continuous monitoring of music engagement patterns pre and post Spotify adoption.
User Group Analysis
Develop similar user profiles using various metrics such as play count, genre, demographics, etc.
Calculate propensity scores for adoption likelihood for Spotify, using this for comparative analysis with non-adopters.
Findings from Spotify Study
Users exhibit:
Increased Quantity: 32% more artists listened to weekly.
Diverse Consumption: 7% fewer superstars; less concentration on personal favorites.
Discoverability: More new content discovered, with varying listening frequencies.
Summary of Research Strategies in Big Data
Descriptive: Track changes over time across different demographics; examine concentration and inequality.
Predictive: Powerful insights while being cautious of overfitting; employ integrated all data strategies.
Causal: Leverage big data for comparative analysis akin to experimental designs.
Assignment Notice
Upcoming quiz on Canvas due next week.
Appendix: Methodology Insights (Optional)
Time series development for search queries within the US from 2003 to 2008.
Methods involving correlation estimates between ILI and search queries, measuring prevalence of flu symptoms.
Lagged CDC ILI inclusion for enhanced predictive modeling over time series.
Discussion on combining search data with historical lags for refined predictive modeling.