Instructor: Mohammed Alyakoob
Institution: USC University of Southern California
Topics Covered:
Data Sources Enabled by Digitization
Big Data and Management
New Data Characteristics: Volume, Velocity, Variety
Assessment of Data-Driven Decision Making
Analytics Types: Descriptive, Predictive, Prescriptive
Challenges in Predictive Analytics
Applications in Business and Sports
New Data Sources:
Online transactions
Social media platforms
Internet of Things (IoT)
Mobile app data
Benefits of Digitization:
Measurable consumer behavior and outcomes
Personalized offerings through data utilization
Quote: "You can’t manage what you don’t measure."
Expansion of big data enhances measurements:
Tracking of customer purchases and promotions
Predictive analytics for consumer buying patterns
Volume:
2.5 exabytes created daily (2012)
More data stored online now than the entire internet two decades ago
Velocity:
Data processed in real-time
Variety:
Diverse sources include social networks and GPS data
Companies utilizing data-driven approaches are:
5% more productive on average
6% more profitable
Found 10% of flights had a 10 minute gap between estimated and actual arrival times
Combined data from multiple sources to nearly eliminate the gap
Many rely on "HiPPO" (Highest Paid Person's Opinion)
Importance of data over intuition
Core questions to ask:
What do the data say?
Where do the data come from?
What analyses were conducted?
How confident in the results?
Data should drive decision-making processes
Review of past data events using simple statistical methods
Generates insights but lacks depth for solution generation
Use existing data to forecast outcomes of unknown variables
Example: Beverage companies predicting consumption patterns
Identifies factors affecting certain outcomes
Focus on actionable insights and influencing decisions
Examples include fundraising strategies and donation likelihood
Key to recognizing data accuracy and usefulness
Example: eBay's reputation score led to new insights when understanding its process
Burden of proof reliant on controlled experiments
Businesses can leverage their online presence for experimentation
Misleading objectives,
Focusing too much on software choices instead of data clarity
Avoiding premature number crunching
Figures like Billy Beane utilizing integrated analytics - Moneyball
Combining quantitative metrics with qualitative insights
Importance of multidisciplinary teams for success
Need for common language and simplified communication
Example: Kraft Group’s model for season ticket renewals
Emphasis on user-centered data analysis
Advocate for incremental investments leading to substantial impacts
Differences Between Interpretable and Flexible Models in Analytics
Interpretable Models:
Example: Linear Regression
Interpretation: If an independent variable (X) increases by one unit, the dependent variable (Y) is expected to increase by a factor represented by the coefficient (beta one).
Advantages: Allows for understanding the relationship between variables. Easier to justify predictions and check individual outputs.
Limitations: Imposes assumptions about linear relationships and can restrict the underlying processes of the data.
Flexible Models:
Examples include deep learning and neural networks.
Flexibility: They do not impose strict assumptions and can achieve better prediction accuracy for complex datasets.
Disadvantages: Often lack interpretability, making it harder to understand the reasoning behind predictions.
Context Matters:
Targeted Advertising: When identifying potential buyers, flexibility is preferred as the focus is on high predictive power rather than understanding why individuals are likely to respond.
Understanding Relationships: When seeking to understand the relationship between variables (e.g., product price and sales), interpretability is crucial for extracting meaningful insights and understanding mechanisms behind decisions.
Predictive Tasks: For predictions unrelated to understandability (e.g., predicting visitor numbers for an open house), flexibility is more valuable.
Need to determine the effect of Superhost status on reservations
Optimal experimental design to unearth true relationships
Implementation of a switchback experiment to analyze wait times
Investigating the effectiveness through comparison groups
Help define parameter space
only focuses on a select group of respondents
hypothetical answers are not always the same as actual choices
Less Risky, doesn’t impact users directly
Can more quickly explore more parameter combinations
Depends on historical data
Best at obtaining impact on market equilibrium
Best at reducing contamination between control and treatment
This is not good at perfectly randomizing the assignment of units
Highest economic costs
Worst at reducing Contamination between control and treatment
Best at being able to perfectly randomize
Best at detecting even small effects
Lowest economic cost
Directly assesses consumer behavior and decision-making
Importance of data understanding, quality analysis, proper methodologies, and clear communication in the context of analytics
Definition: Network analytics refers to the process of analyzing social and organizational networks to understand the relationships among different entities.
Application: Used in various fields, including business, social sciences, and information technology, to enhance decision-making processes.
Centralities: A key concept in network analytics that measures the relative importance of a node within a network.
Degree Centrality: The number of direct connections a node has. A high degree centrality indicates a highly connected node.
Closeness Centrality: Measures how quickly a node can access other nodes in the network. Nodes with high closeness centrality can reach others with fewer steps.
Betweenness Centrality: Captures the extent to which a node lies on the shortest path between other nodes. High betweenness centrality indicates a node’s potential to control information flow.
Importance in Data-Driven Decision Making: Understanding centralities in network analytics can help organizations identify key influencers, optimize resource allocation, and enhance strategic decision-making.
Beta Coefficients:
Beta Zero (β0): This is the intercept of the regression model. It shows the expected average value of the dependent variable when all independent variables are zero. It reflects the average outcome for the baseline group in a sample.
Beta One (β1): This indicates how much the dependent variable is expected to change with a one-unit increase in an independent variable. If β1 is significant (not zero), it shows a meaningful relationship between the independent and dependent variables.
Interpreting Beta Coefficients:
β0 tells us the predicted outcome when independent variables are at baseline levels.
β1 shows how much the dependent variable changes with the independent variable, keeping other variables constant.
Interaction Terms in Regression Analysis
Definition: Interaction terms are created by multiplying two or more independent variables to assess how the relationship between one variable and the dependent variable changes based on another variable.
Purpose: They help analyze how the effect of one predictor variable (e.g., Salary) depends on the level of another predictor variable (e.g., residential competition).
Example: Salary and Competition:
Scenario: Analyze how Salary affects spending behavior depending on residential location relative to competition (close vs. far).
Variables:
Salary (continuous)
FA (competition: 1 = far, 0 = close)
Interaction Term: FA * Salary
Model Specification:
The regression might look like: AmountSpent =
β0 + β1(Salary) + β2(FA) + β3(Salary * FA) + ε
Coefficient Interpretation:
β0 (Intercept): Expected spending when Salary and FA are zero.
β1 (Salary Coefficient): Change in expected spending for each one-unit increase in Salary when FA = 0 (close to competitors).
β2 (FA Coefficient): Expected difference in spending for customers living far versus close to competitors, holding Salary constant.
β3 (Interaction Term Coefficient): Indicates how the effect of Salary on spending differs based on residential status (FA). A positive β3 means increased Salary leads to larger spending increases when far from competition.
Conclusion: Use interaction terms to understand complex relationships between variables in your model, analyzing how they affect the dependent variable differently based on the levels of other variables.