DSO 510 Final Notes

DSO 510: Business Analytics

  • Instructor: Mohammed Alyakoob

  • Institution: USC University of Southern California

Course Overview

  • Topics Covered:

    • Data Sources Enabled by Digitization

    • Big Data and Management

    • New Data Characteristics: Volume, Velocity, Variety

    • Assessment of Data-Driven Decision Making

    • Analytics Types: Descriptive, Predictive, Prescriptive

    • Challenges in Predictive Analytics

    • Applications in Business and Sports

Introduction to Digitization

  • New Data Sources:

    • Online transactions

    • Social media platforms

    • Internet of Things (IoT)

    • Mobile app data

  • Benefits of Digitization:

    • Measurable consumer behavior and outcomes

    • Personalized offerings through data utilization

Big Data Impact

  • Quote: "You can’t manage what you don’t measure."

  • Expansion of big data enhances measurements:

    • Tracking of customer purchases and promotions

    • Predictive analytics for consumer buying patterns

Characteristics of Big Data

  • Volume:

    • 2.5 exabytes created daily (2012)

    • More data stored online now than the entire internet two decades ago

  • Velocity:

    • Data processed in real-time

  • Variety:

    • Diverse sources include social networks and GPS data

Performance of Data-Driven Companies

  • Companies utilizing data-driven approaches are:

    • 5% more productive on average

    • 6% more profitable

Practical Applications

Case Study: Major U.S. Airline

  • Found 10% of flights had a 10 minute gap between estimated and actual arrival times

  • Combined data from multiple sources to nearly eliminate the gap

Culture Shift in Decision Making

  • Many rely on "HiPPO" (Highest Paid Person's Opinion)

  • Importance of data over intuition

Executives' Approach to Data

  • Core questions to ask:

    1. What do the data say?

    2. Where do the data come from?

    3. What analyses were conducted?

    4. How confident in the results?

  • Data should drive decision-making processes

Types of Analytics

Descriptive Analytics

  • Review of past data events using simple statistical methods

  • Generates insights but lacks depth for solution generation

Predictive Analytics

  • Use existing data to forecast outcomes of unknown variables

    • Example: Beverage companies predicting consumption patterns

Prescriptive Analytics

  • Identifies factors affecting certain outcomes

  • Focus on actionable insights and influencing decisions

  • Examples include fundraising strategies and donation likelihood

Importance of Understanding Data Sources

  • Key to recognizing data accuracy and usefulness

  • Example: eBay's reputation score led to new insights when understanding its process

Evidence-Based Decision Making

  • Burden of proof reliant on controlled experiments

  • Businesses can leverage their online presence for experimentation

Common Pitfalls in Predictive Analytics

  • Misleading objectives,

  • Focusing too much on software choices instead of data clarity

  • Avoiding premature number crunching

Lessons from Sports Industry

  • Figures like Billy Beane utilizing integrated analytics - Moneyball

  • Combining quantitative metrics with qualitative insights

Collaborative Analytics

  • Importance of multidisciplinary teams for success

  • Need for common language and simplified communication

Accessible Technology

  • Example: Kraft Group’s model for season ticket renewals

  • Emphasis on user-centered data analysis

Small Wins Strategy

  • Advocate for incremental investments leading to substantial impacts

Prediction vs. Inference in Analytics

  • Differences Between Interpretable and Flexible Models in Analytics

    • Interpretable Models:

      • Example: Linear Regression

      • Interpretation: If an independent variable (X) increases by one unit, the dependent variable (Y) is expected to increase by a factor represented by the coefficient (beta one).

      • Advantages: Allows for understanding the relationship between variables. Easier to justify predictions and check individual outputs.

      • Limitations: Imposes assumptions about linear relationships and can restrict the underlying processes of the data.

    • Flexible Models:

      • Examples include deep learning and neural networks.

      • Flexibility: They do not impose strict assumptions and can achieve better prediction accuracy for complex datasets.

      • Disadvantages: Often lack interpretability, making it harder to understand the reasoning behind predictions.

    • Context Matters:

      • Targeted Advertising: When identifying potential buyers, flexibility is preferred as the focus is on high predictive power rather than understanding why individuals are likely to respond.

      • Understanding Relationships: When seeking to understand the relationship between variables (e.g., product price and sales), interpretability is crucial for extracting meaningful insights and understanding mechanisms behind decisions.

      • Predictive Tasks: For predictions unrelated to understandability (e.g., predicting visitor numbers for an open house), flexibility is more valuable.

Airbnb Example

  • Need to determine the effect of Superhost status on reservations

  • Optimal experimental design to unearth true relationships

Uber Case Study

  • Implementation of a switchback experiment to analyze wait times

  • Investigating the effectiveness through comparison groups

Data Collection Techniques

Surveys

  • Help define parameter space

  • only focuses on a select group of respondents

  • hypothetical answers are not always the same as actual choices

Simulations

  • Less Risky, doesn’t impact users directly

  • Can more quickly explore more parameter combinations

  • Depends on historical data

Synthetic Control

  • Best at obtaining impact on market equilibrium

  • Best at reducing contamination between control and treatment

  • This is not good at perfectly randomizing the assignment of units

  • Highest economic costs

A/B Testing

  • Worst at reducing Contamination between control and treatment

  • Best at being able to perfectly randomize

  • Best at detecting even small effects

  • Lowest economic cost

Experimental Data

  • Directly assesses consumer behavior and decision-making

Experimental Designs Comparison

Conclusions

  • Importance of data understanding, quality analysis, proper methodologies, and clear communication in the context of analytics

Network Analytics Overview

  • Definition: Network analytics refers to the process of analyzing social and organizational networks to understand the relationships among different entities.

  • Application: Used in various fields, including business, social sciences, and information technology, to enhance decision-making processes.

  • Centralities: A key concept in network analytics that measures the relative importance of a node within a network.

    • Degree Centrality: The number of direct connections a node has. A high degree centrality indicates a highly connected node.

    • Closeness Centrality: Measures how quickly a node can access other nodes in the network. Nodes with high closeness centrality can reach others with fewer steps.

    • Betweenness Centrality: Captures the extent to which a node lies on the shortest path between other nodes. High betweenness centrality indicates a node’s potential to control information flow.

  • Importance in Data-Driven Decision Making: Understanding centralities in network analytics can help organizations identify key influencers, optimize resource allocation, and enhance strategic decision-making.

General Regression Interpretation

  • Beta Coefficients:

    • Beta Zero (β0): This is the intercept of the regression model. It shows the expected average value of the dependent variable when all independent variables are zero. It reflects the average outcome for the baseline group in a sample.

    • Beta One (β1): This indicates how much the dependent variable is expected to change with a one-unit increase in an independent variable. If β1 is significant (not zero), it shows a meaningful relationship between the independent and dependent variables.

  • Interpreting Beta Coefficients:

    • β0 tells us the predicted outcome when independent variables are at baseline levels.

    • β1 shows how much the dependent variable changes with the independent variable, keeping other variables constant.

  • Interaction Terms in Regression Analysis

    • Definition: Interaction terms are created by multiplying two or more independent variables to assess how the relationship between one variable and the dependent variable changes based on another variable.

    • Purpose: They help analyze how the effect of one predictor variable (e.g., Salary) depends on the level of another predictor variable (e.g., residential competition).

      • Example: Salary and Competition:

    • Scenario: Analyze how Salary affects spending behavior depending on residential location relative to competition (close vs. far).

      • Variables:

    • Salary (continuous)

    • FA (competition: 1 = far, 0 = close)

    • Interaction Term: FA * Salary

      • Model Specification:

    • The regression might look like: AmountSpent =

    • β0 + β1(Salary) + β2(FA) + β3(Salary * FA) + ε

  • Coefficient Interpretation:

    • β0 (Intercept): Expected spending when Salary and FA are zero.

    • β1 (Salary Coefficient): Change in expected spending for each one-unit increase in Salary when FA = 0 (close to competitors).

    • β2 (FA Coefficient): Expected difference in spending for customers living far versus close to competitors, holding Salary constant.

    • β3 (Interaction Term Coefficient): Indicates how the effect of Salary on spending differs based on residential status (FA). A positive β3 means increased Salary leads to larger spending increases when far from competition.

  • Conclusion: Use interaction terms to understand complex relationships between variables in your model, analyzing how they affect the dependent variable differently based on the levels of other variables.