1/52
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Break-off
The person quits half-way through
Omission
They finish the survey, but skipped a few specific questions.
You can keep a respondent only if they answered more than __ of the questions and key questions.
80%
Imputation
The process of replacing those few missing answers with estimated values so you have a complete dataset to analyze.
Reliability (Cronbach's Alpha)
Used to check if multiple survey items measure the same underlying construct. Alpha must be >0.70 to merge items into a mean scale.
Recoding
Flipping scales
Computing
creating mean indices
Null Hypothesis
There is no difference.
Alternative Hypothesis
What we hope to prove
Descriptive statistics
Descriptive data.
ex: mean, differences, standard deviation
Test statistic
Used to incorporate sampling error
ex: t, F, x^2
P-value
Likelihood of experimental results, if claim (null) is true.
Decision rule: a
How sure you want to be that your effect is not due to random errors.
Steps for hypothesis testing
- Formulate H0 and H1
- Select appropriate analysis and test
- Choose level of significance (usually a=0.05)
- Calculate test statistic
- Determine p-value associated with test statistic
- Compare p-value with chosen level of significance
- Reject or accept H0
- Draw marketing research conclusion
Categorical independent variable
Compare groups
Continuous independent variable
Look at associations (0,1,2,3,4,5,...)
Outliers
values that stick out
- Circles on the box plots
Extreme Values
The asterisks on a box plot
One-sample T-test
Use this when you are comparing everyone in one single group against any single standard number or benchmark.
ex: Are customers actually happy compared to the neutral midpoint of 4?
Paired samples T-test
Use this test when you are comparing the exact same people at two time points.
Step 1: See if p<0.05
Step2: Which mean is higher? This tells you the direction of change.
ex: Did last month's mobile app push notification promotion significantly increase consumer spending?
Independent samples T-test
Use this test when you are comparing two distinct, independent groups of people against each other to see if their averages on a continuous variable are different.
Step 1: Levene's test
Step 2: P significance
ex: Do drive-thru customers buy as much as walk-in customers?
Homogeneity of Variances (Levene's Test)
Is the spread of the data roughly the same for both groups?
- If p>0.05, Variances are equal (read top row)
- If p<0.05, Variances are not equal (read bottom row)
One-way ANOVA and Post-Hoc tests
Use this test when comparing 3 or more distinct groups.
ex: Which of the 3 coffee shops has the absolute best overall customer satisfaction?
Step 1: The F-test (overall ANOVA)
- is the overall p-value (sig.) < 0.05?
- does not tell you which shop is significant
Step 2: The post-hoc test (finding the best one)
- Do not do this step if the p-value is insignificant
LSD
Use if decided what to compare BEFORE collecting data.
Tukey
Use to compare EVERYTHING when groups have similar sample sizes
Schel e
Use if your groups have vastly DIFFERENT sample sizes.
Chi-square test of independence & Z-tests
Use this test when you want to see if 2 categorical variables are related to one another, and you specifically need to know which groups within them are diving the differences.
ex: Do different travel styles (solo, couple, family) book significantly different types of places to stay?
One-sample Binomial Test
Use this test when you want to compare the proportion of a binary (yes/no, 1/2) variable against a known industry baseline or target percentage.
ex: Do our travelers buy insurance at a significantly different rate than the 20% industry baseline?
Bivariate Correlations
Use this test when you want to measure the strength and direction of a linear relationship between two numbered (continuous and/or ordinal) variables.
ex: Does pre-trip stress have a significant relationship with overall trip satisfaction?
Statistical decisions for bivariate correlations
- Step 1: Decide the type of correlations
--- use PEARSON if Both variables are continuous/scale
--- use SPEARMAN if one or both variables are ordinal/ranked
The relationships
-Positive: variables move up together
-Negative: one goes up, one goes down
-Closer to 1 or -1: Strong
-Closer to 0: Weak
Variance Explained (r^2)
Square the pearson r value to get the exact percentage of variance explained. DO NOT do this for spearman.
Regressions
A statistical tool that allows us to play detective. It helps us understand how different factors (Independent Variables) predict a specific outcome (Dependent Variable).
How the shape of your data dictates the regression
The shape of your data dictates which type of regression you are legally allowed to use!
Continuous/normal DV (The Bell Curve)
Most users cluster around an average value, with fewer at the extreme high or low ends.
- The rule: perfect for linear regression
Count DV/ Heavy Skew (The Long Tail)
Most people have a few, but some power-users have a ton, pulling the tail to the right
- The rule: Use a poisson or negative binominal regression
Ordinal DV (Ranked Outcomes)
1= low, 2= medium, 3= high)
- The rule: use ordinal regression
Binary DV (The two-point wall)
0=no, 1=yes
- The rule: You must use logistic regression
Multicategorical DV (unordered buckets of outcomes)
subscription tier (1=free, 2=premium, 2=plus)
- The rule: You cannot use linear regression, use multinominal logistic regression
Bivariate Regression (Simple Linear)
Draws the "line of best fit" to show how one independent variable predicts one continuous dependent variable.
Unstandardized B
The raw impact. "A 1-unit increase in ... yields a $B increase in ..."
Multivariate Regression (Linear)
Tests multiple IVs simultaneously against one DV. This allows us to see which lever is the strongest driver of CLV when they are all competing against each other.
Things to look for in a multivariate model
- Adjusted R-square
- Standardized Beta
- Multicollinearity (VIF)
Adjusted R-square
Look at the adjusted R-square if you have more than one predictor
Standardized Beta
Used the standardized beta to identify the strongest predictor.
Multicollinearity (VIF)
- if VIF>4, there are issues. The two variables captures the same variance. The estimate becomes unreliable. You have to remove a variable one at a time until the VIF decreases.
- if VIF<4, there are no issues.
Problems with OLS (Linear Regression)
It strictly assumes your dependent variable is continuous and normally distributed. If you have binary data, the solution is to use the generalized linear model (GLM).
Binary Logistic Regression
- used for binary dependent variable (predicting a 'yes or no' outcome)
Steps for the Binary Logistic Regression
-Step 1: Look at the omnibus tests of model coefficients table. If the "sig" column is < 0.05, the overall model is statistically significant.
-Step 2: Look at the 'psuedo R^2' (Look at the number next to the N word) in the model summary table.
-Step 3: Calculate (Exp(B)-1)*100 to find the percentage effect.
An Exp(B) of 1.000 means...
No change if there is a 1 unit increase in the IV.
Multinominal Logistic Regression
- Used when we want to predict a nominal dependent variable with more than two outcomes that do not have a specific ranked order.
- Makes sure to set a 'reference category' because the model needs a baseline to compare the categories to (ex: premium vs. free, plus vs. free)
- The Exp(B) tells you how a predictor changes the odds of choosing the specific tier compared to staying on the Free tier.
Ordinal Regression
Used when we want to predict a ranked ordinal dependent variable with more than two levels
-Step 1: Check the significance to see if the model works
-Step 2: Look at the estimate (Beta) for the predictors. A positive beta means the predictor increases the likelihood of moving into a higher churn risk category. A negative beta means a higher score decreases the likelihood of a user moving 'up the stairs'.
Poisson/ Negative Binomial Regressions
Used when predicting a count variable (anything that is a "number of..." and heavily squished to the left/zero)
-Step 1: Look at the deviance row in the goodness of fit table. If the value/df is close to 1, the Poisson model is perfect. If the value/df is significantly larger than 1 ( > or = 2), you have overdispersion and need to run a negative binomial regression instead.
-Step 2: Look at the parameters estimate table. The sig. column tells you if the predictor impacts the matches per week. The Exp(B) tells you the exact percentage increase in the expected count of matches.