1/51
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Predictor variable
A factor used to predict changes in a dependent variable
Outcome variable
What researchers aim to study the change of
Prediction residual (error)
actual value - predicted value
Prediction
the value your model thinks y will have for a given x
Random process
a collection of variables that change over time, but in a way you can’t predict exactly, just probabilistically
Random variable
categorical outcomes of random processes translated into numerical representations
Random event
Something that might happen, but you can’t know for sure until it actually happens
The goal of prediction
To find a rule (model) that makes the smallest possible prediction errors overall
Choosing predictor and outcome
Outcome = Y, Predictor = X
Basic linear regression equation
Y(hat) = a + bX. a = intercept = predicted Y when X equals 0. bX = slope = change in predicted Y for each 1 unit increase in X
Least squares
Chooses a and bX so that the sum of squared residuals is as small as possible
General linear model
Doesn’t require the optimization of the smallest sum of squared residuals like “least squares”
R gives:
Intercept = 2.5
Slope = 0.8
Y(hat) = 2.5 + 0.8x
Y (hat)
Predicted value
Interpret slope b (numerical X)
For each 1 unit increase in X, predicted Y increases by b units
Interpret intercept a (numerical X)
Predicted value of Y when X = 0
Interpret slope b (categorical X with 2 levels)
If X = 0(group A) or 1 (group B), then:
b = difference in group means. (How much higher/lower B is compared to A)
Interpret a (Categorical X)
a = predicted outcomes for the baseline group (group coded 0)
What is R2
The proportion of variation in Y explained by the model. Example: R2 = 0.40 means 40% of differences in Y are explained by X
Finding R2 from r2
R2 = r2
How to find r from R2
r = the square root of R2
Predicting using a linear model
Just plug X into Y(hat) = b0 + b1x
Calculating residual
Residual = Y - Y(hat)
4 conditions for least squares regression
Linear relationship. Independent observations. Normal residuals. Constant variance (spread of residuals stays roughly the same across X) LINC
Normal residual
The models errors are roughly bell shaped, centered at zero, and not skewed
Checking for the 4 conditions in a residual plot
Look for linearity, constant variance
Checking for the 4 conditions in a histogram/QQ plot of residuals
symmetry, unimodality, most residuals close to 0, few large residuals (light tails), just be bell-shaped
Checking study design for the 4 conditions (check independence)
Each data point comes from a different individual or independent event. One person’s measurement doesn’t affect another’s. No pairing, no matching, no repeated measures. Data weren’t collected in a way that clusters people
What is unreasonable extrapolation?
Predicting for X-values far outside the range of observed data, which may give nonsense results
High leverage
Unusual X-value
Influential
Point that changes the regression line a lot
Random process
A repeatable process with uncertain outcomes
Outcome
One result of the random process
Event
A collection of outcomes
Disjoint probability
P(A and B) = 0
A and B are independent if
P(A|B) = P(A) or P(A and B) = P(A)P(B)
Addition rule (disjoint A, B)
P(A or B) = P(A) + P(B)
Multiplication rule (general)
P(A and B) = P(A|B)P(B)
Multiplication rule (independent A and B)
P(A and B) = P(A)P(B)
Using tree diagrams (joint probabilities)
Multiply along branches
Using tree diagrams (total probabilities)
Add branches
Using probability tables (marginal probability)
Row/column totals
Using probability tables (conditional)
Cell/row or column
Using probability tables (Joint)
Individual cell value
Probability distribution
A list of all possible values of a random variable and their probabilities
Expected value (mean)
E(X) = win-probability(value) - loss-probability(value)
Interpretation: long-run average value of the random variable.
Variance
Step 1. Find the mean. Step 2. Subtract each individual value from the mean. Step 3. Square all the new values. Step 4. Sum all squared values together and divide by n - 1 if sample population. Divide by n if whole population.
Standard deviation
Square root of variance
Interpretation: average distance from the mean.
Linear combinations (aX + b)
Expectation:
E(aX+b) = aE(X)+b
Variance:
Var(aX+b) = a^2 Var(X)
Sums of variables (X + Y) if independent
Mean: add the means
Variance: add the variances
(Never add SDs)
Var(X) =
E[X - E(X)to the second power] Or
Var(X) = E(X)to the second power - (E(X)) to the second power
E(aX + bY)
aE(X) + bE(Y)