1/43
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Regression
The straight line that fits the data best, can be expressed mathematically
What does the regression line let us do with respect to X and Y
We can predict unknown Y scores from given X scores (assuming scores fall in data range)
How is regression calculated
Y’ = ay + byx (sometimes Ŷ is also used)
Best fit line
Line that makes predictions about people’s beliefs that are as close to the true scores as possible
Error
Difference between a person’s predicted score and the person’s actual score on the criterion variable
What is another name for the regression line
Least squares regression, because it mathematically minimizes errors associated with trying to predict info
How do we know if a line minimizes errors
We take the sum of squared errors ( ∑(Y - Y’)2 ), which is a minimum on the best fitting line
by
Slope, constant value that dictates the proportional change in Y given a value of X
How is by calculated
by = r(SDy/SDx)
ay
Y intercept, when predicting Y it is the point where the regression line intercepts with the Y-axis
How is ay calculated
ay = Ӯ - by(X̄)
T or F: If X̄ = 0, then Y’ = ay
T
What leads to a flatter slope
Higher variability or heteroscedasticity
What 2 values are used to plot a regression line
Y intercept (0, ay)
The mean of X and mean of Y (X̄, Ӯ)
(Values need to be in range of data)
T or F: The regression line represents the mean for bivariate data
T, the regression lines for Y’ and X’ intersect at the mean for X and Y
What three datapoints are needed to plot Y’ and X’
Y’ - (0, ay) and (X̄, Ӯ)
X’ - (ax, 0) and (X̄, Ӯ)
What happens to the angle of the two regression lines if the correlation is very high
The angle will be very small, if r = ± 1.00 then they overlap, if r = 0 they are at a 90 degree angle (angle increases approaching zero)
What happens to Y’ and X’ respectively if r = 0
Y’ → by = 0, ay = Ӯ and the line is flat
X’ → bx = 0, ax = X̄ and the line is vertical
What are the equations if the relationship is curvilinear (quadratic, cubic or quartic)
Quadratic: Y′ = a + bX + cX2
Cubic: Y′ = a + bX + cX2 + dX3
Quartic: Y′ = a + bX + cX2 + dX3 + eX4
What is Y - Y’
Measure of variability, it is the residual/error or difference between an observed Y and a predicted Y on a regression line. Measures error or prediction around the regression line
Standard error of estimate
A measure of the average deviation of the errors, the difference between the -values predicted by the multiple regression model and the -values in the sample
How is deviation score calculated for SD and SDy-y’
SD = (x - x̄)
SDy’ = (Y - Y’)
How are squared deviations calculated for SD and SDy-y’
SD = (x - x̄)2
SDy’ = (Y - Y’)2
How is sum of squared deviations calculated for SD and SDy-y’
SS = ∑(x −x̄)2
SSy-y’ = ∑(Y − Y’)2
How are SD calculated for SD and SDy-y’
SD = √(∑(x −x̄)2/N)
SDy-y’ = √(∑(Y − Y’)2/N)
T or F: there is an alternate way to calculate SDy’
T, SDy’ = SDy√(1-r2)
T or F: when r is not equal to 0, SDy’ will be smaller than SDy
T, because SDy’ reduces the potential error by using information from 2 sources instead of 1
What would happen to SDy’ if r = 0
SDy’ = SDy (because SDy’ = SDy√1-r2)
What would happen to SDy’ if there is a perfect positive or negative correlation (r = +- 1)
SDy’ = 0 (because SDy’ = SDy√1-r2)
What do we need to know in order to understand explained variability
Total and unexplained variability
Total variability
Denoted with XO, Y - ȳ
Unexplained variability
Denoted with Xt, Y - Y’
Explained variability
Denoted with Xe, Y’ - ȳ
How can one conceptually explain explained variability
Total variability = prediction + residuals
In what 2 ways can total variability be expressed
SST
∑(Y - ȳ)2
In what 2 ways can explained variability be expressed
SSR
∑(Y’ - ȳ)2
In what 2 ways can unexplained variability be expressed
SSE
∑(Y −Y’)2
Why must the values be squared to calculate total variability (e.g why can we not just do ∑(Y - ȳ) = ∑(ȳ – Y’) + ∑(Y-Y’)
Because the sum of deviations is 0, this is why we must us SS in our regression equation
What is the problem regarding SSR and SSE
We have trouble interpreting SSR and SSE because they are squares
What is the solution to the problem regarding SSR and SSE
We calculate the proportion of variability with regard to explained and unexplained variability
How is proportion of variability used to calculate total variability
Conceptual: Total variability = proportion of explained variability + proportion of unexplained variability
Equation: SST/SST = SSR/SST + SSE/SST
What is the important implication regarding proportion of explained variability and proportion of unexplained variability and their relationship with r
Explained variability = r2 and unexplained variability = 1 - r2
What are the 9 attributes of the regression line
Regression line represented bivariate data in a linear relationship and predicts scores based on observed data
Defined by linear equation for a straight line
Two regression lines can represent bivariate data (Y’ and X’)
Does not predict values outside of the range of data
Is the best descriptor of bivariate data
Reflects the method of least squares (e.g. Σ(Y - Y’)2)
Always has some error of prediction present and is measured as standard error of estimate SDy - y’ (unless r = ± 1)
Is a traveling normal distribution with a moving mean
Allows separate measures of SST, SSR and SSE
What are the two linear equations for Y’ and X’
Y’ = ay + byX
X’ = ax + bxY