1/43
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Correlation
strength and direction of relationship between 2 variables e.g. +1 perfect positive, -1 perfect negative
whats a Scatterplot used for
A visual representation used to visualize correlation
what different Correlation strength means
0 when independent, 1 when identical and -1 when exactly inverse
what is Pearson coefficient used for
Used for interval/ratio data, e.g., temperature/height
what does Correlation coefficient show
Strength of relationship, doesn't represent slope
what does Regression coefficient represent
Represents slope, not meaningfulness of effect
Arithmetic mean
Add all values up and divide by the number of all
Variance
Measure of how much values differ from the mean
Standard deviation
E.g., width of peak in a distribution
Covariance
Fluctuation between 2 variables; if X and Y are high or both low together = greater covariance, but if one high and other is opposite = negative covariance
Linear regression - when plotted what does it show
When plotted, produces a line represented by y=ax + b
what is Residual error
Dots distance to regression line; smaller residual error = tighter correlation and better regression
what is Error variance
cumulated (squared) differences of empirical (actual) and predicted values
what is Regression variance
Variance of predicted values explained by the model
what is the Residual/prediction error
Difference between actual (Y) and predicted value
Correlation vs Regression
Correlation expresses reliability of relation; regression allows prediction
when are regression and correlation the same?
if x and y have been z-normalized (regression is then informative of relationship strength)
Statistical inference in regression - what is the null stating
Null hypothesis states no prediction of y based on x
Standard error of slope
Error against which regression slope is tested
Partial correlation
Assesses relationship of one pair after accounting for a third
Multiple regression
Generalization of bivariate regression; describes relationship with multiple predictors
Coefficient of determination R2
Proportion of total variance explained by predictors/model or 1 minus the proportion of total variance given by residuals.
Goodness of fit of a regression model - what is coefficient of determination, multiple correlation coefficient and f-ratio
Coefficient of determination (prop of variance explained by regression model), Multiple correlation coefficient (correlation between predicted and observed values), f-ratio - can derive contrasting the proportion of explained variance with residual.
f-ratio for multiple linear regression
Higher f ratio indicates better models.
Testing significance of individual predictors
Only test the significance of individual predictor variables when the entire regression model has been found to be significant.
Multicollinearity
High similarity between 2/more predictor variables.
Effects of multicollinearity
Adding more predictor variables that are correlated to existing predictors changes predictive quality, making it difficult to estimate predictive value.
Singularity
Entirely redundant variable that is an exact combination of 2/more other variables.
Problems with singularity
Logical - don't want to measure same thing twice; statistical - cannot solve regression as system becomes ill-conditioned.
Example of singularity
Intelligence scale WAIS is fully determined by its subscales, containing no additional independent information.
how to detect multicollinearity
Look for high bivariate correlations between predictors. look at tolerance = measure of uniqueness of predictor variable from other variables, low value = problem
how to detect singularity
Look for high multivariate correlations and low tolerance
Multiple regression approaches: Simultaneous
No a priori model assumed; all predictor variables fit together.
Multiple regression approaches: Stepwise
No a priori model; predictor variables added/removed one at a time to maximize fit.
Multiple regression approaches: Hierarchical
A priori knowledge of variables; assesses explanatory power of new variable.
factors affecting multiple linear regression: Outliers
Points deviating from others, having a disproportionate effect on linear regression fit.
what is Cook's distance used for
Measure extremity of outliers; value greater than 1 indicates concern.
what is Scedasticity
Distribution of residual error
what is homoscedasticity
residuals = relatively constant over range of the predictor variable (have constant variance)
what is Heteroscedasticity
Residuals vary systematically across the range of the predictor variable.
what are the Number of predictors
The number of observations should be high compared to predictor variables; results become meaningless as observation number decreases.
what does an Adjusted R2 do
Corrects for the number of predictor variables; reported in results section.
Range of predictor variable
Small range restricts statistical power.
Variable distribution
Should be normal or uniform; only the residuals need to be normal in multiple regression.