Correlation
relationship between two continuous variables, measure degree data points cluster around regression line
Line of Best Fit
show general pattern of relationship between dependent & independent variable
Correlation Coefficients
measure direction & magnitude of relationship between independent & dependent variable
Magnitude of Association
Numerical value of correlation coefficient, show strength of association, 0 = no linear association, -/+ 1 = perfect linear association
Direction of Association
If correlation coefficient positive or negative, show directionality of relationship (pos or neg correlation)
Pearson Correlation
measure degree of relationship between linear related variables
Assumptions of Pearson Correlation
scale measurements, normal distribution, 2 variables = paired, no outliers, linearity, homoscedasticity
Linearity
straight line relationship between 2 variables, as x increase -> y increase/decrease, check with scatter plot visualization
Homoscedasticity
equal spread of data around line of best fit, data = homoscedastic or heteroscedastic, check with scatter plot
r
correlation coefficient
Spearman's Rank Order Correlation
Non-parametric Pearson's correlation, ranks data to explore relationship between 2 variables
Assumptions of Spearman's Rank Order Correlation
scale/ordinal data, any distribution, linear relationship between variables
Intraclass Correlation Coefficient (ICC)
used to evaluate inter-rate reliability, test-retest reliability & intra-rater reliability; for data structured as groups (not pairs)
Inter-Rater Reliability
variation between >=2 raters, measuring same event
Test-Retest Reliability
variation in 2 measurements under same conditions
Intra-Rater Reliability
variation within 1 rater across >= 2 trails
Factors Impacting Correlations
Restricting Data Range (can sometimes be good), Heterogenous Samples, Outliers (alter correlation)
Clinical use of correlation
reliability to clinical assessments tools, impact medical decision making
Regression
also explore relationship between variables, how explanatory variable impact response variable
Response variable
variable you are predicting, outcome/dependent/y variable
explanatory variable
variable you use to predict, x/independent variable
residuals
distance observed y lies from regression line
epsilon
error term, represent residual
beta 0
y-intercept
beta 1
slope of regression line
Least Squares Method
regression - chooses values of y-intercept & slope that minimize sum of squared residuals
what does a lower squares value mean?
smaller difference between data points & line of best fit
Regression Assumptions
scale data, residuals of regression line are normally distributed, no outliers, linear relationship between 2 variables, data = homoscedastic
Least Squares Regression Model
Sum of residuals = 0, line of best fit passes through mean of x & mean of y
R square
coefficient of determination, amount of variation in y explained by x
interpret image coefficients
y-intercept = -77.283, slope = 3.33
Interpret model summary
R = simple correlation between variables, 57% of variation of y is explained by X
Interpret ANOVA results
model significantly predicts y
r squared characteristics
always positive, as approached 0 = low variation in Y determined by x, max = 1 (all variability in y is determined by x
Linear Regression
1 explanatory & 1 response variable, scale measurements
Multiple Linear Regressions
1 response variable, >1 explanatory variable, scale measurements
Logistic Regression
1 explanatory variable, 1 response variable (dichotomous)
statistical significance
probability of event occurring due to random chance
clinical significance
event/difference is meaningful for a clinical reason
biological significance
whether finding has biological relevance
Data reproducibility
ability to reproduce/replicate findings
Replication crisis
our current inability to reproduce scientific results
causes for replication crisis
publication bias, bad study design & power, questionable research practices
Questionable Research Practices
p-hacking, selective reporting, sampling bias, HARKing ( Hypothesis After Result is Known)
Solutions to replicability crisis
preregistration of studies, replication studies, open science, alternative statistical approaches, education