Quantitative Research and Statistical Analysis

Introduction

The lecture focuses on quantitative research and statistical analysis beyond null hypothesis significance testing.
Emphasis on understanding data and relationships between variables for exploratory research.
Correlation and regression are highlighted as tools to understand data relationships.

Quiz 5 Review

Purpose of post hoc follow-up tests after a one-way ANOVA:
- To identify specific group means that differ from one another.
Interpretation of a 5x2 factorial ANOVA:
- Indicates one continuous outcome measure and two categorical predictors.
- One predictor has five groups, and the other has two groups.
- Total of three variables: two predictors and one outcome.
Reporting the main effect of training:
- Report F-statistic with degrees of freedom, F value, and significance.
Identifying main effects and interactions in plots:
- Main effect: Compare means of groups for each factor.
- Interaction: Check if lines are parallel; non-parallel lines indicate interaction.
- Example: Main effect of study group and attending lectures with interaction.
Assumptions of ANOVAs and potential violations:
- Assumptions include normally distributed residuals and equal variances.
- Unequal variances in subgroups violate ANOVA assumptions.
Analyzing main effects and interactions from tables:
- Main effect: Compare means across levels of a factor.
- Interaction: Assess if the effect of one factor depends on another.
- Example: Main effects of exercise frequency and age, no interaction.
Understanding F value in ANOVAs:
- F value indicates the ratio of variance between groups to variance within groups.
- F = \frac{Variation \ Between \ Groups}{Variation \ Within \ Groups}
Directional alternative hypothesis in T-tests:
- T-tests can test directional hypotheses (one-tailed) due to symmetric sampling distribution.
- ANOVAs cannot test directional hypotheses (only non-specified differences).
- F values are always positive, preventing directional testing.
When to run Tukey's HSD post-hoc test:
- Run after ANOVA if at least one effect is significant.
Interpreting interactions in pop science reports:
- Interaction means the effect of one variable depends on another.
- Example: Health benefits of chocolate interact with mood.

Relationships Between Variables

Focus on continuous variables and their relationships.
Examples: date vs. mood, stock price vs. date, rainfall vs. date, extroversion vs. birth month.
Questions to consider:
- Is there a relationship?
- How strong is the relationship?
- What shape is the relationship?

Correlation and Regression

Tools for capturing and quantifying straight-line relationships are correlation and linear regression.
Nonlinear regression captures more complex relationships (e.g., cyclical, quadratic).
Goodness of fit measures quantify the strength of the relationship.
Exploratory data analysis helps describe and quantify patterns in data.

Measuring Variability

Review of measuring variability in data sets.

Simple Data Sets

Example A: {2, 2, 2, 2, 2, 2} (no variation).
Example B: {1, 3, 1, 0, 1, 4, 3} (more variation).
Mean of both datasets is the same.
Measures of variability: range, interquartile range, standard deviation.

Standard Deviation Formula

Formula for standard deviation:
- \sigma = \sqrt{\frac{\sum{i=1}^{N} (xi - \mu)^2}{N-1}}
- Where:
  - \sigma = standard deviation
  - x_i = each data point
  - \mu = mean of all data points
  - N = number of data points
Steps:
- Subtract the mean from each data point: (x_i - \mu)
- Square the result: (x_i - \mu)^2
- Sum all the squared values: \sum{i=1}^{N} (xi - \mu)^2
- Divide by N-1
- Take the square root.

Detailed Calculation

Example dataset and step-by-step calculation.
Subtracting the mean: Measuring how far each data point is from the mean.
Squaring: Disregarding whether the data point is above or below the mean.
- Creates squared values.
Sum of Squares: Sum of the squared differences between each data point and the mean.
Dividing by N-1: Calculating the variance.
Square Root: Returning to original units, resulting in the standard deviation.

Standard Deviation

Small standard deviation: Data points are tightly clustered.
Large standard deviation: Data points are spread out.
Interpretable because it's in the original units.

Covariance and Correlation

Covariance: Measures how much two variables change together.
Formula for covariance (conceptual, not for calculation).
Covariance involves:
- Measuring how far each data point is from the mean for both X and Y.
- Multiplying these differences.
- Summing the results.
Positive values: Variables agree with one another (both above or below their means).
Negative values: Variables disagree.
Covariance depends on the units of the original tests, making it less interpretable.

Correlation

Standardized measure that doesn't depend on original units.
Formula (conceptual, not for calculation).
Correlation is measured by:
- r = \frac{covariance(X,Y)}{standard \ deviation(X) * standard \ deviation(Y)}
Ranges from -1 to +1.
Perfect Relationship: r = 1
If they almost perfectly align with each other: r = .97
Pearson correlation: Measures the strength of a straight-line relationship.
Zero correlation: No agreement between data points.

Linear Regression

Statistical toolkit for finding the best-fitting line to describe the relationship between variables.
Requires slope and shift parameters.
Statistical model: A shape that describes the relationship between variables.

Equation of a Line

y = \beta0 + \beta1x + \epsilon
- \beta_0 = shift (intercept)
- \beta_1 = slope
- \epsilon = error

Best Fitting Line

Fitting a line to data: Involves minimizing the differences between the data points and the line.
Residual sum of squares: Measure of the difference between data points and the line.
Iteratively adjusting the line: Finding the line that has the lowest possible residual sum of squares.

Predicting One Variable from Another

Regression allows predicting the value of one variable based on another.
Predictor variable: Plotted on the x-axis.
Outcome variable: Plotted on the y-axis.
Example: Predicting a toddler's vocabulary from their age in months.

Minimizing Residual Errors

Linear Model: Finding the best compromise between all data points.
Algorithm: Iteratively adjusting the slope and shift. Minimizing error.

Making Predictions

Coefficients (slope and shift) will be generated by the R algorithm.

Goodness of Fit Measures

How well the fitted line captures the data.

R-Squared

Measure of how well the model predicts the outcome variable.
R^2 = 1 - \frac{Residual \ Sum \ of \ Squares}{Total \ Variability \ in \ outcome \ Measure}
Proportion of the variance that you've explained.
R - and R-squared are the same number (r^2 = R^2)
The square-root of goodness of fit measure.

Correlation vs. Linear Regression

Correlation is simpler to calculate.
Linear regression allows for prediction of new data points (more complex).

Multiple Regression

Extending linear regression to multiple dimensions.
Predicting happiness from income, extroversion, IQ, age, etc.
Multiple regression in R:
- Gives an R-squared goodness of fit.
- Tells the variability of how much these attributes influence someone.

Non-Linear Relationships

Curvilinear vs. Monotonic Relationships

Spearman Rank Correlation

Measures the data with ranking (highest value, lowest value, etc.)
Does not use absolute values of X and Y.

Quadratic vs. Polynomial Regression

The models measure quadratic and polynomial relationships.
Models use method to find the line of best fit.
These models have a good R- squared value as well.

Important of Visually Showing the Data

Ensure your data visualizations match what you are intending to measure.
Correlation Does Not Imply Causation.