Quantitative Research and Statistical Analysis

Quantitative Research and Statistical Analysis

Introduction

  • The lecture focuses on quantitative research and statistical analysis beyond null hypothesis significance testing.
  • Emphasis on understanding data and relationships between variables for exploratory research.
  • Correlation and regression are highlighted as tools to understand data relationships.

Quiz 5 Review

  • Purpose of post hoc follow-up tests after a one-way ANOVA:
    • To identify specific group means that differ from one another.
  • Interpretation of a 5x2 factorial ANOVA:
    • Indicates one continuous outcome measure and two categorical predictors.
    • One predictor has five groups, and the other has two groups.
    • Total of three variables: two predictors and one outcome.
  • Reporting the main effect of training:
    • Report F-statistic with degrees of freedom, F value, and significance.
  • Identifying main effects and interactions in plots:
    • Main effect: Compare means of groups for each factor.
    • Interaction: Check if lines are parallel; non-parallel lines indicate interaction.
    • Example: Main effect of study group and attending lectures with interaction.
  • Assumptions of ANOVAs and potential violations:
    • Assumptions include normally distributed residuals and equal variances.
    • Unequal variances in subgroups violate ANOVA assumptions.
  • Analyzing main effects and interactions from tables:
    • Main effect: Compare means across levels of a factor.
    • Interaction: Assess if the effect of one factor depends on another.
    • Example: Main effects of exercise frequency and age, no interaction.
  • Understanding F value in ANOVAs:
    • F value indicates the ratio of variance between groups to variance within groups.
    • F = \frac{Variation \ Between \ Groups}{Variation \ Within \ Groups}
  • Directional alternative hypothesis in T-tests:
    • T-tests can test directional hypotheses (one-tailed) due to symmetric sampling distribution.
    • ANOVAs cannot test directional hypotheses (only non-specified differences).
    • F values are always positive, preventing directional testing.
  • When to run Tukey's HSD post-hoc test:
    • Run after ANOVA if at least one effect is significant.
  • Interpreting interactions in pop science reports:
    • Interaction means the effect of one variable depends on another.
    • Example: Health benefits of chocolate interact with mood.

Relationships Between Variables

  • Focus on continuous variables and their relationships.
  • Examples: date vs. mood, stock price vs. date, rainfall vs. date, extroversion vs. birth month.
  • Questions to consider:
    • Is there a relationship?
    • How strong is the relationship?
    • What shape is the relationship?

Correlation and Regression

  • Tools for capturing and quantifying straight-line relationships are correlation and linear regression.
  • Nonlinear regression captures more complex relationships (e.g., cyclical, quadratic).
  • Goodness of fit measures quantify the strength of the relationship.
  • Exploratory data analysis helps describe and quantify patterns in data.

Measuring Variability

  • Review of measuring variability in data sets.

Simple Data Sets

  • Example A: {2, 2, 2, 2, 2, 2} (no variation).
  • Example B: {1, 3, 1, 0, 1, 4, 3} (more variation).
  • Mean of both datasets is the same.
  • Measures of variability: range, interquartile range, standard deviation.

Standard Deviation Formula

  • Formula for standard deviation:
    • \sigma = \sqrt{\frac{\sum{i=1}^{N} (xi - \mu)^2}{N-1}}
    • Where:
      • \sigma = standard deviation
      • x_i = each data point
      • \mu = mean of all data points
      • N = number of data points
  • Steps:
    • Subtract the mean from each data point: (x_i - \mu)
    • Square the result: (x_i - \mu)^2
    • Sum all the squared values: \sum{i=1}^{N} (xi - \mu)^2
    • Divide by N-1
    • Take the square root.

Detailed Calculation

  • Example dataset and step-by-step calculation.
  • Subtracting the mean: Measuring how far each data point is from the mean.
  • Squaring: Disregarding whether the data point is above or below the mean.
    • Creates squared values.
  • Sum of Squares: Sum of the squared differences between each data point and the mean.
  • Dividing by N-1: Calculating the variance.
  • Square Root: Returning to original units, resulting in the standard deviation.

Standard Deviation

  • Small standard deviation: Data points are tightly clustered.
  • Large standard deviation: Data points are spread out.
  • Interpretable because it's in the original units.

Covariance and Correlation

  • Covariance: Measures how much two variables change together.
  • Formula for covariance (conceptual, not for calculation).
  • Covariance involves:
    • Measuring how far each data point is from the mean for both X and Y.
    • Multiplying these differences.
    • Summing the results.
  • Positive values: Variables agree with one another (both above or below their means).
  • Negative values: Variables disagree.
  • Covariance depends on the units of the original tests, making it less interpretable.

Correlation

  • Standardized measure that doesn't depend on original units.
  • Formula (conceptual, not for calculation).
  • Correlation is measured by:
    • r = \frac{covariance(X,Y)}{standard \ deviation(X) * standard \ deviation(Y)}
  • Ranges from -1 to +1.
  • Perfect Relationship: r = 1
  • If they almost perfectly align with each other: r = .97
  • Pearson correlation: Measures the strength of a straight-line relationship.
  • Zero correlation: No agreement between data points.

Linear Regression

  • Statistical toolkit for finding the best-fitting line to describe the relationship between variables.
  • Requires slope and shift parameters.
  • Statistical model: A shape that describes the relationship between variables.

Equation of a Line

  • y = \beta0 + \beta1x + \epsilon
    • \beta_0 = shift (intercept)
    • \beta_1 = slope
    • \epsilon = error

Best Fitting Line

  • Fitting a line to data: Involves minimizing the differences between the data points and the line.
  • Residual sum of squares: Measure of the difference between data points and the line.
  • Iteratively adjusting the line: Finding the line that has the lowest possible residual sum of squares.

Predicting One Variable from Another

  • Regression allows predicting the value of one variable based on another.
  • Predictor variable: Plotted on the x-axis.
  • Outcome variable: Plotted on the y-axis.
  • Example: Predicting a toddler's vocabulary from their age in months.

Minimizing Residual Errors

  • Linear Model: Finding the best compromise between all data points.
  • Algorithm: Iteratively adjusting the slope and shift. Minimizing error.

Making Predictions

  • Coefficients (slope and shift) will be generated by the R algorithm.

Goodness of Fit Measures

  • How well the fitted line captures the data.

R-Squared

  • Measure of how well the model predicts the outcome variable.
  • R^2 = 1 - \frac{Residual \ Sum \ of \ Squares}{Total \ Variability \ in \ outcome \ Measure}
  • Proportion of the variance that you've explained.
  • R - and R-squared are the same number (r^2 = R^2)
  • The square-root of goodness of fit measure.

Correlation vs. Linear Regression

  • Correlation is simpler to calculate.
  • Linear regression allows for prediction of new data points (more complex).

Multiple Regression

  • Extending linear regression to multiple dimensions.
  • Predicting happiness from income, extroversion, IQ, age, etc.
  • Multiple regression in R:
    • Gives an R-squared goodness of fit.
    • Tells the variability of how much these attributes influence someone.

Non-Linear Relationships

  • Curvilinear vs. Monotonic Relationships

Spearman Rank Correlation

  • Measures the data with ranking (highest value, lowest value, etc.)
  • Does not use absolute values of X and Y.

Quadratic vs. Polynomial Regression

  • The models measure quadratic and polynomial relationships.
  • Models use method to find the line of best fit.
  • These models have a good R- squared value as well.

Important of Visually Showing the Data

  • Ensure your data visualizations match what you are intending to measure.
  • Correlation Does Not Imply Causation.