Quantitative Research and Statistical Analysis
Quantitative Research and Statistical Analysis
Introduction
- The lecture focuses on quantitative research and statistical analysis beyond null hypothesis significance testing.
- Emphasis on understanding data and relationships between variables for exploratory research.
- Correlation and regression are highlighted as tools to understand data relationships.
Quiz 5 Review
- Purpose of post hoc follow-up tests after a one-way ANOVA:
- To identify specific group means that differ from one another.
- Interpretation of a 5x2 factorial ANOVA:
- Indicates one continuous outcome measure and two categorical predictors.
- One predictor has five groups, and the other has two groups.
- Total of three variables: two predictors and one outcome.
- Reporting the main effect of training:
- Report F-statistic with degrees of freedom, F value, and significance.
- Identifying main effects and interactions in plots:
- Main effect: Compare means of groups for each factor.
- Interaction: Check if lines are parallel; non-parallel lines indicate interaction.
- Example: Main effect of study group and attending lectures with interaction.
- Assumptions of ANOVAs and potential violations:
- Assumptions include normally distributed residuals and equal variances.
- Unequal variances in subgroups violate ANOVA assumptions.
- Analyzing main effects and interactions from tables:
- Main effect: Compare means across levels of a factor.
- Interaction: Assess if the effect of one factor depends on another.
- Example: Main effects of exercise frequency and age, no interaction.
- Understanding F value in ANOVAs:
- F value indicates the ratio of variance between groups to variance within groups.
- F = \frac{Variation \ Between \ Groups}{Variation \ Within \ Groups}
- Directional alternative hypothesis in T-tests:
- T-tests can test directional hypotheses (one-tailed) due to symmetric sampling distribution.
- ANOVAs cannot test directional hypotheses (only non-specified differences).
- F values are always positive, preventing directional testing.
- When to run Tukey's HSD post-hoc test:
- Run after ANOVA if at least one effect is significant.
- Interpreting interactions in pop science reports:
- Interaction means the effect of one variable depends on another.
- Example: Health benefits of chocolate interact with mood.
Relationships Between Variables
- Focus on continuous variables and their relationships.
- Examples: date vs. mood, stock price vs. date, rainfall vs. date, extroversion vs. birth month.
- Questions to consider:
- Is there a relationship?
- How strong is the relationship?
- What shape is the relationship?
Correlation and Regression
- Tools for capturing and quantifying straight-line relationships are correlation and linear regression.
- Nonlinear regression captures more complex relationships (e.g., cyclical, quadratic).
- Goodness of fit measures quantify the strength of the relationship.
- Exploratory data analysis helps describe and quantify patterns in data.
Measuring Variability
- Review of measuring variability in data sets.
Simple Data Sets
- Example A: {2, 2, 2, 2, 2, 2} (no variation).
- Example B: {1, 3, 1, 0, 1, 4, 3} (more variation).
- Mean of both datasets is the same.
- Measures of variability: range, interquartile range, standard deviation.
- Formula for standard deviation:
- \sigma = \sqrt{\frac{\sum{i=1}^{N} (xi - \mu)^2}{N-1}}
- Where:
- \sigma = standard deviation
- x_i = each data point
- \mu = mean of all data points
- N = number of data points
- Steps:
- Subtract the mean from each data point: (x_i - \mu)
- Square the result: (x_i - \mu)^2
- Sum all the squared values: \sum{i=1}^{N} (xi - \mu)^2
- Divide by N-1
- Take the square root.
Detailed Calculation
- Example dataset and step-by-step calculation.
- Subtracting the mean: Measuring how far each data point is from the mean.
- Squaring: Disregarding whether the data point is above or below the mean.
- Sum of Squares: Sum of the squared differences between each data point and the mean.
- Dividing by N-1: Calculating the variance.
- Square Root: Returning to original units, resulting in the standard deviation.
Standard Deviation
- Small standard deviation: Data points are tightly clustered.
- Large standard deviation: Data points are spread out.
- Interpretable because it's in the original units.
Covariance and Correlation
- Covariance: Measures how much two variables change together.
- Formula for covariance (conceptual, not for calculation).
- Covariance involves:
- Measuring how far each data point is from the mean for both X and Y.
- Multiplying these differences.
- Summing the results.
- Positive values: Variables agree with one another (both above or below their means).
- Negative values: Variables disagree.
- Covariance depends on the units of the original tests, making it less interpretable.
Correlation
- Standardized measure that doesn't depend on original units.
- Formula (conceptual, not for calculation).
- Correlation is measured by:
- r = \frac{covariance(X,Y)}{standard \ deviation(X) * standard \ deviation(Y)}
- Ranges from -1 to +1.
- Perfect Relationship: r = 1
- If they almost perfectly align with each other: r = .97
- Pearson correlation: Measures the strength of a straight-line relationship.
- Zero correlation: No agreement between data points.
Linear Regression
- Statistical toolkit for finding the best-fitting line to describe the relationship between variables.
- Requires slope and shift parameters.
- Statistical model: A shape that describes the relationship between variables.
Equation of a Line
- y = \beta0 + \beta1x + \epsilon
- \beta_0 = shift (intercept)
- \beta_1 = slope
- \epsilon = error
Best Fitting Line
- Fitting a line to data: Involves minimizing the differences between the data points and the line.
- Residual sum of squares: Measure of the difference between data points and the line.
- Iteratively adjusting the line: Finding the line that has the lowest possible residual sum of squares.
Predicting One Variable from Another
- Regression allows predicting the value of one variable based on another.
- Predictor variable: Plotted on the x-axis.
- Outcome variable: Plotted on the y-axis.
- Example: Predicting a toddler's vocabulary from their age in months.
Minimizing Residual Errors
- Linear Model: Finding the best compromise between all data points.
- Algorithm: Iteratively adjusting the slope and shift. Minimizing error.
Making Predictions
- Coefficients (slope and shift) will be generated by the R algorithm.
Goodness of Fit Measures
- How well the fitted line captures the data.
R-Squared
- Measure of how well the model predicts the outcome variable.
- R^2 = 1 - \frac{Residual \ Sum \ of \ Squares}{Total \ Variability \ in \ outcome \ Measure}
- Proportion of the variance that you've explained.
- R - and R-squared are the same number (r^2 = R^2)
- The square-root of goodness of fit measure.
Correlation vs. Linear Regression
- Correlation is simpler to calculate.
- Linear regression allows for prediction of new data points (more complex).
Multiple Regression
- Extending linear regression to multiple dimensions.
- Predicting happiness from income, extroversion, IQ, age, etc.
- Multiple regression in R:
- Gives an R-squared goodness of fit.
- Tells the variability of how much these attributes influence someone.
Non-Linear Relationships
- Curvilinear vs. Monotonic Relationships
Spearman Rank Correlation
- Measures the data with ranking (highest value, lowest value, etc.)
- Does not use absolute values of X and Y.
Quadratic vs. Polynomial Regression
- The models measure quadratic and polynomial relationships.
- Models use method to find the line of best fit.
- These models have a good R- squared value as well.
Important of Visually Showing the Data
- Ensure your data visualizations match what you are intending to measure.
- Correlation Does Not Imply Causation.