Correlation + Regression
Overview of Sampling Distribution and Sampling Error
Sampling Error: The variability due to chance in statistical samples.
Definition: Sampling error represents the difference between the sample statistic and the actual population parameter.
Concept: Understanding how likely a calculated mean (or any statistic) is due to random sampling variation.
Population vs. Sample:
Population: The entire group we want to understand or generalize our results to.
Sample: A subset of the population that we actually collect data from because we cannot access the entire population.
Sampling Variation:
Each sample taken will typically yield slightly different results due to inherent variability.
Example: Drawing a handful of 20 Skittles will yield different results in color distributions each time.
Standard Error and Its Importance
Standard Error: Measurement indicating how much sample means are expected to deviate from the population mean.
Utilizes sample size to assess how close samples are to the population mean.
Implications: Understanding standard error helps determine the reliability of sample statistics.
Normal Distribution in Sampling
A normal distribution can be constructed from all possible samples of a fixed size from the population.
Most sample means will cluster around the actual population mean.
Extreme values in samples (e.g., all yellow Skittles) are rare but possible.
Usage of standard error aids in estimating the spread of the sampling distribution.
Tail Ends of the Distribution: Predictions about outliers or extreme sample means can guide decisions about the sample's reliability in representing the population.
Hypothesis Testing Overview
Null Hypothesis ($H_0$): A statement asserting no effect or difference.
Cutoff Values (Alpha Levels):
Common thresholds include alpha of 0.05 or 0.01.
These values denote critical regions for statistical significance in hypothesis testing.
Decision Making: Decide whether to reject or fail to reject the null hypothesis based on sampling data.
Types of Errors in Hypothesis Testing
Type I Error (Alpha): Rejecting the null hypothesis when it is true.
Type II Error (Beta): Failing to reject the null hypothesis when it is false.
Statistical Power: The probability of correctly rejecting a false null hypothesis; it is defined as $1 - eta$. A high power is desirable when conducting research.
Z Scores and Standardization
Z Scores: A way to standardize values from different scales to compare them meaningfully.
Useful in evaluating measures (e.g., anxiety vs. depression) that are recorded on different scales.
Sample Size Considerations
Larger sample sizes typically provide better power and more reliable results.
There exists a balance; overwhelming sample sizes can lead to detecting trivial effects that are not practically significant.
Correlation and Regression Review
Correlation: A statistical measure that describes the extent to which variables change together.
Regression: Used to predict the value of one variable based on the value of another.
Scatter Plots and Relationships: Visual representation of data points from two variables can highlight potential correlations.
Outliers: Extreme data points that do not fit the overall pattern.
Interpretation of scatter plots for insight into the relationship between two variables (e.g., exam anxiety vs. exam performance).
Visualizing Correlations
Purpose of scatter plots in visualizing correlations:
Plot individual scores of two variables to visually assess relationships.
Identify outliers and trends.
Types of Correlation
Positive Correlation: As one variable increases, the other also increases;
Negative Correlation: As one variable increases, the other decreases.
No Correlation: No discernible trend or association between the variables.
Covariance and its Role
Covariance: Measures how much two variables vary together. High positive covariance indicates a strong direct relationship, while negative covariance indicates an inverse relationship.
Standardization of Covariance: To derive the correlation coefficient, one should standardize by dividing the covariance by the product of standard deviations of both variables.
Calculation of Correlation Coefficient (r)
Correlation Coefficient (r): A standardized measure representing the strength and direction of a linear relationship between two variables.
Ranges from -1 to +1, where:
+1 indicates a perfect positive correlation,
-1 indicates a perfect negative correlation,
0 indicates no correlation.
Mathematics of Correlation Coefficient:
The formula for $r$ involves the covariance of the two variables divided by the product of their respective standard deviations.
Importance of Power in Research Studies:
Adequate power ensures that the research project is well-designed and has a higher chance of detecting true effects.
Conclusion and Final Thoughts
Power analysis should be a consideration early in research planning to ensure that adequate sample sizes can be achieved to yield meaningful results. Visualizing relationships through scatter plots and understanding correlations provides critical insights into the dynamics of variables, paving the way for more nuanced analyses moving forward.