Correlation + Regression

Sampling Error: The variability due to chance in statistical samples.
- Definition: Sampling error represents the difference between the sample statistic and the actual population parameter.
- Concept: Understanding how likely a calculated mean (or any statistic) is due to random sampling variation.
Population vs. Sample:
- Population: The entire group we want to understand or generalize our results to.
- Sample: A subset of the population that we actually collect data from because we cannot access the entire population.
Sampling Variation:
- Each sample taken will typically yield slightly different results due to inherent variability.
- Example: Drawing a handful of 20 Skittles will yield different results in color distributions each time.

Standard Error: Measurement indicating how much sample means are expected to deviate from the population mean.
- Utilizes sample size to assess how close samples are to the population mean.
- Implications: Understanding standard error helps determine the reliability of sample statistics.

A normal distribution can be constructed from all possible samples of a fixed size from the population.
- Most sample means will cluster around the actual population mean.
- Extreme values in samples (e.g., all yellow Skittles) are rare but possible.
Usage of standard error aids in estimating the spread of the sampling distribution.
Tail Ends of the Distribution: Predictions about outliers or extreme sample means can guide decisions about the sample's reliability in representing the population.

Null Hypothesis ($H_0$): A statement asserting no effect or difference.
Cutoff Values (Alpha Levels):
- Common thresholds include alpha of 0.05 or 0.01.
- These values denote critical regions for statistical significance in hypothesis testing.
Decision Making: Decide whether to reject or fail to reject the null hypothesis based on sampling data.

Type I Error (Alpha): Rejecting the null hypothesis when it is true.
Type II Error (Beta): Failing to reject the null hypothesis when it is false.
Statistical Power: The probability of correctly rejecting a false null hypothesis; it is defined as $1 - eta$. A high power is desirable when conducting research.

Z Scores: A way to standardize values from different scales to compare them meaningfully.
- Useful in evaluating measures (e.g., anxiety vs. depression) that are recorded on different scales.

Larger sample sizes typically provide better power and more reliable results.
There exists a balance; overwhelming sample sizes can lead to detecting trivial effects that are not practically significant.

Correlation: A statistical measure that describes the extent to which variables change together.
Regression: Used to predict the value of one variable based on the value of another.
Scatter Plots and Relationships: Visual representation of data points from two variables can highlight potential correlations.
- Outliers: Extreme data points that do not fit the overall pattern.
- Interpretation of scatter plots for insight into the relationship between two variables (e.g., exam anxiety vs. exam performance).

Purpose of scatter plots in visualizing correlations:
- Plot individual scores of two variables to visually assess relationships.
- Identify outliers and trends.

Covariance: Measures how much two variables vary together. High positive covariance indicates a strong direct relationship, while negative covariance indicates an inverse relationship.
Standardization of Covariance: To derive the correlation coefficient, one should standardize by dividing the covariance by the product of standard deviations of both variables.

Correlation Coefficient (r): A standardized measure representing the strength and direction of a linear relationship between two variables.
- Ranges from -1 to +1, where:
- +1 indicates a perfect positive correlation,
- -1 indicates a perfect negative correlation,
- 0 indicates no correlation.
Mathematics of Correlation Coefficient:
- The formula for $r$ involves the covariance of the two variables divided by the product of their respective standard deviations.
Importance of Power in Research Studies:
- Adequate power ensures that the research project is well-designed and has a higher chance of detecting true effects.

Power analysis should be a consideration early in research planning to ensure that adequate sample sizes can be achieved to yield meaningful results. Visualizing relationships through scatter plots and understanding correlations provides critical insights into the dynamics of variables, paving the way for more nuanced analyses moving forward.