Definition: Involves making inferences about a population based on sample data.
Purpose: Describes and summarizes data without making inferences.
Example: Average words spoken in conversation.
Male participants: 54 words.
Female participants: 75 words.
Purpose: Draws conclusions about a population based on sample data.
Example: Inferring that the average words spoken by women is greater than that by men.
Conclusion is uncertain and indicates support but not absolute truth about population values.
Key Concept: Inferential stats help determine if observed differences are statistically significant and not due to random chance.
Descriptive statistics provide numerical differences, but inferential statistics appraise their meaning.
Control and Random Assignment: Essential for quality in research to isolate effects of the independent variable (IV) alone.
Null Hypothesis (H0): No differences exist in means (any observed difference is due to random error).
Research Hypothesis (H1): Population means are statistically different due to independent variables (IV).
Alpha Level: Typically set at 0.05 (5% chance of Type I error).
Misconceptions about significance:
5% chance of a false positive does not imply a 95% chance the difference is real.
Does not confirm hypothesis is true and often ignores practical significance (effect size and confidence intervals).
A p-value < 0.05 indicates low probability of observing this difference assuming null hypothesis is true (less than 5% chance).
Null Hypothesis as Death by Natural Causes: A low p-value indicates less likelihood that differences observed are due to natural causes, not proving causation by the independent variable.
Type I Error: Incorrectly rejecting null hypothesis when no effect exists (false positive).
Analogy: A false pregnancy test.
Type II Error: Failing to reject null hypothesis when an effect is present (false negative).
Ability to reject null when it is false reflects the study's power; small sample sizes may lead to this error.
Many non-significant results go unpublished, even when they provide insights.
Reasons for non-publication may include ceiling effects, floor effects, or poor validity of measures.
Larger sample sizes increase reliability of intercepting effect sizes.
Small sample with large differences can be misleading.
Example: Significant results may be seen in large samples even if differences are small, while small samples may show noise masking true differences.
Effect Size Considerations:
Importance of the findings in real-world context (e.g., IQ changes from investments).
Factors influencing power include size of effect, sample size, and type of test used.
Higher alpha levels increase the likelihood of rejecting the null hypothesis but trade-off must be considered.
More commonly utilized but depend on certain assumptions:
Measurement scales being interval or ratio.
Normality of sampling distributions.
Homogeneity of variance.
The theory states that with a sufficiently large sample size (typically n > 30), the sampling distribution of the sample mean approximates a normal distribution regardless of population distribution.
Use Z-scores or boxplots to identify outliers (values that fall outside the whiskers of a boxplot).
Independent Samples t-test: Used to compare means of two unrelated groups.
Paired Samples t-test: Used when samples are connected (e.g., before and after measurements).
Normality of distribution and independence between samples are key assumptions, and these need to be tested.
F Test: Evaluates differences among three or more groups, assessing systematic versus error variance.
One-Way ANOVA: Assesses means across several groups; post hoc tests can determine specific group differences.
ANOVA assumes normality, equal variances, and independence of observations.
If assumptions are violated, alternative methods (e.g., non-parametric tests) may be needed.
Effect size can indicate the proportion of variance explained by the independent variable.
Multiple replications across different settings enhance understanding and establish reliability of findings.
The comprehension of inferential statistics, including hypothesis testing, errors, and effect sizes, is crucial for analyzing data effectively and making appropriate conclusions in research contexts.
One-Tailed Test: Tests for a specific direction of an effect (e.g., whether the mean of one group is greater than the mean of another).
Example: Testing if a new drug leads to an increase in recovery rates compared to a placebo.
Two-Tailed Test: Tests for any difference without specifying a direction (e.g., whether the means of two groups are different).
Example: Testing whether a new educational program leads to a change in test scores, either higher or lower.
Variance: Measures how data points differ from the mean. The role of variance in statistical tests is crucial:
t-Tests: Variance is used to estimate the standard error of the mean. High variance can lead to a larger standard error, making it harder to detect a significant difference.
F-Tests/ANOVA: Variance is assessed to determine if systematic variance (variance between group means) is significantly greater than error variance (variance within groups). A significant F-statistic indicates that at least one group mean differs from others.
Post Hoc Tests: Conducted after a significant ANOVA to determine which specific group means are different from each other.
Common tests include Tukey's HSD, Bonferroni, and Scheffé's Test. These tests control for Type I error across multiple comparisons, ensuring reliable conclusions about specific differences among groups.
Error Bars: Graphical representations of the variability of data and indicate the uncertainty around sample estimates.
Typically represent the standard error or confidence intervals of the mean.
Helpful for visualizing the overlap of datasets; less overlap suggests greater differences in means across groups.