Chapter 13 Notes on Inferential Statistics

Inferential Statistics

  • Definition: Involves making inferences about a population based on sample data.

Descriptive Statistics
  • Purpose: Describes and summarizes data without making inferences.

  • Example: Average words spoken in conversation.

    • Male participants: 54 words.

    • Female participants: 75 words.

Inferential Statistics
  • Purpose: Draws conclusions about a population based on sample data.

  • Example: Inferring that the average words spoken by women is greater than that by men.

    • Conclusion is uncertain and indicates support but not absolute truth about population values.

Real vs. Chance Differences
  • Key Concept: Inferential stats help determine if observed differences are statistically significant and not due to random chance.

    • Descriptive statistics provide numerical differences, but inferential statistics appraise their meaning.

Equivalent Groups Importance
  • Control and Random Assignment: Essential for quality in research to isolate effects of the independent variable (IV) alone.

Hypotheses

  • Null Hypothesis (H0): No differences exist in means (any observed difference is due to random error).

  • Research Hypothesis (H1): Population means are statistically different due to independent variables (IV).

Statistical Significance

  • Alpha Level: Typically set at 0.05 (5% chance of Type I error).

  • Misconceptions about significance:

    • 5% chance of a false positive does not imply a 95% chance the difference is real.

    • Does not confirm hypothesis is true and often ignores practical significance (effect size and confidence intervals).

Understanding p-values
  • A p-value < 0.05 indicates low probability of observing this difference assuming null hypothesis is true (less than 5% chance).

Jury Analogy

  • Null Hypothesis as Death by Natural Causes: A low p-value indicates less likelihood that differences observed are due to natural causes, not proving causation by the independent variable.

Types of Errors

  • Type I Error: Incorrectly rejecting null hypothesis when no effect exists (false positive).

    • Analogy: A false pregnancy test.

  • Type II Error: Failing to reject null hypothesis when an effect is present (false negative).

    • Ability to reject null when it is false reflects the study's power; small sample sizes may lead to this error.

Publishing Issues

  • Many non-significant results go unpublished, even when they provide insights.

  • Reasons for non-publication may include ceiling effects, floor effects, or poor validity of measures.

Sample Size Considerations

  • Larger sample sizes increase reliability of intercepting effect sizes.

    • Small sample with large differences can be misleading.

  • Example: Significant results may be seen in large samples even if differences are small, while small samples may show noise masking true differences.

Practical Importance

  • Effect Size Considerations:

    • Importance of the findings in real-world context (e.g., IQ changes from investments).

Power Analysis

  • Factors influencing power include size of effect, sample size, and type of test used.

  • Higher alpha levels increase the likelihood of rejecting the null hypothesis but trade-off must be considered.

Parametric Tests

  • More commonly utilized but depend on certain assumptions:

    • Measurement scales being interval or ratio.

    • Normality of sampling distributions.

    • Homogeneity of variance.

Central Limit Theorem
  • The theory states that with a sufficiently large sample size (typically n > 30), the sampling distribution of the sample mean approximates a normal distribution regardless of population distribution.

Identifying Outliers

  • Use Z-scores or boxplots to identify outliers (values that fall outside the whiskers of a boxplot).

t-tests

Types of t-tests
  • Independent Samples t-test: Used to compare means of two unrelated groups.

  • Paired Samples t-test: Used when samples are connected (e.g., before and after measurements).

Common Assumptions
  • Normality of distribution and independence between samples are key assumptions, and these need to be tested.

F Tests and ANOVA

  • F Test: Evaluates differences among three or more groups, assessing systematic versus error variance.

  • One-Way ANOVA: Assesses means across several groups; post hoc tests can determine specific group differences.

Model Assumptions
  • ANOVA assumes normality, equal variances, and independence of observations.

  • If assumptions are violated, alternative methods (e.g., non-parametric tests) may be needed.

Effect Size in ANOVA
  • Effect size can indicate the proportion of variance explained by the independent variable.

Replication Importance

  • Multiple replications across different settings enhance understanding and establish reliability of findings.

Summary

  • The comprehension of inferential statistics, including hypothesis testing, errors, and effect sizes, is crucial for analyzing data effectively and making appropriate conclusions in research contexts.

One-Tailed vs. Two-Tailed Tests
  • One-Tailed Test: Tests for a specific direction of an effect (e.g., whether the mean of one group is greater than the mean of another).

    • Example: Testing if a new drug leads to an increase in recovery rates compared to a placebo.

  • Two-Tailed Test: Tests for any difference without specifying a direction (e.g., whether the means of two groups are different).

    • Example: Testing whether a new educational program leads to a change in test scores, either higher or lower.

Role of Variance on t and F Tests
  • Variance: Measures how data points differ from the mean. The role of variance in statistical tests is crucial:

    • t-Tests: Variance is used to estimate the standard error of the mean. High variance can lead to a larger standard error, making it harder to detect a significant difference.

    • F-Tests/ANOVA: Variance is assessed to determine if systematic variance (variance between group means) is significantly greater than error variance (variance within groups). A significant F-statistic indicates that at least one group mean differs from others.

Post Hoc Tests
  • Post Hoc Tests: Conducted after a significant ANOVA to determine which specific group means are different from each other.

    • Common tests include Tukey's HSD, Bonferroni, and Scheffé's Test. These tests control for Type I error across multiple comparisons, ensuring reliable conclusions about specific differences among groups.

Error Bars
  • Error Bars: Graphical representations of the variability of data and indicate the uncertainty around sample estimates.

    • Typically represent the standard error or confidence intervals of the mean.

    • Helpful for visualizing the overlap of datasets; less overlap suggests greater differences in means across groups.