Study Notes on Evaluating Statistical Validity in Experiments
Evaluating Statistical Validity of Experiments
Introduction
- Exploration of how the statistical validity of an experiment is assessed in research methods, particularly within psychology.
Claims and Study Types
- Types of Claims:
- Frequency: Information regarding how often something occurs.
- Association: Relates two variables, but does not imply causation.
- Causal Evidence: Demonstrates that one variable directly affects another.
- Types of Studies:
- Observational Study: Researcher observes subjects without intervention to determine relationships.
- Poll: Gathers opinions from a sample of participants.
- Experiment: Involves manipulation of one variable to determine its effect on another.
- Quasi-Experiment: Similar to an experiment, but lacks random assignment to groups.
- Correlational Study: Investigates the relationship between two or more variables but does not imply causation.
Evaluating Validities
- Four Validities to Evaluate:
- Internal Validity: The degree to which a study accurately establishes causation.
- Construct Validity: Whether the measures used actually measure the concepts they purport to measure.
- External Validity: The extent to which results can be generalized to larger populations.
- Statistical Validity: The appropriateness of the statistical conclusions drawn from a study.
Assignment Example: Inferential Statistics
- Study Findings:
- Recruitment Perception:
- Listening to pitches resulted in higher intellect ratings (Mean = 5.63, SD = 1.61) compared to reading pitches (Mean = 3.65, SD = 1.91).
- Statistical test results: t(37) = 3.53, p < .01, Confidence Interval (CI) = [0.85, 3.13], effect size (d) = 1.16.
- Positive Impressions:
- Candidates appeared more likable when pitches were listened to (Mean = 5.97, SD = 1.92) versus read (Mean = 4.07, SD = 2.23).
- Statistical test results: t(37) = 2.85, p < .01, CI = [0.55, 3.24], d = 0.94.
- Analysis Exercise:
- Q1: Comment on both the direction and size of the effect in the population based on the data provided.
Confidence Intervals
- Definition: A range of likely population values that enhances the precision of our estimates.
- Interpretation of CI Values:
- A wider CI implies less precision, while a more narrow CI suggests greater precision in estimating where the true population parameter lies.
- Example of CIs reflecting results from the transcript groups:
- For transcript group ratings on intellect: [0.85, 3.13]
- Overlaps with zero suggests uncertainty regarding the effect.
Evaluating Statistical Validity in Experiments
- Key Components:
- Point Estimate: Used to identify the effect strength within the sample.
- Descriptive Statistics: Includes metrics such as effect size, central tendency, and variability.
- Precision: Ensures the estimate range likely contains the actual population effect size.
- Inferential Statistics: Contains confidence intervals and considers sample size and variability.
- Replication Importance:
- Ensures reliability of the findings by repeating studies.
- Contributes to meta-analysis for a broader evaluation of validity.
Statistical Significance
- Understanding p-values:
- A p-value indicates the probability of observing the sample data, assuming no real effect exists in the population.
- Interpretation Examples:
- Given a p-value of 0.11:
- Option A: Incorrect. The probability that there IS an effect.
- Option B: Incorrect. The probability that there is NOT an effect.
- Option C: Correct. Assuming no effect, 11% chance of observing the difference.
- Option D: Incorrect. Probability about replicating findings.
Misconceptions About p-values
- Common misconceptions include claims regarding the p-value being a probability of differences or chances of replicating results.
- Refuted Misunderstandings (Goodman, 2008):
- p-values do not represent the probability of an actual difference, nor do they inform about likelihood in repeated samples.
Implications of p-values vs Confidence Intervals
- Statistical vs Real-world Significance:
- A significant p-value does not imply practical significance. A small effect can appear significant with large samples; context and effect size matter.
- P-values provide binary outputs, contrasting the range of likely effect sizes provided by confidence intervals.
- After obtaining results:
- Always consider the potential for replication and implications for theory strength, research questions, and validity reassessments.