chp1
Chapter 1: Introduction to Data
Course material modified from slides developed by Mine C¸ etinkaya-Rundel of OpenIntro.
Slides can be copied, edited, and shared under the CC BY-SA license.
Certain images are included under fair use for educational purposes.
Case Study
Treating Chronic Fatigue Syndrome
Objective: Evaluate the effectiveness of cognitive-behavior therapy for chronic fatigue syndrome.
Participant Pool: 142 patients recruited from referrals by primary care and consultants.
Actual Participants: Only 60 patients entered the study; exclusions included not meeting diagnostic criteria, other health issues, and refusals.
Reference: Deale et. al. Cognitive behavior therapy for chronic fatigue syndrome: A randomized controlled trial. The American Journal of Psychiatry 154.3 (1997).
Study Design
Patients randomly assigned to:
Treatment Group: Cognitive behavior therapy focusing on collaboration, education, and behavior change to safely increase activity.
Control Group: Relaxation techniques without advice to increase activity (e.g., muscle relaxation, visualization).
Results
Follow-up Outcomes: 7 patients dropped out (3 from treatment, 4 from control).
Good Outcome Distribution:
Treatment: 19 Yes, 8 No (Total 27)
Control: 5 Yes, 21 No (Total 26)
Proportions with Good Outcomes:
Treatment: 19/27 ≈ 70%
Control: 5/26 ≈ 19%
Understanding the Results
Real Difference Evaluation:
Example of coin flips: expected natural variation.
Difference between groups (70% - 19% = 51%) may be real or due to variation.
Statistical tools needed to validate the difference as beyond chance.
Generalizing the Results
Generalizability Concern:
Participants had specific characteristics leading to potential bias.
Results cannot be generalized universally yet are promising for a specific subgroup.
Data Basics
Classroom Survey
A survey on statistics students included questions on:
Gender
Introversion/Extraversion
Average sleep hours
Bedtime
Number of countries visited
Dread level (on a 1-5 scale)
Data Matrix
Example data collected from students on various demographic and psychological variables.
Types of Variables
Numerical: Continuous (e.g., hours of sleep) & Discrete.
Categorical: Regular & Ordinal (e.g., gender, dread scale).
Practice Question
Example of variable categorization:Is a telephone area code numerical or categorical?
Relationships Among Variables
Correlation & Data Points:
Examines relationship between GPA and study hours.
Notably, a GPA > 4.0 is an anomaly.
Explanatory and Response Variables
Variable Classification:
Identifying which variable affects the other.
Caution Against Assumptions:
Correlation does not imply causation.
Types of Data Collection
Observational Studies:
Data collected without interference.
Cannot establish causality but can identify associations.
Experiments:
Random assignment of subjects to treatments to establish causal relations.
Association vs. Causation
Clarifies difference between associations (dependent variables) and independence.
Experimental Design Principles
Control Variables: Mitigate other effects.
Randomization: Necessary for assigning treatments and sampling.
Replication: Gather large enough samples for validity.
Blocking: Group subjects by known variables prior to random assignment.
Experimental Design Example
Testing energy gels on runners by blocking for pro/amateur status for better control of results.
Additional Experimental Terms
Placebo: Fake treatment for control groups.
Placebo Effect: Improvement due to belief in treatment.
Blinding: Participants unaware of their group assignments (double-blinding includes researchers).
Practice Question: Observational vs. Experimental Studies
Key differentiation based on random assignment for experiments.
Conclusion
Each section highlights crucial aspects of data analysis, variable classification, experiment design, and the importance of well-structured studies in drawing valid conclusions.
what is the process of statistical investigation? Identify a question or problem, collect relevant data, analyze the data, form a conclusion.