chp1
Chapter 1: Introduction to Data
OpenIntro Statistics, 4th Edition
Developed by Mine Çetinkaya-Rundel
Licensed under CC BY-SA for copying and sharing
Educational images included under fair use guidelines
Case Study
Treating Chronic Fatigue Syndrome
Objective: Evaluate cognitive-behavior therapy (CBT) effectiveness for chronic fatigue syndrome (CFS).
Participant Pool: 142 patients recruited via primary care referrals and consultants to a specialized CFS hospital clinic.
Actual Participants: Only 60 of the 142 individuals met criteria to enter the study:
Exclusions due to non-diagnosis, other health issues, or refusal.
Reference: Deale et al. (1997) "Cognitive behavior therapy for chronic fatigue syndrome: A randomized controlled trial."
Study Design
Patients divided into treatment/control groups, each with 30 participants:
Treatment Group: Engaged in Cognitive Behavior Therapy (CBT) emphasizing collaboration and education without exacerbating symptoms.
Control Group: Received relaxation techniques including progressive muscle relaxation and visualization, without suggestions for increasing activity levels.
Results
Participant Dropout: 7 participants (3 from treatment, 4 from control); results detail participant outcomes:
Good Outcomes at 6-month follow-up:
Treatment Group: 19 yes | 8 no = 27 total
Control Group: 5 yes | 21 no = 26 total
Overall: 24 good outcomes | 29 poor outcomes = 53 total
Proportions:
Treatment Group: 19/27 ≈ 70% had good outcomes.
Control Group: 5/26 ≈ 19% had good outcomes.
Understanding the Results
Question: Do data reveal a "real" difference between treatment groups?
Example: Coin flip scenario illustrates natural variation in data collection processes.
The observed difference (70% vs 19%) may be real or due to chance.
Significant difference suggests a higher likelihood of a true effect rather than random chance.
Statistical tools required to confirm observed differences aren't due to chance.
Generalizing the Results
Question: Are the findings generalizable to all CFS patients?
Observations made on a group that may not represent all characteristics of CFS patients.
Results indicate efficacy for a certain subset, but broader applicability uncertain.
Initial study promising for similar patients.
Data Basics
Classroom Survey
Conducted with introductory statistics course participants. Key variables:
Gender: What is your gender?
Intro/Extra: Do you consider yourself introverted or extraverted?
Sleep: Average hours of sleep per night?
Bedtime: What time do you usually go to bed?
Countries: Number of countries visited?
Dread: Scale of 1-5 for discomfort being in the class.
Data Matrix
Collected data on student responses indicating relationships among gender, sleep, and other variables collected from students.
Types of Variables
Categorical: Variables representing categories
Gender: Categorical
Numerical: Continuous vs. discrete
Sleep: Continuous numerical
Countries: Discrete numerical
Dread: Ordinal categorical, could also function as numerical.
Relationships Among Variables
Exploration of relationships between student GPA and study hours. Key observations:
Scatterplot analysis provides visual data correlation.
Anomalous data points, such as a GPA greater than 4.0, suggest data errors.
Explanatory and Response Variables
Identifying: Understand which variable is believed to affect the other.
Causation vs. Correlation: Recognition that association does not imply direct causation between variables.
Types of Data Collection
Observational Studies: Data collection without interference, useful for establishing associations.
Experiments: Random assignments conducted to establish causal connections between variables.
Causation vs. Correlation
Variables can be associated independent of causation.
Sampling Principles and Strategies
Population vs. Sample: Understand the difference and implications of each.
Census: Includes entire population, challenges include reachability issues and population changes.
Sampling Bias
Non-response or voluntary response can compromise sample representativeness.
Biases must be addressed in surveys to maintain data integrity.
Observational vs. Experimental Design
Distinction in approach: observational studies collect data without interference, whereas experiments involve manipulation.
Importance of randomization in experiments for valid conclusions.