chp1

Chapter 1: Introduction to Data

  • OpenIntro Statistics, 4th Edition

  • Developed by Mine Çetinkaya-Rundel

  • Licensed under CC BY-SA for copying and sharing

  • Educational images included under fair use guidelines

Case Study

Treating Chronic Fatigue Syndrome

  • Objective: Evaluate cognitive-behavior therapy (CBT) effectiveness for chronic fatigue syndrome (CFS).

  • Participant Pool: 142 patients recruited via primary care referrals and consultants to a specialized CFS hospital clinic.

  • Actual Participants: Only 60 of the 142 individuals met criteria to enter the study:

    • Exclusions due to non-diagnosis, other health issues, or refusal.

    • Reference: Deale et al. (1997) "Cognitive behavior therapy for chronic fatigue syndrome: A randomized controlled trial."

Study Design

  • Patients divided into treatment/control groups, each with 30 participants:

    • Treatment Group: Engaged in Cognitive Behavior Therapy (CBT) emphasizing collaboration and education without exacerbating symptoms.

    • Control Group: Received relaxation techniques including progressive muscle relaxation and visualization, without suggestions for increasing activity levels.

Results

  • Participant Dropout: 7 participants (3 from treatment, 4 from control); results detail participant outcomes:

    • Good Outcomes at 6-month follow-up:

      • Treatment Group: 19 yes | 8 no = 27 total

      • Control Group: 5 yes | 21 no = 26 total

      • Overall: 24 good outcomes | 29 poor outcomes = 53 total

  • Proportions:

    • Treatment Group: 19/27 ≈ 70% had good outcomes.

    • Control Group: 5/26 ≈ 19% had good outcomes.

Understanding the Results

  • Question: Do data reveal a "real" difference between treatment groups?

  • Example: Coin flip scenario illustrates natural variation in data collection processes.

    • The observed difference (70% vs 19%) may be real or due to chance.

    • Significant difference suggests a higher likelihood of a true effect rather than random chance.

    • Statistical tools required to confirm observed differences aren't due to chance.

Generalizing the Results

  • Question: Are the findings generalizable to all CFS patients?

    • Observations made on a group that may not represent all characteristics of CFS patients.

    • Results indicate efficacy for a certain subset, but broader applicability uncertain.

    • Initial study promising for similar patients.

Data Basics

Classroom Survey

  • Conducted with introductory statistics course participants. Key variables:

    • Gender: What is your gender?

    • Intro/Extra: Do you consider yourself introverted or extraverted?

    • Sleep: Average hours of sleep per night?

    • Bedtime: What time do you usually go to bed?

    • Countries: Number of countries visited?

    • Dread: Scale of 1-5 for discomfort being in the class.

Data Matrix

  • Collected data on student responses indicating relationships among gender, sleep, and other variables collected from students.

Types of Variables

  • Categorical: Variables representing categories

    • Gender: Categorical

  • Numerical: Continuous vs. discrete

    • Sleep: Continuous numerical

    • Countries: Discrete numerical

    • Dread: Ordinal categorical, could also function as numerical.

Relationships Among Variables

  • Exploration of relationships between student GPA and study hours. Key observations:

    • Scatterplot analysis provides visual data correlation.

    • Anomalous data points, such as a GPA greater than 4.0, suggest data errors.

Explanatory and Response Variables

  • Identifying: Understand which variable is believed to affect the other.

  • Causation vs. Correlation: Recognition that association does not imply direct causation between variables.

Types of Data Collection

  • Observational Studies: Data collection without interference, useful for establishing associations.

  • Experiments: Random assignments conducted to establish causal connections between variables.

Causation vs. Correlation

  • Variables can be associated independent of causation.

Sampling Principles and Strategies

  1. Population vs. Sample: Understand the difference and implications of each.

  2. Census: Includes entire population, challenges include reachability issues and population changes.

Sampling Bias

  • Non-response or voluntary response can compromise sample representativeness.

  • Biases must be addressed in surveys to maintain data integrity.

Observational vs. Experimental Design

  • Distinction in approach: observational studies collect data without interference, whereas experiments involve manipulation.

  • Importance of randomization in experiments for valid conclusions.