Study Notes for QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences

Course Information

  • Instructor: Professor Zhengye Zhou
  • Course Title: QBIO 305 Statistics for the Life Sciences
  • Syllabus and Resources:
    • Available on Brightspace
    • Contains Announcements, Assignments, and Lecture Slides

Instructor Office Hours

  • Location: KAP 470 A
  • Time: Thursdays, 2:30 PM – 3:30 PM, or by appointment on Zoom

Required Textbook

  • Title: Statistics for the Life Sciences
  • Authors: Samuels, Witmer, and Schaffner
  • Edition: 5th
  • Availability: Accessible on Brightspace

Course Objectives

  • Statistical Software: Introduction to R, a free statistical software for data visualization and analysis.
Grading Breakdown
  • Homeworks: (10%)
    • Assigned from textbook, due every Thursday by 11:59 PM
    • The lowest two homework scores are dropped
    • Late homework: receives 50% credit if late by a few minutes/hours; no credit after one week
  • R Projects: (10%)
    • Group work (4-5 students)
    • Submit on Gradescope as a group
  • Exams:
    • Two Midterm Exams: (20% each)
    • Cumulative Final Exam: (40%)
    • All exams must be taken in person unless a serious documented excuse is provided
    • Calculator required (not a cell phone).

Introduction to Statistics

  • Definition: Statistics involves collecting, organizing, analyzing, interpreting, and presenting data.
  • Purpose in Life Sciences: Understand variability in data from diverse research settings (clinic, lab, field).
    • Importance of distinguishing between “signal” (important information) and “noise” (random variation).
Key Concepts
  • Variability: Describes how spread out or clustered a set of data is; no variability exists if there’s a single result.

Learning Outcomes

  • Evaluate data for strong evidence and trustworthiness.
  • Determine sample size for reliable patterns.
  • Apply statistical techniques for data evaluation and proper interpretation.
  • Design effective experiments and analyze data exhaustively while avoiding overinterpretation.

Types of Evidence in Research

  • Anecdote: A small, personal story. Example: “My uncle Roy smoked and died of lung cancer at 48.”
  • Observational Study: Collecting data by observing without manipulation.

Statistics Examples

Tobacco Use Trends
  • Figure: Current cigarette smoking among adults aged 25+, categorized by education level from 2009-2019.
    • Notable decreases in smoking rates associated with higher education levels.
Ice Cream and Murder Rate Correlation
  • Observations show that higher ice cream sales correlate with higher murder rates in cities.
  • Does not imply causation; illustrates correlation does not equal causation.

Confounding Variables

  • Definition: Factors that obscure the true relationship between independent and dependent variables.
  • Example: Weather affecting both ice cream sales and crime rates.

Case Studies in Treatment Effectiveness

Kidney Stone Treatments
  • Comparison between Treatment A and Treatment B for small and large stone sizes, presented with success rates.
  • Discussion on Simpson’s Paradox, showing how confounding variables can lead to misleading interpretations of treatment efficacy.

Types of Study Designs

Case-Control Studies
  • Definition: Observational studies comparing subjects with the outcome of interest (cases) to those without (controls).
  • Key Characteristics: No randomization or assigned treatments, cannot establish cause-and-effect.
Historical Controls
  • Definition: Comparison between current treatment outcomes and past patient data (not concurrent controls).
  • Characteristics:
    • Risks include changes in population characteristics or medical standards over time.
Randomized Controlled Studies (RCT)
  • Definition: Experiments randomized to treatment or control groups to establish cause-and-effect relationships.
  • Example: Testing a blood pressure medication with random assignment.
Longitudinal (Cohort) Studies
  • Definition: Involves repeated observations of variables over long periods, enabling incidence estimations.
  • Famous studies: Nurses’ Health Study and Framingham Heart Study.

Sampling Techniques

Populations and Samples
  • Population: Entire group of interest.
  • Sample: Subset of the population used for data collection.
  • Importance of using samples to infer characteristics about the larger population.
Sample Representativeness
  • Representative Sample: Reflects population characteristics accurately.
  • Biased Sample: Systematically overestimates or underestimates population features.
Random Sampling Techniques
  • Methodology for obtaining a Simple Random Sample (SRS):
    1. Create a sampling frame with unique ID numbers.
    2. Generate random numbers and select corresponding population members.
  • Importance of SRS for minimizing bias in research.
Alternative Sampling Methods
  • Random Cluster Sampling and Stratified Random Sampling for specific populations to improve sampling accuracy.

Sampling Errors

Sampling Error
  • Definition: Variability between different samples drawn from the same population; expected in random sampling.
Non-sampling Error
  • Definition: Mistakes unrelated to randomness, such as systematic biases in the collection process.
    • Examples: Selection bias, question wording effects, nonresponse bias.

Challenges in Sampling Hard-to-Reach Populations

  • Example challenges include measuring illegal activities or transient populations.

Conclusion

  • Importance of understanding sample collection and design in statistical analyses.
  • Recognizing how observational studies can introduce confounding variables.
  • A clear grasp on statistics relies on accurate counts and definitions, continuous assessment of methodologies, and transparency in sampling strategies.