Study Notes for QBIO 305 Statistics for the Life Sciences

QBIO 305 Statistics for the Life Sciences

Course Information

Instructor: Professor Zhengye Zhou
Course Title: QBIO 305 Statistics for the Life Sciences
Syllabus and Resources:
- Available on Brightspace
- Contains Announcements, Assignments, and Lecture Slides

Instructor Office Hours

Location: KAP 470 A
Time: Thursdays, 2:30 PM – 3:30 PM, or by appointment on Zoom

Required Textbook

Title: Statistics for the Life Sciences
Authors: Samuels, Witmer, and Schaffner
Edition: 5th
Availability: Accessible on Brightspace

Course Objectives

Statistical Software: Introduction to R, a free statistical software for data visualization and analysis.

Grading Breakdown

Homeworks: (10%)
- Assigned from textbook, due every Thursday by 11:59 PM
- The lowest two homework scores are dropped
- Late homework: receives 50% credit if late by a few minutes/hours; no credit after one week
R Projects: (10%)
- Group work (4-5 students)
- Submit on Gradescope as a group
Exams:
- Two Midterm Exams: (20% each)
- Cumulative Final Exam: (40%)
- All exams must be taken in person unless a serious documented excuse is provided
- Calculator required (not a cell phone).

Introduction to Statistics

Definition: Statistics involves collecting, organizing, analyzing, interpreting, and presenting data.
Purpose in Life Sciences: Understand variability in data from diverse research settings (clinic, lab, field).
- Importance of distinguishing between “signal” (important information) and “noise” (random variation).

Key Concepts

Variability: Describes how spread out or clustered a set of data is; no variability exists if there’s a single result.

Learning Outcomes

Evaluate data for strong evidence and trustworthiness.
Determine sample size for reliable patterns.
Apply statistical techniques for data evaluation and proper interpretation.
Design effective experiments and analyze data exhaustively while avoiding overinterpretation.

Types of Evidence in Research

Anecdote: A small, personal story. Example: “My uncle Roy smoked and died of lung cancer at 48.”
Observational Study: Collecting data by observing without manipulation.

Statistics Examples

Tobacco Use Trends

Figure: Current cigarette smoking among adults aged 25+, categorized by education level from 2009-2019.
- Notable decreases in smoking rates associated with higher education levels.

Ice Cream and Murder Rate Correlation

Observations show that higher ice cream sales correlate with higher murder rates in cities.
Does not imply causation; illustrates correlation does not equal causation.

Confounding Variables

Definition: Factors that obscure the true relationship between independent and dependent variables.
Example: Weather affecting both ice cream sales and crime rates.

Case Studies in Treatment Effectiveness

Kidney Stone Treatments

Comparison between Treatment A and Treatment B for small and large stone sizes, presented with success rates.
Discussion on Simpson’s Paradox, showing how confounding variables can lead to misleading interpretations of treatment efficacy.

Types of Study Designs

Case-Control Studies

Definition: Observational studies comparing subjects with the outcome of interest (cases) to those without (controls).
Key Characteristics: No randomization or assigned treatments, cannot establish cause-and-effect.

Historical Controls

Definition: Comparison between current treatment outcomes and past patient data (not concurrent controls).
Characteristics:
- Risks include changes in population characteristics or medical standards over time.

Randomized Controlled Studies (RCT)

Definition: Experiments randomized to treatment or control groups to establish cause-and-effect relationships.
Example: Testing a blood pressure medication with random assignment.

Longitudinal (Cohort) Studies

Definition: Involves repeated observations of variables over long periods, enabling incidence estimations.
Famous studies: Nurses’ Health Study and Framingham Heart Study.

Sampling Techniques

Populations and Samples

Population: Entire group of interest.
Sample: Subset of the population used for data collection.
Importance of using samples to infer characteristics about the larger population.

Sample Representativeness

Representative Sample: Reflects population characteristics accurately.
Biased Sample: Systematically overestimates or underestimates population features.

Random Sampling Techniques

Methodology for obtaining a Simple Random Sample (SRS):
1. Create a sampling frame with unique ID numbers.
2. Generate random numbers and select corresponding population members.
Importance of SRS for minimizing bias in research.

Alternative Sampling Methods

Random Cluster Sampling and Stratified Random Sampling for specific populations to improve sampling accuracy.

Sampling Errors

Sampling Error

Definition: Variability between different samples drawn from the same population; expected in random sampling.

Non-sampling Error

Definition: Mistakes unrelated to randomness, such as systematic biases in the collection process.
- Examples: Selection bias, question wording effects, nonresponse bias.

Challenges in Sampling Hard-to-Reach Populations

Example challenges include measuring illegal activities or transient populations.

Conclusion

Importance of understanding sample collection and design in statistical analyses.
Recognizing how observational studies can introduce confounding variables.
A clear grasp on statistics relies on accurate counts and definitions, continuous assessment of methodologies, and transparency in sampling strategies.