Study Notes for QBIO 305 Statistics for the Life Sciences
QBIO 305 Statistics for the Life Sciences
- Instructor: Professor Zhengye Zhou
- Course Title: QBIO 305 Statistics for the Life Sciences
- Syllabus and Resources:
- Available on Brightspace
- Contains Announcements, Assignments, and Lecture Slides
Instructor Office Hours
- Location: KAP 470 A
- Time: Thursdays, 2:30 PM – 3:30 PM, or by appointment on Zoom
Required Textbook
- Title: Statistics for the Life Sciences
- Authors: Samuels, Witmer, and Schaffner
- Edition: 5th
- Availability: Accessible on Brightspace
Course Objectives
- Statistical Software: Introduction to R, a free statistical software for data visualization and analysis.
Grading Breakdown
- Homeworks: (10%)
- Assigned from textbook, due every Thursday by 11:59 PM
- The lowest two homework scores are dropped
- Late homework: receives 50% credit if late by a few minutes/hours; no credit after one week
- R Projects: (10%)
- Group work (4-5 students)
- Submit on Gradescope as a group
- Exams:
- Two Midterm Exams: (20% each)
- Cumulative Final Exam: (40%)
- All exams must be taken in person unless a serious documented excuse is provided
- Calculator required (not a cell phone).
Introduction to Statistics
- Definition: Statistics involves collecting, organizing, analyzing, interpreting, and presenting data.
- Purpose in Life Sciences: Understand variability in data from diverse research settings (clinic, lab, field).
- Importance of distinguishing between “signal” (important information) and “noise” (random variation).
Key Concepts
- Variability: Describes how spread out or clustered a set of data is; no variability exists if there’s a single result.
Learning Outcomes
- Evaluate data for strong evidence and trustworthiness.
- Determine sample size for reliable patterns.
- Apply statistical techniques for data evaluation and proper interpretation.
- Design effective experiments and analyze data exhaustively while avoiding overinterpretation.
Types of Evidence in Research
- Anecdote: A small, personal story. Example: “My uncle Roy smoked and died of lung cancer at 48.”
- Observational Study: Collecting data by observing without manipulation.
Statistics Examples
Tobacco Use Trends
- Figure: Current cigarette smoking among adults aged 25+, categorized by education level from 2009-2019.
- Notable decreases in smoking rates associated with higher education levels.
Ice Cream and Murder Rate Correlation
- Observations show that higher ice cream sales correlate with higher murder rates in cities.
- Does not imply causation; illustrates correlation does not equal causation.
Confounding Variables
- Definition: Factors that obscure the true relationship between independent and dependent variables.
- Example: Weather affecting both ice cream sales and crime rates.
Case Studies in Treatment Effectiveness
Kidney Stone Treatments
- Comparison between Treatment A and Treatment B for small and large stone sizes, presented with success rates.
- Discussion on Simpson’s Paradox, showing how confounding variables can lead to misleading interpretations of treatment efficacy.
Types of Study Designs
Case-Control Studies
- Definition: Observational studies comparing subjects with the outcome of interest (cases) to those without (controls).
- Key Characteristics: No randomization or assigned treatments, cannot establish cause-and-effect.
Historical Controls
- Definition: Comparison between current treatment outcomes and past patient data (not concurrent controls).
- Characteristics:
- Risks include changes in population characteristics or medical standards over time.
Randomized Controlled Studies (RCT)
- Definition: Experiments randomized to treatment or control groups to establish cause-and-effect relationships.
- Example: Testing a blood pressure medication with random assignment.
Longitudinal (Cohort) Studies
- Definition: Involves repeated observations of variables over long periods, enabling incidence estimations.
- Famous studies: Nurses’ Health Study and Framingham Heart Study.
Sampling Techniques
Populations and Samples
- Population: Entire group of interest.
- Sample: Subset of the population used for data collection.
- Importance of using samples to infer characteristics about the larger population.
Sample Representativeness
- Representative Sample: Reflects population characteristics accurately.
- Biased Sample: Systematically overestimates or underestimates population features.
Random Sampling Techniques
- Methodology for obtaining a Simple Random Sample (SRS):
- Create a sampling frame with unique ID numbers.
- Generate random numbers and select corresponding population members.
- Importance of SRS for minimizing bias in research.
Alternative Sampling Methods
- Random Cluster Sampling and Stratified Random Sampling for specific populations to improve sampling accuracy.
Sampling Errors
Sampling Error
- Definition: Variability between different samples drawn from the same population; expected in random sampling.
Non-sampling Error
- Definition: Mistakes unrelated to randomness, such as systematic biases in the collection process.
- Examples: Selection bias, question wording effects, nonresponse bias.
Challenges in Sampling Hard-to-Reach Populations
- Example challenges include measuring illegal activities or transient populations.
Conclusion
- Importance of understanding sample collection and design in statistical analyses.
- Recognizing how observational studies can introduce confounding variables.
- A clear grasp on statistics relies on accurate counts and definitions, continuous assessment of methodologies, and transparency in sampling strategies.