PH102: Biostatistics and Hypothesis Testing - Introduction and Foundations

PH102 Welcome & Introductions
Instructor: Brinda K. Rana, Ph.D.
Miscellaneous Slide Content (Page 3):
- Revelle College at UC San Diego with mottos TRUTH PURPOSE VISION.
- An AccuWeather forecast showing extreme heat for Wednesday afternoon in various cities, with temperatures ranging from $100$ to $115$ degrees Fahrenheit ( $^ ext{o} ext{F}$ ).
- Examples of mathematical expressions shown: $E=mc=\pi n^2$ , $\sqrt{18.72}$ , $MC^2=CTI$ , $yx-x+2=2$ , $x+7x$ , $-2=2$ , $172x$ ( $\implies$ maybe?), $3tx$ , $IX-Y$ .
- References to the University of Michigan, University of Illinois, and The University of Texas MD Anderson Cancer Center.
Instructor's Research Focus:
- The interactions of genes and the environment in age-related disease.
- Affiliated with Moores Cancer Center and Stein Institute for Research in Aging.
Instructional Team:
- Graduate TAs (PhD Students in Biostats): Beiqin Ye & Amanda Li, Ishani Shah, Pruthva Mania.
- A humorous cartoon suggests a past concern about not having TAs.
PH102: Introduction to Biostatistics and Hypothesis Testing
What is Biostatistics? (Chapter 1):
- The field dedicated to the study design, organization of data collection, treatment of data, and the analysis and interpretation of data derived from biological, biomedical, and health-related studies.
Why Biostatistics is Essential:
- Understanding Research Studies: It enables us to interpret research results.
 - Qualitatively: E.g., Is Intervention A better than Intervention B?
 - Quantitatively: E.g., What is the exact effect of Amazon's acquisition of Whole Foods on the US food retail industry? What is the exact effect of legalizing marijuana on marijuana use and the risk of certain diseases?
- Beyond Research: Even if not pursuing a research career, biostatistics is crucial because public health and medicine are evidence-driven, and treatment evidence comes from research studies.
- Real-world Applications:
 - Public Health Officials & Hospital Administrators: To evaluate and compare treatment success rates between hospitals to improve health outcomes for the community.
 - Example Scenario: Hospital A has a $50\%$ success rate ( $50$ successes out of $100$ total), while Hospital B has a $68\%$ success rate ( $68$ successes out of $100$ total). Initially, Hospital B appears better.
 - Selection Bias / Simpson's Paradox: This occurs when the overall data shows one trend, but subsets of the data show opposite or no trends.
 - Illustrative Example with Disease Severity: When categorized by disease severity:
 - Less Severe Cases:
 Hospital A: $18$ successes, $2$ failures out of $20$ total (a $90\%$ success rate).
 Hospital B: $64$ successes, $16$ failures out of $80$ total (an $80\%$ success rate).
 - More Severe Cases:
 Hospital A: $32$ successes, $48$ failures out of $80$ total (a $40\%$ success rate).
 Hospital B: $4$ successes, $16$ failures out of $20$ total (a $20\%$ success rate).
 - In both less severe and more severe categories, Hospital A has a higher success rate, reversing the initial conclusion. This highlights the importance of appropriate data collection to test hypotheses accurately.
 - Clinicians Explaining Test Results: Understanding reference ranges for diagnostic tests, such as lipid panels:
 - Example Lipid Panel Results:
 - Cholesterol, Total: $204$ mg/dL (High, reference <199)
 - Triglycerides: $100$ mg/dL ( $\le 149$ )
 - HDL cholesterol: $59$ mg/dL ( $\ge 39$ , Normal range $40-59$ , Major risk <40, Negative risk >60)
 - VLDL Cholesterol Cal: $12$ mg/dL
 - LDL cholesterol Calc: $120$ mg/dL (Near optimal, Optimal <100, Borderline high $130-159$ , High $160-189$ , Very high >190)
 - Osteoporosis Screening: Explaining diagnoses based on t-scores and z-scores.
 - Genetic Counseling: Advising couples on the likelihood of their future child inheriting a disease found in both families.
 - BSPH Capstone Poster Showcase: Students will use PH102 concepts to:
 - Propose a testable hypothesis.
 - Develop a statistical approach for data collection and hypothesis testing.
 - Determine sample size (number of participants required).
 - Collect and analyze data using learned approaches.
 - Interpret results and present a poster.
The Scientific Method (as an Ongoing Process):
1. Make Observations: What is seen in nature (personal experience, thoughts, reading).
2. Think of Interesting Questions: Why does that pattern occur?
3. Formulate Hypotheses: What are the general causes of the phenomenon?
4. Develop Testable Predictions: If the hypothesis is correct, then specific outcomes ( $a, b, c$ …) are expected.
5. Gather Data to Test Predictions: From literature, new observations, or formal experiments. Requires replication for verification.
6. Refine, Alter, Expand, or Reject Hypotheses.
7. Develop General Theories: Must be consistent with available data and other current theories.
What is a Hypothesis?
- A statement predicting the outcome of a study.
Population vs. Sample:
- Population: The entire group about which conclusions are to be drawn.
- Sample: A subset obtained from the population to test hypotheses, as measuring the entire population is generally impossible. Inferences about the population are made based on the sample.
Random Sample (Chapter 2.1):
- Each population member has an equal probability of being selected into a sample.
- The selection of one individual does not influence the likelihood of another being selected.
Types of Studies (Chapter 2):
- Surveys: Attempt to collect information from a population (e.g., U.S. Census), but more often collected from a sample to infer population characteristics.
- Comparative Studies:
  - Observational: No intervention, only measurement.
  - Experimental: Evaluate the effect of an intervention.
Hypothesis Test:
- A statistical inference about a population based on a sample.
- If measurements could be made in the entire population, a sample and hypothesis test would be unnecessary.
Null ( $H0$ ) vs. Alternative ( $HA$ ) Hypotheses:
- Scientific Hypothesis ( $H_A$ ): The researcher's prediction.
- Null Hypothesis ( $H_0$ ): The "opposite statement" of the scientific hypothesis; this is what is tested statistically.
- Can a null hypothesis be absolutely proven true? No. We can only make an inference based on sample observations, and there's always a chance of error (Type I and II errors).
Null Hypothesis Example 1: Happiness and Income:
- Scientific Hypothesis ( $H_A$ ): Happiness is related to income.
- Null Hypothesis ( $H_0$ ): There is no relationship between happiness and income.
- Scientific Method Outline:
  - Make observations about happiness and money.
  - Ask interesting questions.
  - Develop testable predictions.
  - Gather happiness and income data on study participants.
  - Conduct hypothesis test.
  - Infer about the population based on the observed relationship and develop theories.
- Mathematical Hypotheses (using correlation, $ho$ ):
  - $H_0: \rho = 0$
  - $H_A: \rho \neq 0$
- Statistical Test Question: "How likely are we to observe a correlation (relationship) between income and happiness that is different than $0$ purely by chance if the null hypothesis were true?" The answer helps decide if the scientific hypothesis is plausible.
Null Hypothesis Example 2: COVID Stay-at-Home Order and Obesity:
- Groups:
 - Group 1: Individuals confined at home from March 2020-March 2021.
 - Group 2: Individuals not confined at home during the same period.
- Scientific Hypothesis: The COVID-related stay-at-home order was a risk factor for obesity.
- Null Hypothesis ( $H0$ ) (comparing mean change in BMI, $\mu$ ): Mean change in BMI for Group 1 = mean change in BMI for Group 2 ( $H0: \mu1 = \mu2$ ).
Timing of Biostatistics in Research:
- Biostatistics should begin while formulating the hypotheses (or after formulating the hypotheses, as it guides their refinement and study design).
- It is too late after data is collected because biostatistics guides how to collect data (what questions to ask, sample size, study type).
- BSPH Capstone Project Advice: Plan statistical tests BEFORE data collection, have clear null and alternative hypotheses, and calculate sample size beforehand.
Summary of Conducting a Hypothesis-Based Research Project:
1. STEP ONE: Establish a Null Hypothesis ( $H_0$ ) about a population characteristic (mean, correlation, proportion).
2. STEP TWO: Measure the characteristic in a random sample. The type of sampling is determined by the statistical approaches to be used.
3. STEP THREE: Conduct the Hypothesis Test. The results indicate the likelihood of observing the random sample if the Null Hypothesis were true, leading to inferences about the population.
Understanding Probability for Likelihood Measurement:
- Probability ranges from $0$ (impossible) to $1$ (certain).
- Warm-up Exercise: Drawing a card from a standard $52$ -card deck:
  - Probability of drawing a club ( $\clubsuit$ ) card: $13/52 = 1/4 = 0.25 = 25\%$ .
  - Probability of drawing a King (K) card: $4/52 = 1/13 \approx 0.077 = 7.7\%$ .
  - Probability of drawing a K or a $2$ card (disjoint events): $4/52 + 4/52 = 8/52 = 2/13 \approx 0.154 = 15.4\%$ .
  - Probability of NOT drawing a K or a $2$ card (complement): $1 - (8/52) = 1 - 2/13 = 11/13 \approx 0.846 = 84.6\%$ .
  - Probability of drawing a K or a club (overlapping events, K $\clubsuit$ is counted twice): $4/52 + 13/52 - 1/52 = 16/52 = 4/13 \approx 0.308 = 30.8\%$ .
  - Probability of NOT drawing a K or a club (complement): $1 - (16/52) = 36/52 = 9/13 \approx 0.692 = 69.2\%$ .
Rules of Probability (Formalized Intuition):
1. For any event A, $0 \le \text{Pr(A)} \le 1$ .
2. Null and Full Events:
  - If A always occurs, $\text{Pr(A)} = 1$ .
  - If A never occurs, $\text{Pr(A)} = 0$ .
3. Complement: For any event A, $\text{Pr(not A)} = \text{Pr(A}^c\text{)} = 1 - \text{Pr(A)}$ .
  - Example: Probability of NOT drawing a K card is $1 - 4/52 = 48/52 = 12/13$ .
4. Disjoint Events A and B: (Cannot occur simultaneously)
  - $\text{Pr(A and B)} = 0$
  - $\text{Pr(A or B)} = \text{Pr(A)} + \text{Pr(B)}$
  - Example: Probability of drawing a K or a $2$ card is $4/52 + 4/52 = 8/52$ .
5. For any two events A and B (may overlap):
  - $\text{Pr(A or B)} = \text{Pr(A)} + \text{Pr(B)} - \text{Pr(A and B)}$
  - Example: Probability of drawing a K or a club is $4/52 + 13/52 - 1/52 = 16/52$ .
To Do Before the Next Lecture:
- Install SPSS on your computer using Canvas instructions.
- Open "Hypertension.sav" SPSS data file and familiarize yourself with the data.
- Read Chapters 1-4 in the Textbook.
- Sign up for a homework partner on Canvas.
- Review Probability Warm-Up exercises.
Summarizing Categorical Variables (Context: UC San Diego Student-Run Free Clinic Project):
- Hypertension at SDSRFC Study:
  - $N = 409$ subjects diagnosed and treated for hypertension.
  - Goals: Evaluate treatment success for hypertension, determine factors associated with success.
  - Outcome: Treatment success measured by reduction in systolic blood pressure (SBP) and maintaining SBP in normal range.
  - Variables: Patient ID (patid), age, gender, etc.
Introduction to SPSS:
- Install SPSS, access datasets from Canvas, open "Hypertension.sav".
- Follow along with the pre-recorded video in Module Week One > Lecture ONE.
Breakout Room Question: Describe key features of a normal distribution (without looking up the answer).