Statistical Reasoning Lecture 1

Introduction

  • Presenter: John McGready, Ph.D.

  • Institution: Johns Hopkins University, Bloomberg School of Public Health

Need for Biostatistics

  • Importance of biostatistics highlighted by several prominent figures:

    • Hal Varian, Chief Economist at Google, stated in 2009:
      > “I keep saying that the sexy job in the next 10 years will be statisticians.”

    • Harvard Business Review (2012) dubbed Data Scientist as the Sexiest Job of the 21st Century.

    • New York Times (2009) emphasized statistics as a crucial skill for graduates.

Employment Trends in Statistics

  • Forbes (2019) ranked Data Scientist as the leading job in America according to Glassdoor.

  • Money magazine listed the top 100 jobs in 2021, emphasizing strong demand for statistics-related roles.

Steps in a Research Project

  • Major steps include:

    • Planning/Design of Study

    • Data Collection

    • Data Analysis

    • Presentation

    • Interpretation

  • Statistics play a role in multiple steps, often concentrated in the data analysis phase.

The Ubiquity of Data

  • Data sources:

    • Elmo and Apple study: Children picked twice as many apples with Elmo stickers (Cornell University study on children’s preferences).

    • STD testing in DC High Schools: Noted that 13% of ~3,000 tested students were positive for STDs, mostly gonorrhea and chlamydia.

    • Web-based counseling reduces blood pressure: Participants in a web-based lifestyle counseling group had a larger reduction in systolic blood pressure (10 mmHg) compared to control (6 mmHg).

    • Vaccine Efficacy: A vaccine with 95% efficacy doesn't imply a 5% failure rate; complex statistical interpretations needed.

Role of Statistics in Research

  • Components of the research process include:

    • Planning/Design:

    • Identify primary questions:

      • Is it about quantifying a single group?

      • Is it comparing multiple groups?

    • Determine sample size:

      • Total subjects?

      • Distribution across groups?

    • Selecting participants:

      • Random selection versus convenience sampling.

      • Assignment group decisions for comparisons.

    • Data Collection and Analysis:

    • Summarization of raw data.

    • Address variability obscuring patterns.

    • Inference: Utilizing study info to make population statements.

    • Presentation and Interpretation:

    • What measures convey main messages effectively.

    • Clarifying uncertainty and deriving practical meaning from results.

Course Goals

  • Overview of Skills:

    • Term 1 Goals:

    • Summarization

    • Measurement of Associations

    • Interval Estimation and Statistical Inference

    • Sample Size Considerations in Study Design

    • Term 2 Goals:

    • Adjustment Techniques

    • Assessing Effect Modification (Statistical Interactions)

    • Understanding various regression techniques: linear, logistic, and time-to-event.

Universal Goals in Statistics

  • Focus on:

    • Correct interpretation of statistical results.

    • Summarizing published study results clearly.

    • Evaluating strengths and weaknesses in published research regarding:

    • Study design clarity

    • Research questions

    • Appropriateness of statistical methods

    • Clarity of results reported

    • Overall scientific conclusions.

Defining Populations and Samples

  • Population: Entire group for which data is sought.

    • Example: All 18-year-old male college students in the U.S.

  • Sample: Subset of the population used for data collection.

    • Example: 25 18-year-old male college students in the U.S.

  • Characteristics of a random sample should ideally reflect the overall population, although this alignment is not always achievable.

Random Sampling

  • Optimal representative sampling method, though not always feasible.

  • Defined as a method where each possible subset of a given size (n) has an equal chance of selection.

Comparative Analysis: Population Versus Sample

  • Research focuses on estimating population truths using imperfect sample data.

Examples of Sample vs. Population

  • Pulmonary health research example: 113 men sampled and blood pressure measured.

  • Maternal HIV transmission study: Observed 183 births to HIV+ women, 22% transmission rate obtained.

  • Geographic lung cancer study: Used data from a single year for a selected U.S. state.

Non-Random Sample Types

  • Non-random sampling may introduce biases as certain demographics may not be represented. Examples:

    • Voting behaviors among potential voters (not registered).

    • Specific disease groups like intravenous drug users.

    • Homeless populations.

Implications of Non-Random Sampling

  • Such sampling may not accurately represent population characteristics, potentially skewing findings and interpretations.

Comparison of Study Designs

  • Learning objectives cover descriptions and distinctions between randomized cohort, observational cohort, and case-control designs.

  • Understanding the analytical challenges of unrandomized comparisons.

Common Study Design Types

  1. Prospective Cohort Studies:

    • Randomized and controlled design where subjects are classified based on exposure status for follow-up comparisons.

  2. Observational Cohort Studies:

    • Subjects selected based on exposure, followed to see outcomes.

  3. Case-Control Studies:

    • Subjects selected based on outcome status followed by assessments of prior exposure.

Importance of Randomization in Experiments

  • Guarantees systematic similarities aside from exposure, mitigating biases.

    • Landmark study: The Salk Polio Vaccine trial involved over 200,000 subjects—results adjusted for accuracy.

Randomization Limitations

  • Not always possible in practical scenarios, particularly with sensitive population discussions (like smokers) when health risks are concerned.

Analyzing Observational Studies

  • Subject self-selection in exposures adds biases, complicating clear conclusions.

    • Example: Correlation between smoking and alcohol use affects outcome assessment.

Example of Observational Cohort Study: Needle Exchange Programs

  • Relates relative risk of HIV infection to program participation while adjusting for demographic variances.

Example of HPV Vaccination Study

  • Gender-based outcomes from vaccination studied, findings adjusted for health-seeking behaviors and demographics.

Case-Control Studies

  • Alternative to cohort studies for analyzing rare outcomes efficiently, such as associations of exposure to lung cancer.

Challenges with Case-Control Studies

  • Confounding factors and recall bias impact reliability, emphasizing the importance of controlling analyses for potential distortions.

Summary of Study Type Differences

  • Addressing issues with non-randomized studies across types—implications for public health research.

Types of Data in Research

  • Learning objectives focus on categorizing data types effectively.

    • Continuous Data: Measurement types that can take on an infinite number of values (e.g., blood pressure).

    • Binary Data: Takes two values (e.g., yes/no).

    • Categorical Data: Extends binary to more values, further split into nominal and ordinal categories.

    • Time-to-Event Data: Captures the timing of an event and its occurrence (e.g., time to relapse).

Data Analysis Considerations

  • Use appropriate tools for different data types for robust statistical assessments.

  • Various comparison techniques are employed based on data format:

    • Continuous: Utilize mean differences and statistical tests like t-tests.

    • Binary/Categorical: Comparison using proportion differences and chi-squared tests.

    • Time-to-Event: Incidence rate ratios and Kaplan-Meier curves for survival analysis.

Summary of Analyzed Data Types

  • Three key types in exploration:

    • Continuous, Binary/Categorical, Time-to-Event

    • Different methodologies critical for summarization and analysis.