Lectures 4-10

Lecture Notes - CHS 780 Biostatistics in Public Health

Page 1: Introduction

  • Lecture 4: Distribution

  • Minggen Lu, PhD

  • Date: September 19, 2024


Page 2: Random Variable

  • Definition: A numeric variable that assumes a value based on the outcome of a random experiment.

  • Types of Random Variables:

    • Discrete Random Variable: Takes specific numeric values (often integers).

    • Continuous Random Variable: Can assume any value over an interval or continuum.

  • Examples:

    • Discrete: X = number of heads from 3 coin tosses (Sample space S = {0, 1, 2, 3}).

    • Continuous: X = high temperature in Reno on a summer day (Sample space S = {50 ≤ X ≤ 110}).

  • Notation:

    • Capital letters (X, Y, Z) denote random variables.

    • Lower-case (x, y, z) denote observed values.


Page 3: Probability Distribution

  • Discrete Random Variable X: A function p(x) assigns probabilities for each possible value of X, expressed as p(x) = Pr(X = x).

  • Example: Tossing an unbiased coin three times, with a sample space S = {0, 1, 2, 3}.

    • Probability Distribution:

      • x: 0, p(x): 1/8 (0.125)

      • x: 1, p(x): 3/8 (0.375)

      • x: 2, p(x): 3/8 (0.375)

      • x: 3, p(x): 1/8 (0.125)


Page 4: Application of p(x)

  • Events: Let A = event of obtaining 2 heads, B = event of obtaining at least 2 heads.

    • Compute:

      • Pr(A) = p(2) = 0.375

      • Pr(B) = p(2) + p(3) = 0.5

      • Conditional Probability:

        • Pr(A|B) = Pr(A ∩ B) / Pr(B) = Pr(A) / Pr(B) = 0.375 / 0.5 = 0.75


Page 5: Continuous Probability Distribution

  • Continuous probability distribution of random variable X is represented by an unbroken curve (Density Curve).

  • Characteristics:

    • The area under the curve over an interval represents the probability of X assuming a value in that interval.

    • The total area under the curve equals 1.

    • The probability of X assuming any specific value is zero.


Page 6: Example of Continuous Random Variable

  • Example: Let X be the time spent studying each week by a college student.

  • Sample space for X: S = {0 ≤ X ≤ 50}.

  • The density curve is drawn such that the total area equals 1.


Page 7: Mean and Variance

  • When a random variable is repeatedly measured, it leads to observed values.

  • Mean (µ): The mean of large sets of measurements on X.

  • Variance (σ²): The variance of large sets of measurements on X.

  • Standard Deviation (σ): The square root of variance.


Page 8: Bernoulli Distribution

  • Factorials: n! = n(n-1)(n-2)...1, with 0! = 1.

    • Examples:

      • 3! = 6, 6! = 720, 10! = 3,628,800.

  • Selection:

    • Ordering n objects: n!

    • Choosing x from n (0 ≤ x ≤ n): Binomial Coefficient

      • n choose x: C(n, x) = n!/(x!(n-x)!)


Page 9: Bernoulli Trials

  • Criteria:

    1. Each trial has 2 possible outcomes (success/failure).

    2. Trials are independent.

    3. Probability of success (p) remains constant.

  • Examples:

    1. Tossing a coin 25 times.

    2. Rolling a die 12 times (even/odd outcomes).

    3. Testing 1000 blood samples for HIV status.


Page 10: Binomial Random Variable

  • Let X count the number of successes in n Bernoulli trials.

  • X is a Binomial Random Variable.

  • Probability Distribution:

    • Binomial Distribution denoted as Bin(n, p).

    • Formula: p(x) = Pr(X = x) = C(n, x) p^x (1-p)^(n-x), where x = 0, 1,.., n.


Page 11: Properties of Binomial Random Variable

  • Mean: µ = np.

  • Variance: σ² = np(1-p).


Page 12: Blood Type Example

  • Probability assignments: Blood Types O, A, B, AB:

    • O: 0.45, A: 0.40, B: 0.10, AB: 0.04.

  • Consider a sample of 10 Americans - let X = number with blood type A:

    • Calculate the mean, standard deviation, and probabilities for specific counts (e.g., exactly 4 with blood type A).


Page 13: Psychiatric Disorder Example

  • Error Rate: 20% chance of an adult suffering from a psychiatric disorder.

  • Analyze sample of 12 adults for presence of disorder, calculating probabilities for those within range.

  • Example Questions: a. Probability of 3-6 having a disorder. b. Probability of more than 3 but fewer than 6. c. Probability that at least one has a disorder.


Page 14: Eye Color Example

  • A couple is expecting 6 children with probabilities of eye color: blue (0.25), brown (0.75).

  • Calculate mean, standard deviation for blue-eyed children, and probabilities for various outcomes.

  • Analyze for one child with blue eyes, at least one with blue or brown eyes.


Page 15: Poisson Distribution

  • Used for counting occurrences of events over an interval of time or space.

  • Let X count occurrences—X is a Poisson Random Variable.

  • Mean number of occurrences: λ (lambda).

  • Distribution Formula: p(x) = Pr(X = x) = (e^(-λ) λ^x) / x!, where x = 0, 1, 2, ...


Page 16: Characteristics of Poisson Distribution

  • The probability mass function decreases as the number of occurrences increase.

  • Used to approximate probabilities for binomial distributions when n is large and p is small.


Page 17: Properties of a Poisson Random Variable

  • Mean & Variance: Both are equal to λ.

  • Can approximate binomial probabilities as described in previous pages.


Page 18: Example - Emergency Treatment Center

  • Define number of patients arriving at ETC: mean = 4.5 per day.

  • Probability queries for no, at least one, or 4 or 5 patients arriving.


Page 19: Normal Distribution

  • Normal distribution: most common for continuous variables.

  • Characteristics:

    • Examples include physiological measures (blood pressure, cholesterol).

  • Density Curve Formula: f(x) = (1/(σ√2π)) e^(-(x-µ)²/(2σ²)), where µ = mean, σ = standard deviation.


Page 20: Properties of Normal Distribution

  • Denoted as N(µ, σ²).

  • Standard Normal Distribution: µ = 0 and σ = 1.

  • Visual representation: bell-shaped curve centered on µ.


Page 21: The Empirical Rule

  • About 68% of area lies within one standard deviation (µ ± σ).

  • About 95% of area lies within two standard deviations (µ ± 2σ).

  • About 99.7% lies within three standard deviations (µ ± 3σ).


Page 22: Computing Standard Normal Probabilities

  • Definition of a standard normal variable: Z = (X - µ) / σ.

  • Area under the standard normal curve between points gives probability.


Page 23: Properties of Standard Normal Distribution

  • Basic probability relationships based on standard normal distribution.

  • General formulas for comparisons of Z scores and their implications in hypothesis testing.


Page 24-31: Example and Applications

  • Various practical examples demonstrating how to compute probabilities, perform hypothesis testing, and analyze confidence intervals using normal and other distributions.


Page 32: Statistical Inference

  • Distinction between estimation and hypothesis testing.

  • Estimation methods including point and interval estimates.


Page 33-34: Key Definitions

  • Key statistical terms including populations, samples, statistics, and parameters outlined.

  • Importance of random sampling emphasized.


Page 35-40: Sampling Distribution and Mean

  • Walking through the concepts of sampling distributions, expected values, and standard errors.


Page 41-58: Hypothesis Testing and Power

  • Steps and methods involved in performing hypothesis tests, including Type I and Type II errors.

  • Power analysis and its significance discussed.


Page 59-82: Examples and Applications

  • Illustrative examples reinforcing concepts of hypothesis testing, including confidence interval calculations.


Page 83-137: Two-Sample and Chi-Square Tests

  • Comparing means through paired and independent samples.

  • Chi-square test fundamentals and applications in testing.


Page 138-156: McNemar’s Test and Practical Examples

  • The application of nonparametric tests in various scenarios, including details on ear infection tests and vaccine evaluation.

Chi-Square Tests

Purpose

  • Assess relationships between categorical variables by comparing observed frequencies to expected frequencies.

Types

  1. Chi-Square Test of Independence

    • Purpose: Determine if there is a significant association between two categorical variables in a contingency table.

    • Hypothesis:

      • Null (H0): There is no association between the variables.

      • Alternative (H1): There is an association.

    • Assumptions:

      • Random sampling

      • Expected frequency should not be less than 5 in more than 20% of cells.

  2. Chi-Square Goodness of Fit Test

    • Purpose: Determine if the sample data matches a population with a specified distribution (e.g., expected proportions of categories).

    • Hypothesis:

      • Null (H0): The observed frequencies match the expected frequencies.

      • Alternative (H1): The observed frequencies do not match the expected frequencies.

Applications

  • Useful in survey data analysis, experimental research, and quality control.


McNemar's Test

Purpose

  • Determine if there are differences on a dichotomous outcome between two related groups (e.g., before-and-after measurements on the same subjects).

Hypothesis

  • Null (H0): The proportions are the same (no change).

  • Alternative (H1): The proportions are different.

Assumptions

  • Data must be paired (repeated measures) and dichotomous (two possible outcomes).

Applications

  • Commonly used in case-control studies and pre-post intervention studies to examine changes in a binary outcome.


Wilcoxon Tests

Purpose

  • Non-parametric alternatives to the T-test for comparing two related or independent samples when normal distribution cannot be assumed.

Types

  1. Wilcoxon Signed-Rank Test

    • Purpose: Compare two related samples or matched observations.

    • Hypothesis:

      • Null (H0): The median difference between pairs is zero.

      • Alternative (H1): The median difference is not zero.

    • Applications: Used for before-and-after scenarios.

  2. Wilcoxon Rank-Sum Test (Mann-Whitney U Test)

    • Purpose: Compare two independent samples.

    • Hypothesis:

      • Null (H0): The distributions of the two populations are equal.

      • Alternative (H1): The distributions of the populations are not equal.

    • Applications: Used when comparing two different groups or populations, especially when sample sizes are small.

Key Considerations

  • Both tests rely on ranking data rather than using raw scores and do not require normality of data distribution.

Lecture Notes - CHS 780 Biostatistics in Public Health

Page 83-137: Two-Sample and Chi-Square Tests

Two-Sample Tests

  • Purpose: Compare means from two independent samples to determine if they come from populations with the same mean.

  • Types:

    • Independent Samples T-Test: Used when the sample sizes are small (n < 30) or when the population standard deviations are unknown. Assumes normal distribution.

      • Hypothesis: Null (H0): μ1 = μ2 vs. Alternative (H1): μ1 ≠ μ2.

    • Mann-Whitney U Test: Non-parametric alternative when data do not meet the assumptions of normality. Compares the medians of two groups.

Chi-Square Tests

  • Purpose: Assess relationships between categorical variables by comparing observed frequencies to expected frequencies.

  • Types:

    1. Chi-Square Test of Independence

      • Purpose: Determine if there is a significant association between two categorical variables in a contingency table.

      • Hypothesis:

        • Null (H0): There is no association between the variables.

        • Alternative (H1): There is an association.

      • Assumptions:

        • Random sampling

        • Expected frequency should not be less than 5 in more than 20% of cells.

    2. Chi-Square Goodness of Fit Test

      • Purpose: Determine if the sample data matches a population with a specified distribution (e.g., expected proportions of categories).

      • Hypothesis:

        • Null (H0): The observed frequencies match the expected frequencies.

        • Alternative (H1): The observed frequencies do not match the expected frequencies.

Key Considerations

  • For two-sample tests, ensure random sampling and check for equal variances when using T-tests.

  • For Chi-Square tests, ensure that expected frequencies should not be less than 5 in more than 20% of the cells in a contingency table to maintain validity.

McNemar's Test

  • Purpose: Determine if there are differences on a dichotomous outcome between two related groups (e.g., before-and-after measurements on the same subjects).

  • Hypothesis:

    • Null (H0): The proportions are the same (no change).

    • Alternative (H1): The proportions are different.

  • Assumptions:

    • Data must be paired (repeated measures) and dichotomous (two possible outcomes).

  • Applications:

    • Commonly used in case-control studies and pre-post intervention studies to examine changes in a binary outcome.

Wilcoxon Tests

  • Purpose: Non-parametric alternatives to the T-test for comparing two related or independent samples when normal distribution cannot be assumed.

  • Types:

    1. Wilcoxon Signed-Rank Test

      • Purpose: Compare two related samples or matched observations.

      • Hypothesis:

        • Null (H0): The median difference between pairs is zero.

        • Alternative (H1): The median difference is not zero.

      • Applications: Used for before-and-after scenarios.

    2. Wilcoxon Rank-Sum Test (Mann-Whitney U Test)

      • Purpose: Compare two independent samples.

      • Hypothesis:

        • Null (H0): The distributions of the two populations are equal.

        • Alternative (H1): The distributions of the populations are not equal.

      • Applications: Used when comparing two different groups or populations, especially when sample sizes are small.

Key Considerations

  • Both tests rely on ranking data rather than using raw scores and do not require normality of data distribution.

robot