Note

0.0(0)

Take a practice test

Chat with Kai

View the linked PDF

Explore Top Notes

unit four review: political ideologies and beliefs

Note

Studied by 48 people

5.0(1)

Waste Generation and Management

Chapter 4: Computer Systems and Networks

Note

Studied by 37 people

5.0(1)

Chapter 35: Life and Health Insurance

Note

Studied by 19 people

5.0(1)

Lectures 4-10

Lecture Notes - CHS 780 Biostatistics in Public Health

Page 1: Introduction

Lecture 4: Distribution
Minggen Lu, PhD
Date: September 19, 2024

Page 2: Random Variable

Definition: A numeric variable that assumes a value based on the outcome of a random experiment.
Types of Random Variables:
- Discrete Random Variable: Takes specific numeric values (often integers).
- Continuous Random Variable: Can assume any value over an interval or continuum.
Examples:
- Discrete: X = number of heads from 3 coin tosses (Sample space S = {0, 1, 2, 3}).
- Continuous: X = high temperature in Reno on a summer day (Sample space S = {50 ≤ X ≤ 110}).
Notation:
- Capital letters (X, Y, Z) denote random variables.
- Lower-case (x, y, z) denote observed values.

Page 3: Probability Distribution

Discrete Random Variable X: A function p(x) assigns probabilities for each possible value of X, expressed as p(x) = Pr(X = x).
Example: Tossing an unbiased coin three times, with a sample space S = {0, 1, 2, 3}.
- Probability Distribution:
  - x: 0, p(x): 1/8 (0.125)
  - x: 1, p(x): 3/8 (0.375)
  - x: 2, p(x): 3/8 (0.375)
  - x: 3, p(x): 1/8 (0.125)

Page 4: Application of p(x)

Events: Let A = event of obtaining 2 heads, B = event of obtaining at least 2 heads.
- Compute:
  - Pr(A) = p(2) = 0.375
  - Pr(B) = p(2) + p(3) = 0.5
  - Conditional Probability:
    - Pr(A|B) = Pr(A ∩ B) / Pr(B) = Pr(A) / Pr(B) = 0.375 / 0.5 = 0.75

Page 5: Continuous Probability Distribution

Continuous probability distribution of random variable X is represented by an unbroken curve (Density Curve).
Characteristics:
- The area under the curve over an interval represents the probability of X assuming a value in that interval.
- The total area under the curve equals 1.
- The probability of X assuming any specific value is zero.

Page 6: Example of Continuous Random Variable

Example: Let X be the time spent studying each week by a college student.
Sample space for X: S = {0 ≤ X ≤ 50}.
The density curve is drawn such that the total area equals 1.

Page 7: Mean and Variance

When a random variable is repeatedly measured, it leads to observed values.
Mean (µ): The mean of large sets of measurements on X.
Variance (σ²): The variance of large sets of measurements on X.
Standard Deviation (σ): The square root of variance.

Page 8: Bernoulli Distribution

Factorials: n! = n(n-1)(n-2)...1, with 0! = 1.
- Examples:
  - 3! = 6, 6! = 720, 10! = 3,628,800.
Selection:
- Ordering n objects: n!
- Choosing x from n (0 ≤ x ≤ n): Binomial Coefficient
  - n choose x: C(n, x) = n!/(x!(n-x)!)

Page 9: Bernoulli Trials

Criteria:
1. Each trial has 2 possible outcomes (success/failure).
2. Trials are independent.
3. Probability of success (p) remains constant.
Examples:
1. Tossing a coin 25 times.
2. Rolling a die 12 times (even/odd outcomes).
3. Testing 1000 blood samples for HIV status.

Page 10: Binomial Random Variable

Let X count the number of successes in n Bernoulli trials.
X is a Binomial Random Variable.
Probability Distribution:
- Binomial Distribution denoted as Bin(n, p).
- Formula: p(x) = Pr(X = x) = C(n, x) p^x (1-p)^(n-x), where x = 0, 1,.., n.

Page 11: Properties of Binomial Random Variable

Mean: µ = np.
Variance: σ² = np(1-p).

Page 12: Blood Type Example

Probability assignments: Blood Types O, A, B, AB:
- O: 0.45, A: 0.40, B: 0.10, AB: 0.04.
Consider a sample of 10 Americans - let X = number with blood type A:
- Calculate the mean, standard deviation, and probabilities for specific counts (e.g., exactly 4 with blood type A).

Page 13: Psychiatric Disorder Example

Error Rate: 20% chance of an adult suffering from a psychiatric disorder.
Analyze sample of 12 adults for presence of disorder, calculating probabilities for those within range.
Example Questions: a. Probability of 3-6 having a disorder. b. Probability of more than 3 but fewer than 6. c. Probability that at least one has a disorder.

Page 14: Eye Color Example

A couple is expecting 6 children with probabilities of eye color: blue (0.25), brown (0.75).
Calculate mean, standard deviation for blue-eyed children, and probabilities for various outcomes.
Analyze for one child with blue eyes, at least one with blue or brown eyes.

Page 15: Poisson Distribution

Used for counting occurrences of events over an interval of time or space.
Let X count occurrences—X is a Poisson Random Variable.
Mean number of occurrences: λ (lambda).
Distribution Formula: p(x) = Pr(X = x) = (e^(-λ) λ^x) / x!, where x = 0, 1, 2, ...

Page 16: Characteristics of Poisson Distribution

The probability mass function decreases as the number of occurrences increase.
Used to approximate probabilities for binomial distributions when n is large and p is small.

Page 17: Properties of a Poisson Random Variable

Mean & Variance: Both are equal to λ.
Can approximate binomial probabilities as described in previous pages.

Page 18: Example - Emergency Treatment Center

Define number of patients arriving at ETC: mean = 4.5 per day.
Probability queries for no, at least one, or 4 or 5 patients arriving.

Page 19: Normal Distribution

Normal distribution: most common for continuous variables.
Characteristics:
- Examples include physiological measures (blood pressure, cholesterol).
Density Curve Formula: f(x) = (1/(σ√2π)) e^(-(x-µ)²/(2σ²)), where µ = mean, σ = standard deviation.

Page 20: Properties of Normal Distribution

Denoted as N(µ, σ²).
Standard Normal Distribution: µ = 0 and σ = 1.
Visual representation: bell-shaped curve centered on µ.

Page 21: The Empirical Rule

About 68% of area lies within one standard deviation (µ ± σ).
About 95% of area lies within two standard deviations (µ ± 2σ).
About 99.7% lies within three standard deviations (µ ± 3σ).

Page 22: Computing Standard Normal Probabilities

Definition of a standard normal variable: Z = (X - µ) / σ.
Area under the standard normal curve between points gives probability.

Page 23: Properties of Standard Normal Distribution

Basic probability relationships based on standard normal distribution.
General formulas for comparisons of Z scores and their implications in hypothesis testing.

Page 24-31: Example and Applications

Various practical examples demonstrating how to compute probabilities, perform hypothesis testing, and analyze confidence intervals using normal and other distributions.

Page 32: Statistical Inference

Distinction between estimation and hypothesis testing.
Estimation methods including point and interval estimates.

Page 33-34: Key Definitions

Key statistical terms including populations, samples, statistics, and parameters outlined.
Importance of random sampling emphasized.

Page 35-40: Sampling Distribution and Mean

Walking through the concepts of sampling distributions, expected values, and standard errors.

Page 41-58: Hypothesis Testing and Power

Steps and methods involved in performing hypothesis tests, including Type I and Type II errors.
Power analysis and its significance discussed.

Page 59-82: Examples and Applications

Illustrative examples reinforcing concepts of hypothesis testing, including confidence interval calculations.

Page 83-137: Two-Sample and Chi-Square Tests

Comparing means through paired and independent samples.
Chi-square test fundamentals and applications in testing.

Page 138-156: McNemar’s Test and Practical Examples

The application of nonparametric tests in various scenarios, including details on ear infection tests and vaccine evaluation.

Chi-Square Tests

Purpose

Assess relationships between categorical variables by comparing observed frequencies to expected frequencies.

Types

Chi-Square Test of Independence
- Purpose: Determine if there is a significant association between two categorical variables in a contingency table.
- Hypothesis:
  - Null (H0): There is no association between the variables.
  - Alternative (H1): There is an association.
- Assumptions:
  - Random sampling
  - Expected frequency should not be less than 5 in more than 20% of cells.
Chi-Square Goodness of Fit Test
- Purpose: Determine if the sample data matches a population with a specified distribution (e.g., expected proportions of categories).
- Hypothesis:
  - Null (H0): The observed frequencies match the expected frequencies.
  - Alternative (H1): The observed frequencies do not match the expected frequencies.

Applications

Useful in survey data analysis, experimental research, and quality control.

McNemar's Test

Purpose

Determine if there are differences on a dichotomous outcome between two related groups (e.g., before-and-after measurements on the same subjects).

Hypothesis

Null (H0): The proportions are the same (no change).
Alternative (H1): The proportions are different.

Assumptions

Data must be paired (repeated measures) and dichotomous (two possible outcomes).

Applications

Commonly used in case-control studies and pre-post intervention studies to examine changes in a binary outcome.

Wilcoxon Tests

Purpose

Non-parametric alternatives to the T-test for comparing two related or independent samples when normal distribution cannot be assumed.

Types

Wilcoxon Signed-Rank Test
- Purpose: Compare two related samples or matched observations.
- Hypothesis:
  - Null (H0): The median difference between pairs is zero.
  - Alternative (H1): The median difference is not zero.
- Applications: Used for before-and-after scenarios.
Wilcoxon Rank-Sum Test (Mann-Whitney U Test)
- Purpose: Compare two independent samples.
- Hypothesis:
  - Null (H0): The distributions of the two populations are equal.
  - Alternative (H1): The distributions of the populations are not equal.
- Applications: Used when comparing two different groups or populations, especially when sample sizes are small.

Key Considerations

Both tests rely on ranking data rather than using raw scores and do not require normality of data distribution.

Lecture Notes - CHS 780 Biostatistics in Public Health

Page 83-137: Two-Sample and Chi-Square Tests

Two-Sample Tests

Purpose: Compare means from two independent samples to determine if they come from populations with the same mean.
Types:
- Independent Samples T-Test: Used when the sample sizes are small (n < 30) or when the population standard deviations are unknown. Assumes normal distribution.
  - Hypothesis: Null (H0): μ1 = μ2 vs. Alternative (H1): μ1 ≠ μ2.
- Mann-Whitney U Test: Non-parametric alternative when data do not meet the assumptions of normality. Compares the medians of two groups.

Chi-Square Tests

Purpose: Assess relationships between categorical variables by comparing observed frequencies to expected frequencies.
Types:
1. Chi-Square Test of Independence
  - Purpose: Determine if there is a significant association between two categorical variables in a contingency table.
  - Hypothesis:
    - Null (H0): There is no association between the variables.
    - Alternative (H1): There is an association.
  - Assumptions:
    - Random sampling
    - Expected frequency should not be less than 5 in more than 20% of cells.
2. Chi-Square Goodness of Fit Test
  - Purpose: Determine if the sample data matches a population with a specified distribution (e.g., expected proportions of categories).
  - Hypothesis:
    - Null (H0): The observed frequencies match the expected frequencies.
    - Alternative (H1): The observed frequencies do not match the expected frequencies.

Key Considerations

For two-sample tests, ensure random sampling and check for equal variances when using T-tests.
For Chi-Square tests, ensure that expected frequencies should not be less than 5 in more than 20% of the cells in a contingency table to maintain validity.

McNemar's Test

Purpose: Determine if there are differences on a dichotomous outcome between two related groups (e.g., before-and-after measurements on the same subjects).
Hypothesis:
- Null (H0): The proportions are the same (no change).
- Alternative (H1): The proportions are different.
Assumptions:
- Data must be paired (repeated measures) and dichotomous (two possible outcomes).
Applications:
- Commonly used in case-control studies and pre-post intervention studies to examine changes in a binary outcome.

Wilcoxon Tests

Purpose: Non-parametric alternatives to the T-test for comparing two related or independent samples when normal distribution cannot be assumed.
Types:
1. Wilcoxon Signed-Rank Test
  - Purpose: Compare two related samples or matched observations.
  - Hypothesis:
    - Null (H0): The median difference between pairs is zero.
    - Alternative (H1): The median difference is not zero.
  - Applications: Used for before-and-after scenarios.
2. Wilcoxon Rank-Sum Test (Mann-Whitney U Test)
  - Purpose: Compare two independent samples.
  - Hypothesis:
    - Null (H0): The distributions of the two populations are equal.
    - Alternative (H1): The distributions of the populations are not equal.
  - Applications: Used when comparing two different groups or populations, especially when sample sizes are small.