Hypothesis Testing

Inference from Sample to Populations

The lecture focuses on statistical inference, which involves drawing conclusions about a population based on data from a sample.

Hypothesis Testing

Definition: Hypothesis testing is a method used to make decisions or inferences about a population based on sample data.
Null Hypothesis: A statement of no effect or no difference. It's the hypothesis that researchers try to disprove.
Alternative Hypothesis: A statement that contradicts the null hypothesis, suggesting there is a difference or effect.
Type-I Error: Rejecting the null hypothesis when it is actually true (false positive). Denoted by $\\alpha$ .
Type-II Error: Failing to reject the null hypothesis when it is actually false (false negative). Denoted by $\\beta$ .
P-values: The probability of obtaining results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true. A small p-value suggests evidence against the null hypothesis.
Confidence Intervals: A range of values within which the true population parameter is likely to fall.
Single Mean Test: A statistical test used to determine whether a sample mean is significantly different from a known or hypothesized population mean.
Single Proportion Test: A statistical test used to determine whether a sample proportion is significantly different from a known or hypothesized population proportion.

STATA Demonstration

Demonstration of STATA commands for data analysis, including:
- Upload “HospAdmNeu.dta” data in STATA.
- summarize Ordinary1213 (Summary measures of ordinary admission).
- summarize Ordinary1213, detail (Detailed summary of ordinary admission).
- Graph box Ordinary1213 (Draw a boxplot to check for outliers).
- scatter Ordinary1213 Daycase1213 (Scatter plot between two variables).
- hist Ordinary1213 (Draw a histogram for ordinary admission).
- hist Ordinary1213, bin(20) (Change the bin for better representation).
- hist Ordinary1213, bin(20) normal (Check normality with a normal curve).
- Homework: Repeat all commands for a day case hospital admission.

Statistical Inference: The Big Picture

Real World: Data is collected from the real world.
Theoretical World: Scientific and statistical models are used to represent the data.
Sample: A subset of the population from which data is collected.
Population: The entire group of individuals or objects of interest.
Conclusion: Inferences are made from the sample to the population.

Populations and Samples

A population is a collection of objects, people, or events of interest.
Data collection on the entire population is often impractical.
A sample is a subset of the population used to infer information about the population.
Samples are not of interest in their own right but for what they reveal about the population.

Example: Babies Birth Weight and Mum’s Smoking

Study in Newham, London, investigating causes of low birth weight in babies born in 2016.
Data collected from 1000 babies in Newham hospital database.
Dataset includes variables like baby ID, birth weight (bwt), gestational age (gest), mother’s age (mat_age), and number of cigarettes smoked per week (cigs).

Statistical Inference

Summary statistics (mean, percentiles, SD) are calculated from the sample.
These statistics are used to infer characteristics about the total population.
The goal is to understand what sample statistics tell us about the theoretical distributions of the population.

Random Samples

Research Question: Average weight of babies and reasons for underweight babies.
Theoretical Population: Defined before data collection to understand the generalizability of inferences.
Examples: all babies (current and future), babies born in the UK between 1990 and 2000.
Average Baby Weight: Around 7 pounds (3.2 kg) for females and 7 pounds 5 ounces (3.3 kg) for males.
Reasons for Underweight Babies: Premature birth, Intrauterine Growth Restriction (IUGR), infections during pregnancy, inadequate weight gain, smoking, alcohol or drug use, and maternal age.

Random Samples (contd.)

The sample is a subset of the population and needs to be representative.
Random Sampling: Each individual in the population has an equal chance of being included, and the inclusion of one individual does not affect the inclusion of another.
Opportunistic Sampling: Also known as convenience sampling, involves selecting participants based on availability (e.g., recruiting participants from a local support group for a rare neurological disorder like Stiff Person Syndrome).

Stratified Random Sampling

Example: Global Adult Tobacco Survey (GATS) in Bangladesh.
Overall prevalence of tobacco smoking:
- 2009: 23.00% (95% CI 22.98 to 23.00)
- 2017: 16.44% (95% CI 16.43 to 16.45)
Methodology: Two-stage stratified sampling.
- First stage: Eight administrative divisions were created, stratified by urban and rural Enumeration Areas (EAs).
- Second stage: 30 households were systematically selected from each sampled PSU (EA).
- One participant was randomly picked from all eligible men and women in a participating household.

Statistical Inference: From Sample to Populations

Assessing the accuracy of sample statistics in estimating population parameters.
Repeatedly choosing samples from the same population results in different values for a statistic (e.g., the mean).
Uncertainty associated with the estimate needs to be assessed.

Sampling Variability and Standard Errors

The standard deviation of the sampling distribution of the mean ( $s / \sqrt{n}$ ) measures the typical error between the sample mean and the population mean.
$s / \sqrt{n}$ quantifies the accuracy of the sample mean as an estimate of the population mean and is known as the standard error (SE) of the mean.
Since the theoretical standard deviation (s) is unknown, the sample standard deviation (SD) is used in its place.

Standard Deviation (SD) vs Standard Error (SE)

Standard Deviation (SD): Measures how a typical observation in the sample differs from the sample mean.
Standard Error (SE): Quantifies the typical error between the mean measured in a sample and the theoretical mean in the population.
SD measures variability in the population or sample.
SE ( $= SD/\sqrt{n}$ ) measures variability in the sample means.

Importance of Normal Distribution in Medical Research

Central Tendency and Variability: Helps in understanding the mean and standard deviation of medical data.
Statistical Inference: Many tests and confidence intervals assume normality (e.g., t-tests and ANOVA).
Predictive Modeling: Used in risk assessment and predictive modeling.
Quality Control and Standardization: Used to monitor and maintain consistency of medical tests and procedures.

Sampling Distributions

The frequency distribution of the sample means is called the sampling distribution of the mean.
If the population distribution is Normal, the distribution of the sample mean over repeated samples is also Normal.
The variation in the sample means depends on the variance of the population ( $s^2$ ) and the sample size n.
For large samples (n>30), the distribution of the sample mean is approximately normal, regardless of the population distribution.

Hypothesis Testing

Null and alternative hypotheses depend on the type of investigation.
- To see if there is a difference between two procedures:
  - Null Hypothesis: No difference.
  - Alternative Hypothesis: There is a difference.
- To find out if a bold claim is true:
  - Null Hypothesis: There is no difference.
  - Alternative Hypothesis: The claim is true (drug A is better/worse than drug B).

Task-1: State Null Hypothesis

In a criminal court, the accused is assumed innocent unless proven guilty.
Null Hypothesis: The accused is innocent.

Assumptions for Hypothesis Testing

My sample(s):
1. Is representative
2. Is independent
3. Has homogeneous variance
4. Is normal
Assumptions 1 and 2 are usually considered automatically met.
Assumptions 3 and 4 need to be tested using appropriate tools & techniques.

Types of Data

Quantitative:
- Continuous (e.g., blood pressure, age).
- Discrete (e.g., number of children, number of cigarettes per day).
Categorical:
- Ordinal (ordered categories, e.g., grade of breast cancer, disease severity).
- Nominal (unordered categories, e.g., sex, ethnicity).

Statistical Tests for Continuous Data

If sample size >= 30 and assumptions are met, use normal distribution.
If sample size < 30 and assumptions are met, use t-distribution.
If assumptions are not met, transform variables and repeat steps 1 or 2.
If none of the assumptions are met, use non-parametric tests.

Parametric tests are generally more powerful.

Appropriate Tests for Continuous Data

One-Sample:
- t-test
- Sign test
Paired (2 groups):
- Paired t-test
- Wilcoxon signed rank test
- Sign test
Independent (2 groups):
- Unpaired t-test
- Wilcoxon rank sum test
Independent (>2 groups):
- One-way ANOVA
- Kruskal-Wallis ANOVA

Categorical Data

Categorical covariate data are often called factors
Categorical data that take on only two distinct values are said to be dichotomous or binary
Categorical data are often coded using numerical values (e.g. 0 = NO, 1 = YES) –
statistical packages usually treat numeric data as quantitative unless you explicitly declare it to be categorical
Limiting factor for any continuous observation is the accuracy of the measurement instrument

Appropriate Tests for Categorical Data

1 group:
- z test for a proportion
- Sign test
Paired (2 categories):
- McNemar's test
Independent (2 groups):
- Chi-squared test
- Fisher's exact test
Independent (>2 groups):
- Chi-squared test
- Chi-squared trend test
>2 Categories:
- Chi-squared test

Type I and Type II Errors

Decision	H0 is true	H0 is false
Do not reject H0	Correct decision	Type II error (β)
Reject H0	Type I error (α)	Correct Decision (1-β)

P-value and Confidence Interval (CI)

P-value: Measures the strength of evidence against the null hypothesis.
- P > 0.10: No evidence against null hypothesis.
- 0.05 < P < 0.10: Weak evidence against null hypothesis.
- 0.01 < P < 0.05: Moderate evidence against null hypothesis.
- 0.001 < P < 0.01: Strong evidence against null hypothesis.
- P < 0.001: Very strong evidence against null hypothesis.
Confidence Interval (CI): Range of values within which the true population value is likely to be found.
- If the CI for the difference in mean scores excludes ‘0’, there is strong evidence against the null hypothesis.

Confidence Interval for a Single Population Mean

A range of likely values for the parameter.
To construct a confidence interval for the population mean, make use of the following pieces of information:
- Sample mean
- Standard Error of the mean
95% of sample means lie within 1.96 SE above or below the population mean

Confidence Interval (contd.)

Approximated 1.96~2
There is 95% probability that this interval contains the unknown but true value of the population mean

Example: Finding Single Mean and 95% CI

Baby weight study with 1000 babies in Newham hospital.
Random sample of 1000 babies chosen, 997 considered after deleting unusual weights.
Mean weight (x) = 3305 gm; SD = 505 gm.
SE (mean) = $505/ \sqrt{997} = 16$ gm.
95% CI for mean = $x - z(5\%) \times SE(x)$ to $x + z(5\%) \times SE(x)$
- = 3305 - 2 × 16 to 3305 + 2 × 16
- = 3273 to 3337 gm
We are 95% confident that the true mean birth weight is between 3273 and 3337 gm.

Single Proportion

If out of total sample of size n only d individuals from our sample experience some event:
$p = \frac{d}{n}$

Example (vaccination):

In a trial of a new vaccine, 20 out of 1000 children vaccinated showed signs of adverse reaction:

$p = \frac{20}{1000} = 0.020 = 2.0\%$ , thus advising parents that the vaccine is associated with an estimated 2% risk of adverse reaction.

Single Proportion and its 95% CI – Vaccination Example

Step 1: Calculate proportion: $p = \frac{20}{1000} = 0.020$
Step 2: Calculate Standard error:
$se(p) = \sqrt{\frac{0.02(1-0.02)}{1000}} = 0.004$
Step 3: Calculate 95% Confidence interval:
- $0.02-1.96 \times 0.004$ to $0.02 +1.96 \times 0.004$
- = 0.012 to 0.027 = 1.2% to 2.7%

Practice Example: Single Proportion and its 95% CI

Smoking habits survey in Birmingham, UK, among 1000 teenagers aged 15-16 in 2001.
123 reported being current smokers.
Find the proportion of teenagers who smoked and the 95% confidence interval.

Practice Example : Single proportion and its 95% CI – Birmingham teenage smoking habit - solution of class exercise

Step 1 : calculate proportion, $p = \frac{123}{1000} = 0.123$
Step 2: Calculate the standard error of the proportion, $se(p) = \sqrt{\frac{0.123(1−0.123)}{1000}} = 0.0104$
Step 3: Calculate 95% CI using: $p −1.96×se(p)$ , $p + 1.96×se(p)$
This gives, 0.123−1.96×0.0104 and 0.123 + 1.96×0.0104 = 0.103 to 0.143 = 10.3% to 14.3%

Homework

Upload babies data from Moodle using STATA.
Using STATA command work out i) mean & sd of babies weight
Using STATA command work out 95% CI for mean baby weight & comment on your findings.

Hypothesis Testing

Inference from Sample to Populations

Hypothesis Testing

STATA Demonstration

Statistical Inference: The Big Picture

Populations and Samples

Example: Babies Birth Weight and Mum’s Smoking

Statistical Inference

Random Samples

Random Samples (contd.)

Stratified Random Sampling

Statistical Inference: From Sample to Populations

Sampling Variability and Standard Errors

Standard Deviation (SD) vs Standard Error (SE)

Importance of Normal Distribution in Medical Research

Sampling Distributions

Hypothesis Testing

Task-1: State Null Hypothesis

Assumptions for Hypothesis Testing

Types of Data

Statistical Tests for Continuous Data

Appropriate Tests for Continuous Data

Categorical Data

Appropriate Tests for Categorical Data

Type I and Type II Errors

P-value and Confidence Interval (CI)

Confidence Interval for a Single Population Mean

Confidence Interval (contd.)

Example: Finding Single Mean and 95% CI

Single Proportion

Example (vaccination):

Single Proportion and its 95% CI – Vaccination Example

Practice Example: Single Proportion and its 95% CI

Practice Example : Single proportion and its 95% CI – Birmingham teenage smoking habit - solution of class exercise

Homework

Recommended Reading