Hypothesis Testing
Inference from Sample to Populations
- The lecture focuses on statistical inference, which involves drawing conclusions about a population based on data from a sample.
Hypothesis Testing
- Definition: Hypothesis testing is a method used to make decisions or inferences about a population based on sample data.
- Null Hypothesis: A statement of no effect or no difference. It's the hypothesis that researchers try to disprove.
- Alternative Hypothesis: A statement that contradicts the null hypothesis, suggesting there is a difference or effect.
- Type-I Error: Rejecting the null hypothesis when it is actually true (false positive). Denoted by .
- Type-II Error: Failing to reject the null hypothesis when it is actually false (false negative). Denoted by .
- P-values: The probability of obtaining results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true. A small p-value suggests evidence against the null hypothesis.
- Confidence Intervals: A range of values within which the true population parameter is likely to fall.
- Single Mean Test: A statistical test used to determine whether a sample mean is significantly different from a known or hypothesized population mean.
- Single Proportion Test: A statistical test used to determine whether a sample proportion is significantly different from a known or hypothesized population proportion.
STATA Demonstration
- Demonstration of STATA commands for data analysis, including:
- Upload “HospAdmNeu.dta” data in STATA.
summarize Ordinary1213(Summary measures of ordinary admission).summarize Ordinary1213, detail(Detailed summary of ordinary admission).Graph box Ordinary1213(Draw a boxplot to check for outliers).scatter Ordinary1213 Daycase1213(Scatter plot between two variables).hist Ordinary1213(Draw a histogram for ordinary admission).hist Ordinary1213, bin(20)(Change the bin for better representation).hist Ordinary1213, bin(20) normal(Check normality with a normal curve).- Homework: Repeat all commands for a day case hospital admission.
Statistical Inference: The Big Picture
- Real World: Data is collected from the real world.
- Theoretical World: Scientific and statistical models are used to represent the data.
- Sample: A subset of the population from which data is collected.
- Population: The entire group of individuals or objects of interest.
- Conclusion: Inferences are made from the sample to the population.
Populations and Samples
- A population is a collection of objects, people, or events of interest.
- Data collection on the entire population is often impractical.
- A sample is a subset of the population used to infer information about the population.
- Samples are not of interest in their own right but for what they reveal about the population.
Example: Babies Birth Weight and Mum’s Smoking
- Study in Newham, London, investigating causes of low birth weight in babies born in 2016.
- Data collected from 1000 babies in Newham hospital database.
- Dataset includes variables like baby ID, birth weight (bwt), gestational age (gest), mother’s age (mat_age), and number of cigarettes smoked per week (cigs).
Statistical Inference
- Summary statistics (mean, percentiles, SD) are calculated from the sample.
- These statistics are used to infer characteristics about the total population.
- The goal is to understand what sample statistics tell us about the theoretical distributions of the population.
Random Samples
- Research Question: Average weight of babies and reasons for underweight babies.
- Theoretical Population: Defined before data collection to understand the generalizability of inferences.
- Examples: all babies (current and future), babies born in the UK between 1990 and 2000.
- Average Baby Weight: Around 7 pounds (3.2 kg) for females and 7 pounds 5 ounces (3.3 kg) for males.
- Reasons for Underweight Babies: Premature birth, Intrauterine Growth Restriction (IUGR), infections during pregnancy, inadequate weight gain, smoking, alcohol or drug use, and maternal age.
Random Samples (contd.)
- The sample is a subset of the population and needs to be representative.
- Random Sampling: Each individual in the population has an equal chance of being included, and the inclusion of one individual does not affect the inclusion of another.
- Opportunistic Sampling: Also known as convenience sampling, involves selecting participants based on availability (e.g., recruiting participants from a local support group for a rare neurological disorder like Stiff Person Syndrome).
Stratified Random Sampling
- Example: Global Adult Tobacco Survey (GATS) in Bangladesh.
- Overall prevalence of tobacco smoking:
- 2009: 23.00% (95% CI 22.98 to 23.00)
- 2017: 16.44% (95% CI 16.43 to 16.45)
- Methodology: Two-stage stratified sampling.
- First stage: Eight administrative divisions were created, stratified by urban and rural Enumeration Areas (EAs).
- Second stage: 30 households were systematically selected from each sampled PSU (EA).
- One participant was randomly picked from all eligible men and women in a participating household.
Statistical Inference: From Sample to Populations
- Assessing the accuracy of sample statistics in estimating population parameters.
- Repeatedly choosing samples from the same population results in different values for a statistic (e.g., the mean).
- Uncertainty associated with the estimate needs to be assessed.
Sampling Variability and Standard Errors
- The standard deviation of the sampling distribution of the mean () measures the typical error between the sample mean and the population mean.
- quantifies the accuracy of the sample mean as an estimate of the population mean and is known as the standard error (SE) of the mean.
- Since the theoretical standard deviation (s) is unknown, the sample standard deviation (SD) is used in its place.
Standard Deviation (SD) vs Standard Error (SE)
- Standard Deviation (SD): Measures how a typical observation in the sample differs from the sample mean.
- Standard Error (SE): Quantifies the typical error between the mean measured in a sample and the theoretical mean in the population.
- SD measures variability in the population or sample.
- SE () measures variability in the sample means.
Importance of Normal Distribution in Medical Research
- Central Tendency and Variability: Helps in understanding the mean and standard deviation of medical data.
- Statistical Inference: Many tests and confidence intervals assume normality (e.g., t-tests and ANOVA).
- Predictive Modeling: Used in risk assessment and predictive modeling.
- Quality Control and Standardization: Used to monitor and maintain consistency of medical tests and procedures.
Sampling Distributions
- The frequency distribution of the sample means is called the sampling distribution of the mean.
- If the population distribution is Normal, the distribution of the sample mean over repeated samples is also Normal.
- The variation in the sample means depends on the variance of the population () and the sample size n.
- For large samples (n>30), the distribution of the sample mean is approximately normal, regardless of the population distribution.
Hypothesis Testing
- Null and alternative hypotheses depend on the type of investigation.
- To see if there is a difference between two procedures:
- Null Hypothesis: No difference.
- Alternative Hypothesis: There is a difference.
- To find out if a bold claim is true:
- Null Hypothesis: There is no difference.
- Alternative Hypothesis: The claim is true (drug A is better/worse than drug B).
- To see if there is a difference between two procedures:
Task-1: State Null Hypothesis
- In a criminal court, the accused is assumed innocent unless proven guilty.
- Null Hypothesis: The accused is innocent.
Assumptions for Hypothesis Testing
- My sample(s):
- Is representative
- Is independent
- Has homogeneous variance
- Is normal
- Assumptions 1 and 2 are usually considered automatically met.
- Assumptions 3 and 4 need to be tested using appropriate tools & techniques.
Types of Data
- Quantitative:
- Continuous (e.g., blood pressure, age).
- Discrete (e.g., number of children, number of cigarettes per day).
- Categorical:
- Ordinal (ordered categories, e.g., grade of breast cancer, disease severity).
- Nominal (unordered categories, e.g., sex, ethnicity).
Statistical Tests for Continuous Data
- If sample size >= 30 and assumptions are met, use normal distribution.
- If sample size < 30 and assumptions are met, use t-distribution.
- If assumptions are not met, transform variables and repeat steps 1 or 2.
- If none of the assumptions are met, use non-parametric tests.
- Parametric tests are generally more powerful.
Appropriate Tests for Continuous Data
- One-Sample:
- t-test
- Sign test
- Paired (2 groups):
- Paired t-test
- Wilcoxon signed rank test
- Sign test
- Independent (2 groups):
- Unpaired t-test
- Wilcoxon rank sum test
- Independent (>2 groups):
- One-way ANOVA
- Kruskal-Wallis ANOVA
Categorical Data
- Categorical covariate data are often called factors
- Categorical data that take on only two distinct values are said to be dichotomous or binary
- Categorical data are often coded using numerical values (e.g. 0 = NO, 1 = YES) –
statistical packages usually treat numeric data as quantitative unless you explicitly declare it to be categorical - Limiting factor for any continuous observation is the accuracy of the measurement instrument
Appropriate Tests for Categorical Data
- 1 group:
- z test for a proportion
- Sign test
- Paired (2 categories):
- McNemar's test
- Independent (2 groups):
- Chi-squared test
- Fisher's exact test
- Independent (>2 groups):
- Chi-squared test
- Chi-squared trend test
- >2 Categories:
- Chi-squared test
Type I and Type II Errors
| Decision | H0 is true | H0 is false |
|---|---|---|
| Do not reject H0 | Correct decision | Type II error (β) |
| Reject H0 | Type I error (α) | Correct Decision (1-β) |
P-value and Confidence Interval (CI)
- P-value: Measures the strength of evidence against the null hypothesis.
- P > 0.10: No evidence against null hypothesis.
- 0.05 < P < 0.10: Weak evidence against null hypothesis.
- 0.01 < P < 0.05: Moderate evidence against null hypothesis.
- 0.001 < P < 0.01: Strong evidence against null hypothesis.
- P < 0.001: Very strong evidence against null hypothesis.
- Confidence Interval (CI): Range of values within which the true population value is likely to be found.
- If the CI for the difference in mean scores excludes ‘0’, there is strong evidence against the null hypothesis.
Confidence Interval for a Single Population Mean
- A range of likely values for the parameter.
- To construct a confidence interval for the population mean, make use of the following pieces of information:
- Sample mean
- Standard Error of the mean
- 95% of sample means lie within 1.96 SE above or below the population mean
Confidence Interval (contd.)
- Approximated 1.96~2
- There is 95% probability that this interval contains the unknown but true value of the population mean
Example: Finding Single Mean and 95% CI
- Baby weight study with 1000 babies in Newham hospital.
- Random sample of 1000 babies chosen, 997 considered after deleting unusual weights.
- Mean weight (x) = 3305 gm; SD = 505 gm.
- SE (mean) = gm.
- 95% CI for mean = to
- = 3305 - 2 × 16 to 3305 + 2 × 16
- = 3273 to 3337 gm
- We are 95% confident that the true mean birth weight is between 3273 and 3337 gm.
Single Proportion
- If out of total sample of size n only d individuals from our sample experience some event:
Example (vaccination):
- In a trial of a new vaccine, 20 out of 1000 children vaccinated showed signs of adverse reaction:
, thus advising parents that the vaccine is associated with an estimated 2% risk of adverse reaction.
Single Proportion and its 95% CI – Vaccination Example
- Step 1: Calculate proportion:
- Step 2: Calculate Standard error:
- Step 3: Calculate 95% Confidence interval:
- to
- = 0.012 to 0.027 = 1.2% to 2.7%
Practice Example: Single Proportion and its 95% CI
- Smoking habits survey in Birmingham, UK, among 1000 teenagers aged 15-16 in 2001.
- 123 reported being current smokers.
- Find the proportion of teenagers who smoked and the 95% confidence interval.
Practice Example : Single proportion and its 95% CI – Birmingham teenage smoking habit - solution of class exercise
- Step 1 : calculate proportion,
- Step 2: Calculate the standard error of the proportion,
- Step 3: Calculate 95% CI using: ,
- This gives, 0.123−1.96×0.0104 and 0.123 + 1.96×0.0104 = 0.103 to 0.143 = 10.3% to 14.3%
Homework
- Upload babies data from Moodle using STATA.
- Using STATA command work out i) mean & sd of babies weight
- Using STATA command work out 95% CI for mean baby weight & comment on your findings.
Recommended Reading
- Practical Statistics for medical research by Douglas Altman : Chapter 10, page 232-234.
- Medical Statistics by B. Kirkwood & J. Sterne : Chapter-6
- Statistics notes: The normal distribution BMJ 1995; 310 doi: https://doi.org/10.1136/bmj.310.6975.298
- Statistics Notes: Standard deviations and standard errors. BMJ. 2005 Oct 15; 331(7521): 903. doi: 10.1136/bmj.331.7521.903
- Comparison between t and normal distribution – separate file uploaded in Moodle (3 slides only)