intro to epidem W8-Case-Control Study Design and Analysis
Case-Control Studies
Case-control studies examine disease or outcome etiology by looking back to see if a particular exposure is associated with an outcome that has already occurred.
This design is particularly useful for studying rare outcomes.
The design sequence involves assessing a group of people who either do or do not have an outcome of interest and then looking back in time to see if those with the outcome differ from those without in terms of some prior exposure.
A case-control study starts with identifying people who either have the outcome (cases) or do not have the outcome (controls) and then looks back in time to see whether or not people were exposed.
Example: Schizophrenia
If schizophrenia occurs in about one percent of the general population, a cohort study assessing an exposure associated with schizophrenia development would require a very large sample.
To obtain a target sample of 500 individuals with schizophrenia, one would need to recruit approximately 50,000 individuals and follow them longitudinally.
A case-control study allows researchers to recruit 500 individuals with diagnosed schizophrenia and a group of individuals without schizophrenia and then search for past exposures through interviews or data records.
Challenges in Sampling for Case-Control Studies
Recruiting cases can be challenging, particularly with rare outcomes or outcomes that take a long time to develop.
Controls should be similar to the cases in as many ways as possible and also similar to the general population.
Cases of rare outcomes are often found through hospitals.
Using multiple hospitals from multiple locations can help ensure that the hospitalized individuals are representative, improving generalizability.
Using only one hospital may lead to findings that are specific to that hospital.
It is better to pull cases from multiple hospitals when possible.
Incident cases involve recruiting individuals as they become a new case.
Using incident cases may take a long time to recruit enough cases, especially if the outcome is very rare.
Prevalent cases involve using existing records from hospitals to find people with existing diagnoses.
With prevalent cases, there might be a problem of survival, as only those who have survived with the disease might be available to contact.
Recruiting Controls
Controls should be similar to cases and represent a disease-free population.
Common options for recruiting controls include hospitals, neighborhoods, friends, or family.
Hospital Controls
Hospital controls are similar to cases because they also have some form of condition that is forcing them to seek hospitalization.
For example, in a study on brain cancer, controls might be selected from those hospitalized for other cancers such as bladder cancer or thyroid cancer.
Hospital controls are available, willing to participate, and economical.
Hospitalized populations may differ from the general population, limiting generalizability.
Neighborhood Controls
Neighborhood controls involve selecting individual controls for each case who live in the same neighborhood.
For example, selecting a control participant from Bothell if cases are from Bothell.
Cases and controls may share social factors such as social status or culture.
Individuals living in a particular neighborhood might be too similar to each other, forcing the controls and the cases to be too alike on exposure.
People may not answer their doorbell and might not be willing to participate.
Best Friend Controls
Each case is asked to name a good friend who might be willing to participate.
Friends might share traits such as social status, culture, or age.
Recruiting a best friend might be easier than recruiting a neighborhood control, as friends are often willing to participate due to their friendship.
Friends might be too similar with regards to exposure.
Family Controls
Each case is asked for the name of a sibling or spouse who might be willing to participate.
Family controls are often willing to participate and often look like the cases with regards to social status, culture, and in many cases even genes and environments if you're using siblings.
Cases and controls might wind up looking too similar to each other with regards to exposure.
Number of Controls per Case
Statistical power increases with sample size.
Power will increase up to a ratio of four controls per one case; therefore, researchers often recruit four controls per case.
Recruiting multiple controls also allows for more than one control type per case.
Researchers must pick which group is most likely the gold standard of truth to be a true comparison.
Matching
Matching is used to ensure that cases and controls are similar on important characteristics other than the target exposure.
Demographic features such as sex, age, race, occupation, marital status, or social status are often matching factors.
Matching can be done at the group level or through individual matching.
With individual matching, for each case the researchers try to find a control with similar characteristics meaning they are essentially each case's doppelganger.
The more characteristics you try to match, the harder it is to find a good control.
If you match on too many characteristics, then you cannot study that particular characteristic.
Example of Overmatching
A study examining the link between teen smoking and stress in adulthood recruits 200 high-stress cases and 400 low-stress controls individually matched for sex, age, race, marital status, socioeconomic status, military status, and occupation.
The data can not then be used to assess the association between military status and stress.
Matching has force the same proportion of the characteristic in our cases and controls.
If 45 of the cases were veterans and there are two controls per case, then 90 controls would also be veterans.
Case-Control vs. Cohort Study Design
Case-control studies recruit participants based on outcome status (cases or controls).
Cohort studies recruit participants who do not have the outcome, based on membership in a general population or exposure status.
Case-control studies always look retrospectively for exposure, whereas cohort studies can be prospective, retrospective, or a combination.
Incidence can only be assessed using a cohort study.
Relative risk can be assessed with a cohort study.
Case control studies should use the odds ratio.
Advantages and Disadvantages
Cohort Study
Advantages: assess temporality, calculate incidence, less recall bias.
Disadvantages: expensive, time-consuming.
Case-Control Study
Advantages: less expensive, needs less time.
Disadvantages: cannot calculate incidence or guarantee temporal order, difficult to select good controls, more chance of recall bias.
Case-Control Study Within a Defined Cohort Study
Nested Case-Control Study: Selects controls for each case at the time when each case is identified.
Case-Cohort Study: Controls are randomly chosen from the defined population at the end of the study.
Case Crossover Design
Used when the outcome occurs fairly immediately after the exposure.
Each individual case serves as his or her own control.
A period of time directly before the outcome is studied for the exposure of interest, relative to a control period prior to that.
Example: Ebola
Ebola develops within 21 days of exposure.
In a study of nurses in Sierra Leone during the Ebola outbreak, the exposure of interest was the number of Band-Aids changed in the last month.
The number of Band-Aids changed in the month leading up to the Ebola diagnosis would be the case exposure time period.
The month prior to that would be the control month.
The level of exposure during the control period would be compared to the level of exposure during the case time period.
Odds Ratio
The calculation used to express the association between an independent and dependent variable for a case-control study.
Odds are the chance that something will occur divided by the chance that something will not occur.
Odds = \frac{Probability \ of \ occurance}{Probability \ of \ non-occurance} = \frac{p}{1-p}
Odds Ratio Calculation
Using a 2x2 table with cells A, B, C, and D:
Odds \ Ratio = \frac{A/B}{C/D}
Odds \ Ratio = \frac{A \times D}{B \times C}
The odds ratio is the same for both cohort and case-control studies.
Approximating Relative Risk with Odds Ratio
Cases must be representative of all people with the disease in relation to their exposure.
Controls must be representative of all people without the disease with regards to exposure.
The disease has to be rare in the study population.
Outcome-positive individuals (cells A and C) must be small in relation to the total sample.
Matched Pairs Odds Ratio
Uses a 2x2 table with cases on one side and controls on the other, looking at exposure.
The discordant cells (one exposed, the other not) are used in the calculation.
Odds \ Ratio = \frac{B}{C}
(B and C are the discordant cells.)
Interpretation of Odds Ratio
Odds Ratio ≈ 1: No association.
Odds Ratio > 1: Odds of exposure in cases is greater than odds of exposure in controls.
Odds Ratio < 1: Odds of exposure in cases is less than odds of exposure in controls.
Methodological Considerations
Threats to external validity involve bringing individuals into the sample.
Threats to internal validity involve problems with the actual study design or data collection.
Problems of recall may threaten internal validity.
Limitations might occur if there are difficulties attaining past records regarding exposure or if records are incomplete.
Biases associated with recall may occur if individuals with a particular outcome are more likely to remember particular exposures than those without.
Cross-Sectional Studies
Recruit from a defined population and gather information at one specific time point.
Collect data on both exposure and outcome of interest.
Cannot measure incidence because the development of the outcome is not watched over time after a given exposure.
Use the odds ratio for analysis.
A notable limitation is that individuals might have problems with recall, especially when recalling distant events.
The odds ratio calculation is worded as it would be in a cohort study, looking at the odds of the outcome in an exposed group compared to an unexposed group.
Ecological Studies
Utilize data at the group or population level.
Can be useful for describing a population.
Making the ecological fallacy occurs when data from an ecological study are used to make inferences at the individual level, which cannot be done without individual level data.