Sampling Methods, Correlation, Regression, and Causality
Sampling Methods
- Sampling is selecting a portion of a larger population to make inferences about the entire population.
Types of Sampling Methods
- Simple Random
- Systematic
- Stratified
- Cluster
- Etc.
Simple Random Sampling
- Definition: Each member of the population has an equal chance of being selected.
- Process: Entirely random with no set pattern.
- Example: Choosing 25 employees out of 250 by drawing names from a hat.
Systematic Sampling
- Definition: Selecting every nth member from a population after a random start.
- Process: Organized and patterned selection.
- Example: A supermarket surveying every 10th or 15th customer to study buying habits.
Stratified Sampling
- Definition: Dividing the population into subgroups (strata) and taking a random sample from each.
- Criteria: Subgroups are based on characteristics like age, gender, or income.
- Example: Surveying 100 people out of a city of 50,000, ensuring the sample reflects the city's demographics (e.g., if 20% of the city is Asian, 20% of the sample should be Asian).
Cluster Sampling
- Definition: Dividing the population into clusters, randomly selecting clusters, and sampling all members within those clusters.
- Process: Uniform sampling within chosen clusters.
- Example: Studying soda consumption by dividing a city into areas (clusters) and selecting certain areas for the sample.
Correlation
- Definition: A statistical measure that evaluates the strength and direction of the relationship between two or more variables.
Importance of Correlation in Research
- Helps understand the strength and direction of relationships between variables.
- Aids in predicting and explaining phenomena.
- Guides decision-making and policy development.
Types of Correlation
- Positive Correlation:
- Definition: Increase in one variable corresponds to an increase in another.
- Example: More time spent running on a treadmill leads to more calories burned.
- Negative Correlation:
- Definition: Increase in one variable corresponds to a decrease in another.
- Example: Lower temperature leads to wearing more clothes.
- Zero Correlation:
- Definition: No linear relationship between variables.
- Example: Amount of tea drunk and level of intelligence.
Correlation Coefficient
- Definition: Statistical metric (Pearson's r) that quantifies the strength and direction of a linear relationship between two continuous variables.
- Scale: Ranges from -1 to 1.
Regression Analysis
- Definition: A statistical method used to model and examine the relationship between a dependent variable (target) and one or more independent variables (predictors).
- Purpose: Aids in anticipating and comprehending how adjustments to the predictor(s) will impact the target variable.
Regression Equation
- Represents the relationship between the dependent variable (Y) and independent variables (X1, X2, etc.).
- Formula: Y=b0+b1X1+b2X2+…+bnXn
Causality
- Definition: The relationship between a cause and an effect.
- Explanation: Describes how one event or variable (the cause) influences another (the effect).
- Example: A company implementing a one-to-one marketing strategy and observing a measurable increase in monthly subscriptions.
Hill's Criteria
- Nine viewpoints to evaluate epidemiologic evidence to determine if causation can be deduced.
Hill's Criteria
- Strength
- Definition: A strong association is more likely to be causal.
- Consistency
- Definition: An association is more likely to be causal when it is observed in different population groups.
- Specificity
- Definition: When an exposure is associated with a specific outcome only, then it is more likely to be causal.
- Temporality
- Definition: A cause should precede the outcome, and the timing of the exposure should be compatible with the latency period.
- Biological Gradient
- Definition: The frequency or intensity of the outcome increases when an exposure is more intense or lasts longer.
- Plausibility
- Definition: An association is more likely to be causal when it is biologically plausible.
- Coherence
- Definition: A cause-and-effect interpretation of an association should not conflict with what is known about the natural history and biology of the disease.
- Experimental Evidence
- Definition: If experimental evidence exists, then the association is more likely to be causal.
- Analogy
- Definition: The existence of an analogy could strengthen the belief that an association is causal.
Example of Hill's Criteria
- Hill believed that causal relationships were more likely to demonstrate strong associations than non-causal agents.
- Smoking and lung cancer demonstrate a strong association as the risk ratios, rate ratios, and odds ratios range from 20 to 40 when comparing smokers to non-smokers.