Sampling Methods, Correlation, Regression, and Causality

Sampling Methods

  • Sampling is selecting a portion of a larger population to make inferences about the entire population.

Types of Sampling Methods

  • Simple Random
  • Systematic
  • Stratified
  • Cluster
  • Etc.

Simple Random Sampling

  • Definition: Each member of the population has an equal chance of being selected.
  • Process: Entirely random with no set pattern.
  • Example: Choosing 25 employees out of 250 by drawing names from a hat.

Systematic Sampling

  • Definition: Selecting every nth member from a population after a random start.
  • Process: Organized and patterned selection.
  • Example: A supermarket surveying every 10th or 15th customer to study buying habits.

Stratified Sampling

  • Definition: Dividing the population into subgroups (strata) and taking a random sample from each.
  • Criteria: Subgroups are based on characteristics like age, gender, or income.
  • Example: Surveying 100 people out of a city of 50,000, ensuring the sample reflects the city's demographics (e.g., if 20% of the city is Asian, 20% of the sample should be Asian).

Cluster Sampling

  • Definition: Dividing the population into clusters, randomly selecting clusters, and sampling all members within those clusters.
  • Process: Uniform sampling within chosen clusters.
  • Example: Studying soda consumption by dividing a city into areas (clusters) and selecting certain areas for the sample.

Correlation

  • Definition: A statistical measure that evaluates the strength and direction of the relationship between two or more variables.

Importance of Correlation in Research

  • Helps understand the strength and direction of relationships between variables.
  • Aids in predicting and explaining phenomena.
  • Guides decision-making and policy development.

Types of Correlation

  • Positive Correlation:
    • Definition: Increase in one variable corresponds to an increase in another.
    • Example: More time spent running on a treadmill leads to more calories burned.
  • Negative Correlation:
    • Definition: Increase in one variable corresponds to a decrease in another.
    • Example: Lower temperature leads to wearing more clothes.
  • Zero Correlation:
    • Definition: No linear relationship between variables.
    • Example: Amount of tea drunk and level of intelligence.

Correlation Coefficient

  • Definition: Statistical metric (Pearson's r) that quantifies the strength and direction of a linear relationship between two continuous variables.
  • Scale: Ranges from -1 to 1.

Regression Analysis

  • Definition: A statistical method used to model and examine the relationship between a dependent variable (target) and one or more independent variables (predictors).
  • Purpose: Aids in anticipating and comprehending how adjustments to the predictor(s) will impact the target variable.

Regression Equation

  • Represents the relationship between the dependent variable (Y) and independent variables (X1, X2, etc.).
  • Formula: Y=b0+b1X1+b2X2++bnXnY = b0 + b1X1 + b2X2 + … + bnXn

Causality

  • Definition: The relationship between a cause and an effect.
  • Explanation: Describes how one event or variable (the cause) influences another (the effect).
  • Example: A company implementing a one-to-one marketing strategy and observing a measurable increase in monthly subscriptions.

Hill's Criteria

  • Nine viewpoints to evaluate epidemiologic evidence to determine if causation can be deduced.

Hill's Criteria

  • Strength
    • Definition: A strong association is more likely to be causal.
  • Consistency
    • Definition: An association is more likely to be causal when it is observed in different population groups.
  • Specificity
    • Definition: When an exposure is associated with a specific outcome only, then it is more likely to be causal.
  • Temporality
    • Definition: A cause should precede the outcome, and the timing of the exposure should be compatible with the latency period.
  • Biological Gradient
    • Definition: The frequency or intensity of the outcome increases when an exposure is more intense or lasts longer.
  • Plausibility
    • Definition: An association is more likely to be causal when it is biologically plausible.
  • Coherence
    • Definition: A cause-and-effect interpretation of an association should not conflict with what is known about the natural history and biology of the disease.
  • Experimental Evidence
    • Definition: If experimental evidence exists, then the association is more likely to be causal.
  • Analogy
    • Definition: The existence of an analogy could strengthen the belief that an association is causal.

Example of Hill's Criteria

  • Hill believed that causal relationships were more likely to demonstrate strong associations than non-causal agents.
  • Smoking and lung cancer demonstrate a strong association as the risk ratios, rate ratios, and odds ratios range from 20 to 40 when comparing smokers to non-smokers.