Chapter 1 Part 2 Data collection

Chapter 1: Data Collection

Key Concepts in Statistics

Mean of Sampling Distribution
- Refers to the average of sample means drawn from a population, which is consistent irrespective of sample size.
- Central Limit Theorem: As sample size increases, the distribution of sample means approaches normality, regardless of the population's distribution.
Shape of Sampling Distribution
- The shape becomes nearly normal when sample size is adequate (typically n ≥ 30).

Page 2: Agenda

Data Collection Methods
Observational vs. Experimental Studies
Random Sampling

Page 3: Observational Studies vs. Designed Experiments

Learning Objectives
- Distinguish between observational studies and experiments.
- Explain various types of observational studies.

Page 4: Observational Studies and Experiments

Observational Study
- Researchers observe behaviors without influence.
Designed Experiment
- Researchers manipulate variables and assign groups.

Page 5: Example - Cellular Phones and Brain Tumors

Context: Study of mobile phone use and brain tumors with 791,710 women over 7 years.
Key Finding: No significant difference in tumor incidence between phone users and non-users (Source: Benson et al., 2013).

Page 6: National Toxicology Program Study

Investigated radio-frequency radiation (RFR) and brain tumors using rats in controlled environments:
- Three groups: control (no RFR), GSM-modulated RFR, CDMA-modulated RFR.
- Findings: Low tumor incidence in exposed rats; results not statistically significant.

Page 8: Research Variables

Response Variable: Brain cancer occurrence.
Explanatory Variable: Level of cell phone usage.
Aim is to see how the explanatory variable impacts the response variable.

Page 9: Observational Study Definition

No influence on response or explanatory variables; behavior is simply observed.

Page 11: Flu Shots Example

Longitudinal study of 36,000 seniors regarding flu shot effectiveness.
- Findings: Flu shots associated with reduced hospitalization and mortality from pneumonia/influenza (Source: Nichol et al., 2007).

Page 13: Confounding in Studies

Definition: Effects of multiple explanatory variables are not isolated leading to relations that may not be directly due to the studied variables.
Lurking Variables: Not considered but affect the response variable.

Page 15: Causation vs. Association

Observational studies reveal association, not causation.

Page 18: Types of Observational Studies

Cross-sectional Studies: Information collected at one point in time.
Case-control Studies: Retrospective study comparing individuals with certain characteristics to those without.
Cohort Studies: Prospective, following a group over time to collect data on characteristics.

Page 20: Census

Defined as a list of all individuals in a population and their characteristics.

Page 21: Web Scraping

Process of data extraction from websites; involves ethical considerations and leveraging available public data.

Page 24: Simple Random Sampling Definition

Definition: Randomly selecting individuals from a population ensures every individual has an equal chance of being included in the sample.

Page 25: Sample Size Consideration

Size of sample (n) must be less than that of the population (N).

Page 27: Simple Random Sampling Example

Scenario: Selecting three friends from six for a concert.
- Total combinations calculated to highlight sampling likelihood.

Page 31: Sampling Techniques

Without Replacement: Selected individuals can't participate again.
With Replacement: Selected individuals can be chosen again in future samples.

Page 41: Cluster Sampling

Approach involves selecting entire clusters—groups of individuals—and surveying all members within them.

Page 47: Types of Sampling

Comparison of Stratified, Systematic, and Cluster Sampling techniques and their methodologies.

Page 48: Bias in Sampling

Sources of Bias
- Sampling Bias: Bias in selection technique favoring specific population aspects.
- Nonresponse Bias: Differences in opinions between respondents and non-respondents.
- Response Bias: Inaccurate reflections of true feelings due to various influences.

Page 60: Addressing Response Bias

Suggested considerations include:
- Interviewer Error: Skilled interviewers lead to accurate responses.
- Misrepresented Answers: Responses may not always be truthful.
- Wording and Order of Questions: Skewed results can result from biased phrasing or leading questions.