Week 1 Note's: Data Collection & Sampling (1.1–1.3)

1.1 Introduction to Data Collection

Course Context (MATH 1401, Week 1): Focuses on Orientation, Practice Assignment, and Academic Dishonesty Quiz in iCollege.
Goals: Identify individuals and variables, classify variables (categorical/quantitative), identify population/sample, distinguish observational study/experiment.
What is statistics? It's a scientific discipline focused on working with data: collection, organization, analysis, interpretation, and presentation.
- Purpose: Make informed decisions based on data.
Statistical Problem-Solving Process:
1. Ask Questions
2. Collect/Consider Data
3. Analyze Data
4. Interpret Results
Core Concepts:
- Individuals: Entities described in a dataset (e.g., a student, a tree, a car).
- Variable: An attribute that takes different values for different individuals (e.g., height, major, color).
- Types of Variables:
- Categorical: Labels or groups (e.g., a student's major: "Math," "Biology"; eye color: "blue," "brown").
- Quantitative: Numeric values (quantities/measurements) (e.g., height: $65$ inches; test score: $85$ points).
- Data Structure: Rows are individuals, columns are variables.

1.2 Populations, Samples, and Study Designs

Goal: Make inferences about a population.
Population: The entire group of interest (e.g., all students at a university).
Census: Data from every individual in the population.
Sample: A subset of individuals from the population (e.g., $100$ students interviewed from the university).
Purpose of Sampling: To infer about the population when a census is impractical (due to cost, time).
Observational Studies vs. Experiments:
- Observational Study: Observe and measure variables without influencing responses.
- Examples: Surveying opinions on a new policy; tracking medical histories of patients.
- Experiment: Deliberately impose treatments to measure a response.
- Examples: Applying different fertilizers to plants to measure growth; testing different drug dosages.
- Important: Observational studies describe, but typically can't establish causation. Experiments aim to determine causal effects.

1.2 Sampling: Good and Bad (Bias, Randomness, and Sampling Methods)

Goals: Describe how convenience and voluntary response sampling lead to bias, explain how random sampling avoids bias, describe other bias sources.
Bias: Design of a study systematically overestimates or underestimates a quantity of interest.
Sampling Methods:
- Volunteer Sampling: Individuals self-select.
- Example: An online poll where people choose to vote on a housing preference. Leads to bias as motivated individuals respond.
- Convenience Sampling: Individuals chosen because they are easy to reach.
- Example: Asking students passing by a dining hall about housing preference. May not represent all students.
Avoiding Bias: Use random sampling.
- Random Sample: Individuals selected by a chance process (e.g., drawing names from a hat).
Types of Bias:
- Undercoverage: Some population members are less likely to be chosen or excluded (e.g., a phone survey that only calls landlines, ignoring cell-only users).
- Nonresponse: Chosen individuals cannot be contacted or refuse to participate (e.g., mail-in surveys with low return rates).
- Response Bias: Inaccurate responses due to survey design or factors (e.g., poorly worded questions, social desirability bias in sensitive topics).

1.3 Simple Random Sampling (SRS) and Sampling Variability

Simple Random Sample (SRS): Each sample of size $n$ has an equal chance of being selected from the population. Sampling is typically done without replacement.
- Example 1: Drawing $50$ names randomly from a hat containing all population names.
- Example 2: Numbering everyone $1$ to $N$ , then randomly selecting $n$ numbers.
Sampling Variability (Sampling Error): Different random samples of the same size yield different estimates due to chance.
- Relationship with Sample Size ( $n$ ): As $n$ increases, sampling variability decreases, leading to more precise and consistent estimates closer to the population parameter.
- Intuition: Larger samples give more consistent results.
- Mathematical Intuition (for mean): Standard Error ( $SE( ar{X})$ ) decreases as $n$ increases; often expressed as $SE( ar{X}) = \frac{\sigma}{\sqrt{n}}$ .

1.4 Quick Connections to Practice and Real-World Relevance

Summary: Statistics helps identify who to study (population), how to collect data (sampling design), and how to interpret results (inference).
Causality: Observational studies describe but usually can't establish causation. Experiments can establish causation by applying treatments.
Real-World Impact: Concepts of sampling, bias, and sample size are vital in polls, surveys, market research, and clinical trials.
Key Formulas/Notations:
- Population size: $N$
- Sample size: $n$
- SRS Principle: Each subset of size $n$ from the population has equal probability.
- Sampling without replacement: Individual chosen once.
- Conceptual SE for simple cases: $SE( \bar{X}) = \frac{\text{population standard deviation } \sigma}{\text{(effective) } \sqrt{n}}$

A dot plot is a simple type of data display that shows data points as dots above a number line. It's used to visualize the distribution of a small dataset, especially for quantitative variables. Each dot represents a single observation from the dataset.

In statistics, a bimodal distribution is a distribution with two distinct peaks, indicating two different modes or common values in a dataset.