Week 1 Note's: Data Collection & Sampling (1.1–1.3)
1.1 Introduction to Data Collection
Course Context (MATH 1401, Week 1): Focuses on Orientation, Practice Assignment, and Academic Dishonesty Quiz in iCollege.
Goals: Identify individuals and variables, classify variables (categorical/quantitative), identify population/sample, distinguish observational study/experiment.
What is statistics? It's a scientific discipline focused on working with data: collection, organization, analysis, interpretation, and presentation.
Purpose: Make informed decisions based on data.
Statistical Problem-Solving Process:
Ask Questions
Collect/Consider Data
Analyze Data
Interpret Results
Core Concepts:
Individuals: Entities described in a dataset (e.g., a student, a tree, a car).
Variable: An attribute that takes different values for different individuals (e.g., height, major, color).
Types of Variables:
Categorical: Labels or groups (e.g., a student's major: "Math," "Biology"; eye color: "blue," "brown").
Quantitative: Numeric values (quantities/measurements) (e.g., height: inches; test score: points).
Data Structure: Rows are individuals, columns are variables.
1.2 Populations, Samples, and Study Designs
Goal: Make inferences about a population.
Population: The entire group of interest (e.g., all students at a university).
Census: Data from every individual in the population.
Sample: A subset of individuals from the population (e.g., students interviewed from the university).
Purpose of Sampling: To infer about the population when a census is impractical (due to cost, time).
Observational Studies vs. Experiments:
Observational Study: Observe and measure variables without influencing responses.
Examples: Surveying opinions on a new policy; tracking medical histories of patients.
Experiment: Deliberately impose treatments to measure a response.
Examples: Applying different fertilizers to plants to measure growth; testing different drug dosages.
Important: Observational studies describe, but typically can't establish causation. Experiments aim to determine causal effects.
1.2 Sampling: Good and Bad (Bias, Randomness, and Sampling Methods)
Goals: Describe how convenience and voluntary response sampling lead to bias, explain how random sampling avoids bias, describe other bias sources.
Bias: Design of a study systematically overestimates or underestimates a quantity of interest.
Sampling Methods:
Volunteer Sampling: Individuals self-select.
Example: An online poll where people choose to vote on a housing preference. Leads to bias as motivated individuals respond.
Convenience Sampling: Individuals chosen because they are easy to reach.
Example: Asking students passing by a dining hall about housing preference. May not represent all students.
Avoiding Bias: Use random sampling.
Random Sample: Individuals selected by a chance process (e.g., drawing names from a hat).
Types of Bias:
Undercoverage: Some population members are less likely to be chosen or excluded (e.g., a phone survey that only calls landlines, ignoring cell-only users).
Nonresponse: Chosen individuals cannot be contacted or refuse to participate (e.g., mail-in surveys with low return rates).
Response Bias: Inaccurate responses due to survey design or factors (e.g., poorly worded questions, social desirability bias in sensitive topics).
1.3 Simple Random Sampling (SRS) and Sampling Variability
Simple Random Sample (SRS): Each sample of size has an equal chance of being selected from the population. Sampling is typically done without replacement.
Example 1: Drawing names randomly from a hat containing all population names.
Example 2: Numbering everyone to , then randomly selecting numbers.
Sampling Variability (Sampling Error): Different random samples of the same size yield different estimates due to chance.
Relationship with Sample Size (): As increases, sampling variability decreases, leading to more precise and consistent estimates closer to the population parameter.
Intuition: Larger samples give more consistent results.
Mathematical Intuition (for mean): Standard Error () decreases as increases; often expressed as .
1.4 Quick Connections to Practice and Real-World Relevance
Summary: Statistics helps identify who to study (population), how to collect data (sampling design), and how to interpret results (inference).
Causality: Observational studies describe but usually can't establish causation. Experiments can establish causation by applying treatments.
Real-World Impact: Concepts of sampling, bias, and sample size are vital in polls, surveys, market research, and clinical trials.
Key Formulas/Notations:
Population size:
Sample size:
SRS Principle: Each subset of size from the population has equal probability.
Sampling without replacement: Individual chosen once.
Conceptual SE for simple cases:
A dot plot is a simple type of data display that shows data points as dots above a number line. It's used to visualize the distribution of a small dataset, especially for quantitative variables. Each dot represents a single observation from the dataset.
In statistics, a bimodal distribution is a distribution with two distinct peaks, indicating two different modes or common values in a dataset.