Week 1 Note's: Data Collection & Sampling (1.1–1.3)

1.1 Introduction to Data Collection

  • Course Context (MATH 1401, Week 1): Focuses on Orientation, Practice Assignment, and Academic Dishonesty Quiz in iCollege.

  • Goals: Identify individuals and variables, classify variables (categorical/quantitative), identify population/sample, distinguish observational study/experiment.

  • What is statistics? It's a scientific discipline focused on working with data: collection, organization, analysis, interpretation, and presentation.

    • Purpose: Make informed decisions based on data.

  • Statistical Problem-Solving Process:

    1. Ask Questions

    2. Collect/Consider Data

    3. Analyze Data

    4. Interpret Results

  • Core Concepts:

    • Individuals: Entities described in a dataset (e.g., a student, a tree, a car).

    • Variable: An attribute that takes different values for different individuals (e.g., height, major, color).

    • Types of Variables:

    • Categorical: Labels or groups (e.g., a student's major: "Math," "Biology"; eye color: "blue," "brown").

    • Quantitative: Numeric values (quantities/measurements) (e.g., height: 6565 inches; test score: 8585 points).

    • Data Structure: Rows are individuals, columns are variables.

1.2 Populations, Samples, and Study Designs

  • Goal: Make inferences about a population.

  • Population: The entire group of interest (e.g., all students at a university).

  • Census: Data from every individual in the population.

  • Sample: A subset of individuals from the population (e.g., 100100 students interviewed from the university).

  • Purpose of Sampling: To infer about the population when a census is impractical (due to cost, time).

  • Observational Studies vs. Experiments:

    • Observational Study: Observe and measure variables without influencing responses.

    • Examples: Surveying opinions on a new policy; tracking medical histories of patients.

    • Experiment: Deliberately impose treatments to measure a response.

    • Examples: Applying different fertilizers to plants to measure growth; testing different drug dosages.

    • Important: Observational studies describe, but typically can't establish causation. Experiments aim to determine causal effects.

1.2 Sampling: Good and Bad (Bias, Randomness, and Sampling Methods)

  • Goals: Describe how convenience and voluntary response sampling lead to bias, explain how random sampling avoids bias, describe other bias sources.

  • Bias: Design of a study systematically overestimates or underestimates a quantity of interest.

  • Sampling Methods:

    • Volunteer Sampling: Individuals self-select.

    • Example: An online poll where people choose to vote on a housing preference. Leads to bias as motivated individuals respond.

    • Convenience Sampling: Individuals chosen because they are easy to reach.

    • Example: Asking students passing by a dining hall about housing preference. May not represent all students.

  • Avoiding Bias: Use random sampling.

    • Random Sample: Individuals selected by a chance process (e.g., drawing names from a hat).

  • Types of Bias:

    • Undercoverage: Some population members are less likely to be chosen or excluded (e.g., a phone survey that only calls landlines, ignoring cell-only users).

    • Nonresponse: Chosen individuals cannot be contacted or refuse to participate (e.g., mail-in surveys with low return rates).

    • Response Bias: Inaccurate responses due to survey design or factors (e.g., poorly worded questions, social desirability bias in sensitive topics).

1.3 Simple Random Sampling (SRS) and Sampling Variability

  • Simple Random Sample (SRS): Each sample of size nn has an equal chance of being selected from the population. Sampling is typically done without replacement.

    • Example 1: Drawing 5050 names randomly from a hat containing all population names.

    • Example 2: Numbering everyone 11 to NN, then randomly selecting nn numbers.

  • Sampling Variability (Sampling Error): Different random samples of the same size yield different estimates due to chance.

    • Relationship with Sample Size (nn): As nn increases, sampling variability decreases, leading to more precise and consistent estimates closer to the population parameter.

    • Intuition: Larger samples give more consistent results.

    • Mathematical Intuition (for mean): Standard Error (SE(arX)SE( ar{X})) decreases as nn increases; often expressed as SE(arX)=σnSE( ar{X}) = \frac{\sigma}{\sqrt{n}}.

1.4 Quick Connections to Practice and Real-World Relevance

  • Summary: Statistics helps identify who to study (population), how to collect data (sampling design), and how to interpret results (inference).

  • Causality: Observational studies describe but usually can't establish causation. Experiments can establish causation by applying treatments.

  • Real-World Impact: Concepts of sampling, bias, and sample size are vital in polls, surveys, market research, and clinical trials.

  • Key Formulas/Notations:

    • Population size: NN

    • Sample size: nn

    • SRS Principle: Each subset of size nn from the population has equal probability.

    • Sampling without replacement: Individual chosen once.

    • Conceptual SE for simple cases: SE(Xˉ)=population standard deviation σ(effective) nSE( \bar{X}) = \frac{\text{population standard deviation } \sigma}{\text{(effective) } \sqrt{n}}

A dot plot is a simple type of data display that shows data points as dots above a number line. It's used to visualize the distribution of a small dataset, especially for quantitative variables. Each dot represents a single observation from the dataset.

In statistics, a bimodal distribution is a distribution with two distinct peaks, indicating two different modes or common values in a dataset.