Introduction to Data Collection and Sampling Methods

Overview of Data Collection and Sampling

  • The week's focus is on techniques for collecting good data and relevant methods involved.

  • Presenter refers to a diagram as a foundational overview of the course.

  • Previous topics discussed included critiquing research and constructing one’s own research based on frameworks from the course, particularly the data sleuthing framework.

Importance of Sampling

  • Population Parameters:

    • These cannot be computed unless a complete census is conducted.

    • If a census isn’t feasible, a sample must be taken instead.

  • The goal is to ensure that the sample is as representative of the population as possible.

Data Collection Methods

  • The session will focus on methods for collecting good data, especially the sampling methods outlined in Component Three of the data sleuthing framework, emphasizing the individuals surveyed and how they were selected.

Census Examples from New Zealand

  • Mention of the New Zealand government conducting a census every five years, with delays occurring due to unforeseen circumstances, such as natural disasters (e.g., Christchurch earthquakes, Cyclone Gabriel in 2023).

  • The presenter encourages reviewing the results of the census via a link to Stats New Zealand, which includes demographic and household income statistics.

Changes to Census Methodology

  • Discussion of the 2018 New Zealand census, which was conducted online but faced criticism for poor response rates.

  • Current statistics minister Shane Rettie proposed that future population estimates will rely solely on administrative data instead of traditional census methods, which the presenter criticizes for lacking a solid baseline for accurate data analysis.

Challenges and Alternatives to Census

  • The reluctance of communities, especially the Māori, to fill out census forms due to distrust towards the government is acknowledged.

  • Mention of successful localized census efforts by Māori communities, which resulted in significantly higher response rates (from under 50% to around 85-90%).

Sampling and Sampling Size Questions

  • Example of a hypothetical study involving 1,600 participants, where 24% claim to watch a particular YouTube channel.

  • Students are prompted to consider how close this percentage is to the true population percentage, leading to discussions about confidence levels in sampling methodologies.

Polling Organizations and Selection Methods

  • Several professional polling organizations, like Gallup and Ipsos, are cited.

  • Emphasizes the significance of how individuals are selected for their surveys, indicating this selection method's critical nature to ensure validity and reliability in findings.

Types of Studies Overview

  • The upcoming lectures will cover various study types, including:

    • Randomized experiments

    • Observational studies

    • Case-control studies

    • Meta-analyses

    • Surveys

Importance of Representative Data
  • Emphasized throughout that representative data is essential across all types of studies to yield valid and useful information.

Terminology in Statistics

  • Unit: A single individual or object to be measured; can also be referred to as a subject, though some communities prefer 'individual'.

  • Population: The entire collection of units. Populations can be people, objects, events, etc., and ideally represent all measurements possible if fully surveyed.

  • Sample: A subset of the population that is actually measured, serving as an estimate of the population.

  • Sampling Frame: A list of all units from which the sample is chosen; it should ideally include the entire population.

    • Historical context given where telephone books served as an effective sampling frame from the 1960s to 1990s.

  • Sample Survey: A method that involves collecting measurements on a subset of the population.

  • Census: A comprehensive survey where every unit in the population is measured.

Quiz on Survey Terminology

  • Participants engage in identifying units, populations, sampling frames, etc., using the example of the Bureau of Labor Statistics which visits approximately 60,000 households to determine employment status.

Quiz Breakdown
  • Units: Adults in the labor force.

  • Population of Units: All adults.

  • Population of Measurements: Employment status (employed or unemployed).

  • Sampling Frame: List of all known households.

  • Sample of Units: Adults from the 60,000 sampled households.

The Beauty of Good Sampling

  • The presenter discusses the mathematical elegance of proper sampling methods which estimate traits within a population's margin of error, demonstrating how proper sampling can lead to near-accurate population estimates, regardless of population size.

Margin of Error Concept
  • Defined: The margin of error quantifies the potential discrepancy between a sample result and the true population parameter.

  • Formula: ext{Margin of Error} = rac{1}{ ext{sqrt}(n)} where n is the sample size.

  • Emphasized that larger sample sizes lead to smaller margins of error due to more reliable representation of the population.

Practical Examples

  • With a sample of 1,600, results can typically be accurate within 2.5% margin of error.

    • Example given of survey where 55% of respondents support a proposal with confidence intervals presented:

    • Between 52.5% and 57.5% as the actual population percentage supported, illustrating how sample statistics inform population parameters.

Real-World Applications
  • The concept of margin of error is frequently encountered in media polled during election seasons.

  • Example discussed about teen drug use measurement, illustrating how sample size and associated margin of errors function in practice.

Key Considerations in Data Collection
  • There are essential reasons to opt for sampling over census, including:

    • Destructive testing scenarios, wherein measurement methods irreparably alter units.

    • Efficiency in time and cost when dealing with large populations.

Upcoming Focus on Sampling Methods

  • The next lecture will delve into specific sampling methods, highlighting probability-based sampling, where each unit has an equal chance of being included in the sample, with a focus on the simple random sample concept, requiring a sampling frame and random number generation methods.