populations vs samples

GY1421 Working with Geographic Information

Week 4: Overview

  • Topics covered in Week 4:

    • Populations versus Samples

    • What to do with Quantitative Data

    • Confidence Intervals Introduction


What to Do with Quantitative Data

  • Steps in managing quantitative data:

    1. Collect and Pre-process Data: Gather data and prepare it for analysis.

    2. Descriptive Statistics: Summarize and describe the main features of the data.

    3. Test for Differences: Analyze data to identify any significant differences.

    4. Look for Relationships: Explore correlations and relationships among variables.


Population versus Sample

Key Definitions

  • Population: The entire group of individuals or items about which we wish to draw conclusions.

  • Sample: A subset of the population from which we actually collect data to infer characteristics about the whole population.

  • Sampling Design: The method used to select a sample from the population.

Importance of Sampling

  • It is often impractical to collect data from every individual in a population due to time, cost, or feasibility (e.g., global temperature data, specific health demographics).

  • A well-chosen sample can allow for inferences about the broader population.


Key Characteristics

Population vs. Sample

Population

Sample

The measurable characteristic is called a parameter.

The measurable characteristic is called a statistic.

Includes all members of the specified group.

A subset that represents the population.

Reports are true representations of opinions.

Reports have a margin of error and a confidence interval.

Complete set of data.

Must be representative of the entire population.

Statistical Measures

  • Population Mean (μ): The average of the entire population.

  • Sample Mean (X̄): The average of a sample selected from the population.

  • Population Variance (σ²) and Sample Variance (s²): Measures of data variability in the population and sample, respectively.


Efficient Sampling

  • Sampling can be an efficient strategy, but the sample must be representative or unbiased to be valid.


Sampling Design

Methods of Sampling

  • Random Sampling:

    • Utilizing random number generators (Excel, R) to collect unbiased samples.

  • Stratified Random Sampling:

    • Classifying the population into strata (groups) and then randomly selecting from these strata to ensure representation.

Considerations in Sampling

  • Use of random coordinates can help achieve randomness in sampling.

  • Awareness of natural clustering in random samples; techniques like systematic sampling may help mitigate these issues.


Sample Selection Guidelines

Do’s and Don’ts

Guidelines
  • Every element in a sample must be part of the defined population.

  • The sample should accurately represent the population.

  • Samples from the same population should be independent.

Violations
  • Including individuals outside defined demographics (e.g., aged incorrectly).

  • Non-representative sampling methods leading to skewed results.


Confidence Intervals Introduction

  • When calculating the sample mean, it is an estimate of the population mean, which may not be exact.

  • To express uncertainty, confidence intervals provide a range of values instead of a single estimate.

Reporting Confidence Intervals

  • Typically reported as 95% confidence intervals to indicate uncertainty.

    • Example: The mean height of cis men in the UK is 175 cm ± 6.2, giving a 95% confidence interval between 168.8 cm and 181.2 cm.


Narrowing the Confidence Interval

  • To achieve narrower intervals, consider:

    1. Collecting more samples for increased data accuracy.

    2. Lowering stated confidence levels to create a tighter range.