populations vs samples
GY1421 Working with Geographic Information
Week 4: Overview
Topics covered in Week 4:
Populations versus Samples
What to do with Quantitative Data
Confidence Intervals Introduction
What to Do with Quantitative Data
Steps in managing quantitative data:
Collect and Pre-process Data: Gather data and prepare it for analysis.
Descriptive Statistics: Summarize and describe the main features of the data.
Test for Differences: Analyze data to identify any significant differences.
Look for Relationships: Explore correlations and relationships among variables.
Population versus Sample
Key Definitions
Population: The entire group of individuals or items about which we wish to draw conclusions.
Sample: A subset of the population from which we actually collect data to infer characteristics about the whole population.
Sampling Design: The method used to select a sample from the population.
Importance of Sampling
It is often impractical to collect data from every individual in a population due to time, cost, or feasibility (e.g., global temperature data, specific health demographics).
A well-chosen sample can allow for inferences about the broader population.
Key Characteristics
Population vs. Sample
Population | Sample |
|---|---|
The measurable characteristic is called a parameter. | The measurable characteristic is called a statistic. |
Includes all members of the specified group. | A subset that represents the population. |
Reports are true representations of opinions. | Reports have a margin of error and a confidence interval. |
Complete set of data. | Must be representative of the entire population. |
Statistical Measures
Population Mean (μ): The average of the entire population.
Sample Mean (X̄): The average of a sample selected from the population.
Population Variance (σ²) and Sample Variance (s²): Measures of data variability in the population and sample, respectively.
Efficient Sampling
Sampling can be an efficient strategy, but the sample must be representative or unbiased to be valid.
Sampling Design
Methods of Sampling
Random Sampling:
Utilizing random number generators (Excel, R) to collect unbiased samples.
Stratified Random Sampling:
Classifying the population into strata (groups) and then randomly selecting from these strata to ensure representation.
Considerations in Sampling
Use of random coordinates can help achieve randomness in sampling.
Awareness of natural clustering in random samples; techniques like systematic sampling may help mitigate these issues.
Sample Selection Guidelines
Do’s and Don’ts
Guidelines
Every element in a sample must be part of the defined population.
The sample should accurately represent the population.
Samples from the same population should be independent.
Violations
Including individuals outside defined demographics (e.g., aged incorrectly).
Non-representative sampling methods leading to skewed results.
Confidence Intervals Introduction
When calculating the sample mean, it is an estimate of the population mean, which may not be exact.
To express uncertainty, confidence intervals provide a range of values instead of a single estimate.
Reporting Confidence Intervals
Typically reported as 95% confidence intervals to indicate uncertainty.
Example: The mean height of cis men in the UK is 175 cm ± 6.2, giving a 95% confidence interval between 168.8 cm and 181.2 cm.
Narrowing the Confidence Interval
To achieve narrower intervals, consider:
Collecting more samples for increased data accuracy.
Lowering stated confidence levels to create a tighter range.