1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Sampling
Process of collecting data and drawing a statistical inference from a subset of population.
Population
All elements the investigator is interested in. Is divided into sampling units (individual elements or groups of elements).
Target population
The group we want to make inferences about.
Sampled population:
the group from which the sample is drawn.
Simple random sampling
A method of selecting a subset of a larger population where each member has an equal, non-zero chance of being chosen. Units are individual elements. Other methods may use groups of elements.
Sampling frame
The list of all sampling units available for selection.
Random Sampling:
Important for a valid statistical inference e.g. confidence intervals and hypothesis testing. It ensures fairness for each element to be selected - probabilistic approach.
Probabilistic approach
uses probability to model uncertainty, uncertainty, dependencies, and relationships. It assigns probabilities to outcomes to represent the likelihood of events,
Randomization
- Each element in the population should have an equal chance of beeing selected.
There are two types of errors that may happen when sampling:
- Sampling error
- Non-sampling error
- In statistics when we calculate e.g. test hypothesis we assume these error did not happen.
Sampling error:
- May occur since we don't investigate all elements in the population increasing the risk for an unrepresentative sample.
Non-sampling error
- May occur due to measurement errors, meaning how the data is collected and not because of sample size. E.g. because of unclear questions, non-responsive (sensitive questions) or wrong recorded answers.
Simple random sample:
A simple random sample of size 𝑛 from a population of size 𝑁 means every possible sample of size 𝑛 has the same probability of being chosen.
Simple random sample can estimate:
- Population mean (average)
- Population total
- Population proportion
- Confidence intervals:
o If population variance is unknown use: T-distribution with n - 1 degrees of freedom.
Stratfied random sampling:
The population is divided into H groups (strata) based on a characteristic (e.g., age, income, region).Within each stratum, a simple random sample of size is taken. Results from all strata are combined to estimate population parameters (mean, total, proportion).
Stratfied random sampling example
- Examples: Emplyees divided into 3 strata by age: under 30, age 30-49 and age 50+.
- Basically sample each category and then combine the results -> more precise estimates.
Cluster sampling
requires that the population be divided into N groups of elements called clusters. We would define the frame as the list of N clusters. We then select a simple random sample of n clusters. In the simplest form of cluster sampling, we would then collect data for all elements in each of the n clusters.
Use of cluster sampling
- Instead of sampling individuals directly, you sample groups (clusters) and then include everyone in those groups.
- Pick whole groups at random and study everyone within those groups e.g. studying schools, neighborhoods or companies.
- Common use: Area sampling: Clusters could be cities, postcode areas, or local authority regions. You randomly select a few of these areas. Then you collect data from all individuals in those chosen areas.
Difference between stratified and cluster sampling:
- Stratified sampling: divide population into strata, sample some individuals from each stratum.
- Take one sample from strata 1, 2 and 3 and use it as a estimation for the population. This decrease sample error.
- Cluster sampling: divide population into clusters, sample entire clusters and include all individuals inside them.
Highlight
The confidence interval with N-n gives a more narrow confidence interval, with a more narrow population, this correction matters a lot.