Sampling and Empirical Distribution

Population

Sample

Random Sample

A sample for which it is possible to calculate, before the sample is drawn, the chance with which any subset of elements will enter the sample. You can calculate the probability before you take the sample
All possible samples are not necessarily equally likely
Simple Random Sample = all possible samples of a particular size must be equally likely.

Random samples are important because they are representative samples

Convenience samples

Sampling may be used to select units upon which data is collected will be collected from the target population.

To divide a large sample into test and training sets

Probability distributions can be determined analytically

For complex distributions, simulation is often easier

Empirical Distribution

Empirical = based on observation

Observations can be from repetitions of an experiment

Large Random Sampling

If you repeat an experiment a bunch of times, the proportion of times that an event occurs tends to get closer to the theoretical probability.

To make inferences, you have to have a measure of reliability. This can be done by bootstrapping ( generating a “new” sample )

Theoretical Sampling Distribution

Sample mean will be approximately normal for large samples