Definition: Generalizing the results of a study of a small number of individuals (sample) to the larger group (population).
Definition: A subset of the population that accurately reflects the characteristics of the larger population.
Significance: Meaningful inferences can only be made if the sample is representative; acknowledge differences between sample and population.
Reasons for Sampling vs. Census:
Time, cost, and logistics.
Populations may be infinite or not yet exist.
Sampling may be destructive (e.g., testing battery life).
Population may not be practically available.
Data can become outdated quickly (e.g., political polls).
Focus on comprehensive data about a few instead of minimal data about many.
Sampling Error: Uncertainty from using a sample instead of the entire population.
Data Acquisition Errors: Errors from acquiring, recording, and editing statistical data.
Bias: When sample selection favors individuals with certain population characteristics.
Validity: Degree of correspondence between the concept addressed and the measuring variable.
Accuracy: Absence of error or the agreement between measured and true values.
Systematic Error: Instrument consistently measures too high/low (bias).
Random Error: Unpredictable errors.
Precision: Level of exactness in the measurement process.
Metadata: Information about the data creation, cleaning, and processes, crucial for identifying potential errors or biases.
Sampling Process Steps:
Define the population.
Construct a sampling frame (list of all members).
Select a sampling design.
Specify information to collect.
Collect data.
Probability Sampling:
Simple Random: Each possible sample has an equal chance of selection.
Systematic: Every kth element is chosen starting from a randomly selected point.
Stratified: Population divided into strata; samples taken from each stratum.
Cluster: Classes defined for convenience, certain clusters selected for detailed study.
Non-Probability Sampling:
Convenience: Individuals easily accessible.
Snowball: Initial subjects lead to more subjects in their networks.
Judgmental: Personal judgment determines sample inclusion.
Volunteer: Self-selected individuals, may not be representative.
Quota: Select individuals to fit quotas, may introduce bias.
Larger samples yield more precision and representation.
Key Factors:
Time and cost.
Non-response rates affect representativeness.
Heterogeneity of population may require larger samples.
Complexity of analysis may require larger samples.
Point Estimation: Estimate a single value (e.g., population mean).
Interval Estimation: Estimate a range (e.g., μ lies within 24-26).
Observed vs. Theoretical Distribution:
Mean: μ (population mean), s (sample standard deviation).
With a sufficiently large sample size, sample means approximate the population mean, regardless of the distribution.
Sample statistics are normally distributed as sample size increases.
Asymptotically normal distribution.
Mean equal to population mean.
Standard deviation (standard error) equals population standard deviation divided by the square root of N.
The mean of means is more likely to be accurate than any single mean.
Standard Error of the Mean: Theoretical standard deviation of sample means.
Confidence Interval: Estimated range containing the true mean, e.g., μ = 25 ± 5 implies mean lies between 20 and 30.
Confidence Levels: Probability that the true mean lies within the interval (e.g., 95% confidence level implies a 5% risk).
Sample size (larger sample = smaller error).
Standard deviation (more variable data = larger error).
Desired confidence level (higher confidence = wider interval).
For 95% CI:
Use normal distribution properties, approximately 95% of values lie within ±1.96 standard deviations from the mean.
Confidence intervals calculated as:
Lower limit: μ - 1.96(σ/√n)
Upper limit: μ + 1.96(σ/√n)
Sample mean (n = 50, x̄ = 171.2, s = 10):
95% CI: 171.2 ± 1.96(10/√50) → 168.4 to 174.0.
Adjusting confidence levels (e.g., 90% CI) involves calculating different α values and corresponding Z-scores.
Confidence intervals help estimate population parameters from samples and provide essential insights about data variability and inference. Understanding sampling techniques and their respective strengths and weaknesses is crucial for accurate and valid statistical analyses.