L5 Statistical inference

Statistical inference allows statisticians to make informed conclusions about a population based on a sample.
Essential concepts include normal distribution and p-values.

Random Samples: Preferable as they represent the population accurately.
- Every individual in the population has an equal chance of being selected.
- Facilitates generalization of research results to the broader population.
Non-Random Samples: May lead to biased results and less reliable conclusions.

Facilitates the leap from known sample results to unknown population parameters.
The normal curve plays a crucial role in this inference process.

Mean = Median.
Roughly 68.26% of observations fall within ±1 standard deviation from the mean.
Approximately 95.44% within ±2 standard deviations, and about 99.72% within ±3 standard deviations.

Z-scores express values in terms of standard deviations from the mean.
A z-score of 0 corresponds to the mean, while a z-score of ±1 indicates one standard deviation away.
Useful for comparing scores from different distributions.

States that sample means will be normally distributed regardless of population shape if sample size is sufficiently large.
Enables estimation of population parameters from sample means.
Standard error formula: SE = σ / √n, linking sample and population standard deviations.

A statistical measure that indicates the likelihood of results from a sample applying to the population.
Standard cut-off: p < 0.05 indicates statistical significance.
Indicates confidence level in the generalization from sample to population.

Example: p = 0.01 shows 99% confidence that results are valid for the broader population.
Higher p-values suggest less confidence in generalization.

Used to estimate the range of a population parameter.
A 95% confidence interval indicates there's a 95% chance the true population value falls within that range.
Example: If a favorability rating is 63% with a margin of error of ±3%, the real favorability ranges from 60% to 66%.

Type I error: Incorrectly rejecting a true null hypothesis (false positive).
- Probability of finding a statistically significant result in your sample when, in fact, there is no relationship in the population.
Type II error: Failing to reject a false null hypothesis (false negative).
- Probability of not finding a statistically significant result when, in fact, there is a relationship in the population.
Both types of errors have many causes, [e.g., sample size, non-random samples, measurement error, etc.]

Helps derive conclusions from limited data to infer insights about broader populations.
Crucial for future predictions and informed decision-making in business contexts.