Nominal Scale
Categories without any order (e.g., gender, eye color).
Ordinal Scale
Ordered categories with no consistent differences between values (e.g., rankings).
Interval Scale
Ordered categories with consistent intervals but no true zero (e.g., temperature in Celsius).
Ratio Scale
Ordered categories like interval but with a true zero point (e.g., weight, height).
Skew
Affects the symmetry of the distribution, shifting the mean away from the median.
Kurtosis
Refers to the 'tailedness' of the distribution, influencing the variability in extreme values.
Variance
Measure of the average squared differences from the mean.
Standard Deviation
Square root of variance, indicating data spread.
Standard Error
Measures the variability of the sample mean relative to the population mean.
Sampling Distribution
The distribution of sample means over repeated sampling from the population.
Why does sample variance underestimate true population variance?
It divides by N instead of N-1, missing some variability in small samples.
What is Bessel's correction?
Using N-1 as the denominator for sample variance to correct for bias.
Efficient Estimator
Has the smallest variance among unbiased estimators.
Unbiased Estimator
Expected value equals the true population parameter.
Sufficient Estimator
Uses all the data to estimate the parameter.
Resistant Estimator
Not influenced by outliers.
How does sample size affect efficiency?
Larger sample sizes increase efficiency and reduce the standard error of the mean.
Central Limit Theorem
Allows the use of normal distribution for binomial data when sample size is large.
Null Hypothesis
States there is no effect or difference.
P-value
Probability of observing the data assuming the null hypothesis is true.
Effect Size
Measures the magnitude of a difference.
How do sample sizes affect p-values?
Sample size affects p-values but not effect sizes.
Standardizing a Distribution
Allows comparisons across different distributions.
What happens when you standardize a distribution?
The mean becomes 0, and the standard deviation becomes 1.
Why do data need to be normally distributed for z-tests or t-tests?
Assumes normality for accurate p-values and test validity.
Z-Distribution vs T-Distribution
T-distribution has heavier tails and is used for smaller samples.
When are z and t distributions the same?
When sample size is large with infinite degrees of freedom.
Critical Values for T-test
Change with sample size due to increased uncertainty in small samples.
Difference between One-sample T-test and Z-test
T-test used when population standard deviation is unknown; Z-test when it is known.
What to choose when both T-test and Z-test are possible?
Choose the T-test if population variance is unknown.
Assumptions for One-sample T-test
Normality and independence are necessary for accurate p-values.
Type I Error (α)
Rejecting a true null hypothesis.
Type II Error (β)
Failing to reject a false null hypothesis.
Confidence Level
1 - α.
Power
1 - β, or the ability to detect an effect.
Effect of not adjusting critical values for t-test
Increases the likelihood of Type I errors.
Purpose of Power Analysis
Determines the required sample size to detect an effect.
Ways to increase power
Increase sample size, effect size, or alpha level.
Effect of small sample sizes on t-test
Increases variability and reduces efficiency.
Effect of outliers on t-test
Can disproportionately affect results in small samples.
Why is N-1 used instead of N?
To correct bias in estimating population variance from a sample.
What does a high kurtosis indicate?
Heavier tails and a higher probability of extreme values.
What does a low skew indicate?
A distribution that is approximately symmetric.
What is a critical value?
A point on the scale of the test statistic beyond which we reject the null hypothesis.
What does it mean if the p-value is less than α?
It indicates sufficient evidence to reject the null hypothesis.
How does statistical significance differ from practical significance?
Statistical significance refers to the likelihood of an effect, while practical significance considers the real-world importance of the effect.
What is effect size used for?
To quantify the magnitude of a difference or relationship.
How do z-scores relate to probability?
Z-scores indicate how many standard deviations an element is from the mean.
What happens to the sampling distribution as sample size increases?
It becomes narrower and approaches a normal distribution.
What is a confidence interval?
A range of values, derived from sample statistics, that is believed to cover the true population parameter.
What does a confidence level of 95% mean?
If we were to take many samples, approximately 95% of the calculated confidence intervals would contain the true population parameter.
What is the relationship between variance and standard deviation?
Standard deviation is the square root of variance.
What is a two-tailed test?
A hypothesis test that checks for the possibility of an effect in two directions.