Normal distribution - Hypotheses testing - Confidence interval
Normal distribution
The normal distribution has a central peak where most observations occur, and the probability of events decreases equally in both the positive and negative directions on the X-axis.
Skewness
The skewness measures the symmetry of a distribution.
The normal distribution is symmetric and has a skewness of zero.
If the data has a skewness less than zero, the left tail of the distribution is longer. If it is more than zero, the tail is skewed to the right.
Skewness assumptions
Symmetrical skewness: The data is relatively symmetrical if the skewness is between -0.5 and 0.5.
Moderate skewness: If the data skewness is between -1 and -0.5 (negatively skewed) and between 0.5 and 1 (positively skewed).
Highly skewed: |skewness| > -1
Kurtosis
Kurtosis tells us about the shape of peakedness or flatness of the distribution.
Kurtosis tells us about the tail behavior. It is the measure of outliers present in the distribution.
Assumptions
Mesokurtic:
Tails are similar to a normal distribution.
Kurtosis is equal to 3.
Leptokurtic:
Kurtosis is greater than 3.
Tails are heavier than a normal distribution, indicating more outliers.
The distribution has a thin, tall peak.
Platykurtic:
Kurtosis is less than 3.
Tails are lighter (fewer outliers) compared to a normal distribution.
The distribution is broad with a lower peak.
Box-plot
When the median is in the middle of the box, and the whiskers are about the same on both sides, the distribution is symmetric.
The distribution is positively skewed when the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box.
The distribution is negatively skewed when the median is closer to the top of the box and if the whisker is shorter on the upper end of the box.
Box plots are useful as they show the dispersion of a data set.
In statistics, dispersion is the extent to which a distribution is stretched or squeezed.
Box plots are useful as they show outliers within a data set.
Q-Q plot
Q-Q plots play a vital role in graphically analyzing and comparing two probability distributions by plotting their quantiles against each other. If the two distributions that we are comparing are exactly equal, then the points on the Q-Q plot will perfectly lie on a straight line.
Testing of hypothesis
The primary objective of inferential statistical analysis is to use data from a sample to make inferences about the population from which the sample was drawn.
A hypothesis test (or test of significance) is a standard procedure for evaluating a claim about a population's property.
Null and alternative hypothesis
The null hypothesis is a statement that the value of a population parameter is equal to some claimed value.
The alternative hypothesis is the statement that the parameter has a value that somehow differs from the null hypothesis.
Two types:
One-sided alternative hypothesis
Two-sided alternative hypothesis.
The test statistic
The P-value (probability value) is the probability of getting a test statistic value that is at least as extreme as the one representing the sample data, assuming that the null hypothesis is true.
If p-value < a, reject null hyp.
If p-values > a, accept or fail to reject null hyp.
The significance level
Significance level (αlpha): measures the risk of Type I error (common values: 0.05, 0.01, 0.10).
Type I Error: Rejecting null hyp. when true.
Type II Error: Not rejecting null hyp. when false.
Key trade-off: Lowering alpha reduces Type I error but may increase Type II error.