Graphical models called %%density curves%% can be helpful to describe the location of individuals within a distribution. Such models are especially helpful when data falls in a bell-shaped pattern called a %%normal distribution%%.
One way to describe a data point’s location in the distribution is to tell what percent of observations are less than it or, the %%percentile%%.
IMPORTANT: Some people define the pth percentile as the value of p percent of observations less than or equal it.
%%Cumulative relative frequency%% is the addition of all the counts for the current class and all classes with smaller values of the variable, divided by n, and multiplied by 100 to be turned into a percent.
To make a %%cumulative relative frequency graph%%, we plot a point corresponding to the cumulative relative frequency in each class at the smallest value of the next class.
Converting observations from original values to standard deviation units is known as %%standardizing%%. To standardize a value, subtract the mean of the distribution then divide the difference by the standard deviation.
If x is an observation from a distribution that has a known mean and standard deviation, the %%standardized score (z-score)%% for x is
z= (x-mean)/standard deviation
We often standardize observations to express them on a common scale.
Ex: comparing the heights of two children of different ages
To find the standardized score (z-score) for an individual observation, the data is transformed by subtracting the mean and dividing the difference by the standard deviation. Transforming converts the observation from the original units of measurement to a standardized scale.
Adding the same positive number a to (or subtracting a from) each observation
Multiplying (or dividing) each observation by the same positive number b
When dealing with z-scores, the shape of the distribution stays the same despite the transformations. However, the center and spread do change. For a z-score distribution, the mean is always 0 and the standard deviation is always 1.
A %%density curve%% is a curve that
Density curves describe the overall pattern of a distribution. The area under the curve and above any interval or values on the horizontal axis is the proportion of all observations that fall in that interval.
A density curve is often a good description of the overall pattern of a distribution, but don’t include outliers.
IMPORTANT: No set of real data is exactly described by a density curve. The curve is an approximation that is easy to use and accurate enough for practical use.
Measures of center and spread apply to density curves in addition to the actual sets of data.
The %%median of a density curve%% is the “equal-areas point”, the point where half the area under the curve is to the left and the other half is to the right. Since density curves are idealized patterns, a symmetric density curve is exactly symmetric. Therefore, the median is exactly at the center. When the data is skewed, it’s harder to tell where the median is and a mathematical process is needed to find it.
The %%mean of a density curve%% is the point at which the curve would balance if made of solid material.
The median and mean are the same for a symmetric density curve. They’re at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the tail.
Since the density curve is an idealized description of the distribution, the notation for mean and standard deviation are different.
The notation for mean of a density curve is the Greek letter mu and the notation for standard deviation is the Greek letter sigma.
One particularly important class of density curves are normal curves and the distributions they describe are %%normal distributions%%.
Normal distributions are important in statistics because
Also known as the “empirical rule”, %%the 68-95-99.7 rule%% is followed by all normal distributions.
In a normal distribution with mean mu and standard deviation sigma:
IMPORTANT: The 68-95-99.7 rule applies to only normal distributions.
Changing to standardized units z uses the formula
z=(x-mu)/sigma
If the variable we standardize has a normal distribution, then so does the new variable z. This new distribution is called the %%standard normal distribution%%.
Because all normal distributions are the same when we standardize, we can find the area under any normal curve from a table. Table A, the %%standard normal table%%.
We can answer a question about areas in any normal distribution by standardizing and using table A or by using technology.
Just because a distribution looks normal doesn’t mean it is. A %%normal probability plot%% provides a good assessment of whether a data set follows a normal distribution. When you examine a normal probability plot, look for shapes that show clear departures from normality.
If the points on a normal probability plot are close to a straight line, the data are approximately normally distributed. Systematic deviations from the straight line show the data isn’t normally distributed. Outliers appear as points that are far away from the overall pattern of the plot.
In a right-skewed distribution, the largest observations fall distinctly to the right of a line drawn through the main body of points. Similarly, left skewness is evident when the smallest observations fall to the left the line.