Looks like no one added any tags here yet for you.
central tendency
the extent to which the values of a numerical variable group around a typical or central value.
variation
the amount of dispersion or scattering away from a central value that the values of a numerical variable show.
shape
the pattern of the distribution of values from the lowest value to the highest value.
Mean
sum of values divided by the number of values
The most common measure of central tendency.
Affected by extreme values (outliers).
median
the “middle” number (50% above, 50% below).
less sensitive to extreme values
mode
Value that occurs most often.
Not affected by extreme values.
Used for either numerical or categorical data.
There may be no mode.
There may be several modes.
measures of variation:
give information on the spread or variability or dispersion of the data values.
range
Simplest measure of variation.
Difference between the largest and the smallest values:
sample variance
Average (approximately) of squared deviations of values from the mean.
sample standard deviation
Most commonly used measure of variation.
Shows variation about the mean.
Is the square root of the variance.
Has the same units as the original data.
coefficient of variation
Measures relative variation.
Always in percentage (%).
Shows variation relative to mean.
Can be used to compare the variability of two or more sets of data measured in different units.
shape of a distribution
Describes how data are distributed.
two useful shapes:
skewness
kurtosis
skewness
Measures the extent to which data values are not symmetrical
kurtosis
Kurtosis measures the peakedness of the curve of the distribution—that is, how sharply the curve rises approaching the center of the distribution.
quartiles
split the ranked data into 4 segments with an equal number of values per segment.
five number summary
The five numbers that help describe the center, spread and shape of data
Xsmallest.
First Quartile (Q1)
Median (Q2).
Third Quartile (Q3).
Xlargest.
Interquartile range (midspread)
measures the spread in the middle 50% of the data.
measure of variability that is not influenced by outliers or extreme values
boxplot
A Graphical display of the data based on the five-number summary
If data are symmetric around the median then the box and central line are centered between the endpoints
population parameters:
population mean
variance
standard deviation
population mean
the sum of the values in the population divided by the population size, N.
population variance
Average of squared deviations of values from the mean.
standard deviation
Most commonly used measure of variation.
Shows variation about the mean.
Is the square root of the population variance.
Has the same units as the original data.
empirical rule
approximates the variation of data in a symmetric mound-shaped distribution
Chebyshev’s Rule
Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will fall within k standard deviations of the mean (for k > 1).
covariance
measures the strength of the linear relationship between two numerical variables (X & Y).
coefficient of correlation
Measures the relative strength of the linear relationship between two numerical variables.
Data analysis is objective:
Should report the summary measures that best describe and communicate the important aspects of the data set.
Data interpretation is subjective:
Should be done in fair, neutral and clear manner
Numerical descriptive measures:
Should document both good and bad results.
Should be presented in a fair, objective and neutral manner.
Should not use inappropriate summary measures to distort facts.