Measures of Central Tendency, Dispersion and Location
MEASURES OF CENTRAL TENDENCY- A single value that attempts to describe a set of data by identifying the central position within that set of data. The measures of central tendency are sometimes called measures of central location. The common measures are the Mean, Median and Mode. These are all valid measures of central tendency, but depending on different conditions, some measures of central tendency become more appropriate to use than others
MEAN (AVERAGE)- It is the most popular and well-known measure of central tendency. It can be used with both discrete and continuous data. An important property of the mean is that it includes every value in your data set as part of the calculation.
·
|  |
\n The formula for the Mean can be expressed using these different formulas:
EXAMPLE:
· Find the sample mean: Scores in BIOE211 Quizzes from Quiz 1 – 5.
|  |
\n
When not to use the mean?
· The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. For example, consider the wages of staff at a factory below:
· These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value.
· The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency. As we will find out later, taking the median would be a better measure of central tendency in this situation.
MEDIAN
· The middle score for a set of data that has been arranged in order of magnitude.
· It is less affected by outliers and skewed data.
|  |
\n EXAMPLE:
|  |
\n Step 1 – REARRANGE the data in order of magnitude
Step 2 – Identify the median mark – 56
o It is the middle mark because there are 5 scores before it and 5 scores after it. But this is applicable ONLY FOR ODD NUMBER of SCORES. EVEN NUMBERS simply have to take the middle two scores and average the result.
MODE
· The mode is the most frequent score in our data set.
· mode is used for categorical data where we wish to know which is the most common category
· MODE on a histogram represents the highest bar in a bar chart or histogram.
Problems when using mode
·
|  |
\n When we have two or more values that share the highest frequency
·
|  |
\n When the most common mark is far away from the rest of the data in the data set.
MEASURES OF DISPERSION
MEASURES OF SPREAD
· Also called as measure of dispersion.
· It is used to describe the variability in a sample or population.
· It is used in conjunction with a measure of central tendency.
· A measure of spread gives us an idea of how well the mean, for example, represents the data.
STANDARD DEVIATION
· It is the measure of the spread of scores within a set of data. The standard deviation measures how concentrated the data are around the mean. It is used in conjunction with the mean to summarize continuous data. It is appropriate only when the continuous data is not significantly skewed or has outliers
VARIANCE
· Another method for calculating the deviation of a group of scores from the mean.
Formula for standard deviation and variance:
· (Mean is represented by µ or X̄)
RANGE
· The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, XL - XS.
· What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
·  \n The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8
When to Use the Range
· The range is used when:
o you have ordinal data or
o you are presenting your results to people with little or no knowledge of statistics
· The range is rarely used in scientific work as it is fairly insensitive:
o It depends on only two scores in the set of data, XL and XS
o Two very different sets of data can have the same range:
Example:
1 1 1 1 9 vs 1 3 5 7 9
COEFFICIENT OF VARIATION
· Measure of Relative Variation. It is always expressed in percentage (%). It shows Variation Relative to the mean and is used to compare two or more groups.
Formula (for Sample):
|  |
\n
·
|  |
\n Now that you have learned the different summary measure, let’s practice answering the given example:
·
|  |
\n Compute for the standard deviation of the data set:
ANSWERS:
o SD: 2.30
o Variance: 5.3
o Coefficient of Variation: 14.74%
MEASURES OF LOCATION OR POSITION
PERCENTILES
· Numerical measures that give the relative position of a data value relative to the entire data set.
· Divide an array (raw data arranged in increasing or decreasing order of magnitude) into 100 equal parts.
· The kth percentile, denoted as Pk, is the data value in the data set that separates the bottom k% of the data from the top (100-k)%.
DECILES
· Divide an array into ten equal parts, each part having ten percent of the distribution of the data values, denoted by Dk.
· The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile and so on…
QUARTILES
· Divide an array into four equal parts, each part having 25% of the distribution of the data values, denoted by Qk.
· The 1st quartile is the 25th percentile, the 2nd quartile is the 50th percentile, also the median and the 3rd quartile is the 75th percentile.
TERMINOLOGIES
· Histogram- A display of statistical information that uses rectangles to show the frequency of data items in successive numerical intervals of equal size
· Outliers- A data point that differs significantly from other observations.