Discrete (or Ungrouped) Frequency Distribution:
The frequency refers to discrete values; each class is distinct and separate from the other class.
Non-continuity from one class to another exists.
Examples:
Example Data:
Example Frequency Distribution Table:
No. of children | Tally marks | Frequency |
---|---|---|
0 | || | 3 |
1 | H| | 7 |
2 | H| H|| | 8 |
3 | H||| | 7 |
4 | ||| | 4 |
5 | |\ | 2 |
6 | |\ | 2 |
Total | 40 |
Continuous (or Grouped) Frequency Distribution:
There are three methods of classifying data according to class intervals:
Exclusive (Continuous) Method:
Type of class interval in which the class interval overlaps.
Example Expenditure and Number of Families:
Expenditure (\) | Number of families |
---|---|
0-5000 | 60 |
5000-10,000 | 95 |
10000-15000 | 122 |
15000-20000 | 83 |
20000-25000 | 40 |
Total | 400 |
The first class interval implies all data from 0 to 4999.99; 5000 is not included in the first class but in the second class, and so on.
Inclusive (Discrete) Method:
In this method, the overlapping of the class intervals is avoided.
Both the lower and upper limits are included in the class interval.
Used for a grouped data frequency distribution for discrete variables like members in a family or number of workers in a factory where the variable may take only integral values and cannot be used with fractional values like age, height, or weight.
Example Distribution:
Class Interval (C.I) | Frequency |
---|---|
5-9 | 5 |
10-14 | 7 |
15-19 | 12 |
20-24 | 21 |
25-29 | 10 |
30-34 | 5 |
Total | 70 |
To decide whether to use the inclusive or exclusive method, it is important to determine whether the variable under observation is continuous or discrete.
In the case of continuous variables, the exclusive method must be used, and the inclusive method should be used in the case of discrete variables.
Open-End Classes:
A class limit is missing either at the lower end of the first class interval or at the upper end of the last class interval or both, and classes are not specified.
The necessity of open-end classes arises in practical situations, particularly relating to economics and medical data when there are few very high values or few very low values which are far apart from the majority of observations.
Example:
Salary Range | Number of workers |
---|---|
Below 2000 | 7 |
2000-4000 | 5 |
4000-6000 | 6 |
6000-8000 | 4 |
8000 and above | 3 |
Total | 25 |
Example: Given the numbers of tools produced by workers in a factory:
38, 25, 13, 14, 27, 41, 47, 17, 32, 25, 43, 18, 25, 18, 39, 44, 19, 20, 20, 26, 40, 45, 34, 31, 32, 27, 33, 37, 25, 26, 33, 28, 31, 34, 35, 46, 29, 34, 31, 34, 35, 24, 30, 41, 32, 29, 28, 30, 31, 30, 34, 35, 36, 29, 26, 32, 36, 35, 36, 37, 23, 32, 23, 22, 29, 33, 37, 33, 27, 24, 36, 42, 29, 37, 29, 23, 44, 41, 45, 39, 21, 42, 22, 28, 22, 15, 16, 17, 21, 22, 29, 35, 31, 27, 40, 23, 32, 40, 37
Use Sturges' rule to determine the number of class intervals and prepare a frequency distribution table.
Solution:
Number of class intervals: K = 1 + 3.322 \log N = 1 + 3.322 \log 100 = 7.6
Class Interval Size: C = \frac{R}{K} = \frac{46 - 13}{7.6} = 4.34 \approx 5
Taking C = 5, we have the classes 13-17, 18-22, 43-47 as inclusive types.
Frequency Table:
C.I | Tally | Frequency |
---|---|---|
13-17 | H| | 6 |
18-22 | H| | | 12 |
23-27 | H||\ | 10 |
28-32 | H| H| || | 14 |
33-37 | H| H| | 11 |
38-42 | || | 7 |
43-47 | ||\ | 4 |
Total | 100 |
A histogram is also called a block frequency diagram.
Frequency distribution can be represented in the form of graphs and charts.
Histogram is a continuous distribution, and if the class interval is discrete, we need to adjust it to a continuous one before the histogram is drawn by subtracting 0.5 from lower classes and adding it to upper classes.
The histogram is constructed by plotting the class boundaries frequency against class boundaries.
Example: The scores of thirty students in a statistics examination were given as follows:
126, 145, 137, 145, 140, 146, 131, 143, 127, 133, 134, 144, 136, 135, 128, 130, 137, 142, 141, 139, 147, 149, 150, 148, 146, 150, 148, 151, 153, 155
Use the above information to obtain the histogram of the distribution.
C.I | C.B | U.C.B | F |
---|---|---|---|
0-126 | 0-125.5 | 125.5 | 0 |
126-130 | 125.5-130.5 | 130.5 | 4 |
131-135 | 130.5-135.5 | 135.5 | 4 |
136-140 | 135.5-140.5 | 140.5 | 5 |
141-145 | 140.5-145.5 | 145.5 | 6 |
146-150 | 145.5-150.5 | 150.5 | 8 |
151-155 | 150.5-155.5 | 155.5 | 3 |
155-160 | 155.5-160.5 | 160.5 | 0 |
It is obtained by plotting the midpoints of each class interval and the corresponding frequency of that class.
It can also be obtained by joining the mid-points of the tops of the rectangles of the histogram and extending the line to meet the x-axis.
A polygon drawn will have the same area as the corresponding histogram if the class intervals are the same.
Using the data plot the frequency polygon of the distribution.
C.I | C.B | U.C.B | F | Mid-value |
---|---|---|---|---|
0-126 | 0-125.5 | 125.5 | 0 | 62.75 |
126-130 | 125.5-130.5 | 130.5 | 4 | 128.00 |
131-135 | 130.5-135.5 | 135.5 | 4 | 133.00 |
136-140 | 135.5-140.5 | 140.5 | 5 | 138.00 |
141-145 | 140.5-145.5 | 145.5 | 6 | 143.00 |
146-150 | 145.5-150.5 | 150.5 | 8 | 148.00 |
151-155 | 150.5-155.5 | 155.5 | 3 | 153.00 |
156-160 | 155.5-160.5 | 160.5 | 0 | 158.00 |
Is obtained by plotting cumulative frequency against the upper-class boundary. It can be used to evaluate the median, quartiles, percentiles, deciles, and interquartile range.
The graph is usually an S shape.
Using the data plot Cumulative frequency curve (ogive) of the distribution.
C. I | C. B | U.B. C. B | F | C-F |
---|---|---|---|---|
0-126 | 0-125.5 | 125.5 | 0 | 0 |
126-130 | 125.5-130.5 | 130.5 | 4 | 4 |
131-135 | 130.5-135.5 | 135.5 | 4 | 8 |
136-140 | 135.5-140.5 | 140.5 | 5 | 13 |
141-145 | 140.5-145.5 | 145.5 | 6 | 19 |
146-150 | 145.5-150.5 | 150.5 | 8 | 27 |
151-155 | 150.5-155.5 | 155.5 | 3 | 30 |
Statistical errors are the difference between the actual magnitude of the object in question and the magnitude of the estimation of the objects given by the enumerator or researcher.
For example, an investigator estimated that 4,832 people use a particular toothpaste in an area, but the actual number of people that use the toothpaste is 5,241.
Then statistical error = actual number of people - estimated number of people:
5241 - 4832 = 409
Statistical error is not the same as the errors in the process of calculating your estimated error.
Absolute Error:
This is given by subtracting the estimated value from the actual value.
Absolute \ error = Actual \ value - Estimated \ value
Example: The estimated no of employees that will resign from his/her employment after 10 years of service is 15. The actual number that resigns is 17.
A.E = 17 - 15 = 2
Relative Error (or percentage error):
This is the actual error committed (Absolute error) divided by the estimated value. When this proportion is multiplied by 100 it becomes a percentage error.
R.E = \frac{Absolute \ error}{Estimated \ value}
P.\epsilon = \frac{A.E}{E.V} \times 100\%
Measures of central tendency or measures of location, simply called averages, are widely used statistical measures.
It is the measure of locating a central value, which has the tendency of other values in the distribution clustering around it.
The measure is very important in the sense that such value when determined can be considered to be the representative of the group.
The five measures of central tendency are:
Defined as the sum of the observations divided by the number of observations.
If the variable x assumes values X1, X2, X3, …, Xn, then the mean \bar{X} is given by:
\bar{X} = \frac{X1 + X2 + X3 + … + Xn}{n} = \frac{\sum{i=1}^{n} Xi}{n}
This formula is for the ungrouped or raw data.
Example: Calculate the mean for 2, 4, 6, 8, 10
\bar{X} = \frac{2 + 4 + 6 + 8 + 10}{5} = \frac{30}{5} = 6
The mean for grouped data is obtained from the following formula:
\bar{X} = \frac{\sum{i=1}^{n} fi xi}{\sum{i=1}^{n} f_i}
Where:
x_i = midpoint of individual class
f_i = the frequency of the Individual class
\sum f_i = the sum of the frequencies or total frequencies
Example: Given the following frequency distribution, calculate the arithmetic mean:
Mark | No. of students |
---|---|
64 | 8 |
63 | 18 |
62 | 12 |
61 | 9 |
60 | 7 |
59 | 6 |
\bar{X} = \frac{\sum fx}{N} = \frac{3713}{60} = 61.88
Example: Calculate the arithmetic mean of the marks from the following table:
Marks | Number of Students |
---|---|
0-10 | 12 |
10-20 | 18 |
20-30 | 27 |
30-40 | 20 |
40-50 | 17 |
\bar{X} = \frac{\sum fx}{N} = \frac{2470}{94} = 26.28
Arrange the given values in increasing or decreasing order. If the number of values are odd, the median is the middle value. If the number of values are even, the median is the average (mean) of the middle two values.
Example: Find the median of 2, 1, 4, 3, 6, 5
Solution: 1, 2, 3, 4, 5, 6
Median = \frac{3 + 4}{2} = 3.5
In the case of a discrete frequency distribution, the median is obtained by considering the Cumulative frequencies. The steps for calculating the median are given below:
Find \frac{N}{2}, where N = \sum f
See the Cumulative frequency (C.F) just greater than \frac{N}{2}
The Corresponding value of the x is median.
Example: Obtain the median for the following frequency distribution:
| x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| :- | - | - | - | -- | -- | -- | -- | -- | -- | -- | -- |
| f | 3 | 4 | 8 | 10 | 16 | 20 | 25 | 15 | 9 | 6 | 4 |
Solution: N = 120, \frac{N}{2} = 60
For grouped data, the median is defined as:
Median = Lm + (\frac{\frac{n}{2} - cf{bm}}{f_m}) \times c
Where,
L_m = lower class boundary of the median class
cf_{bm} = Cumulative frequency before the median class
f_m = Frequency of the median class
c = class size or width
\frac{n}{2} = The median position (to help identify the median class)
Example The errors discovered in the lengths of rods produced in a factory in (millimeter) are given below :
Errors in length (mm) | 19-21 | 22-24 | 25-27 | 28-30 | 31-33 | 34-36 | 37-39 |
---|---|---|---|---|---|---|---|
Number of rods | 9 | 12 | 18 | 23 | 19 | 13 | 6 |
Estimate the median error in the length of rods.
Median = Lm + (\frac{\frac{n}{2} - cf{bm}}{f_m}) \times c = 27.5 + (\frac{50-39}{23})\times 3 = 27.5 + 1.43 = 28.93
For ungrouped data, the mode is the value that occurs most frequently.
However, it is easy to understand, it may not be unique or clearly defined, as some distributions may have more than one mode.
A distribution with one mode is called a unimodal distribution; a distribution with two modes is called a bimodal distribution; and a distribution with more than three modes is referred to as a multi-modal distribution.
Examples:
Find the mode of the following distributions;
14, 19, 16, 21, 18, 24, 15 ad 19
Mode from Frequency Data: In frequency data, the mode is the number with the highest frequency.
Example:
Find the mode of the distribution x: 1, 2, 3, 4, 5, 6, 7, 8, 9 and frequency f: 4, 9, 16, 25, 12, 15, 7, 3, 1
When data are grouped, the mode can be obtained using the following formula:
mode = Lm +(\frac{\Delta1}{\Delta1+\Delta2})\times c
Where:
L_m = lower class boundary of the modal class
\Delta_1 = the difference between modal class frequency and the frequency of the next upper class