Data Representation and Central Tendency
Diagrammatic Representations for Data
Bar diagrams and pie charts are important diagrammatic representations.
Bar Diagram
Usually needs x and y axes.
X-axis shows categories (e.g., programs like PGDM, PGEM, PGCM).
Y-axis shows the number of students.
A scale must be defined for the y-axis (e.g., 1 cm = 40 students).
Example:
PGDM: 200 students
PGEM: 140 students
PGCM: 30 students
Bars are created with heights corresponding to the number of students in each category.
Diagrams are effective for quickly conveying information.
Group Frequency
Group frequency refers to data organized in class intervals.
Example: 10-20, 20-30, 30-40, 40-50, 50-60.
Class intervals are used for grouping data.
Frequency Data Categories:
Variable and corresponding frequency (e.g., marks and number of students).
Example: 5 marks - 3 students, 10 marks - 4 students, 15 marks - 7 students, 18 marks - 3 students.
Class intervals (grouped frequency).
Example: 0-5, 5-10, 10-15, 15-20.
Determining Class Interval:
Requires a range (R) and the number of classes (k).
R = \text{highest value} - \text{smallest value}
k is typically between 5 and 20.
Sturges formula can also be used to determine k.
Width of the class interval = R / k, round decimal values up to the next integer.
Pie Chart
Circular representation of data, where portions of the circle represent different categories.
To determine the size of each portion, calculate the angle for each category.
Total angle of a circle is 360 degrees.
Calculation:
\text{Angle for a category} = (\frac{\text{Category value}}{\text{Total value}}) \times 360
Example: If total students are 370:
PGDM (200 students): \frac{200}{370} \times 360 \approx 198 degrees.
PGEM (140 students): \frac{140}{370} \times 360 degrees.
Pareto Chart
A type of bar diagram used as a quality control tool.
Identifies major causes of variability in companies.
Example: Analyzing defective products to find common causes.
Raw material issues in 32 out of 50 defective products.
Faulty machine in 10 out of 50 defective products.
Graphical Representations for Data
Histogram
Graphical representation used for interval or ratio scale data (quantitative data).
Need group frequency data.X-axis: variable, Y-axis: frequency.
Scale defined on both axes.
Example:
Internal exam out of 25 marks:
0-5 marks: 6 students
5-10 marks: 12 students
10-15 marks: 24 students
15-20 marks: 15 students
20-25 marks: 4 students
Bars are connected, unlike bar diagrams.
Histograms are used to locate the mode.
The highest bar indicates the mode.
Draw diagonals from the upper vertices of the highest bar to the adjoining bars.
The point where the diagonals intersect indicates the mode when a perpendicular is dropped to the x-axis.
Frequency Polygon/Curve
Based on histograms.
Frequency Polygon: Line diagram created by marking midpoints on the upper vertices of each bar in the histogram and joining them with straight lines using a scale.
Frequency Curve: Similar to the frequency polygon, but the midpoints are joined with a freehand curve.
Ogives (Cumulative Frequency Curves)
Require group frequency data.
Two types of cumulative frequencies:
Less than type: Related to upper limits of class intervals.
Example: Number of students having marks less than 5, less than 10, etc.
More than type: Related to lower limits of class intervals.
Example: Number of students having marks equal to or more than 0, more than 5, etc.
Constructing Cumulative Frequency Tables
Less than Type: Start with the lowest class interval then successively summate the cumulative frequencies of next class intervals
More than Type: Start with the total number of entities in the dataset, successively subtract the cumulative frequency for each proceeding group
Plotting
Less than type ogive: Plot cumulative frequencies against upper limits.
More than type ogive: Plot cumulative frequencies against lower limits.
Ojives are used to locate the median.
The intersection point of the less than and more than type ogives indicates the median when a perpendicular is dropped to the x-axis.
Class Intervals
Exclusive Type: Upper limit of one class interval is the same as the lower limit of the next (e.g., 0-5, 5-10, 10-15).
The upper limit is excluded from the class interval (e.g., a value of 5 goes into the 5-10 interval).
Inclusive Type: Class intervals are discrete (e.g., 1-5, 6-10, 11-15).
Exclusive type is used for continuous variables.
Inclusive type is used for discrete variables.
Measures of Central Tendency
Central Tendency
The property of data to cluster around a particular value.
Measures
Arithmetic mean.
Median.
Mode.
Arithmetic Mean
Sum of all values divided by the number of values.
Symbol: \bar{x} for sample mean, \mu for population mean.
Calculation for Raw Data
\bar{x} = \frac{\sum x_i}{n}
Calculation for Frequency Data
\bar{x} = \frac{\sum fi xi}{\sum f_i}
Where fi are the frequencies and xi are the variable values.
Calculation for Grouped Frequency Data
\bar{x} = \frac{\sum fi xi}{\sum f_i}
x_i are the midpoints of the class intervals.