Data Representation and Central Tendency

Diagrammatic Representations for Data

  • Bar diagrams and pie charts are important diagrammatic representations.

Bar Diagram

  • Usually needs x and y axes.

  • X-axis shows categories (e.g., programs like PGDM, PGEM, PGCM).

  • Y-axis shows the number of students.

  • A scale must be defined for the y-axis (e.g., 1 cm = 40 students).

  • Example:

    • PGDM: 200 students

    • PGEM: 140 students

    • PGCM: 30 students

  • Bars are created with heights corresponding to the number of students in each category.

  • Diagrams are effective for quickly conveying information.

Group Frequency

  • Group frequency refers to data organized in class intervals.

  • Example: 10-20, 20-30, 30-40, 40-50, 50-60.

  • Class intervals are used for grouping data.

  • Frequency Data Categories:

    • Variable and corresponding frequency (e.g., marks and number of students).

      • Example: 5 marks - 3 students, 10 marks - 4 students, 15 marks - 7 students, 18 marks - 3 students.

    • Class intervals (grouped frequency).

      • Example: 0-5, 5-10, 10-15, 15-20.

  • Determining Class Interval:

    • Requires a range (R) and the number of classes (k).

    • R = \text{highest value} - \text{smallest value}

    • k is typically between 5 and 20.

    • Sturges formula can also be used to determine k.

    • Width of the class interval = R / k, round decimal values up to the next integer.

Pie Chart

  • Circular representation of data, where portions of the circle represent different categories.

  • To determine the size of each portion, calculate the angle for each category.

  • Total angle of a circle is 360 degrees.

  • Calculation:

    • \text{Angle for a category} = (\frac{\text{Category value}}{\text{Total value}}) \times 360

    • Example: If total students are 370:

      • PGDM (200 students): \frac{200}{370} \times 360 \approx 198 degrees.

      • PGEM (140 students): \frac{140}{370} \times 360 degrees.

Pareto Chart

  • A type of bar diagram used as a quality control tool.

  • Identifies major causes of variability in companies.

  • Example: Analyzing defective products to find common causes.

    • Raw material issues in 32 out of 50 defective products.

    • Faulty machine in 10 out of 50 defective products.

Graphical Representations for Data

Histogram

  • Graphical representation used for interval or ratio scale data (quantitative data).
    Need group frequency data.

  • X-axis: variable, Y-axis: frequency.

  • Scale defined on both axes.

  • Example:

    • Internal exam out of 25 marks:

      • 0-5 marks: 6 students

      • 5-10 marks: 12 students

      • 10-15 marks: 24 students

      • 15-20 marks: 15 students

      • 20-25 marks: 4 students

  • Bars are connected, unlike bar diagrams.

  • Histograms are used to locate the mode.

    • The highest bar indicates the mode.

    • Draw diagonals from the upper vertices of the highest bar to the adjoining bars.

    • The point where the diagonals intersect indicates the mode when a perpendicular is dropped to the x-axis.

Frequency Polygon/Curve

  • Based on histograms.

  • Frequency Polygon: Line diagram created by marking midpoints on the upper vertices of each bar in the histogram and joining them with straight lines using a scale.

  • Frequency Curve: Similar to the frequency polygon, but the midpoints are joined with a freehand curve.

Ogives (Cumulative Frequency Curves)

  • Require group frequency data.

  • Two types of cumulative frequencies:

    • Less than type: Related to upper limits of class intervals.

      • Example: Number of students having marks less than 5, less than 10, etc.

    • More than type: Related to lower limits of class intervals.

      • Example: Number of students having marks equal to or more than 0, more than 5, etc.

Constructing Cumulative Frequency Tables
  • Less than Type: Start with the lowest class interval then successively summate the cumulative frequencies of next class intervals

  • More than Type: Start with the total number of entities in the dataset, successively subtract the cumulative frequency for each proceeding group

Plotting
  • Less than type ogive: Plot cumulative frequencies against upper limits.

  • More than type ogive: Plot cumulative frequencies against lower limits.

  • Ojives are used to locate the median.

    • The intersection point of the less than and more than type ogives indicates the median when a perpendicular is dropped to the x-axis.

Class Intervals
  • Exclusive Type: Upper limit of one class interval is the same as the lower limit of the next (e.g., 0-5, 5-10, 10-15).

    • The upper limit is excluded from the class interval (e.g., a value of 5 goes into the 5-10 interval).

  • Inclusive Type: Class intervals are discrete (e.g., 1-5, 6-10, 11-15).

  • Exclusive type is used for continuous variables.

  • Inclusive type is used for discrete variables.

Measures of Central Tendency

Central Tendency

  • The property of data to cluster around a particular value.

Measures

  • Arithmetic mean.

  • Median.

  • Mode.

Arithmetic Mean
  • Sum of all values divided by the number of values.

  • Symbol: \bar{x} for sample mean, \mu for population mean.

Calculation for Raw Data
  • \bar{x} = \frac{\sum x_i}{n}

Calculation for Frequency Data
  • \bar{x} = \frac{\sum fi xi}{\sum f_i}

  • Where fi are the frequencies and xi are the variable values.

Calculation for Grouped Frequency Data
  • \bar{x} = \frac{\sum fi xi}{\sum f_i}

  • x_i are the midpoints of the class intervals.