BM 3

Exploring and Making Sense of Data-Presenting Information

Lecture 3-Class Discussion Notes

BM & EBL Year 1 Instructor: Kelebogile Kenalemang

Introduction

  • Discussion on organizing and presenting data to derive relevant information.

  • Importance of applying ideas and techniques appropriately for useful and timely information management.

  • Assumption: Required data has been collected from an appropriate population or sample.

  • Focus on processing and organizing the data to extract required information.

Frequency Distributions

  • Definition: A frequency distribution organizes data into a format showing the number of observations within intervals.

  • In larger datasets, repeated values are common.

  • Frequency: The number of times a data value appears in a dataset.

  • Example: In a dataset of student exam scores, the frequency of a score of 80 indicates the number of students who scored 80.

Frequency Table

  • A frequency table can display:

    • Categorical variables (qualitative)

    • Quantitative variables (numeric)

  • Categorical variables represent categories (e.g., eye color), whereas quantitative variables are numbers.

  • A frequency table summarizes frequencies and is known as a frequency distribution.

Example: Categorical Variable

  • Data on family planning methods used by teens in Kweneng, West Botswana:

    • Left Column: Method Used (categorical variable)

    • Right Column: Frequency (number of teens using each method).

Example: Quantitative (Numerical) Variables

  • Dataset of 20 statistics students’ marks:

    • Scores: 97, 92, 88, 75, 83, 67, 89, 55, 72, 78, 81, 91, 57, 63, 67, 74, 87, 84, 98, 46

  • Frequency table construction based on specific classes (e.g., 90-99, 80-89):

    • Class | Frequency (f)

    • 90-99 | 4

    • 80-89 | 6

    • 70-79 | 4

    • 60-69 | 3

    • 50-59 | 2

    • 40-49 | 1

  • A frequency table lists data intervals (data classes) with corresponding frequencies.

Types of Quantitative Frequency Tables

  • Ungrouped Frequency Tables:

    • Lists each individual data value with its frequency.

    • Suitable for discrete data (distinct, separate values).

    • Example: For exam scores like 65, 70, 75, 80, each score and its frequency are shown.

  • Grouped Frequency Tables:

    • Data values grouped into intervals or classes; frequencies recorded for each interval.

    • Used for continuous data (values within ranges).

    • Example: Grouping heights (e.g., 150-160 cm, 161-170 cm) and recording frequencies.

Discrete vs Continuous Data

  • Continuous Data:

    • Measured on a scale (e.g., weight, height).

    • Can take an infinite number of values (includes fractions/decimals).

    • Examples: Money, weight, temperature.

  • Discrete Data:

    • Takes on whole values and is counted; cannot have fractions or decimals.

    • Examples: Number of defective items, number of students.

Constructing an Ungrouped Frequency Table

  • Survey of 50 households with data on occupancy.

  • Data entered: 4, 7, 4, 1, 4, 2, 3, 6, 3, 5, 6, 3, 4, 9, 12, 1, 3, 4, 2, 2, 1, 1, 3, 8, 1, 1, 4, 2, 3, 4, 4, 4, 1, 4, 2, 3, 5, 4, 4.

  • Task: Construct a frequency distribution for these data

    • Discrete variable: Number of occupants in the household.

    • Frequency of each value determined using a tally system.

Constructing a Grouped Frequency Table

  • Group data into classes when dealing with large datasets (e.g., weights in a program).

  • Avoid cumbersome long lists of data.

  • Steps:

    1. Decide on number of classes (5 to 20).

    2. Calculate range (max - min) and divide by number of classes to find class width.

    3. Round up class width to next convenient number.

    4. Find class limits starting from the minimum data entry.

    5. Count entries for each class and record frequencies.

Example: Grouped Frequency Table

  • An example dataset of IQ scores for gifted classroom: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133, 136, 138, 141, 142, 149, 150, 154.

  • Process:

    • Pick 5 classes.

    • Calculate range (max - min = 154 - 118 = 36).

    • Class width = 36 / 5 = 7.2 ≈ 8.

    • Class limits are created from the minimum score incremented by class width:

    • 118, 126, 134, 142, 150.

    • Upper-class limits defined by subtracting 1 from class width.

    • 118-125, 126-133, 134-141, 142-149, 150-157.

Finishing Up the Table

  • Adding a frequency column to the IQ scores frequency table:

    • Class | Frequency

    • 118-125 | 4

    • 126-133 | 6

    • 134-141 | 3

    • 142-149 | 2

    • 150-157 | 2

Class Exercise

  • Construct a frequency table with 6 data classes from this gas purchase dataset:

    • Amounts: 7, 4, 18, 4, 9, 8, 8, 7, 6, 2, 9, 5, 9, 12, 4, 14, 15, 7, 10, 2, 3, 11, 4, 4, 9, 12, 5, 3

Cumulative Frequency Distribution

  • Definition: The sum of the class and all classes below it in a frequency distribution.

  • Table format can be used for summarization.

  • Example:

    • Class | Frequency (f) | Cumulative Frequency

    • 90-99 | 4 | 4

    • 80-89 | 6 | 10

    • 70-79 | 4 | 14

    • 60-69 | 3 | 17

    • 50-59 | 2 | 19

    • 40-49 | 1 | 20

Class Exercise

  • Add a cumulative frequency column to the table based on gas purchases.

Relative Frequency

  • Definition: The percentage of data elements in a class.

  • Calculated as:

    • Relative frequency =

    • Percentage frequency = relative frequency × 100

Example

  • Frequency distribution example with relative frequencies:

    • Class | Frequency (f) | Cumulative Frequency | Relative Frequency (f / n)

    • 90-99 | 4 | 4 | 0.20

    • 80-89 | 6 | 10 | 0.30

    • 70-79 | 4 | 14 | 0.20

    • 60-69 | 3 | 17 | 0.15

    • 50-59 | 2 | 19 | 0.10

    • 40-49 | 1 | 20 | 0.05

Class Exercise

  • Add relative frequency and percentage frequency columns to the gas purchases example.

Describing Data Sets

  • After organizing data in frequency distributions, six ways to describe data are explored:

    1. Bar Graphs

    2. Pie Charts

    3. Histograms

    4. Frequency Polygons / Line Charts

    5. Stem and Leaf Plots (self-reading)

    6. Box Plots (self-reading)

Bar Charts/Graphs

  • Types: Simple, compound, and component bar charts.

  • Description: Information represented via rectangles (bars).

  • Bars can be horizontal or vertical; height corresponds to frequency.

  • Suitable for categorical data (e.g., gender, age groups).

  • The scale on the frequency axis must always include zero.

Pie Charts

  • A circular diagram divided into portions representing various categories.

  • The entire pie represents all categories; each portion shows a category's size proportionally.

  • Best utilized for a single variable with up to 6 categories.

Diagrams for Non-Categorical Data

  • Use histograms for frequency distribution data and cumulative frequency polygons (ogives) for cumulative distributions.

The Histogram

  • Definition: A graphical representation of frequency table information with boxes touching.

  • Horizontal axis: variable measured; vertical axis: class frequency.

  • Each data class represented by a vertical bar with height as class frequency.

Frequency Polygon

  • Definition: A line graph representation of frequency table information, an alternative to histogram.

  • Plotted points joined, creating a closed figure at the zero frequency density at both ends.

  • Vertical axis: frequency; horizontal axis: measured variable.

Interpretation of Charts and Diagrams

  • Importance of accurately representing data to avoid misleading illustrations.

  • Critical examination of diagrams produced by others is advised to avoid misinterpretation.

Reading Assignment

  • Study advantages and disadvantages of each chart type.

  • Identify at least four advantages and disadvantages for each chart type.