Chapter 2: Descriptive Statistics and Data Visualization Patterns and Data Organization and Visualization
Introduction to Descriptive Statistics
Objective of Descriptive Statistical Methods: The primary aim is to present information and data in a manner that is clear, concise, and accurate.
The Problem of Data Volume: Analyzing large data sets is difficult because there is often too much information for the human mind to assimilate or understand.
Task of Descriptive Methods: These methods summarize information and extract main features without distorting the overall picture.
Organizing and Graphing Data
Samples and Populations: All data sets used in this context are regarded as samples drawn from a specific population.
Purpose of Samples: A sample is studied mainly to obtain information about the larger population.
Focus of Study: The main goal involves summarizing and describing specific features of the data set.
Raw Data (Ungrouped Data):
Definition: Information obtained from each member of a population or sample and recorded in the sequence in which it becomes available.
Characteristics: Collected at random; not organized or ranked.
Frequency Distributions
Definition: A frequency distribution is a table where data are grouped into classes, and the number of values (frequencies) falling in each class is recorded.
Purpose: To gain insight into the distribution pattern of frequencies across different classes; the name refers specifically to this pattern.
Example 1: Survey of Urban Neighborhood Families
Context: A survey of families recorded the number of children per family.
Sample Size ():
Raw Data:
Frequency Distribution Table Results:
children: Frequency () = ; Relative Frequency () =
child: Frequency () = ; Relative Frequency () =
children: Frequency () = ; Relative Frequency () =
children: Frequency () = ; Relative Frequency () =
children: Frequency () = ; Relative Frequency () =
children: Frequency () = ; Relative Frequency () =
children: Frequency () = ; Relative Frequency () =
Total Frequency:
Relative Frequency Calculations
Definition: The proportion (or percentage) of data falling into a specific class.
Formula:
Note: The sum of frequencies always equals the sample size ().
Example 2: Smartphone Ownership (Qualitative Data)
Context: Type of smartphone owned by the youngest family member.
Categories: = Android, = iPhone, = Windows phone.
Raw Data:
Frequency Distribution Results:
Android: Frequency = ; ; Cumulative Frequency () =
iPhone: Frequency = ; ; Cumulative Frequency () =
Windows phone: Frequency = ; ; Cumulative Frequency () =
Constructing Frequency Distributions for Grouped Data
Case Study: Staff Lunch Spending at DUT
Data Set: Amounts spent in Rands by lecturers.
Values:
Statistics: Highest value = , Lowest value = , .
Major Steps for Construction:
Calculate Number of Classes ():
Formula:
Calculation:
Calculate Class Width ():
Formula:
Range ():
Calculation:
Choose Starting Point:
Rule: Use any convenient number equal to or less than the smallest value in the data set as the lower limit of the first class ( was chosen for this example).
Grouped Frequency Distribution Table:
Classes:
20 - < 30: , , , Midpoint () =
30 - < 40: , , , Midpoint () =
40 - < 50: , , , Midpoint () =
50 - < 60: , , , Midpoint () =
60 - < 70: , , , Midpoint () =
70 - < 80: , , , Midpoint () =
Percentage Frequency: Calculated as .
Cumulative Frequency (): The frequency of a class plus all previous frequencies. For the first class, always equals the frequency ().
Midpoint (Class Mark): The average of the two class limits.
Formula:
Example:
Exercises for Practice
Exercise 1: Call Center Performance
Metric: Service level (time in seconds to answer calls).
Data ():
Task: Construct a frequency and cumulative distribution.
Exercise 2: Public Transport Spending
Data: Amount spent per day by DUT staff members.
Statistics: Highest = , Lowest = .
Data Set:
Task: Construct a frequency and cumulative distribution.
Graphing Grouped Data
Histogram
Definition: Graphical representation of a frequency distribution.
Construction:
Horizontal axis: Class boundaries (true class limits ensuring continuity).
Vertical axis: Frequencies, relative frequencies, or percentages.
Representation: Rectangular bars with class boundaries as the base and frequency as the height.
Gaps: There are no gaps between bars because they are drawn over class boundaries.
Insight: From the DUT lunch spending histogram, it is visible that most staff members spent between and Rand.
Shape of a Distribution
Purpose: Histograms describe the clustering pattern of data values.
Normal Distribution: Bell-shaped; .
Right Skewed: Tail extends to the right; \text{Mode} < \text{Median} < \text{Mean}.
Left Skewed: Tail extends to the left; \text{Mean} < \text{Median} < \text{Mode}.
Central Location Rules:
If data is normally distributed, use the Mean.
If data is not normally distributed, use the Median.
Frequency Polygon
Definition: A line graph emphasizing continuous change in frequencies.
Construction:
Plot class midpoints against frequencies.
Add two additional classes (one at each end) with zero frequency to anchor the graph to the horizontal axis.
Join adjacent points with straight lines.
Equivalence: The histogram and frequency polygon are equivalent; the area under both represents the total number of observations ().
Cumulative Frequency Graph (Ogive)
Definition: A graph of cumulative frequencies versus upper-class boundaries.
Construction:
Horizontal axis: Upper class boundary.
Vertical axis: Cumulative frequency ().
Starting Point: Plot against the lower-class boundary of the first class.
Example coordinates: .
Graphing Qualitative Data Sets
Bar Charts
Definition: Categorical data represented by rectangular bars.
Features: Bars can be vertical or horizontal. Height/length is proportional to the size of the category. Only totals are represented.
Example (Facebook Users, US 2011):
Age Group: users (
Age Group: users (
Age Group: users (
Total users:
Example (Commerce Student Distinctions):
As: students; A: students; As: students; As: students; As: students.
Pie Chart
Definition: A circle divided into slices proportional to the frequency of subgroups.
Calculations:
Angle:
Percentage:
Results for Commerce Students (Total ):
As: (
A: (
As: (
As: (
As: (
Results for Facebook Age Groups:
:
:
: