STAT-2210 - Introductory Statistics Chapter 1: Picturing Distributions with Graphs
STAT-2210 - Introductory Statistics Chapter 1: Picturing Distributions with Graphs
Instructor: Dr. Wijekularathna, D.K.
University: Troy University
1.1 Introduction
Statistics: The science of data.
Each data set contains a collection of information (variables) about a group of individuals.
Data Set: Information collected about the group of individuals in the sample, including details about each individual.
Data can originate from a variety of entities (humans, animals, objects, etc.).
1.2 Individuals and Variables
1.2.1: Individuals
Definition: The objects described by a set of data.
Examples: Individuals can be people, animals, or inanimate objects.
1.2.2: Variable
Definition: Any characteristic of an individual; can take different values for different individuals.
1.2.3: Data
Refers to the values of a variable.
1.2.4: Observation
Definition: Each individual piece of data.
1.2.5: Dataset
Definition: Collection of all observations for a particular variable.
Classification of Variables
Variables can be broadly classified into two types:
Qualitative (Categorical):
Definition: Categories or labels that place individuals into groups.
Subtypes:
Nominal: No natural ordering; e.g., gender, eye color.
Ordinal: Natural order exists; e.g., rankings (A, B, C) and Likert-type scales.
Quantitative (Numeric):
Definition: Numerical values representing measurements.
Subtypes:
Discrete: Countable number of values; e.g., number of prescriptions.
Continuous: Values can take any range; e.g., height (inches), weight (pounds).
1.3 Exploratory Data Analysis
Definition: The process of using statistical tools to examine data to describe their main features.
Principles of Exploring Data:
Examine each variable individually, then study relationships among variables.
Begin with graphical representations, followed by numerical summaries.
1.3.1 Distribution of a Variable
Goal: Graphically display the distribution of each variable.
Indicates what values the variable takes and how often.
Graphical Tools for Distribution:
Categorical Variables:
Bar chart
Pie chart
Quantitative Variables:
Histogram
Stem-and-leaf plot
Dot plot
Objective: Illustrate the value range and occurrences.
Time Plot: For variables that change over time.
1.4 Categorical Variables: Pie Charts and Bar Graphs
The aim is to examine the data to describe main features (Exploratory Data Analysis).
1.4.1 Pie Charts
Definition: A pie chart is a disk divided into wedges proportional to the frequencies of categorical data.
Steps to Construct a Pie Chart:
Construct the relative frequency distribution.
Relative Frequency Formula:
Divide the disk into wedges according to relative frequencies.
Label slices with distinct values and frequencies.
Example: Survey results on preferred movie genres (Comedy, Action, Romance, Drama, SciFi).
1.4.2 Bar Graphs
Definition: A bar graph shows a vertical bar for each category, representing relative frequencies.
Steps to Construct a Bar Graph:
Create relative frequency distribution.
Draw axes for bars (horizontal for categories, vertical for frequencies).
Construct bars with heights proportional to relative frequencies.
Label axes and bars.
Example: Bar graph for movie preference using data from Example 1.3.
1.5 Quantitative Variables
Quantitative variables often take many values.
1.5.1 Histogram
Definition: A graph using bars to represent frequencies or relative frequencies of quantitative variable outcomes.
Steps to Construct a Histogram:
Draw axes (horizontal for categories, vertical for frequencies).
For discrete variables, separate bars for each category.
For wide data ranges, divide values into equal length intervals.
Example 7:
Data on the heights of 16 adults in centimeters. (e.g., 162, 168, etc.)
Divide data into intervals for histogram construction.
Example 8:
Data on weights of 52 adults presented as intervals (e.g., 100 - <120 lbs).
1.5.1.1 Interpreting Histograms
Interpretation Aspects:
Overall patterns: Shape, center, spread (e.g., symmetric, skewed).
Identify outliers.
1.5.2 Stem-and-Leaf Plots
Definition: A method where each observation is represented by its stem and leaf components.
Steps:
Sort data in increasing order.
Separate observations into stems and leaves.
Write stems vertically with leaves beside.
Include a key for interpretation.
1.5.3 Dot Plots
Definition: A visual that displays a dot for each observation along a number line.
Steps:
Draw horizontal axis for values.
Record each observation with a dot.
1.6 Describing Distributions
Symmetric Distribution: Left and right sides mirror each other (e.g., IQ scores).
Skewed to the Right: Right side is longer (a few large values skew the mean; example: income).
Skewed to the Left: Left side is longer (small values skew the mean; example: GPA).
1.7 Time Plots
Definition: Displays how variables change over time.
Characteristics:
Time on horizontal axis and measured variable on vertical axis.
Look for trends and seasonal variations.
End Chapter Problem Set
Crime study variables: age, gender (categorical); race, education, annual income (quantitative).
Construct bar graph for BMI weight statuses.
Draw pie chart and bar graph for the 2002 Winter Olympics medal counts.
Graph color popularity distribution of cars and light trucks.
Create a stemplot from football team weights and precipitation data for U.S. cities.
Analyze sugar consumption data with stemplot/histogram.
Analyze histogram of television sets per household.
Interpret blood pressure readings in a relative-frequency histogram.