1.2: Histograms, Box Plots, Outliers, and Standard Deviation

**Histogram**: a bar graph for quantitative dataThe horizontal axis is divided into classes

Each class needs to cover the same range of values

Generally, 5-7 classes is a good minimum

The more classes, the more detail/nuance shown

The vertical axis measures how much data is in each class

The bars must be touching

If a data point is on the break of a class group (on a tick mark on the x-axis), it is included in the right bar

**Frequency histogram**: a histogram showing the number of data points**Relative frequency histogram**: a histogram showing the percent of dataCan be made by taking the frequency in each class and dividing it by the total number of data points

The center is generally found by estimation, especially if

*only*a graph is givenA histogram displays how many pieces of data are in each class

Consistent scales on both axes

Labels for both axes

A break on the x-axis if it does not start at 0

The y-axis starting at 0

Data points are considered outliers if they lie…

Eg. data set: 18, 19, 13, 2, 15, 19, 15, 31, 17, 16, 29

Q1 = 15

Q3 = 19

IQR = Q3 - Q1 = 4

Q1 - 1.5(IQR) = 15 - 1.5(4) = 9 → anything below 9 is an outlier

Q3 + 1.5(1QR) = 19 + 1.5(4) = 25 → anything above 25 is an outlier

So, 2, 29, and 31 are outliers

Mean is the numerical standardized average of a set of data

Standard deviation is the spread of data about the mean

Standard deviation uses the same units as the original data

Skew and outliers influence both mean and standard deviation

**Skew**: the extent to which a graph is pulled to one side or centered around the middleIf skew/outliers are present in a data set, this means that mean and standard deviation should not be used

These measurements work well when data is approximately symmetrical with no outliers

Resistent to outliers

These measurements work well when data

*is*skewed and/or contains outliers

Range = maximum - minimum

IQR = Q3 - Q1

Standard Deviation

**x̄**= meanStandard deviation measures the rough average distance between each point and the mean

Larger standard deviations indicate that there is more data further from the mean

Moderate standard deviations indicate that data is moderately spread around the mean

Smaller standard deviations indicate that there is more data clumped closer to the mean

Variance

Variance is also equal to the square root of standard deviation

Remember to always plot data; measures of spread and center only display specific facts about a data set, but graphs give the best overall pictures of distributions

**Histogram**: a bar graph for quantitative dataThe horizontal axis is divided into classes

Each class needs to cover the same range of values

Generally, 5-7 classes is a good minimum

The more classes, the more detail/nuance shown

The vertical axis measures how much data is in each class

The bars must be touching

If a data point is on the break of a class group (on a tick mark on the x-axis), it is included in the right bar

**Frequency histogram**: a histogram showing the number of data points**Relative frequency histogram**: a histogram showing the percent of dataCan be made by taking the frequency in each class and dividing it by the total number of data points

The center is generally found by estimation, especially if

*only*a graph is givenA histogram displays how many pieces of data are in each class

Consistent scales on both axes

Labels for both axes

A break on the x-axis if it does not start at 0

The y-axis starting at 0

Data points are considered outliers if they lie…

Eg. data set: 18, 19, 13, 2, 15, 19, 15, 31, 17, 16, 29

Q1 = 15

Q3 = 19

IQR = Q3 - Q1 = 4

Q1 - 1.5(IQR) = 15 - 1.5(4) = 9 → anything below 9 is an outlier

Q3 + 1.5(1QR) = 19 + 1.5(4) = 25 → anything above 25 is an outlier

So, 2, 29, and 31 are outliers

Mean is the numerical standardized average of a set of data

Standard deviation is the spread of data about the mean

Standard deviation uses the same units as the original data

Skew and outliers influence both mean and standard deviation

**Skew**: the extent to which a graph is pulled to one side or centered around the middleIf skew/outliers are present in a data set, this means that mean and standard deviation should not be used

These measurements work well when data is approximately symmetrical with no outliers

Resistent to outliers

These measurements work well when data

*is*skewed and/or contains outliers

Range = maximum - minimum

IQR = Q3 - Q1

Standard Deviation

**x̄**= meanStandard deviation measures the rough average distance between each point and the mean

Larger standard deviations indicate that there is more data further from the mean

Moderate standard deviations indicate that data is moderately spread around the mean

Smaller standard deviations indicate that there is more data clumped closer to the mean

Variance

Variance is also equal to the square root of standard deviation

Remember to always plot data; measures of spread and center only display specific facts about a data set, but graphs give the best overall pictures of distributions