A data visualization tool that compares data across different categories or groups. It uses bars to present numeric values for levels of data categories, which can extend horizontally or vertically.
Pie Chart
A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the quantity it represents.
Two-way Table
A table that summarizes data on the relationship between two categorical variables for some group of individuals.
Marginal Relative Frequency
The percent or proportion of individuals that have a specific value for one categorical variable.
Joint Relative Frequency
The percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable.
Conditional Relative Frequency
The percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition).
Conditional Distribution
Side-by-side bar graph
Segmented bar graph
Mosaic plot
Association
Knowing the value of one variable helps us predict the value of the other.
No association
Knowing the value of one variable does not help us predict the value of the other.
TERMS
DEFINITION
Dotplot
Symmetric
Skewed right
Right side of the graph is much longer than the left side.
Skewed left
Left side of the graph is much longer than the right side.
Unimodal
Single peak
Bimodal
Two distinct clusters and peaks
Approximately symmetric
One peak, and the left and right sides are about the same (think of a hill).
Uniform
Frequencies are about the same for all values.
Stemplot
Histogram
Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval.
Unit 1.3
TERMS
DEFINITIONS
Mean
The average of all the individual values.
Statistic
A number that describes some characteristic of a sample.
Parameter
A number that describes some characteristic of a population.
Resistant
When a statistical measure isnβt resistant to extreme values.
Median
The number that is in the middle of the distribution.
Range
Maximum - minimum
Standard deviation
The typical value distance of the values in a distribution from the mean.
Sample variance
Sample Standard Deviation
Quartiles
Divisions in the distribution.
1st Quartile
The median of the values that are left of the actual median.
3rd Quartile
The median of the values that are right of the actual median.
Interquartile Range (IQR)
Q3 - Q1 = IQR
Outlier
Formulas:
Low outliers: Q1 - 1.5 x (Q3-Q1)
High outliers: Q3 + 1.5 x (Q3-Q1)
5 number summary
Minimum, Q1, Median, Q3, Maximum
Boxplot
Percentile (pth)
The value with p% of observations less than or equal to it.
Cumulative Relative Frequency Graph
A point corresponding to the percentile of a given value in a distribution of quantitative data. Consecutive points are then connected with a line segment to form the graph.
Standardized score (z-score)
IMPORTANT TO KNOW
z=value - meanstandard deviation or z=x-
TERMS
DEFINITIONS
Density Curve
Mean of a Density Curve
The point at which the curve would balance if made of solid material.
Median of a Density Curve
The equal-areas point, or the point that divides the area under the curve in half.
Skewed right - (mean > median) mean is closer to tail. Skewed left - (mean < median) mean is closer to the first end.
Normal distribution
Symmetric density curves, single-peaked, bell-shaped
Normal Curve
Same thing as Normal distribution
Empirical Rule
(68-95-99.7)
68% - 1 standard deviation of the mean.
95% - 2 standard deviations of the mean.
99.7% - 3 standard deviations of the mean.
Standard Normal Distribution
Normal distribution with mean 0 and standard deviation 1.