Basic terms
individuals
objects described by a set of data, could be people, animals, or things
variable
characteristic of an individual, can take different values for different individuals
categorical variable
places individuals into one of several groups or categories
quantitative variable
takes numerical values for what makes sense to find an average
distribution
tells what values a variable takes and how often it takes these values
inference
conclusion drawn that goes beyond the data at hand
frequency table
shows the frequency of data in each category
relative frequency table
shows the percent of data in each category
pie chart
shows distributions of data in a circle, emphasizes category’s relation to the whole
bar graph
shows categories of data in bars, compares quantities measured in the same units
dotplot
data values are shown by dots above their location on a number line
shape
major peaks, clusters or gaps of values, potential outliers, symmetry, skewness
center
mean or median that roughly splits data in two
spread
tells variability in data, could be range, or smallest and higher values
outliers
values that differ from the overall data pattern
symmetric
right and left sides of graph are approximately mirrors of each other
skewed to the right
right side of graph is notably longer than left side, positively skewed
skewed to the left
left side of graph is notably longer than the right side, negatively skewed
unimodal
one peak in the graph
bimodal
two peaks in the graph
stemplot
(stem and leaf plot) separates first digit(s) as a stem, and the last digits as leaves, goes least to greatest
histogram
shows the frequency of the data found between ranges
mean
average of all the data
median
highest value or average between middle values, resistant to skewing
range
highest value minus lowest value
quartiles
1st Q contains 25% of data, 3rd Q contains 25% of data, 2nd Q is the median
interquartile range (IQR)
3rd Q value minus 1st Q value
five-number summary
summary of data showing minimum, Q1, median, Q3, and maximum
boxplot
has dots at min. and max. values, boxes to show Q1 and Q3 distribution
standard deviation
distance between a value and the mean, square root of variance
variance
average of squared distances from the mean
resistant
a measure that won’t be altered by outliers or extremes
1.5 x IQR rule
subtract this from Q1 to find lower outliers, add to Q3 to fined higher outliers, anything outside added/subtracted values is an outlier
what is used with symmetrical data that has no outliers
mean and standard deviation
what is used with asymmetrical data that has outliers
median and IQR
quantitative graphs
histograms, dotplots, stemplots, boxplots, frequency charts
qualitative graphs
pie charts, bar graphs