places an individual into one of several groups or categories.
New cards
2
quantitative variables
takes numerical values for which it makes sense to find an average.
New cards
3
distribution
tells us what values the variable takes and how often it takes these values; pattern of variation.
New cards
4
data table
lists individuals.
New cards
5
frequency table
summarizes distribution in counts.
New cards
6
relative frequency table
summarizes distribution in percents.
New cards
7
two-way table
a table used to describe two categorical variables.
New cards
8
marginal distribution
the distribution of values of a categorical variable among all individuals described by the table.
New cards
9
conditional distribution
describes values of variable among individuals who have a specific value of another variable; there is a different conditional distribution for each value of the other variable.
New cards
10
segmented bar graph
a "stacked" bar graph that shows parts of a whole; forces us to use percents, easy to compare.
New cards
11
association
high/low amounts of V1 associated with high/low amounts of V2.
New cards
12
characteristics to address when describing the distribution of a quantitative variable
shape
New cards
13
outliers
New cards
14
center
New cards
15
spread
New cards
16
shape
skewness, symmetry
New cards
17
center
mean, median
New cards
18
spread
range, standard deviation
New cards
19
histogram
labels, equal classification widths
New cards
20
what to do with boundary values (whole number on next bar or lower bar?)
New cards
21
make dot plot first
New cards
22
minimum of five bins (bars)
New cards
23
relative frequency histogram
makes it easier to compare two distributions, especially when number of individuals is very different.
New cards
24
x bar
mean of a sample
New cards
25
μ
mean of a population
New cards
26
resistant measures of center
median - YES, outliers don't affect the number of items in a set
New cards
27
mean - NO, mean is pulled in the direction of skewness
New cards
28
how does the shape of a distribution affect the relationship between the mean and the median?
skew right: mean > median
New cards
29
skew left: mean < median
New cards
30
symmetric: mean = median
New cards
31
range
max - min
New cards
32
not resistant measure of spread
New cards
33
quartiles
median of observations to left and right of median
New cards
34
IQR
Q3 - Q1
New cards
35
resistant measure of spread
New cards
36
outliers
Q1 - 1.5(IQR) = lower boundary
New cards
37
Q3 + 1.5(IQR) = upper boundary
New cards
38
five-number summary
minimum, Q1, median, Q3, maximum -> boxplot
New cards
39
standard deviation
the typical distance of the values in the data set from the mean.
New cards
40
(dispersion, spread, variation)
New cards
41
similarities between range, IQR, standard deviation
all measure spread
New cards
42
differences between range, IQR, standard deviation
range is least resistant to outliers
New cards
43
standard deviation is slightly resistant
New cards
44
IQR is most resistant
New cards
45
properties of standard deviation
measures spread about the mean; only use when mean is chosen as center.
New cards
46
Sx is always greater than or equal to 0.
New cards
47
Sx has the same measurement units as data (original observations).
New cards
48
Sx is NOT resistant.
New cards
49
factors to consider when choosing summary statistics
center and spread of distribution
New cards
50
- skewed/outlier: median, IQR (resistant)
New cards
51
- symmetric data without outliers: mean, standard deviation