1/40
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
categorical variable
assigns labels that place each individual into a particular group, called a category
can not take the average
distribution description is one-one
quantitative variable
takes number values that are counts or measurements
can take measurements
distribution description is SOCS
two-way table
table of counts that summarizes data on the relationship between two categorical variables for some group of individuals
try to include the total counts of the rows and columns
marginal relative frequency
gives the percent/proportion of individuals that have a specific value for one categorical variable
only describes ONE of the variables in the two-way table
row or column total/total total
joint relative frequency
gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for ANOTHER categorical variable
answers information about BOTH variables in the two-way table
AND!
one box/total total
conditional relative frequency
gives the proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable
proportion of individuals of a certain category within one overarching variable
one box/row or column total
side-by-side bar graph
displays the distribution of a categorical variable for each value of another categorical variable
groups based on values of the categorical variables and placed side by side
several bars in one category
segemented/ribbon bar graph
displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category
one rectangle divided up per category
separate bars
mosaic plot
a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category
one rectangle divided up per category
CONNECTED
width of the rectangle is as large as the population of the category
categorical graphs
side-by-side bar graphs
mosaic plots
segmented/ribbon bar graphs
pie charts
association
when knowing the value of one variable can help predict the value of the other
DOES NOT MEAN CAUSATION
no association
when knowing the value of one variable does NOT help predict the value of the other
dotplot
shows each data value as a dot above its location on a number line
full points for graph
title
correct graph
labeled axis
if necessary: subheadings and key
stemplot
shows each data value separated into two parts
stem, all digits minus the final digits (ex: 10s or 100s)
lead, the final digit (ex: 1s or 0.1s)
kind of like a dot plot on its side + grouped into 10s or 5s
splitting stems
stemplot when the data is very clustered
back-to-back stemplot
combining two distributions of the same quantitative variable
requires two keys and subheadings
describe the distrbution
SOCS in CONTEXT
use variable name, not just units
write in complete sentences
start with capital and end with period
NO bullets
shape
unimodal/bimodal/etc
skewed left/skewed right/symmetric
use QUALIFERS
skew towards the TAIL
outlier
can do approximate, but if possible do EXACT
High-End Outlier
Q3 + 1.5 x IQR
Low-End Outlier
Q1 - 1.5 x IQR
unimodal
one peak
bimodal
two distinct clusters
uniform
frequency about the same for all values
mean
the average of all the individual data values
better for more symmetric data
NOT resistant
resistant
is not sensitve to extreme values
median
midpoint of a distribution
resistant
symmetric
data forms a basic “U”
mean = median (approximately)
skewed left
tail end is at the left
mean < median
skewed right
tail end is at the right
mean > median
range
distance between minimum and maximum values
NOT resistant measure of variability
standard deviation
the typical distance of the values in a distribution from the mean
NOT resistant
quartiles
groups of the distribution divided into fours
first quartile (Q1)
median of the data values LEFT or LESS THAN the median
third quartile (Q3)
median of the data values RIGHT or GREATER THAN the median
interquartile range (IQR)
distance between first and third quartiles
Q3 - Q1
more resistant
mode
the data value that occurs the most
histogram
uses intervals of values as a bar
bin width = size of interval
heights = frequency/relative frequency
use relative frequency when comparing distributions of different total counts
good for large data sets
do not know individual values
how many bins
count data points
square root the number of data points
round up
quantitative graphs
dotplots
histograms
stemplots