Looks like no one added any tags here yet for you.
Categorical data
_ cannot be measured. Instead, data is counted and placed into a specific group or category
quantitative data
is measurement data
frequency
looks at actual counts
relative frequency
looks at data as percentages or proportions
Cumulative frequency
the sum of the current count and all previous counts (a running total)
Cumulative relative frequency
is the sum of the current percent and all previous percentages (a running total of the
percentages or proportions)
a contingency table
is also called a two-way table
dependent
If variables are ___ then there is an association/relationship between the variables
independent
If variables are __ then there is not an association/relations between the variables
pie charts and segmented bar charts
should a full distribution (100%)
a mosaic plot
is a type of segmented bar chart
mosaic plot
allows you to compare relative frequencies (percentages or proportions) of two or more groups
a mosaic plot
allows you to compare the actual quantity of two more groups. Large areas have a large quantity than a smaller area
pie charts and bar charts
charts are two ways to visually display categorical data
Histograms, stem-plots, and dot plots
are three ways to visually display quantitative data
lot at or compare one variable statistics
Pie charts, bar charts, histograms, stem-plots, and dot plots
modes
The humps on a histogram are called
gaps
The spaces on a histogram are called
must
Stem-plots ___ have a key
least to greatest
The leaves of stem-plots are listed from ___ from the stems
back-to-back
When comparing two different items (ex. males vs females), you can create a ____ stem-plot
0-4 and 5-9
When creating a stem-plot and you have a lot of data, it is good to split the leaves from
shape, center, and spread
When discussing quantitative data, you must discuss
roughly symmetric
Never say symmetric. Instead say
when discussing shape
you should discuss three things: 1) unimodal, bimodal, multimodal, or uniform 2) roughly symmetric or skewed (left/negative or right/positive), 3) gaps and outliers (if any exist)
mean
is the average
median
is the middle number when numbers are arranged from least to greatest
mode
is the most occurring value
n+1/2
to find the location of the median, use the equation
odd
When the sample size is ___, the median will always be the middle term
even
when the sample size is___, the median will not be a middle term, but rather the average of the two terms closest to
the center
roughly symmetric
The mean and the median are about the same when data is
skewed to the right
The mean is larger than the median when data is
skewed to the left
The mean is smaller than the median when data is
range is the
maximum value minus the minimum value
the lower quartile
is the median of the bottom half of the data
the upper quartile
is the median of the top half of the data
Q3-Q1
The interquartile range (IQR) is
weakest
the range is the ___ form of variability
use mean and standard deviation
when data is roughly symmetric with no potential outliers
use the median and IQR
when data is skewed or outliers exist
median and IQR
are resistant to outliers and skewed data
mean and standard deviation
are not resistant to outliers and skewed data
smallest
Standard deviation is __ when data is tightly clustered around the mean
larger
standard deviation is ___ when data is more spread out
min, Q1, median, Q3, max
five number summary is
lower and upper fences
outliers form the __ on box plots
modified box plots
include outliers
modified
always create ___ box plots
whi
it is skewed left
if the box plot has more amount to the left side
it is skewed right
if the box plot has more amount to the right side
time plots
are a visual displays that look at data over a period of time
are shifted
when adding or subtracting a value to every number in a data set, all measures of position
stay the same
when adding or subtracting a value to every number in a data set, all measures of spread
are rescaled
when multiplying or dividing a value to every number in a data set all measures of spread and variability (with the exception of variance)
when rescaling the variance
you must first square the value you are rescaling and then multiply/divide it by the variance
mew
population mean is
x bar
sample mean is
sigma
population standard deviation is
s
sample standard deviation is
sigma squared
population variance is
s squared
sample variance is
below
Percentile rank is the percentage of data that lies ___ an observation
raw data
is listing out all of the actual data
summary statistics
are statistics that summarize the data (mean, standard deviation, etc.)
n
sample size is denoted by
sample size
is the number of observations in the sample
approximately normal
never say data is normal, instead say it is
normal models
are used when histograms are unimodal and roughly symmetric
normal models are used when
normal probability plots are fairly linear from the lower left to the upper right
mean of the data
The center of every normal model is the
normal models
are standardized when the data is converted into z-scores
zero
When normal models are standardized into z-scores, the mean becomes
units
z-scores have no
above or below
z scores tell us how many standard deviations ___ the mean
data is below the mean
when z scores are negative
data is above the mean
when z scores are positive
y-mew/sigma
the formula for z score is
on or above
density curves look at data ___ the x- axis
one
the area of every density curve is always
the normal distribution
an example of a density curve
normalcdf
When we have normal models and we have z-scores, we can find the probability using
(lower bound, upper bound, mew, sigma)
normalcdf formula is
inversenorm
When you have percentages, percentile ranks, or probabilities and you want to find a z-score or critical z-statistic, then use
(area below, mew, sigma)
formula for inversenorm
subtract the percentage from 100 and then use inversenorm
Your calculator cannot find a z-score when given the top percentage of something (ie. top 5%, top 10%, etc.). You must first
a trial
is a sequence of events that we want to investigate that leads to an outcome
the law of large numbers
looks at long term behavior and says that as the number of trials increase, the repeated trials get closer and closer to the actual probability of the event
a simulation
can be used to imitate behavior and is often used to model long term behavior
simulation steps
1. Start at the beginning of a random number table and move left to right
2. Look at digits 1 at a time, 2 at a time, 3 at a time, etc. depending on the context of the question
3. Discuss how the numbers are assigned
4. Discuss if any numbers should be ignored
5. Discuss whether repeated numbers in a given trial should be ignored
6. Explain when to stop
7. Explain what to do now that the values have been selected
sample space
is the collection of all possible outcomes and is denoted with the letter S
outcome/event
The result or value of the trial is called an
same probability of happening
For equally likely outcomes, every outcome in the sample space has the
zero to one
Probability is always a number from
one
If you add all of the probabilities in a sample space, your sum will always be
complement of a
The probability that the set of outcomes are not in A is called the
1-P(A)
complement of A formula
disjoint events
are also called Mutually Exclusive events
disjoint/mutually exclusive events
can only be one thing or the other, there can be no overlap, they cannot occur at the same time