1/45
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
individuals
objects described by a set of data (e.g. people, animals, things)
variable
any characteristic of an individual; can be different values for different individuals
categorical/qualitative variable
records which of several categories/groups an individual belongs to; arithmetic is not meaningful
quantitative variable
takes numerical values for which it makes sense to do arithmetic (e.g. adding, averaging)
distribution
pattern of variation of a variable; records values the variable takes and how often it takes them; presentation of data
range
spread; high value - low value; gives interval of scores
spread
describes where data lies in a distribution; measured by range, standard deviation, variance, and/or M.A.D.
frequency
how many times the value of a variable occurs
outlier
an individual observation that falls outside the overall pattern of the graph; determined by eye or using the 1.5 IQR rule: if it’s less than Q₁ - 1.5*IQR or greater than Q₃ + 1.5*IQR, it’s an outlier
center
where the graph is centered; measured by mean, median, and/or mode
shape
the shape observations form in the distribution; described as skewed left/right or symmetric
skewed left
the left side (lower half of the distribution) extends much farther out than the right; the left side is the “tail”
skewed right
the right side (upper half of the distribution) extends much farther out than the left; the right side is the “tail”
symmetric
the right and left sides of the distribution are approximately mirror images of each other
dot plot
graph of data set using dots for each observation
histogram
graph with bars showing frequency of different values of one variable (not categories!); most common for quantitative variables, can use to group nearby values if too many values for a dot plot
stemplot
graph for a small data set that gives more info; stems are all but rightmost digit of observations, leaves are the final digit in decreasing order out from the stem (remember to include a key!)
split stems
each stem appears twice; do if all leaves would fall on just a few stems
back-to-back stemplot
stemplot with leaves on the right and left; use to compare two distributions (don’t forget a key with both distributions!)
time plot
graph plotting each observation against the time at which it was measured; use to show change over time
mean
most common measure of center; (∑x)/n; x̄ for sample mean and μ for population mean
∑
sigma; symbol meaning “sum of”
x̄ (x bar)
sample mean equal to (∑x)/n
nonresistant
sensitive to the influence of extreme observations; because the mean and standard deviation are nonresistant, they are pulled towards the tail
median
the middle value; M = med = x͂ = the (n/2)+1th value (or middle value) in odd functions and = the mean of the middle two values in even functions
resistant
not sensitive to the influence of extreme observations (e.g. median)
quartiles
spread; the quartiles make up the middle half of the data
Q₁
median of the observations below M; ¼ of the listed observations (25th percentile)
Q₃
median of the observations above M; ¾ of the listed observations (75th percentile)
IQR
IQR = interquartile range = Q₃ - Q₁ ; spread of the middle half of the data and used to test outliers
five-number summary
minimum, Q₁, median, Q₃, and maximum; used to describe center and spread of data and to construct box plots
minimum
smallest observation (may or may not include outliers)
maximum
largest observation (may or may not include outliers)
boxplot
graph of the five number summary; box with lines marking the quartiles and median with “whiskers” extending from the quartiles to the min and max; used for side-by-side distribution comparison
modified boxplot
same as a normal boxplot, but outliers are marked separate points and the whiskers extend to the extremes that are not outliers
statistic
numerical value summarizing data for the SAMPLE
parameter
numerical value summarizing data for the entire POPULATION
standard deviation
spread; describes the average distance of observations from their mean; s for sample and σ for population; s = √variance
variance
mean of squared deviations; s² = [∑(x-x̄)f] / n OR n-1 ; use n-1 for samples and n for populations
percentile
position; kth percentile = Pₖ = at most k% of observations fall below the value at Pₖ; (# of scores at or below given score)/(total # of scores); vertical axis of ogive graph
ogive
graph measuring scores against percentile; make a histogram, then make a line from left to right connecting points on upper right corners and the last point on the lower left
experiment
planned activity with imposed treatment whose results yield data set (without imposed treatment, it’s a study)
data
value of variable associated with one element of population or sample
exploratory data analysis
statistical tools and ideas used to examine data in order to describe their main features
mean absolute deviation (M.A.D)
(∑|x-x̄|f) / n OR n-1 ; use n for population and n-1 for samples; gives average distance from mean but without direction (like standard deviation but using abs. value to get rid of direction instead of square)
degrees of freedom
n-1; all deviations but the last (nth) deviation; used to explain why we divide samples by n-1 instead of n (greater margin of error)