1/89
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
trimmed mean
deleting percentage of smallest and largest values from data set and computing mean of the rest
bimodel
data set with 2 modes
multimodal
data set with more than 2 modes
weighted mean
(weight of obs x value of obs) / weight of obs
geometric mean definition
used in analyzing growth rates in financial data; applied anytime want to determine mean rate of change over successive periods
geometric mean formula
xg = nth root of [(x1)(x2)…(xn)]
percentile definition
at least p% of items take on this value or less, at least (100-p)% of items take on this value or more
location of pth percentile
Lp = (p/100)(n-1)
1st quartile
25th percentile
2nd quartile
50th percentile
3rd quartile
75th percentile
measures of variability
range, interquartile range, variance, standard deviation
interquartile range formula
IQR = Q3 - Q1
variance definition
difference between xi and the mean
sample variance formula
s² = (sum of [xi-x]²) / (n-1)
population variance formula
S² = (sum of [xi-u]²) / (N)
standard deviation definition
measures the dispersion of a dataset relative to its mean
standard deviation formula
square root of the variance (s for sample, omega for population)
coefficient of variation definition
indicates how large the standard deviation is in relation to the mean
coefficient of variation formula
CV = (population standard deviation) / (population mean)
left skewness
skewness is negative, mean < median, tail to the left
right skewness
skewness is positive, mean > median, tail is to the right
highly skewed right
skewness is often above 1.0
z-scores definition
number of standard deviations a data value is from the mean
z-score formula
zi = (xi-x) / (s)
negative z-score
data value < sample mean
positive z-score
data value > sample mean
Chebyshev’s Theorem
at least 1 - (1/z²) of items will be within z standard deviations of the mean
z=2
at least 75% data points
z=3
at least 89% data points
z=4
at least 94% data points
empirical rule definition
when data approximates a bell-shaped distribution, can determine % of values within specified number of standard deviations
empirical rule ± 1 standard deviation
68%
empirical rule ± 2 standard deviations
95%
empirical rule ± 3 standard deviations
99.7%
outlier characteristics
± 3 standard deviations away, incorrectly recorded, included, or just unusual
boxplot requirements
smallest value, first quartile, median, third quartile, largest value
lower limit formula
Q1 = 1.5(IQR) below Q1
upper limit
Q3 = 1.5(IQR) above Q3
covariance definition
linear association between two variables
sample covariance formula
sxy = (sum of [(xi-x)(yi-y)]) / (n-1)
population covariance formula
sigmaxy = (sum of [(xi-Mx)(yi-Yy)]) / (N)
correlation coefficient definition
linear association, not necessarily causation, can take on values between -1 and 1
sample correlation coefficient formula
rxy = (sxy) / [(sx)(sy)]
population correlation coefficient formula
Pxy = (sigmaxy) / [(sigmax)(sigmay)]
strong negative correlation
-1
strong positive correlation
1
data dashboard characteristics
graphical and numerical
drilling down (data dashboards)
functionality in interactive dashboards that allow user to access information at a detailed level
probability definition
likelihood an event will occur, scale from 0-1
stats experiment
generates a well-defined outcome
sample space
set of all experimental outcomes
sample point
experimental outcome
coin toss experiment
the actual toss
coin toss sample space
heads, tails
counting rule for multiple step experiment definition
sequence of k steps in which there are n1 possible results for first step, n2 for second step…
total number of experimental outcomes formula
(n1)(n2)k(nk)
combinations formula
CNn = (N!) / (n!)(N-n)!
permutations formula
PNn = (n!) [(N!) / (N-n)!]
event
collection of sample points
probability
sum of probabilities of sample points in the event
complement of event definition
event consisting of all sample points NOT in A; P(A) = P(Ac)
Union of two events definition
event containing all sample points IN A OR B OR BOTH; A U B
intersection of two events definition
set of all sample points in BOTH A AND B
mutually exclusive events definition
events have no sample points in common, when one event occurs the other can’t occur
conditional probability definition
probability of event given that another event has occurred
(independent) P(A and B)=
P(A)P(B)
(dependent) P(A and B)=
P(A)P(B|A)
independent events definition
if probability of A is not changed by the existence of B
if one mutually exclusive event is known to occur,
the other event’s probability is 0
2 events that are not mutually exclusive,
might or might not be independent
Bayes’ Theorem
prior probability, likelihood of the evidence, and posterior probability
Bayes’ Theorem formula
P(A|B) = [P(B|A)P(A)] / P(B)
probability distribution definition
describes how probabilities are distributed over values of random variables
discrete uniform probability distribution
f(x) = 1/n
discrete uniform expected value
E(x) = xf(x)
discrete uniform variance
(x-u)²f(x)
bivariate distribution definition
2 random variables being looked at, interested in relationship between them
binomial probability distribution characteristics
experiment with n identical trials, 2 outcomes success and failure on each, prob of success is p, does not change from trial to trial, trials are independent
binomial prob dist formula
f(x) = (n!) / [x!(n-x)!] p^x (1-p)^(n-x)
binomial prob dist variance formula
var(x) = np(1-p)
poisson probability distribution characteristics
prob of occurrence is same for any 2 intervals of equal length, occurrence or non in interval is independent of occurrence in any other interval, number of occurrences per interval
poisson formula
f(x) = [(M^x)(e^-M)] / x!
hypergeometric prob dist
trials not independent, prob of success does change from trial to trial
uniform prob dist expected value
E(x) = (a+b)/n
uniform prob dist variance
(b-a) / n
normal prob dist characteristics
skewness is 0, mean median mode same
z-score formula
z = (x-M) / standard dev
exponential prob dist characteristics
time takes to complete a task, mean and standard dev are same, length of interval between occurrences
exponential prob formula
p(x-x0) = 1+e^(-x0/M)