1/79
guess who prioritized another AP class and forgot to study for stats lol gl to you all too!
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
variable
characteristics about each case
case
individual we collect data from
statistic
value calculated from a sample
parameter
value calculated from a population
variance
a measure of spread, standard deviation squared
when data is unimodal symmetric
use mean and standard deviation
when data is skewed
use median and iqr
outlier rule
data values below q1 - 1.5(iqr) and above q3 - 1.5(iqr) are outliers
cumulative frequency graph
y-axis is always 0 - 100%, slope is never negative
z-score
[x - (x-bar)] / s
shifting data
(adding or subtracting) affects measures of center only
rescaling data
(multiplying or dividing) affects measures of center and spread
describing distributions
the distribution of [context] is [shape], centered at a mean/median of [center] and spread out with a standard deviation/iqr of [spread]. [outliers]
explanatory
x, independent
values being changed in an experiment (factors)
response
y, dependent
values being measured in an experiment
describing associations
there is a [direction], [strength], [form] association between [variable 1] and [variable 2]
correlation constant (r)
affected by outliers, unaffected by shifting and rescaling
measures strength of the linear association
slope interpretation
for each additional [explanatory], the predicted [response] increases/decreases by [slope]
Intercept interpretation
the model predicts that a(n) [explanatory] of 0 [x-units] will be [y-intercept]
this is/is not significant…
R^2 interpretation
[R^2] of the variability in [response] is accounted for by differences in the linear model using [explanatory]
residual
y - (y-hat)
“actual minus predicted”
leverage
away from the other points horizontally that has great influence on the linear model and r
influential point
changes the slope significantly
bowl of soup
the size of the sample doesn't matter compared to the quality of the sample (bigger does not equal better)
census
sampling the whole population
sampling frame
the list you draw your sample from
simple random sample (srs)
every possible group of n from a population has an equal chance of being sampled
stratified sample
divide the population into groups of individuals, called strata, that are similar in some way and do an srs in each to form a full sample
cluster sample
divides the population into clusters and randomly select to sample all individuals within them
systematic sample
sample every nth person, randomly determining where to start
multistage sample
a combination of two or more sampling methods
voluntary response bias
people choose whether or not to respond, typically those with stronger opinions
response bias
leads people to respond a certain way
undercoverage
some groups of the population are left out of the sample
nonresponse bias
an individual chosen for the sample cannot be contacted or does not cooperate
convenience sample
type of bias where you choose individuals who are easiest to reach
retrospective observational study
past data
prospective observational study
looking at data as it happens
treatment
made up of a combination of factors, assigned randomly
statically significant
when the observed difference is so great that it couldn't have been due to chance/randomization
blocking
reduces variability so that differences we see can be attributed to the treatments imposed
control group
provides a baseline (“basis for comparison”)
placebo effect
subjects can respond to something that doesn't exist
blinding
single: subjects don't know
double: subjects and researchers don’t know
confounding variable
another variable other than the factors that affects the response variable
lurking variables
affect both the explanatory and response variables
matched pairs design
one subject gets both treatments or use naturally paired subjects
replication
there needs to be more than one subject in each treatment group, if not replicate the experiment again
designing an experiment: introduction
the factors/response variable/treatments are…
[treatment 1] is the control group, which provides a baseline for comparison so we can see if there is an actual difference in [response variable] for subjects who [explanatory]
designing an experiment: randomization
we will randomly assign the n subjects to the treatments
assign them a number, #1-n and use a random number generator
the first [# of subjects] unique numbers are group 1
the next [# of subjects] unique numbers are group 2/3/4..
the remaining [# of subjects] are the last group
repeat this process for the [other] block
designing an experiment: blocking
we block this experiment by [blocking variable] because we want to reduce variability so that we can attribute the differences we see to the treatments being imposed
designing an experiment: blinding
we will (double) blind this experiment
the subjects won’t know what treatment they get [context] {and/or} the researcher won’t know who got what treatment
we blind the subjects to nullify the placebo effect, because if the subjects knew what treatment they were getting, the result could be from knowing instead of [explanatory] {and/or} we blind the researchers to keep them honest
sample space
set of all possible outcomes (S = {1, 2, 3…})
mutually exclusive/disjoint
events don’t share any outcomes in common
addition rule
for mutually exclusive: P(AuB) = P(A) + P(B)in general: P(AuB) = P(A) + P(B) - P(AnB)
union (u)
either event A, B, or both occurring
intersection (n)
event A and B happening
multiplication rule
if there are multiple things happening, multiply their probabilities
Independent events
events A and B are independent if P(B) = [P(B|A)]
conditional probability
P(A|B) = P(AnB) / P(B)
at least one probability
P(not 0) = 1 - P(0)
law of averages
is a scam, do not believe in it or its lies
probabilities do not change the more times one event occurs, nor do they do so to balance things out :c
law of large numbers
as we repeat a random process over and over again, the true probability will emerge
random variable (X)
a number based on a random event
expected value of a random variable
E(X) = Σ(x * P(x))
sum of value of that outcome times the probability it will occur for every outcome
standard deviation of a random variable
SD(X) = sqrt {Σ[(x - (x-bar))^2 * P(X)]}
the square root of the sum of [the value of the outcome minus E(X)] squared times the probability of that event occurring, for every outcome
adding two random variables
E(X + Y) = E(X) + E(Y)
SD(X + Y) = sqrt [SD(X)^2 + SD(Y)^2]
subtracting two random variables
E(X - Y) = E(X) - E(Y)
SD(X - Y) = sqrt [SD(X)^2 + SD(Y)^2]
summing a series
E(X1, X2…Xn) = n * E(X)
SD (X1, X2…Xn) = sqrt [n * SD(X)^2]
Bernoulli trials
only two possible outcomes, probability of success (p) is the same for every trial, trials are independent of each other
geometric probability model
about getting to the first success
P(X=x) = (1-p)^(x-1) * p
probability that the first success is the nth trial
P(X=n) = (1-p)^(n-1) * p
geometric pdf (p, n)
probability that the first success within n trials
P(X=1) + P(X=2)…+ P(x=n)
geometric cdf (p, n)
binomial probability model
fixed number of trials
P(X=x) = (nx) * p^x * (1 - p)^(nx)
n = number of trials
probability of x successes within n amount of trials
P(X=x) = (nx) * p^x * (1 - p)^(nx)
binom cdf (n, p, x)
probability of fewer than x successes within n amount of trials
binom cdf (n, p, [x-1])
x is not included
probability of more than x successes within n amount of trials
1 - binom cdf (n, p, x)
x is not included
probability of at least x successes within n amount of trials
1 - binom cdf (n, p, [x-1])
x is included
probability of at most x successes within n amount of trials
binom cdf (n, p, x)
x is included
probability of between x1 and x2 successes in n amount of trials
binom cdf (n, p, x2) - binom cdf (n, p, (x1-1)