1/78
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
p
fixed constant (qualitative)
^p (p hat)
sample proportion (quantitative), random variable
descriptive statistics
organizing and summarizing data
inferential statistics
formal methods for drawing conclusions and making generalizations from good data
case
individual or object for which we obtain information
variable
characteristic recorded for each case
stacked data format
rows represent cases and columns represent variables
unstacked data format
columns represent variables but rows don't represent cases
ordinal categorical
categories can be ordered or ranked
nominal categorical
categories can't be ordered or ranked
continuous quantitative
value to be measured
discrete quantitative
value to be counted
frequency table
frequency or count of a categorical variable
relative frequency
percentage or proportion for each category of variable
two-way table
used to show relationship between 2 categorical variables
levels
possible values of an explanatory variable
associated
if values of 1 variable tend to be related to values of other variable
causally associated
if changing the value of explanatory variable influences response variable value
lurking variable
variable with important effect on relationship among study variables, but isn't 1 of the explanatory variables studied
observational studies
study in which the researcher doesn't actively control value of any variable, only observes as exist (can't control for lurking variables or be used to establish causation)
experimental study
study in which the researcher actively controls value of one or more explanatory variables
randomized experimental study
study in which explanatory variable for each unit is determined randomly (can be used to establish causation)
single-blind
subjects don't know which group they're in
experimenter bias
distortion that can arise on the part of the experimenter due to how the subjects are assigned to groups
double-blind
neither subjects nor researchers know which group participant has been assigned to
distribution
pattern of variability, provides possible values variable can take on and how often these possible values occur
histogram
visualization of distribution of quantitative variable in which data are binned into discrete groups
skewed right
data piled on left and tail extends to the right
skewed left
data piled on right and tail extends to the left
bar chart
1 bar for each categorical variable (bars don't touch)
categorical variability
diversity of data values (many categories = high variability, few categories = low variability)
measures of center
mean, median, and mode
order statistics
ranked order
resistance measure
measure relatively unaffected by outliers (mean and mode)
measures of variability
numerical values that describe how spread out data are (variance, standard of deviation, range, and IQR (sum of deviations always 0))
mean absolute deviation
average of absolute value of deviations from values to mean
z-score
measures how many standard deviation above/below mean
percentile
measures indicating the value below which a given percentage of observations in a dataset falls
5 number summary
minimum, Q1, Q2/median, Q3, maximum
range
difference between maximum and minimum
IQR
difference between Q1 and Q3
residual
difference between observed response (y) and predicted response (y hat)
least squares regression line/line of best fit
y hat = a + bx
extrapolation
using a regression line to predict y for a value of x that is outside the range of data used to determine the regression line
theoretical probability
relative frequency of the event is the process was repeated infinitely many times
empirical probability
relative frequency of the event based on an experiment or real-life process (or simulation in some cases)
sample space
collection/set of all possible outcomes of an experiment
event
specific collection/set of outcomes, subset of sample space
mutually exclusive/disjoint
A ∩ B = 0
random variable
numerical quantity that changes trial to trial in a random process
parameter
numerical value that describes a population
statistic
numerical value that describes a sample
standard error
standard deviation of sampling distribution, measures how much statistic varies between samples
CLT
good approximation of sampling distribution without needing other simulations
confidence interval (parameter)
captures parameter for specified proportion of all samples
null hypothesis
claim that there's no effect/difference
alternative hypothesis
claim for which we seek evidence
reject H0
p-value < α
fail to reject H0
p-value >= α
What are the steps of a hypothesis test?
State hypotheses and define parameters, state significance level, check assumptions, calculate sample statistic, calculate standardized test statistic, find p-value, make formal decision, and write conclusion.
Population (N)
The entire group of individuals or instances about whom we hope to learn.
Sample (n)
A subset of a population, examined in hope of learning about the population.
Sample mean (x̄)
The arithmetic average of the values in a sample.
Population mean (μ)
The arithmetic average of all values in a population.
Sample variance (s²)
A measure of how spread out the data in a sample is from the sample mean.
Population variance (σ²)
A measure of how spread out the data in a population is from the population mean.
Sample standard deviation (s)
The square root of the sample variance, representing the average distance of sample data points from the mean.
Population standard deviation (σ)
The square root of the population variance, representing the average distance of population data points from the mean.
What is the regression equation ŷ = a + bx?
A linear model where ŷ is the predicted response, a is the y-intercept, b is the slope, and x is the observed explanatory variable.
Additive Rule of Probability
P(A ∪ B) = P(A) + P(B) - P(A ∩ B), used to find the probability of the union of two events.
Complement Rule
P(Ac) = 1 - P(A), used to find the probability of an event not occurring.
Null Hypothesis (H0)
The statement of no effect or no difference, which is assumed to be true until evidence suggests otherwise.
Alternative Hypothesis (HA)
The statement that there is an effect, a difference, or a relationship in the population.
What is the Interquartile Range (IQR)?
The difference between the third quartile (Q3) and the first quartile (Q1), representing the middle 50% of the data.
What is a p-value?
The probability of observing a test statistic at least as extreme as the one calculated, assuming the null hypothesis is true.
What is a confidence interval?
A range of values calculated from sample data that is likely to contain the true population parameter with a specified level of confidence.
What is the Margin of Error?
The range of values above and below the sample statistic that accounts for sampling variability.
What is the Binomial Coefficient?
A formula, denoted as nCr or (n choose k), used to calculate the number of ways to choose k successes in n trials.
What is the Central Limit Theorem (CLT) for proportions?
A theorem stating that the sampling distribution of a sample proportion will be approximately normal if the sample size is large enough.