1/34
post midterm, then premidterm
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
normal distribution
values cluster around the average, bell shaped and symmetrical
standard normal distribution
a special, simplified version of the normal distribution
has a mean of 0 and a standard deviation of 1
any normal distribution can be converted into this standard form, which makes it easy to compare and look up probabilities using tables
normal distribution of sample means
the concept where id you take many, many samples of the same sample size from a population and calculate the mean for each sample. If you plot all those sample means, they will form a normal distribution, even if the original population was not normal
the sampling distribution of the mean will be normally distributed, provided the sample size is large enough
central limit theorem (CLT)
if you take large enough random samples from any population, the distribution of the sample means will be approximately normal
lets statisticians use the powerful tools of the normal distribution to make inferences about any population
binomial distribution
a specific type of discrete probability distribution that models the probability of observing a certain number of “successes” in a fixed number of trials
discrete - graph consists of separate bars and not a continuous curve
how it looks depends on the number of trials and the probability of success
normal approximation of the binomial distribution
the binomial distribution is used for “yes/no” or “success/failure” data. When you have a large number of trials, the graph of the binomial distribution starts to look almost exactly like a smooth normal distribution. This allows you to use the simpler normal distribution calculations to estimate binomial probabilities
t-distribution
a distribution that looks very much like the normal distribution (bell-shaped), but its tails are a little fatter and its center is a little flatter. It is used instead of the normal distribution when your sample size is small or when you do not know the populations standard deviation. As the sample size gets larger the t-distribution becomes identical to the normal distribution
confidence interval for the mean
the range of values that is likely to contain the true average (mean) of the population. Instead of guessing a single number, a confidence interval gives a high confidence estimate, often expressed as a percentage
standard deviation
a measure of how spread out the data is from the average. A small SD means data points are clustered tightly. A large Sd means they are very scattered
variance
simply the standard deviation squared (SD²).
both SD and variance are ways to quantify the variability in the data
one-sample t-test
used to determine if the average (mean) of a single group is significantly different from a known or hypothesized value (eg. Is the average score of this year’s students different from the target average of 80?)
paired t-test
a test used to compare the means of two groups when the data points are naturally linked or matched (eg. the same people measured before and after a treatment). It focuses on the average difference between the pairs (eg. Did the average wight of people change after a 6-week diet program?)
two-sample t-test
a test used to compare the means of two completely separate and independent groups to see if they are significantly different from each other (eg. is the average salary of men different from the average salary of women?)
comparing variances
This procedure determines if the spread of two populations is significantly different from each other
often a preliminary step before conducting a two-sample t-test as the t-test sometimes uses different formulas depending on whether the variances are equal or not
transformations
mathematical adjustments applied to data points to make the data better fit the assumptions required for a statistical test, often to make the distribution look more normal
nonparametric tests
Tests that do not assume the data follows a specific distribution. They are used when assumptions like normality are severely violated, especially with small sample sizes. They often work by comparing the ranks of the data instead of the actual values
distribution-free tests
sign test
a simple nonparametric test that determines if two related groups are different by just counting how many data pairs have a positive difference versus a negative difference (eg, comparing only the sign, not the magnitude of the change)
wilcoxon signed-rank test
a nonparametric test used to compare two related samples (like a paired t-test) or to compare a single sample against a median. It uses the rank of the differences to see if one group’s values are consistently higher than the others
mann-whitney u-test (wilcoxon rank-sum test)
a nonparametric test used to compare two independent groups (like a two-sample t-test). It checks if the values in one group tend to be larger than the values in the other group by comparing the ranks of all the data combined
ANOVA
(analysis of variance) a statistical test used to determine if the means of two or more independent groups are significantly different from each other. It works by comparing the variability between the groups to the variability within the groups
planned comparisons (a priori test)
statistical comparisons between specific groups that were decided upon before the data was collected or analyzed, based on specific research questions or theory. These are more powerful than unplanned comparisons
unplanned comparisons (post hoc tests)
statistical comparisons between groups that are performed after a significant result is found in an ANOVA test. Their purpose is to figure out which specific pairs of groups are different
fixed effects
in an experiment, the effect of specific, limited categories chosen by the researcher. The results only apply to those specific categories (eg. comparing exactly three specific drug does: 10mg, 20mg, 30mg)
random effect
in an experiment, the effects of categories that are randomly sampled from a larger population of possible categories (eg. choosing 10 different schools to study, where the goal is to generalize to all schools)
correlation coefficient (r)
a single number between -1 and +1 that measures the strength and direction of the linear relationship between two variables
+ 1 is a perfect positive relationship (as one goes up, the other one goes up)
-1 is a perfect negative relationship (as one goes up the other goes down)
0 means no linear relationship
spearman’s rank correlation (p)
a nonparametric version of correlation. Instead of using the actual data values, it uses the rank of the data to measure the strength of the relationship. It is useful when the relationship is not strictly linear or when the data has outliers
linear regression
a statistical method used to model the relationship between two variables by fitting a straight line to the data. It allows you to predict values of one variable based on the values of the other
confidence intervals (in regression)
a range of values around the regression line (or around a specific predicted value) that you are highly confident contains the true population relationship or prediction
regression toward the mean
the statistical phenomenon where an extremely high or low score on one measurement is likely to be followed by a score that is closer to the average (mean) on a later measurement
nonlinear regression
a statistical method used when the relationship between variables is curved, not a straight line. It uses complex equations (not just y = mx + b) to model the relationship
simulation
using a computer to create artificial data many times based on certain assumptions to study the behavior of a statistical method or to estimate probabilities that are hard to calculate analytically
randomization
the core process of randomly assigning participants to different groups or randomly drawing samples from a population to ensure fairness and eliminate bias
bootstrapping
a computer-intensive method where a single sample is treated as a population, and the computer repeatedly draws sub-samples with replacement from it (typically thousands of times). This creates a simulated sampling distribution to estimate statistics like the standard error or confidence intervals when traditional formulas are difficult to use
maximum likelihood estimation
a sophisticated method for estimating the parameters (like the mean or variance) of a statistical model. It selects the parameter values that make the observed data the most probable (eg. they maximize the likelihood of getting the data you actually got)
log-likelihood ratio test (LR test)
a hypothesis test used to compare two competing statistical models. It calculates the ratio of the likelihoods (often using the natural log) to see if the more complex model (the one with more parameters) provides a significantly better fit to the data than the simpler model