1/61
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Population
The entire group you want to learn about
Sample
The subset you actaully observe
Parameter
A number describiing the population (usually unknown)
Statistic
A number calculated from the sample (used to estimate parameters)
Population distribution
Distribution of Individual values in the population
Sample distribution
distribution of individual values in your sample
Sampling distribution
distribution of statistic across many possible samples
Central Limit Theorem (CLT)
For sufficiently large samples, the sampling distribution of the sample mean is approximately normal, regardless of the shape of the population distribution
Standard error
standard deviation of the sampling distribution
Normal distribution
Bell-shaped and symmetric, mean = median = mode, completely determined by two parameters: mu (center) and sigma (spread), notation: X ~ N (mu, sigma²)
Point estimate
Sample mean in confidence intervals
Confidence interval
A range of values calculated from sample data that are likely to contain the true, unknown population parameter with a specific level of confidence
Confidence level
The confidence level is the percentage of times you expect to get close to the same estimate if you run your experiment again or resample the population in the same way.
Margin of error
A statistic showing how much data may differ from the true population using + and - percentage
Critical value
The cutoff point from a probability distribution (like the z- or t-distribution) that determines how far the sample statistic can deviate from the population parameter while still being consistent with a specified confidence level. The multiplier.
Null hypothesis
The skeptic’s position; what we’re trying to disprove
Alternative hypothesis
The research claim; what we want to show
Test statistic
Measures how many standard errors our estimate is from the null value
P-value
The probability of data this extreme is Ho is true
Significance level
The threshold for determining if a result is statistically significant in a hypothesis test
Type 1 error
False positive (rejecting a true Ho)
Type 2 error
False negative (failing to reject a false Ho)
One-sided test
Checks for a difference in a specific direction. The unknown true population mean is either specifically higher or lower than the null hypothesis mean.
Two-sided test
Checks for any difference. The unknown true population mean is not the same as the null hypothesis mean, but the direction is not specified.
Statistical signficance
The result’s p-value is less than the alpha of 0.05.
Practical significance
Is the effect of statistical significance large enough to matter outside of the study?
Correlation
Measures the strength and direction of the linear relationship between two continuous variables. Range between 0 and 1.
Causation
One variable directly causes change in another.
Counterfactual
What would have happened under the alternative condition and it is unobservable
Average Treatment Effect (ATE)
The average causal effect of a treatment, calculated as the average outcome for those who received the treatment minus the average outcome for those who did not, assuming treatment assignment is independent of potential outcomes.
Confounding variable
A confounding variable is a third variable that is related to both the independent variable (IV) and the dependent variable (DV) and can distort the observed relationship between the IV and DV.
Direct causal relationship
X → Y
Spurious relationship
Z → X and Z → Y
Chain/mediation relationship
X → Z → Y
Internal validity
Checks if the study established causation. Did the treatment cause the effect?
External validity
Checks if the results of the study can be generalized in the real world
Selection bias
Sample selection (who volunteers), treatment selection (who seeks treatment), attritrion (who drops out)
Measurement validity
If your study actually captures the real-world concept. Lab vs. reality, self-report vs. behavior, proxy measures
Generalizability
Whether the finding will apply with different populations, different settings, and different times
Demand effects/reactivity
Whether people behave differently because they’re being studied. Hawthorne effects, social desirability bias, experimenter demand
Reverse causality
Does X cause Y or does Y cause X (or both?)
Correlation coefficient (r)
Gives us a single number that summarizes linear relationship between two variables
Intercept (a)
Tells us what value the model predicts if the dv is 0
Slope (b)
Tells us the amount of growth per increase of one unit
Residual
The difference between an actual observed data point and the value predicted by a model
Least squares/ols
A method for estimating a regression line by choosing the coefficients that minimize the sum of squared residuals (the squared differences between observed and predicted values)
R²
R² is the proportion of the variation in the dependent variable (Y) that is explained by the regression model
Adjusted R²
R² is the proportion of the variation in the dependent variable (Y) that is explained by the regression model
Control variable
Fixed effects
Interaction term
Multicollinearity
Heteroscedasticity
Nonlinearity
Dummy variable
A binary variable used to represent categories in regression model
Reference category
The omitted group when using dummy variables. All dummy variable coefficients are interpreted relative to this group
Control variable
a variable included in a regression to account for other factors that may affect the outcome, helping isolate the relationship between the main independent variable and the dependent variable
Fixed Effects
Control for all unobserved, time-invariant characteristics of units by comparing each unit to itself overtime
Interaction term
Allows the effect of one independent variable on the dependent variable to depend on the value of another variable
Multicollinearity
When two or more independent variables are highly correlated, making coefficient estimates unstable and standard errors large
Heteroscedasticity
Occurs when the variance of the regression errors is not constant across values of the independent variables, which leads to incorrect standard errors
Nonlinearity
When the relationship between the independent variable and the dependent variable is not linear, meaning a straight line is not an appropriate fit