1/90
These flashcards cover the basics of research methodology and statistical theory, including descriptive and inferential statistics, NHST, ANOVA, and linear regression.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Mean
The average calculated by adding all observed values and dividing by the number of values, denoted by the formula: Xˉ=N1∑i=1NXi
Median
The 'middle observation' where half of the observations are larger and half are smaller than this value.
Mode
The value or values observed most frequently in the data.
Range
The distance between the maximum and minimum values in a dataset; it is very sensitive to outliers.
Standard Deviation
A measure of spread describing how far each point is from the centre of mass, denoted by the formula: S=N1∑i=1N(Xi−Xˉ)2
Operationalisation
The process of relating unobservable theoretical constructs to concrete measures (e.g., using the Beck Depression Inventory to measure depression).
Nominal Scale
A measurement scale where values have no particular relationship or meaningful numbering scheme (e.g., eye colour).
Ordinal Scale
A measurement scale with a natural ordering, but where the differences between values are not meaningful (e.g. ranking data).
Interval Scale
A measurement scale with a natural ordering where differences between numbers are meaningful, but ratios are not (e.g., year).
Ratio Scale
A measurement scale with a natural ordering where 'zero means zero' and both differences and ratios between numbers are meaningful (e.g., age).
Discrete variable
Values that come in specific categories, with no values existing in between (e.g. year, party voted for).
Continuous variable
A value that varies smoothly; there’s always something ‘in between’ (e.g. response time).
Predictor
A variable used to explain other variables, also known as an independent variable or treatment.
Outcome
A variable to be explained in terms of other variables, also known as a dependent variable or response.
Test-retest Reliability
A measure of consistency obtained by conducting the same measurement at two different times to see if results match.
Inter-rater reliability
A measure of consistency obtained by different people conducting the same measurements to see if results match.
Internal consistency reliability
A measure of consistency obtained by conducting the same measurements with theoretically equivalent versions of a measure to see if results match (e.g. what’s your favourite colour v. what colour would you choose).
Frequentist Probability
The degree of belief that probability is objective and represents the long-run frequency of repeatable events.
Bayesian Probability
A subjective view of probability represented as a 'degree of belief' held by an idealised, rational agent.
Binomial Distribution
A distribution used to describe count data of one of two possible events happening.
dbimon()/dnorm
Probability density of a specific outcome (doesn’t work for normal distributions).
pbinom()/pnorm()
Chance that the outcome doesn’t exceed a threshold
qbinom()/qnorm()
Compute some quantile of the distribution.
rbinom()/rnorm()
Sample a random number from a distribution.
Normal Distribution
A continuous distribution described by the mean (μ) and standard deviation (σ), where mean, median, and mode are identical.
Central Limit Theorem
The theory stating that as sample size increases, the sampling distribution of the mean becomes normal and converges on the true population mean (μ).
Standard Error of the Mean (SEM)
A measure reflecting the uncertainty about the mean, calculated as: SEM=Nσ (as sample size increases, the variance goes down)
Confidence Interval (CI)
The range bounded by ±1.96 SEMs that is 95% likely to cover the true population mean: CI95=Xˉ±1.96Nσ^
Null Hypothesis (H0)
The hypothesis being tested which states there is no effect; all NHST statistical claims are specifically about this hypothesis.
Fisher (NHST)
States that hypothesis testing is about trying to falsify a single hypothesis (H) and that Type I error reflects the probability of observing a test statistic at least as extreme as the one that was actually found.
Neyman (NHST)
States that hypothesis testing is about choosing between two rival hypotheses (HA or HB) and that Type I error describes a rate you must be willing to tolerate if you want to reject the null.
Type I Error
A false positive; rejecting the null hypothesis when it is actually true, typically controlled at α=0.05.
Type II Error
A false negative; accepting the null hypothesis when it is actually false, dependent on sample size, effect size and ( α ).
Type I and Type II Error Trade-off
Lower (α) means higher (β )
Increasing sample size, all else equal, increases power (1-β) and decreases the Type II error rate
Type II Error (β)
A false negative; accepting the null hypothesis when it is actually false.
Power (1−β)
The probability of correctly rejecting a false null hypothesis, which increases with larger sample sizes.
Chi-squared Statistic
χ2 : calculated by summing the difference between observed and expected values of categorical data (the larger the value, the worser the fit to the data).
Goodness of Fit Test
A chi-squared test that compares the observed frequencies of one variable against a hypothesis about the true probabilities of that variable, calculated as: χ2=∑Ei(Oi−Ei)2, where Oi are the observed frequencies and Ei are the expected frequencies.
Test of Independence
A chi-squared test that tests whether two nominal-scale variables are related to each other, calculated as: χ2=∑∑Eij(Oij−Eij)2, where Oij are the observed frequencies and Eij are the expected frequencies.
Critical Region for Chi-squared
Calculated by finding the 95% quantile of the distribution w/ the respective degrees of freedom
( qchisq(.95, df = …) )
Chi-squared Standard Residuals
Indicate how many 'standard deviations' away each cell is from the expected frequency, with values beyond ±1.96 suggesting significance.
Cramer’s V
A measure of effect size for chi-squared tests calculated as: V=N(k−1)χ2
Cramer’s V (0 to 0.1)
Negligible association
Cramer’s V (0.1 to 0.3)
Weak association
Cramer’s V (0.3 to 0.5)
Moderate association
Cramer’s V (0.5 to 1)
High association
Chi-squared Assumptions
Large Expected Frequencies: The sampling distribution is valid only if the expected frequencies in each category are sufficiently large (typically at least 5), as it breaks down for too few observations.
Independence of Data: The observations must be independent; there should be no special relationship among them, ensuring that the sampling methods do not introduce bias.
Large Expected Frequencies Violated
Use Fisher Exact Test: works by calculating the exact probability of obtaining a particular contingency table, but assumes rows and columns are fixed
Independence of Data Violated
Use McNemar Test: when have multiple observations for each person, e.g. pre-test and post-test
Z-score
A standardized score with a mean of 0 and a standard deviation of 1: Z=σX−μ (conceptually equivalent to chi-squared adjusted residuals)
T-test statistic
Calculated under the premise that the population distribution is normally distributed. It is determined by averaging several potential values for the population standard deviation, represented as: t=NσXˉ−μ , which approaches a normal distribution as the sample size increases.
T-statistic
t : symmetric about zero, in which deviations demonstrate support against the null hypothesis
Cohen’s d
A simple measure of effect size for t-tests: d=std devmean 1−mean 2
Cohen’s d (0.2)
Small effect size
Cohen’s d (0.5)
Medium effect size
Cohen’s d (0.8)
Large effect size
One Sample T-test
A statistical test used to determine if the mean of a single sample differs significantly from a known population mean.
Independent Sample T-test
A statistical test used to compare the means of two independent samples to determine if they differ significantly from each other.
Paired T-test
A statistical test used to compare the means of two related groups to determine if they differ significantly from each other.
T-test Assumptions
Population distributions are normal
Observations are independently sampled
Homogeneity of Variance (groups have the same standard deviation)
T-test Normality Violated
Use QQ-plots to observe the quantiles of data, as compared against the theoretical quantiles of the normal distribution. If not identical (a nice straight line), either use the Shapiro-Wilk Test or Wilcoxon.
Shapiro-Wilk Test
A statistical test used to determine whether a sample comes from a normally distributed population. It assesses the normality of data by comparing the observed distribution to an expected normal distribution. Values less than 1 and a significant p-value imply deviations from normality.
Wilcoxon
A non-parametric statistical test used to evaluate whether there is a significant difference between the distributions of two related samples or matched observations. It is used when the assumptions of the t-test are violated. However, can lead to higher Type II error.
Wilcoxon (0.1 to 0.3)
Small effect size
Wilcoxon (0.3 to 0.5)
Medium effect size
Wilcoxon (>0.5)
Large effect size
QQ-plot
A scatterplot of actual quantiles of the observed data against theoretical quantiles of the normal distribution to assess the normality of data. If the points deviate significantly from the diagonal line, the normality assumption is considered violated.
Student T-test
A statistical test that assumes that both groups have equal variance.
Welch T-test
An adaptation of the t-test used when the assumption of equal variance between groups is violated.
One-way ANOVA
A statistical test used to determine if the population means for multiple groups are identical by comparing variability between groups (SSb) and within groups (SSw ). If the between groups variability is significantly greater than within groups variability, it suggests that at least one group mean is different.
Between Groups Variability (SSb)
The variability in scores that is attributed to the differences among group means in an ANOVA.
Within Groups Variability (SSw )
The variability in scores that is attributed to differences within individual scores in the same group in an ANOVA.
F-statistic
The ratio of mean square between groups to mean square within groups: F=MSwMSb (means are more different when this value is larger and small when the null is true)
Two-way ANOVA
An extension of ANOVA that evaluates the effect of two independent variables on a dependent variable, allowing for interaction effects between the variables. Results are different from running two separate one-way ANOVAs as residuals are different (SSR).
Residual Sum of Squares (SSR)
The total variation in the dependent variable that is not explained by the independent variables in an ANOVA analysis. It reflects the variability within the groups after accounting for the effects of the independent variables.
Interaction Sum of Squares (SSA:B)
The portion of total variation in an ANOVA that is attributed to the interaction between two independent variables. It assesses how the interaction influences the dependent variable beyond the individual effects of the variables.
Eta Squared (η2)
A measure of effect size in ANOVA representing the proportion of total variance attributable to a factor, calculated by dividing (SSB) by (SStot).
Holm Correction
A method to control the family-wise Type I error rate by sorting p-values and adjusting them sequentially.
Bonferroni Correction
A method to control the family-wise Type I error rate by multiplying all original p-values by the number of tests (tends to lose a lot of power).
Post Hoc Tests
Statistical tests applied after ANOVA to determine which specific group means are significantly different from each other, and for which there are no particular hypotheses.
Linear Regression Model
A mathematical relationship expressed as: Yi=b1X+b0+ϵi, where b1 is the slope, b0 is the intercept, and ϵ is the residual.
ANOVA Assumptions
Residuals are normally distributed (i.e. within-groups variance)
Homogeneity of variance across all groups
Independence
Residual Normality Violated (ANOVA)
Use Shapiro-Wilk Test on residuals
Akaike Information Criterion (AIC)
A measure for model selection that penalizes model complexity: AIC=σ2SSres+2K
Cook’s Distance
A metric quantifying the influence of a data point by combining its 'outlier-ness' and its leverage.
Variance Inflation Factor (VIF)
A measure used to quantify the extent of collinearity among predictors in a regression model.
Measures of central tendency
Mean, median, mode
Measures of spread
Range, interquartile range, standard deviation
Interquartile range
A measure of spread that describes the difference between the first and third quartiles in a dataset, indicating the range of the middle 50% of values
Theoretical constructs
Concepts or models used in statistical analysis to represent phenomena, often not directly observable (e.g. attitudes, beliefs, information processing speeds)
Measure
Tool for getting people to produce data that is informative about the construct (e.g. survey items, reaction times)