1/51
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Measure the strength of patterns in our data
One of two main goals of multivariate analysis.
Determine if this pattern is strong enough to be believed
The second of two main goals of multivariate analysis.
Data
Information collected to gain knowledge about a field or to answer a question of interest.
Design (in statistics)
Choosing subjects.
Description (in statistics)
Summarizing data.
Inference (in statistics)
Making predictions about a population based on a sample.
Population (statistical definition)
Total set of subjects of interest.
Sample (statistical definition)
Subset of the population on which a study collects its data.
Parameter
Numerical summary of a population. Example: percentage of all adult Americans.
Statistic
Numerical summary of a sample. Example: percentage of adult Americans in a specific location.
Nominal level of measurement
Data that consist of names, labels, or categories only; qualitative and cannot be ranked or ordered.
Ordinal level of measurement
Qualitative data that can be arranged in some order (low to high); no quantitatively fixed space between items.
Interval level of measurement
Quantitative data where intervals are meaningful but ratios are not; has an arbitrary zero point.
Mean
Average value; sum of all values divided by total number of values.
Advantage of mean
Easy to interpret; foundational statistic usable in many procedures; sample mean is best estimate of population mean.
Disadvantage of mean
Requires interval or ratio level data; can be highly influenced by outliers, especially in small sample sizes.
Median
Exact middle value in a sorted data set.
Data requirement for median
Requires ordinal data.
Advantage of median
True model of central tendency because it is always in the middle; not strongly influenced by outliers.
Mode
The most common value in a data set.
Standard deviation
A measure of variability for interval variables equal to the square root of the variance.
Best summary for nominal data
Mode.
Best summary for ordinal data
Mode, Median, and Range.
Best summary for interval-ratio data
Mean, Median, Mode, Range, and Standard Deviation.
Normal distribution characteristics
Symmetrical/even distribution.
Central limit theorem
Things that start as not normal (bi-modal) can become normal with enough added data; sample looks like population as you collect more data.
Z-score
A way to identify one's position and location in a distribution of data; changes a raw score to a standardized or normalized score.
Z-score formula
The individuals data (Xi) - the sample mean (x̅), divided by the samples data’s standard deviation (sx)
Purpose of z-scores
Crucial to make comparisons across variables with different units (income, education years, number of children).
Inferential statistics definition
We rarely observe population parameters, only sample statistics; to draw inferences, samples must be representative.
Sampling error
Almost every sample will miss the true population parameter by a little.
Sampling distribution
Theoretical distribution of a statistic for all possible samples of a given size (N); does not exist empirically.
Sampling with replacement
Choosing a subject randomly as first member, then putting them back before choosing the next member.
Mean of sampling distribution
Equal to the true population mean (
Standard error
Standard deviation of the sampling distribution; equals population standard deviation divided by square root of N (
Population distribution
A real distribution representing characteristics of all members of the population of interest.
Sampling distribution (summary)
A theoretical probability distribution representing results of all possible samples drawn from the population.
Sample distribution
A real distribution describing characteristics of a sample.
Central Limit Theorem summary
Most random samples with relatively large N are close to true population mean; larger N means smaller standard error and closer clustering to true parameter.
Point estimate
Sample statistic used to estimate the exact value of a population parameter (mean, proportion, etc.).
Confidence interval
Range built around a sample statistic within which the population parameter is likely to fall.
Confidence interval width principle
Confidence in a range goes up the wider it is because it can account for more possible values.
Default confidence interval used in class
95 percent confidence interval.
When to reject null hypothesis
p<0.05, it is unlikely the observed difference is by random chance.
Two sample t-test
Compare differences between two sample statistics from two mutually exclusive, independently and randomly selected groups.
Example of two sample t-test
Gender wage gap between men and women.
ANOVA (Analysis of Variance)
Compares differences across more than two groups; essentially a 3+ group t-test.
ANOVA logic
Uses a sampling distribution of variation in means; decomposes variance to compare between-group differences to within-group differences.
ANOVA example age and capital punishment
Different age groups should have different support levels; a particular age group should have similar support levels.
Three steps of quantitative research
Have research idea and hypothesize questions; 2. Find appropriate data/variables; 3. Choose statistical analysis and interpret findings.
Three bivariate associations
X1→Y (direct cause); X1→X2 (cause to mediator); X2→Y (mediator to outcome).
Mediating variable example
Does racism cause educational barriers which then causes Y.