1/38
Flashcards covering key statistical terms, concepts, and methodologies based on the provided lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistic
A numerical summary of a sample.
Non-response bias
Occurs when many people chosen for the sample don't respond, and their non-response systematically affects the survey results.
Undercoverage bias
Occurs when some members of the population are inadequately represented in the sample.
Cluster sample
A sampling method where the population is divided into groups (clusters), some clusters are randomly selected, and all individuals within the chosen clusters are surveyed.
Residual
The difference between an observed value and the value predicted by a regression model (Observed Y - Predicted Y).
Outlier
A data point that deviates significantly from other observations.
Influential point
A data point whose removal causes a substantial change in the regression model (e.g., slope or intercept).
Q3 (Third Quartile)
The value below which 75% of the data falls; the 75th percentile.
Treatment (in experiment)
A specific condition applied to the experimental units being studied.
Correlation coefficient
A measure of the strength and direction of a linear relationship between two quantitative variables, denoted by 'r'.
Coefficient of determination (R-squared)
The proportion of the variance in the dependent variable that can be predicted from the independent variable(s). It is the square of the correlation coefficient (r^2).
Slope (in regression)
In a regression equation, it represents the estimated average change in the response variable for every one-unit increase in the explanatory variable.
Randomization (in experiment)
The process of assigning subjects to different treatment groups purely by chance, to reduce bias and ensure groups are comparable.
Right-skewed distribution
A distribution where the tail on the right side is longer than the left side, often resulting in the mean being greater than the median.
Block design
An experimental design where subjects are divided into groups (blocks) based on a shared characteristic, and treatments are randomly assigned within each block.
IQR (Interquartile Range)
A measure of statistical dispersion, calculated as the difference between the third quartile (Q3) and the first quartile (Q1), representing the middle 50% of data.
Double-blind study
An experiment where neither the participants nor the researchers administering the treatments know who is receiving the actual treatment and who is receiving a placebo.
Categorical data
Data that can be divided into groups or categories, rather than measured numerically (e.g., color, gender).
First quartile (Q1)
The value below which 25% of the data falls; the 25th percentile.
Sensitive question
A survey question that may elicit dishonest responses due to personal or social reasons, leading to response bias.
Z-score
A measure of how many standard deviations an element is from the mean.
Residual plot
A scatterplot of the residuals against the explanatory variable, used to check the appropriateness of a linear model.
High leverage point
A data point that has an unusual x-value (explanatory variable value) compared to the rest of the data.
Symmetric distribution
A distribution where data values are distributed equally around the center, often with the mean and median being approximately equal.
Range
The difference between the maximum and minimum values in a dataset.
Left-skewed distribution
A distribution where the tail on the left side is longer than the right side, often resulting in the mean being smaller than the median.
Stratified sample
A sampling method where the population is divided into homogeneous subgroups (strata), and then a simple random sample is drawn from each subgroup.
Observational study
A study where researchers observe and measure characteristics of subjects without attempting to influence or manipulate any variables.
Experiment
A study where researchers actively impose some treatment on subjects in order to observe their responses.
Statistically significant
A result that is unlikely to have occurred by random chance, suggesting a real effect or relationship.
Explanatory variable
A variable that is thought to explain or cause changes in another variable (the response variable); also known as the independent variable.
Confounding variable
An unmeasured variable that influences both the explanatory and response variables, creating a spurious association.
Response variable
The variable that measures an outcome of interest; also known as the dependent variable.
Predictive power
How well a model or variable can forecast future outcomes, often assessed by R-squared or correlation.
Five-number summary
A set of five values that describe the distribution of data: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Lower fence for outliers
A threshold used to identify potential outliers, typically calculated as Q1 - (1.5 * IQR).
Upper fence for outliers
A threshold used to identify potential outliers, typically calculated as Q3 + (1.5 * IQR).
Experimental unit
The smallest unit to which a treatment is applied in an experiment.
Blinding (in experiment)
The practice of keeping subjects and/or researchers unaware of the treatment assignments, to prevent bias from expectations.