1/23
Flashcards for vocabulary review of descriptive statistics concepts.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive Statistics
To summarise and represent data in a way that humans can easily interpret. It includes techniques like central tendency, dispersion, and association.
Measures of Central Tendency
Mean, Median, Mode
Mean
Arithmetic average: sum of values divided by number of observations. Used with interval/ratio data. Affected by outliers.
Median
Middle value when data is ordered. Best used when data is skewed or has outliers. Appropriate for ordinal and continuous data.
Mode
Most frequently occurring value. Only option for nominal data. Can be used when grouping continuous data into categories.
Statistical Power of the Mean
It includes all data points in the calculation and is used in inferential tests.
Measures of Dispersion
Range, Interquartile Range (IQR), Standard Deviation (SD), Coefficient of Variation (CV)
Range
Difference between highest and lowest values. Simple but sensitive to outliers.
Interquartile Range (IQR)
Spread of the middle 50% of data. Calculated as Q3 - Q1. Used with box plots.
Variance
Average of squared deviations from the mean. Not directly interpretable (unit is squared).
Standard Deviation (SD)
Square root of variance. Indicates average deviation from the mean. Used in inferential tests.
Coefficient of Variation (CV)
SD divided by the mean × 100. Compares relative spread between datasets. Useful for comparing different variables or time periods.
Why Square Deviations in Variance?
To avoid positive and negative values cancelling each other out.
Interpreting SD
Smaller SD = data is tightly clustered around mean. Larger SD = data is more spread out.
Chi-Squared Test
Testing association between two categorical variables. Compares observed vs expected frequencies. Uses degrees of freedom (df) and critical value tables.
Pearson’s Correlation Coefficient (r)
Measures strength and direction of linear relationship between two continuous variables. Ranges from -1 (perfect negative) to +1 (perfect positive). 0 = no linear relationship.
Correlation vs. Causation
Correlation shows a relationship. Causation implies one variable causes another which correlation does not prove.
Skewness
Indicates asymmetry in a distribution. Positive skew: tail to the right (mean > median). Negative skew: tail to the left (mean < median).
Normal Distribution
Bell-shaped, symmetrical. Mean = Median = Mode. Follows empirical rule (68%-95%-99.7% within 1, 2, 3 SDs)
Empirical Rule
68% of data within ±1 SD, 95% within ±2 SD, 99.7% within ±3 SD
Importance of Normality
Many inferential tests assume normally distributed data.
Box Plot Usefulness
Visualising median, IQR, and identifying outliers. Helps assess symmetry/skew.
When to Use Median Instead of Mean
When the data is skewed or contains outliers.
Best Data for CV
Ratio-level data with a meaningful zero.