1/50
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
The science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.
Variable
A characteristic or attribute that can assume different values.
Data
Facts or information collected for reference or analysis.
Descriptive Statistics
Consists of the collection, organization, summarization, and presentation of data.
Inferential Statistics
Consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
Population
Consists of all subjects (human or otherwise) that are being studied.
Sample
A group of subjects selected from a population.
Statistic
A characteristic or measure obtained by using the data values from a sample.
Parameter
A characteristic or measure obtained by using all the data values from a specific population.
Numerical Data (Quantitative Data)
Data whose values are numbers or quantities (e.g. 3.5 years, 1.2 kg, 4 ms).
Categorical Data (Qualitative Data)
Data whose values are not numerical in nature but relate to categories (e.g. sex, eye colour, type of policy).
Discrete Data
Numerical data that can only take particular distinct numerical values.
Continuous Data
Numerical data that can take any numerical value in a specified range, e.g. (0,1) or (-∞, ∞).
Attribute (Dichotomous) Data
Categorical data values that have only two categories, e.g. claim/no claim, dead/alive, or male/female.
Nominal Data
Categorical data values that cannot be ordered in any natural way, e.g. type of insurance policy or nature of claim.
Ordinal Data
Categorical data values that can be ordered in a natural way, e.g. exam grades (A, B, C,…) or level of agreement.
Frequency Distribution
A table showing the number of times that each value in a data set has been observed; suitable for categorical or discrete numerical data.
Cumulative Frequency
The sum of all the frequencies up to and including the current point.
Bar Chart
A diagram used for discrete numerical or categorical data where a bar is drawn for each value to show its frequency.
Histogram
A diagram used for continuous data with no spaces between bars; the vertical axis shows frequency density, not frequency.
Frequency Density
Frequency divided by class width; used as the height of bars in a histogram so that bar area equals frequency.
Cumulative Frequency Curve (Ogive)
A graph constructed by plotting cumulative frequencies against the upper limit of each class and joining the points with a smooth curve.
Skewness
A measure of how symmetrical a data set is; the greater the asymmetry, the greater the magnitude of the skewness.
Positively Skewed Distribution
A distribution with a longer tail to the right; mode < median < mean.
Negatively Skewed Distribution
A distribution with a longer tail to the left; mode > median > mean.
Symmetric Distribution
A distribution where mode = median = mean.
Sample Mode
The value in a data set with the highest frequency; the value that occurs most often.
Modal Group
For grouped data, the class interval with the highest frequency.
Sample Mean Formula
x̄ = (x₁ + x₂ + … + xₙ) / n = (1/n) × Σxᵢ; obtained by summing all observations and dividing by the number of observations.
Mean from Frequency Distribution
x̄ = Σ(xᵢfᵢ) / Σfᵢ; used when data is presented as a frequency table using midpoints for grouped data.
Sample Median
The value that splits the data set into two equal halves when observations are ordered from smallest to largest.
Position of the Median
For n ordered observations, the median is at position (n+1)/2; if n is even, average the two middle values.
Lower Quartile (Q1)
The point one quarter of the way through the ordered data set; position = (n+1)/4.
Upper Quartile (Q3)
The point three quarters of the way through the ordered data set; position = 3(n+1)/4.
Interquartile Range (IQR)
A measure of spread equal to Q3 − Q1; not affected by extreme values.
Range
The difference between the largest and smallest values in a data set; easy to calculate but affected by extreme values.
Sample Variance Formula
s² = (1/(n−1)) × Σ(xᵢ − x̄)²; measures the average squared deviation from the mean.
Alternative Sample Variance Formula
s² = (1/(n−1)) × (Σxᵢ² − nx̄²); computationally more convenient than the deviation formula.
Sample Standard Deviation
s = √[( 1/(n−1)) × Σ(xᵢ − x̄)²]; has the same units as the data values.
Variance from Frequency Distribution
s² = (1/(n−1)) × (Σxᵢ²fᵢ − nx̄²); used when data is in a frequency table.
Effect of Adding a Constant on Location
If each value is increased by a constant a, the mode, mean, and median are each increased by a; spread measures are unchanged.
Effect of Multiplying by a Constant on Location
If each value is multiplied by b, the mode, mean, and median are each multiplied by b.
Effect of Multiplying by a Constant on Spread
If each value is multiplied by b: range, IQR, and standard deviation are multiplied by b; variance is multiplied by b²; skewness by b³.
Advantage of the Mode
Easy to calculate and not affected by extreme values.
Disadvantage of the Mode
May not be unique, may not exist, focuses on few values, and has no simple algebraic formula for further use.
Advantage of the Mean
Uses all data values and has mathematical properties useful in further calculations.
Disadvantage of the Mean
Can be distorted by extreme values (outliers).
Advantage of the Median
Not affected by extreme values.
Disadvantage of the Median
Does not use all data values and has no simple algebraic formula for further calculations.
Estimating the Median from Grouped Data
Identify the interval where cumulative frequency reaches n/2, then use linear interpolation to estimate the median value.
Estimating Quartiles from Grouped Data
For Q1, find the interval where cumulative frequency reaches n/4; for Q3, where it reaches 3n/4; then use linear interpolation.