1/46
A comprehensive set of vocabulary flashcards covering major concepts, graphical tools, and numerical measures introduced in the lecture on descriptive statistics and exploratory data analysis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive Statistics
Methods that summarize or describe the important characteristics of a known data set.
Inferential Statistics
Techniques that use sample data to make generalizations or inferences about a population.
Center (of a data set)
A representative value, such as the mean or median, indicating where the middle of the data lies.
Variation
A measure of how much the data values differ from one another.
Distribution (shape)
The overall pattern or form of how data values are spread, e.g., bell-shaped, uniform, or skewed.
Outlier
A data value that lies far from the majority of other observations.
Frequency Table
A table that lists classes (categories) of values with the count of observations in each class.
Lower Class Limit
The smallest value that can belong to a particular class in a frequency table.
Upper Class Limit
The largest value that can belong to a particular class in a frequency table.
Class Boundary
The value that separates adjacent classes; halfway between upper limit of one class and lower limit of the next.
Class Midpoint
The average of the upper and lower class limits; used as a representative value for the class.
Class Width
The difference between two consecutive lower class limits (or boundaries).
Relative Frequency
Class frequency divided by the sum of all frequencies, usually expressed as a percentage.
Cumulative Frequency
Running total of frequencies for classes up to a given point.
Histogram
A bar graph of class frequencies where bars touch, depicting the distribution of quantitative data.
Relative Frequency Histogram
A histogram whose vertical axis shows relative frequencies instead of raw counts.
Dot Plot
A simple graph placing a dot for each data value above a number line.
Stem-and-Leaf Plot
A display that separates each data value into a stem (leading digits) and leaf (trailing digit).
Scatter Diagram
A graph of paired (x, y) data points used to assess relationships between two variables.
Mean (Arithmetic Mean)
The sum of all data values divided by the number of values; the average.
Population Mean (µ)
The mean of all values in an entire population, denoted by the Greek letter mu (µ).
Sample Mean (x̄)
The mean of values in a sample, denoted by x-bar (x̄).
Σ (Sigma)
Mathematical symbol indicating the summation of a set of numbers.
n
The number of data values in a sample.
N
The number of data values in a population.
Median
The middle value when data are ordered; 50th percentile, denoted by x͂ (x-tilde).
Mode
The data value that occurs most frequently in a set.
Bimodal
A distribution with exactly two modes.
Multimodal
A distribution with more than two modes.
Midrange
The value halfway between the highest and lowest data values; (max + min)/2.
Symmetric Distribution
A distribution whose left half mirrors its right half.
Skewed Distribution
A distribution that stretches further on one side; can be skewed left (negative) or right (positive).
Range
The difference between the highest and lowest data values.
Standard Deviation
A measure of how far data values typically deviate from the mean.
Sample Standard Deviation (s)
Standard deviation of sample data; denominator uses (n − 1).
Population Standard Deviation (σ)
Standard deviation of all population data; denominator uses N.
Variance
The square of the standard deviation (s² for a sample, σ² for a population).
Range Rule of Thumb
Approximation that range ≈ 4 × standard deviation for many data sets.
Empirical Rule
For bell-shaped data: ~68 % within 1 s, 95 % within 2 s, 99.7 % within 3 s of the mean.
Chebyshev’s Theorem
In any data set, at least 1 − 1/K² of values lie within K standard deviations (K > 1) of the mean.
z-Score
Number of standard deviations a value x is above or below the mean; z = (x − mean)/s.d.
Quartiles (Q1, Q2, Q3)
Values that divide ordered data into four equal parts; Q2 is the median.
Deciles
Cut points that divide ordered data into ten equal parts (D1–D9).
Percentile
A score indicating the percentage of data values below it; e.g., P25 = Q1.
Five-Number Summary
Minimum, Q1, Median (Q2), Q3, Maximum.
Boxplot (Box-and-Whisker)
Graph of the five-number summary that highlights center, spread, and outliers.
Exploratory Data Analysis (EDA)
Use of graphical and numerical tools to understand data characteristics before formal inference.