1/32
Flashcards covering key vocabulary related to data summarization, types of variables, measures of center, measures of variability, and graphical summaries in statistics.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive statistics
The field of statistics that summarizes and describes data, often the first step in statistical analysis.
Data summarization
The process of providing a concise description of a dataset, making it easier to understand and discover information.
Raw data
Data as it is collected before any interpretation or summarization.
Qualitative variables (Categorical variables)
Variables whose outcomes are descriptive and non-numerical.
Quantitative variables (Numerical variables)
Variables whose outcomes are numerical, such as height and arm span.
Dichotomous variables
A type of qualitative variable with only two possible outcomes (e.g., gender, allergies).
Nominal variables
A type of qualitative variable whose response options have no natural order or ranking (e.g., marital status).
Ordinal variables
A type of qualitative variable whose response options have some inherent order or ranking (e.g., blood pressure levels).
Frequency distribution table
A table commonly used to summarize qualitative data numerically, usually containing class, frequency, and relative frequency.
Class (in FDT)
One of the categories into which qualitative data can be classified in a frequency distribution table.
Frequency (in FDT)
The number of times a specific category appears in the data set.
Relative frequency (Percentage)
The proportion of times a category appears in the data set, calculated as frequency divided by the total number of observations.
Cumulative frequency
An additional column in frequency distribution tables for ordinal data, showing the sum of frequencies up to a certain class.
Cumulative relative frequency
An additional column in frequency distribution tables for ordinal data, showing the sum of relative frequencies up to a certain class.
Bar Graph
A chart where the horizontal axis lists categories and the vertical axis corresponds to frequencies or percentages, displayed by heights of vertical bars.
Pareto chart
A type of bar graph that displays categories in descending order of frequency, helping to understand their relative importance.
Pie graph (Pie chart)
A circle divided into sections or wedges according to the percentage of each category in qualitative data.
Mean
The average value of the data, found by adding up all values and dividing by the number of observations.
Median
The middle observation of a dataset once arranged in order, dividing the observations such that 50% fall below and 50% fall above it.
Mode
The most frequent value in a dataset.
Resistant measure
A statistical measure that is not easily affected by the presence of outliers (e.g., the median).
Outliers
Extremely high or extremely low data values that fall well outside the overall pattern of the data.
Variability
The spread or dispersion in a dataset, indicating how much data values differ from each other.
Range
The difference between the largest (maximum) and the smallest (minimum) values in a data set.
Sample variance (s^2)
The average of the squares of the distance each data value is from the mean.
Sample standard deviation (s)
The square root of the sample variance, representing the most widely used measure of variability for a continuous variable.
Interquartile range (IQR)
The difference between the first and third quartiles (Q3 - Q1), useful for measuring variability when extreme values are present.
First quartile (Q1)
The value that holds 25% of the data values below it; it is the median of the lower half of the data.
Third quartile (Q3)
The value that holds 25% of the data values above it; it is the median of the upper half of the data.
Lower fence (for outliers)
A boundary calculated as Lower quartile - 1.5 × IQR, used to numerically identify outliers.
Upper fence (for outliers)
A boundary calculated as Upper quartile + 1.5 × IQR, used to numerically identify outliers.
Box plots (Box-whisker plots)
Graphs commonly used for summarizing continuous data, based on a five-number summary.
Five-number summary
A set of five descriptive statistics for a dataset: the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum.