Data Summarization and Descriptive Statistics

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

Flashcards covering key vocabulary related to data summarization, types of variables, measures of center, measures of variability, and graphical summaries in statistics.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

Descriptive statistics

The field of statistics that summarizes and describes data, often the first step in statistical analysis.

2
New cards

Data summarization

The process of providing a concise description of a dataset, making it easier to understand and discover information.

3
New cards

Raw data

Data as it is collected before any interpretation or summarization.

4
New cards

Qualitative variables (Categorical variables)

Variables whose outcomes are descriptive and non-numerical.

5
New cards

Quantitative variables (Numerical variables)

Variables whose outcomes are numerical, such as height and arm span.

6
New cards

Dichotomous variables

A type of qualitative variable with only two possible outcomes (e.g., gender, allergies).

7
New cards

Nominal variables

A type of qualitative variable whose response options have no natural order or ranking (e.g., marital status).

8
New cards

Ordinal variables

A type of qualitative variable whose response options have some inherent order or ranking (e.g., blood pressure levels).

9
New cards

Frequency distribution table

A table commonly used to summarize qualitative data numerically, usually containing class, frequency, and relative frequency.

10
New cards

Class (in FDT)

One of the categories into which qualitative data can be classified in a frequency distribution table.

11
New cards

Frequency (in FDT)

The number of times a specific category appears in the data set.

12
New cards

Relative frequency (Percentage)

The proportion of times a category appears in the data set, calculated as frequency divided by the total number of observations.

13
New cards

Cumulative frequency

An additional column in frequency distribution tables for ordinal data, showing the sum of frequencies up to a certain class.

14
New cards

Cumulative relative frequency

An additional column in frequency distribution tables for ordinal data, showing the sum of relative frequencies up to a certain class.

15
New cards

Bar Graph

A chart where the horizontal axis lists categories and the vertical axis corresponds to frequencies or percentages, displayed by heights of vertical bars.

16
New cards

Pareto chart

A type of bar graph that displays categories in descending order of frequency, helping to understand their relative importance.

17
New cards

Pie graph (Pie chart)

A circle divided into sections or wedges according to the percentage of each category in qualitative data.

18
New cards

Mean

The average value of the data, found by adding up all values and dividing by the number of observations.

19
New cards

Median

The middle observation of a dataset once arranged in order, dividing the observations such that 50% fall below and 50% fall above it.

20
New cards

Mode

The most frequent value in a dataset.

21
New cards

Resistant measure

A statistical measure that is not easily affected by the presence of outliers (e.g., the median).

22
New cards

Outliers

Extremely high or extremely low data values that fall well outside the overall pattern of the data.

23
New cards

Variability

The spread or dispersion in a dataset, indicating how much data values differ from each other.

24
New cards

Range

The difference between the largest (maximum) and the smallest (minimum) values in a data set.

25
New cards

Sample variance (s^2)

The average of the squares of the distance each data value is from the mean.

26
New cards

Sample standard deviation (s)

The square root of the sample variance, representing the most widely used measure of variability for a continuous variable.

27
New cards

Interquartile range (IQR)

The difference between the first and third quartiles (Q3 - Q1), useful for measuring variability when extreme values are present.

28
New cards

First quartile (Q1)

The value that holds 25% of the data values below it; it is the median of the lower half of the data.

29
New cards

Third quartile (Q3)

The value that holds 25% of the data values above it; it is the median of the upper half of the data.

30
New cards

Lower fence (for outliers)

A boundary calculated as Lower quartile - 1.5 × IQR, used to numerically identify outliers.

31
New cards

Upper fence (for outliers)

A boundary calculated as Upper quartile + 1.5 × IQR, used to numerically identify outliers.

32
New cards

Box plots (Box-whisker plots)

Graphs commonly used for summarizing continuous data, based on a five-number summary.

33
New cards

Five-number summary

A set of five descriptive statistics for a dataset: the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum.