The Nature of Data

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/34

flashcard set

Earn XP

Description and Tags

Flashcards covering fundamental concepts in data types, exploratory data analysis, population vs. sample, and measures of central tendency and spread based on lecture notes from STSTA 198CNL / Duke University / Fall 2025.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

35 Terms

1
New cards

Exploratory data analysis (EDA)

An initial data analysis that summarizes main characteristics, often done through visual means or basic summary statistics.

2
New cards

Nominal data

Named categories without numeric meaning; if only two categories, often referred to as binary or dichotomous.

3
New cards

Binary data

A type of nominal data with only two categories.

4
New cards

Dichotomous data

Another term for binary data, referring to data with only two categories.

5
New cards

Ordinal data

Ordered categories where differences between values are not easily measured, but relative comparisons about differences between levels matter.

6
New cards

Categorical data

A broad type of data that includes nominal and ordinal data, consisting of categories.

7
New cards

Count data

Data representing counts or ranks (e.g., number of alcoholic drinks consumed).

8
New cards

Rank data

Data representing a position in a sequence, derived from ordering a set of items by some characteristic.

9
New cards

Continuous data

Measurable quantities where the difference between possible values can be arbitrarily small, and data might lie within a range or be unbounded.

10
New cards

Numeric data

A broad type of data that includes count/rank data and continuous data, consisting of numerical values.

11
New cards

Population

The entire group of individuals or items that the research question is interested in.

12
New cards

Sample

A subset of the population from which data is collected for analysis.

13
New cards

Parameters

Attributes of the population of interest, not computable directly (unless the entire population is perfectly measured), usually written in Greek letters.

14
New cards

Statistics

Attributes of a sample, a function of the observed values at hand, usually written in Roman letters.

15
New cards

Sample mean

The arithmetic average of values in a sample, calculated as the sum of all values divided by the sample size.

16
New cards

Population mean

The arithmetic average of values in an entire population.

17
New cards

Point estimate

A single value used to estimate an unknown population parameter, such as the sample mean estimating the population mean.

18
New cards

Sample median

The 50th percentile of a sample; the value for which 50% of values are below when observations are ranked numerically.

19
New cards

Percentile

The numeric value at which a specified percentage of values are below.

20
New cards

Robust to extreme values

Describes a statistic (like the median) that is less affected by outliers or extreme values in a dataset compared to others (like the mean).

21
New cards

Sample mode

The most frequent value in a dataset, corresponding to 'peaks' in distributions.

22
New cards

Multimodal distribution

A distribution that has multiple peaks, indicating several frequent values.

23
New cards

Sample minimum

The smallest observation in a dataset.

24
New cards

Sample maximum

The largest observation in a dataset.

25
New cards

Sample range

The difference between the sample maximum and minimum.

26
New cards

Quantiles

Cutpoints that divide data into equal-sized groups (e.g., tertiles, quartiles, quintiles, percentiles).

27
New cards

Interquartile range (IQR)

The width of the middle 50% of the data; the difference between the third and first quartiles.

28
New cards

Five-number summary

A set of five descriptive statistics for a dataset: the sample minimum, first quartile (Q1), median (Q2), third quartile (Q3), and sample maximum.

29
New cards

Outliers

Observations numerically distant from others in a dataset, which should be noted and handled carefully.

30
New cards

Sample variance

Approximately the average squared deviation from the mean in a sample, used to estimate population variance.

31
New cards

Population variance

The average squared deviation from the mean for an entire population.

32
New cards

Sample standard deviation (SD)

The square root of the sample variance, providing a measure of spread in the same units as the original dataset.

33
New cards

Skewed distribution

A distribution that is not symmetric, characterized by a 'tail' on either the right (right-skewed) or left (left-skewed) side.

34
New cards

Right-skewed distribution

A distribution with a tail extending to the right, meaning the majority of data points are concentrated on the lower end.

35
New cards

Chebyshev's inequality

A theorem stating that for any distribution (with a mean and standard deviation), the proportion of values within k standard deviations of the mean is at least 1 - 1/k^2.