Lecture 3: Data Exploration and Discretization

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

flashcard set

Earn XP

Description and Tags

Flashcards covering data exploration concepts (summary statistics, visualization) and discretization methods from Lecture 3.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

Summary statistics

A set of measures that summarize data, e.g., frequency and mean.

2
New cards

Frequency

The percentage of times a value occurs in the data set.

3
New cards

Mode

The most frequent attribute value.

4
New cards

Mean

Arithmetic average; a location measure that is sensitive to outliers.

5
New cards

Median

The middle value; a measure of central tendency often used as an alternative to the mean.

6
New cards

Range

Difference between the maximum and minimum values.

7
New cards

Variance

A measure of the spread of a data set; a common dispersion metric.

8
New cards

Visualization

Conversion of data into visual representations to reveal patterns, relationships, and outliers.

9
New cards

Scatter plot

A two-dimensional plot showing relationships between two numeric attributes; can use size, shape, and color to encode extra attributes.

10
New cards

Histogram

A chart showing the distribution of a single variable by binning values into intervals.

11
New cards

Discretization

Turning a numeric (continuous) attribute into a categorical attribute by dividing its range into sub-ranges (bins).

12
New cards

Bin (bucket)

A sub-range of values used in discretization.

13
New cards

Equal-width discretization

Divides the value range into N equal-sized subranges; bin width = (max – min) / N.

14
New cards

Equal-frequency discretization

Divides the range into N bins so each bin holds roughly the same number of instances.

15
New cards

Unsupervised discretization

Discretization methods that do not use class values when creating bins (e.g., equal-width, equal-frequency).

16
New cards

Supervised discretization

Discretization methods that consider class values to choose bin boundaries.

17
New cards

Entropy-based discretization

A supervised method using information entropy to select bin boundaries for better class separation.

18
New cards

Iris dataset

A classic data set with three flower classes (Setosa, Virginica, Versicolor) and four attributes (sepal/petal length/width).