Lecture 3: Data Exploration and Discretization

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/17

Earn XP

Description and Tags

Flashcards covering data exploration concepts (summary statistics, visualization) and discretization methods from Lecture 3.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

Summary statistics

A set of measures that summarize data, e.g., frequency and mean.

New cards

Frequency

The percentage of times a value occurs in the data set.

New cards

Mode

The most frequent attribute value.

New cards

Mean

Arithmetic average; a location measure that is sensitive to outliers.

New cards

Median

The middle value; a measure of central tendency often used as an alternative to the mean.

New cards

Range

Difference between the maximum and minimum values.

New cards

Variance

A measure of the spread of a data set; a common dispersion metric.

New cards

Visualization

Conversion of data into visual representations to reveal patterns, relationships, and outliers.

New cards

Scatter plot

A two-dimensional plot showing relationships between two numeric attributes; can use size, shape, and color to encode extra attributes.

New cards

Histogram

A chart showing the distribution of a single variable by binning values into intervals.

New cards

Discretization

Turning a numeric (continuous) attribute into a categorical attribute by dividing its range into sub-ranges (bins).

New cards

Bin (bucket)

A sub-range of values used in discretization.

New cards

Equal-width discretization

Divides the value range into N equal-sized subranges; bin width = (max – min) / N.

New cards

Equal-frequency discretization

Divides the range into N bins so each bin holds roughly the same number of instances.

New cards

Unsupervised discretization

Discretization methods that do not use class values when creating bins (e.g., equal-width, equal-frequency).

New cards

Supervised discretization

Discretization methods that consider class values to choose bin boundaries.

New cards

Entropy-based discretization

A supervised method using information entropy to select bin boundaries for better class separation.

New cards

Iris dataset

A classic data set with three flower classes (Setosa, Virginica, Versicolor) and four attributes (sepal/petal length/width).

Explore top notes

Origins of Psychology

Updated 329d ago

Note

Patho week2 ch 9

Updated 886d ago

Note

Ch 14 - Money and Banking

Updated 887d ago

Note

12-01: Polynomial Functions

Updated 372d ago

Note

Level 4 PC Assessment Review – Aviation

Updated 944d ago

Note

Carpal Tunnel Syndrome

Updated 946d ago

Note

11-01: Humans and Our Environment

Updated 372d ago

Note

Chapter 7: The Early Republic

Updated 1084d ago

Note

Explore top flashcards

AP Biology Unit 2

Updated 643d ago

Flashcards (82)

OIA2004 LIVER CIRRHOSIS MANAGEMENT

Updated 89d ago

Flashcards (40)

Accounting Unit 3 and 4

Updated 901d ago

Flashcards (39)

The Age of Jackson & Westward Expansion

Flashcards (57)

Flashcards (39)

Flashcards (28)

Cognitive Psychology Exam #3

Updated 32d ago

Flashcards (111)

unit 5

Updated 536d ago

Flashcards (22)