1/119
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is a Dataset?
A collection of objects and their attributes used for analysis
What is an Attribute in data mining?
A property or characteristic of an object. Also known as variable, field, characteristic, dimension, or feature
What is an Object in data mining?
A collection of attributes. Also known as record, point, case, sample, entity, or instance
What are the 5 important characteristics of datasets?
Size, Dimensionality, Sparsity, Distribution, Resolution
Why is Size an important dataset characteristic?
The type of analysis often depends on the size of the data
Why is Dimensionality an important dataset characteristic?
High-dimensional data presents unique challenges in analysis
Why is Sparsity an important dataset characteristic?
It emphasizes the importance of presence over absence in the data
What are the 4 main types of datasets?
Record Data, Graphs and Networks, Ordered (Sequence) Data, Spatial Data
What is Record Data?
Records with fixed attributes, including relational records, data matrix, and transaction data
Give 3 examples of Graphs and Networks datasets
Transportation network, Social or information networks, Molecular Structures
Give 3 examples of Ordered (Sequence) Data
Video (sequence of images), Genetic Sequence Data, Temporal sequence
Give 2 examples of Spatial Data
RGB Images, Satellite images
What are the 4 types of attributes?
Nominal, Ordinal, Interval, Ratio
What is a Nominal attribute?
Unordered categories (e.g., gender, eye color, types of fruit like apple, orange)
What is an Ordinal attribute?
Ordered categories (e.g., grades A/B/C, height tall/medium/short, swimming level beginner to advanced)
What is an Interval attribute?
Numerical with equal intervals but no true zero (e.g., calendar dates, temperatures in Celsius or Fahrenheit)
What is a Ratio attribute?
Numerical with equal intervals and a true zero (e.g., temperature in Kelvin, length, counts, elapsed time)
What operations can be performed on Nominal attributes?
Distinctness only (=, ≠)
What operations can be performed on Ordinal attributes?
Distinctness (=, ≠) and Order (
What operations can be performed on Interval attributes?
Distinctness (=, ≠), Order (
What operations can be performed on Ratio attributes?
Distinctness (=, ≠), Order (
What is a Discrete Attribute?
An attribute that takes values from a finite or countable set (e.g., gender, eye color, swimming level). Typically represented as integers
What is a Continuous Attribute?
An attribute that takes values within a continuous range (e.g., height, length, temperature). Typically represented as floating-point variables
What are Binary attributes?
A special case of discrete attributes with only two possible values
What are Asymmetric Attributes?
Attributes where only the presence (non-zero value) matters, not the absence
Give 2 examples of asymmetric attributes
Words present in documents, Items present in customer transactions
Why do we focus on presence in asymmetric attributes?
In real scenarios (e.g., grocery shopping), we don't say purchases are similar because we both didn't buy most of the same products. We focus on what was bou