1/40
Vocabulary flashcards covering key concepts from the 'BUSINESS INTELLIGENCE' lecture notes, including data cubes, OLAP operations, types of databases, data mining patterns and techniques, and data cleaning methods.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Cube
A multi-dimensional data structure used in business intelligence to represent data along dimensions like Time, Location, and Product.
OLAP Operations
A set of analytical operations used on data cubes, including Roll-Up, Drill-Down, Slice and Dice, and Pivot.
Roll-Up
An OLAP operation that performs aggregation on a data cube by climbing up to a higher level of information.
Drill-Down
An OLAP operation that starts with high-level information and subdivides it further into more detailed levels (e.g., year into month, weeks, days).
Slice and Dice
An OLAP operation that performs analysis on a small, specific part of a given data cube.
Pivot (Rotate)
An OLAP operation used for visualization that rotates the data axes to provide an alternative view on the data.
Traditional Databases (Relational Databases)
Also known as Relational Databases or DBMS, used for defining database structure, data storage, concurrent access, and consistent information storage, typically using SQL and storing data in normalized forms.
DBMS
An acronym for Database Management Systems, often synonymous with Relational Databases, used for data storage, access, and structure definition.
Object-Oriented Databases
Databases based on object-oriented programming, where each entity is an object containing variables, messages for communication, and methods to return values.
Object-Relational Databases
An extension of object-oriented databases that provides rich data types to handle complicated objects, class hierarchies, and object inheritance.
Spatial Databases
Databases that contain spatial-related information, such as geographical data, chip designs, medical equipment, or satellite images, with capabilities to display and analyze this data.
Temporary & Time-Series Databases
Databases that store short time-related or time-evolving attributes, such as stock trades, used to calculate trends or object evolution over time.
Text & Multimedia Databases
Databases capable of storing word descriptions, reports, notes (can be unstructured/semi-structured), and multimedia assets like images, music, and video.
Heterogeneous Databases
Databases that contain elements from different types of database systems.
Legacy Databases
Databases that support legacy data, often having a long lifespan due to factors like government regulations.
Descriptive Patterns
Patterns that characterize general properties of data, such as totals, often used in reports.
Predictive Patterns
Patterns that perform intensive calculations on current data to classify new data based on already learned patterns.
Classification and Prediction
The process of finding models or functions to create descriptions and training datasets, often using decision trees, to classify new data.
Decision Tree
A flow-chart-like description with an "If-Then" structure, where each node tests an attribute, each branch represents an outcome, and each leaf represents a class or class description.
Characterization (Concept/Class Description)
A summary of data being studied, describing general properties (e.g., "Over the course of last year we have sold 20 software titles").
Discrimination (Concept/Class Description)
A comparison of relevant data, such as possible growth or decline over time, where data under comparison must be comparable and have a valid business reason.
Cluster Analysis
A statistical method that focuses on identifying characteristics that bind data together, such as location or product, to group similar data points.
Outliers
Objects that do not comply with the general behavior or model of data, often detected using statistical tests and used to detect fraud.
Data Warehousing
Architectures and tools for business executives to systematically organize, understand, and use data to make strategic decisions.
Noisy Data
Data that contains errors, outliers, or inconsistencies, often requiring methods like binning or smoothing for cleaning.
Binning
A technique for handling noisy data by partitioning a dataset into smaller groups or "bins."
Mean
The "average" value in a dataset.
Median
The "middle" value in a dataset when ordered.
Mode
The number that occurred the most often in a dataset.
Range
The difference between the highest and lowest value in a dataset.
Partition into Equidepth
A binning method where each bin or bucket has the same number of values.
Smoothing by Bin Mean
A data cleaning method where each value in a bin is replaced with the mean (average) of that bin.
Smoothing by Bin Median
A data cleaning method where each value in a bin is replaced with the median (middle value) of that bin.
Smoothing by Bin Mode
A data cleaning method where each value in a bin is replaced with the mode (most frequent value) of that bin.
Smoothing by Bin Range
A data cleaning method where each value in a bin is replaced with the range (difference between highest and lowest value) of that bin.
Smoothing by Bin Boundaries
A data cleaning method where values in a bin are replaced with the closest value from the bin's boundaries.
AND Decision
A logical decision rule where the output is true only if all input conditions are true.
OR Decision
A logical decision rule where the output is true if at least one input condition is true.
XOR Decision
A logical decision rule where the output is true if exactly one input condition is true (exclusive OR).
Information Gain
A measure used in decision tree learning to decide the effectiveness of an attribute in classifying data; high gain indicates a more effective attribute.
Granular Data
Data that is detailed and specific, allowing for more precise analysis and insights.