1/76
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Roll Up
A search technique that approaches a specific analysis topic in phases from a low summary level to a high summary level.
Generalized Sequential Pattern
What is the meaning of the acronym in GSP algorithm?
Star Schema
In this schema, has data duplication but its easy to understand.
OLAP
This technique provides various search techniques that allow end users to analyze data from diverse perspectives and summary levels.
Drill Accross
A search technique that uses a certain analysis viewpoint on one analysis topic to approach another analysis topic.
Sequence
The possibility of a given transaction occurring in the future is forecast by performing time series analysis on transaction history data.
Pivot
A search technique that changes the axis of the analysis perspective on a specific analysis topic.
Integrated
In data warehouse, this characteristics has its data consistency and physical unity through company-wide standardization.
Subject-oriented
The data of a specific subject needed for decision-making activities from an enterprise perspective is saved, while other data are not included.
Time Variant
To analyze past and present trends and forecast the future, a data warehouse retains data for a long time in the form of a series of snapshots.
Clustering
An analysis algorithm that groups records with similar attributes, by considering several attributes of given records (customers, products).
Non-volatile
When a modification occurs in the operation system data, existing data are deleted. This is?
Online Analytical Processing
What is the meaning of the acronym OLAP?
Dimension Table
This table in the data warehouse modeling is a table that has multiple attributes.
Slice
A search technique that creates subsets by selecting specific values for the members at one level or the members above that level.
Data Warehouse Modeling
This refers to a data modeling technique that, from a data analysis perspective, enables to analyze large-scale data from various viewpoints.
LOAD
A phase in which converted data are sent to the warehouse for storage and the necessary indexes are generated.
Data Mining
This is a series of processes that identify a systematic statistical rule or pattern among a large amount of data, convert it into meaningful information, and apply it to corporate decision-making.
Classification
An analysis algorithm that creates a tree-type model, which classifies the values (category values) of a specific attribute (category type) by analyzing a dataset when it is given.
ETL
This refers to the entire process by which data are extracted from the source system and stored in the data warehouse after cleansing and conversion.
Fact Table
This table in the data warehouse modeling is the core table composed of a set of highly relevant measures.
Extract, Transform, and Load
What is the meaning of the acronym ETL?
Transform
A phase of ETL, in which extracted data are cleaned and converted into a data format suitable for the data warehouse.
Association
An analysis algorithm that discovers a pattern using a combination of highly relevant data in transaction data, etc.
Drill Down
A search technique that approaches a specific analysis topic in phases from a high summary level to a low summary level.
Exract
This phase of ETL, data are pull out from the original file or operating system database and stored in the data warehouse.
Snowflake Schema
In this schema, data are normalized to limit data duplication.
Data Warehouse
This is integrated system or database that enables the user to instantly analyze internal data and external data generated by the operation system of an enterprise over time, without the need for separate programming from multiple points of view, by integrating data by subject.
Dice
A search technique that creates subsets by slicing more than two dimensions.
Online Trasaction Processing
What is the meaning of the acronym OLTP?
Customer Relationship Management
CRM meaning
Customer Relationship Management
database is a resource containing all client information collected, governed, transformed, and shared across an organization. It includes marketing and sales reporting tools, which are USEFUL for leading sales and marketing campaigns and increasing customer engagement.
Supply Chain Management
SCM meaning
Supply Chain Management
management of the flow of goods, data, and finances related to a product or service, from the procurement of raw materials to the delivery of the product at its final destination.
Data Warehouse
DW meaning
Data Warehouse
-also known as an enterprise data warehouse
- is a system used for reporting and data analysis and is considered a core component of business intelligence.
- are central repositories of integrated data from one or more different sources.
Data Lake
- a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data.
- It can store data in its native format and process any variety of it, ignoring size limits.
Online Analytical Processing
- is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.
- business intelligence queries often aid in trends analysis, financial reporting, sales forecasting, budgeting and other planning purposes.
Online Transaction Processing
OLTP meaning
Online Transaction Processing
- is a type of data processing that consists of executing a number of transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for example.
Generalized Sequential Pattern
GSP meaning
Generalized Sequential Pattern
- is an algorithm used for sequence mining.
- The algorithms for solving sequence mining problems are mostly based on the apriori (level-wise) algorithm.
- One way to use the level-wise paradigm is to first discover all the frequent items in a level-wise fashion.
Apriori Algorithm
- for frequent item set mining and association rule learning over relational databases.
- It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.
Concept of Data Warehouse
- an integrated system or database that enables the user to instantly analyze internal data and external data generated by the operation system of an enterprise over time, without the need for separate programming from multiple points of view, by integrating data by subject.
- Subject-oriented
- Integrated
- Time Variant
- Non-volatile
Characteristics of Data Warehouse
Subject-oriented
Among the multiple types of operation system data that are managed by the data business functions, the data of a specific subject needed for decision-making activities from an enterprise perspective is saved, while other data are not included.
Integrated
- The structure of a data warehouse is characterized by data consistency and physical unity through company-wide data standardization
- When obtaining data from the operation system, a series of data conversion tasks are performed to integrate the data.
Time variant
- To analyze past and present trends and forecast the future, a data warehouse retains data for a long time in the form of a series of snapshots.
- Users can understand the process of data change over time using the data history.
Non-volatile
- A data warehouse is a read-only database that cannot be deleted or modified once it has been loaded from the operation system database.
- When a modification occurs in the operation system data, existing data are deleted. The data in the data waehouse stores the history of data at each point in time.
Data Warehouse Modeling
- Unlike general ER modeling for OLTP systems, ________ refers to a data modeling technique that, from a data analysis perspective, enables users to analyze large-scale data from various viewpoints.
- Data are generally organized into fact tables and dimension tables so that end users or analysts can easily analyze information.
- Fact Table
- Dimension Table
Components of Data Warehouse Modeling
Fact Table
- A core that composed of a set of highly relevant measures.
- As measurement data, a Measure can observe the goal of information analysis, such as the amount, number, time, etc.
Dimension Table
- A sub-table and a perspective of analyzing each fact.
- has multiple attributes, thus allowing data analysis from diverse perspectives
Data Warehouse Modeling Technique
- data warehouse model organizes data using fact tables and dimension tables to facilitate the analysis of information.
- The technique can be divided into the 'star schema' and the 'snowflake' depending on the dimension table normalization status.
- Star Schema
- Snowflake Schema
2 DWM Technique
Star Schema
- A modeling technique for designing data by separating it into fact tables and dimension tables.
- Data duplication occurs because dimension table data are not normalized.
- The schema is easy to understand and has few joins, thus improving query performance, but data consistency problems may occurs.
Snowflake Schema
- A modeling technique for completely normalizing the dimension table of the star schema.
- Data duplication is rare, and few storage spaces are used owing to the normalization of the dimensional table, but there is some concern about performance degradation due to the greater number of joins compared to the star schema.
Concept of ETL
- ETL refers to the entire process by which data are extracted from the source system and stored in the data warehouse after cleansing and conversion.
- It plays the role of maintaining data consistency and integrity among the components of the data warehouse,
- and is also called ETT (Extraction Transformation, Transportation).
- Extraction
- Transformation
- Loading
3 Phase
Extraction
- This phase in which data are extracted from the original file or operating system database and stored in the data warehouse.
- In the past, data were extracted on a daily or monthly basis, but in some recent cases data were extracted in real time using database logs according to te business requirements
Transformation
- A phase in which extracted data are cleaned and converted into data format suitable for the data warehouse.
- In the event of data quality problems, data are cleansed according to the reference data or business rules
- The original data format is converted into a data format suitable for tha data warehouse.
Loading
- A phase in which converted data are sent to the warehouse for storage and the necessary indexes are generated.
- Full and partial update techniques are available.
Concept of OLAP
- OLAP refers to the process by which the end user accesses multi-dimensional information without an intermediary or medium, and then analyzes the information interactively and uses it for decision making.
- That is, when the operational data extracted and converted by ETL are stored in the data warehouse or data mart, the end user analyzes them using OLAP
OLAP search technique
- OLAP provides various search techniques that allow end users to analyze data from diverse perspectives and summary levels.
- Drill Down
- Roll Up
- Drill Across
- Pivot
- Slice
- Dice
OLAP search techniques
Drill Down
A search technique that approaches a specific analysis topic in phases from a high summary level to a low (detail) summary level.
Roll Up
- Concept opposite to Drill Down
- A search technique that approaches a specific analysis topic in phases from a low summary level to a high summary level
Drill Across
A search technique that uses a certain analysis viewpoint on one analysis topic to approach another analysis topic.
Pivot
A search technique that changes the axis of the analysis perspective on a specific analysis topic.
Slice
A search technique that creates subsets by selecting specific values for the members at one level or the members above that level.
Dice
A search technique that creates subsets by slicing more than two dimensions.
Concept and Algorithm of Data Mining
- Data mining refers to a series of processes that identify a systematic statistical rule or pattern among a large amount of data, convert it into meaningful information, and apply it to corporate decision-making.
- Association
- Sequence
- Classification
- Clustering
4 Data Mining Algorithms
Association
- An analysis algorithm that discovers a pattern using a combination of highly relevant data in transaction data, etc.
- Apriori algorithm (etc.)
- This algorithm is mainly used to place products by analyzing offline stores, and to recommend relatd products automatically at online shopping malls, etc.
Sequence
- An analysis algorithm that searches the correlation of items over time by adding the concept of time to association analysis.
- The possibility of a given transaction occuring in the future is forecast by performing time series analysis on transaction history data.
- Apriori Algorithm, Generalized Sequential Patterns (GSP), etc.
Classification
- An analysis algorithm that creates a tree-type model, which classifies the values (category values) of a specific attribute (category type) by analyzing a dataset when it is given
- Decision tree algorithm, etc
- What is the Decision Tree in Data Mining? A Decision Tree is a plan that includes a root node, branches, and leaf nodes. Every internal node characterizes an examination on an attribute, each division characterizes the consequence of an examination, and each leaf node grasps a class tag.
Clustering
- An analysis algorithm that groups records with similar attributes, by considering several attributes of given records (customers, products).
- K-Means algorithm, EM algorithm, etc.
- k-means is a technique for data clustering that may be used for unsupervised machine learning. It is capable of classifying unlabeled data into a predetermined number of clusters based on similarities (k)
- The Expectation-Maximization Algorithm, or EM algorithm for short, is an approach for maximum likelihood estimation in the presence of latent variables. A general technique for finding maximum likelihood estimators in latent variable models is the expectation-maximization (EM) algorithm.