UNDERSTANDING DATABASE ANALYTICS

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/76

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

77 Terms

1
New cards

Roll Up

A search technique that approaches a specific analysis topic in phases from a low summary level to a high summary level.

2
New cards

Generalized Sequential Pattern

What is the meaning of the acronym in GSP algorithm?

3
New cards

Star Schema

In this schema, has data duplication but its easy to understand.

4
New cards

OLAP

This technique provides various search techniques that allow end users to analyze data from diverse perspectives and summary levels.

5
New cards

Drill Accross

A search technique that uses a certain analysis viewpoint on one analysis topic to approach another analysis topic.

6
New cards

Sequence

The possibility of a given transaction occurring in the future is forecast by performing time series analysis on transaction history data.

7
New cards

Pivot

A search technique that changes the axis of the analysis perspective on a specific analysis topic.

8
New cards

Integrated

In data warehouse, this characteristics has its data consistency and physical unity through company-wide standardization.

9
New cards

Subject-oriented

The data of a specific subject needed for decision-making activities from an enterprise perspective is saved, while other data are not included.

10
New cards

Time Variant

To analyze past and present trends and forecast the future, a data warehouse retains data for a long time in the form of a series of snapshots.

11
New cards

Clustering

An analysis algorithm that groups records with similar attributes, by considering several attributes of given records (customers, products).

12
New cards

Non-volatile

When a modification occurs in the operation system data, existing data are deleted. This is?

13
New cards

Online Analytical Processing

What is the meaning of the acronym OLAP?

14
New cards

Dimension Table

This table in the data warehouse modeling is a table that has multiple attributes.

15
New cards

Slice

A search technique that creates subsets by selecting specific values for the members at one level or the members above that level.

16
New cards

Data Warehouse Modeling

This refers to a data modeling technique that, from a data analysis perspective, enables to analyze large-scale data from various viewpoints.

17
New cards

LOAD

A phase in which converted data are sent to the warehouse for storage and the necessary indexes are generated.

18
New cards

Data Mining

This is a series of processes that identify a systematic statistical rule or pattern among a large amount of data, convert it into meaningful information, and apply it to corporate decision-making.

19
New cards

Classification

An analysis algorithm that creates a tree-type model, which classifies the values (category values) of a specific attribute (category type) by analyzing a dataset when it is given.

20
New cards

ETL

This refers to the entire process by which data are extracted from the source system and stored in the data warehouse after cleansing and conversion.

21
New cards

Fact Table

This table in the data warehouse modeling is the core table composed of a set of highly relevant measures.

22
New cards

Extract, Transform, and Load

What is the meaning of the acronym ETL?

23
New cards

Transform

A phase of ETL, in which extracted data are cleaned and converted into a data format suitable for the data warehouse.

24
New cards

Association

An analysis algorithm that discovers a pattern using a combination of highly relevant data in transaction data, etc.

25
New cards

Drill Down

A search technique that approaches a specific analysis topic in phases from a high summary level to a low summary level.

26
New cards

Exract

This phase of ETL, data are pull out from the original file or operating system database and stored in the data warehouse.

27
New cards

Snowflake Schema

In this schema, data are normalized to limit data duplication.

28
New cards

Data Warehouse

This is integrated system or database that enables the user to instantly analyze internal data and external data generated by the operation system of an enterprise over time, without the need for separate programming from multiple points of view, by integrating data by subject.

29
New cards

Dice

A search technique that creates subsets by slicing more than two dimensions.

30
New cards

Online Trasaction Processing

What is the meaning of the acronym OLTP?

31
New cards

Customer Relationship Management

CRM meaning

32
New cards

Customer Relationship Management

database is a resource containing all client information collected, governed, transformed, and shared across an organization. It includes marketing and sales reporting tools, which are USEFUL for leading sales and marketing campaigns and increasing customer engagement.

33
New cards

Supply Chain Management

SCM meaning

34
New cards

Supply Chain Management

management of the flow of goods, data, and finances related to a product or service, from the procurement of raw materials to the delivery of the product at its final destination.

35
New cards

Data Warehouse

DW meaning

36
New cards

Data Warehouse

-also known as an enterprise data warehouse

- is a system used for reporting and data analysis and is considered a core component of business intelligence.

- are central repositories of integrated data from one or more different sources.

37
New cards

Data Lake

- a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data.

- It can store data in its native format and process any variety of it, ignoring size limits.

38
New cards

Online Analytical Processing

- is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.

- business intelligence queries often aid in trends analysis, financial reporting, sales forecasting, budgeting and other planning purposes.

39
New cards

Online Transaction Processing

OLTP meaning

40
New cards

Online Transaction Processing

- is a type of data processing that consists of executing a number of transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for example.

41
New cards

Generalized Sequential Pattern

GSP meaning

42
New cards

Generalized Sequential Pattern

- is an algorithm used for sequence mining.

- The algorithms for solving sequence mining problems are mostly based on the apriori (level-wise) algorithm.

- One way to use the level-wise paradigm is to first discover all the frequent items in a level-wise fashion.

43
New cards

Apriori Algorithm

- for frequent item set mining and association rule learning over relational databases.

- It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.

44
New cards

Concept of Data Warehouse

- an integrated system or database that enables the user to instantly analyze internal data and external data generated by the operation system of an enterprise over time, without the need for separate programming from multiple points of view, by integrating data by subject.

45
New cards

- Subject-oriented

- Integrated

- Time Variant

- Non-volatile

Characteristics of Data Warehouse

46
New cards

Subject-oriented

Among the multiple types of operation system data that are managed by the data business functions, the data of a specific subject needed for decision-making activities from an enterprise perspective is saved, while other data are not included.

47
New cards

Integrated

- The structure of a data warehouse is characterized by data consistency and physical unity through company-wide data standardization

- When obtaining data from the operation system, a series of data conversion tasks are performed to integrate the data.

48
New cards

Time variant

- To analyze past and present trends and forecast the future, a data warehouse retains data for a long time in the form of a series of snapshots.

- Users can understand the process of data change over time using the data history.

49
New cards

Non-volatile

- A data warehouse is a read-only database that cannot be deleted or modified once it has been loaded from the operation system database.

- When a modification occurs in the operation system data, existing data are deleted. The data in the data waehouse stores the history of data at each point in time.

50
New cards

Data Warehouse Modeling

- Unlike general ER modeling for OLTP systems, ________ refers to a data modeling technique that, from a data analysis perspective, enables users to analyze large-scale data from various viewpoints.

- Data are generally organized into fact tables and dimension tables so that end users or analysts can easily analyze information.

51
New cards

- Fact Table

- Dimension Table

Components of Data Warehouse Modeling

52
New cards

Fact Table

- A core that composed of a set of highly relevant measures.

- As measurement data, a Measure can observe the goal of information analysis, such as the amount, number, time, etc.

53
New cards

Dimension Table

- A sub-table and a perspective of analyzing each fact.

- has multiple attributes, thus allowing data analysis from diverse perspectives

54
New cards

Data Warehouse Modeling Technique

- data warehouse model organizes data using fact tables and dimension tables to facilitate the analysis of information.

- The technique can be divided into the 'star schema' and the 'snowflake' depending on the dimension table normalization status.

55
New cards

- Star Schema

- Snowflake Schema

2 DWM Technique

56
New cards

Star Schema

- A modeling technique for designing data by separating it into fact tables and dimension tables.

- Data duplication occurs because dimension table data are not normalized.

- The schema is easy to understand and has few joins, thus improving query performance, but data consistency problems may occurs.

57
New cards

Snowflake Schema

- A modeling technique for completely normalizing the dimension table of the star schema.

- Data duplication is rare, and few storage spaces are used owing to the normalization of the dimensional table, but there is some concern about performance degradation due to the greater number of joins compared to the star schema.

58
New cards

Concept of ETL

- ETL refers to the entire process by which data are extracted from the source system and stored in the data warehouse after cleansing and conversion.

- It plays the role of maintaining data consistency and integrity among the components of the data warehouse,

- and is also called ETT (Extraction Transformation, Transportation).

59
New cards

- Extraction

- Transformation

- Loading

3 Phase

60
New cards

Extraction

- This phase in which data are extracted from the original file or operating system database and stored in the data warehouse.

- In the past, data were extracted on a daily or monthly basis, but in some recent cases data were extracted in real time using database logs according to te business requirements

61
New cards

Transformation

- A phase in which extracted data are cleaned and converted into data format suitable for the data warehouse.

- In the event of data quality problems, data are cleansed according to the reference data or business rules

- The original data format is converted into a data format suitable for tha data warehouse.

62
New cards

Loading

- A phase in which converted data are sent to the warehouse for storage and the necessary indexes are generated.

- Full and partial update techniques are available.

63
New cards

Concept of OLAP

- OLAP refers to the process by which the end user accesses multi-dimensional information without an intermediary or medium, and then analyzes the information interactively and uses it for decision making.

- That is, when the operational data extracted and converted by ETL are stored in the data warehouse or data mart, the end user analyzes them using OLAP

64
New cards

OLAP search technique

- OLAP provides various search techniques that allow end users to analyze data from diverse perspectives and summary levels.

65
New cards

- Drill Down

- Roll Up

- Drill Across

- Pivot

- Slice

- Dice

OLAP search techniques

66
New cards

Drill Down

A search technique that approaches a specific analysis topic in phases from a high summary level to a low (detail) summary level.

67
New cards

Roll Up

- Concept opposite to Drill Down

- A search technique that approaches a specific analysis topic in phases from a low summary level to a high summary level

68
New cards

Drill Across

A search technique that uses a certain analysis viewpoint on one analysis topic to approach another analysis topic.

69
New cards

Pivot

A search technique that changes the axis of the analysis perspective on a specific analysis topic.

70
New cards

Slice

A search technique that creates subsets by selecting specific values for the members at one level or the members above that level.

71
New cards

Dice

A search technique that creates subsets by slicing more than two dimensions.

72
New cards

Concept and Algorithm of Data Mining

- Data mining refers to a series of processes that identify a systematic statistical rule or pattern among a large amount of data, convert it into meaningful information, and apply it to corporate decision-making.

73
New cards

- Association

- Sequence

- Classification

- Clustering

4 Data Mining Algorithms

74
New cards

Association

- An analysis algorithm that discovers a pattern using a combination of highly relevant data in transaction data, etc.

- Apriori algorithm (etc.)

- This algorithm is mainly used to place products by analyzing offline stores, and to recommend relatd products automatically at online shopping malls, etc.

75
New cards

Sequence

- An analysis algorithm that searches the correlation of items over time by adding the concept of time to association analysis.

- The possibility of a given transaction occuring in the future is forecast by performing time series analysis on transaction history data.

- Apriori Algorithm, Generalized Sequential Patterns (GSP), etc.

76
New cards

Classification

- An analysis algorithm that creates a tree-type model, which classifies the values (category values) of a specific attribute (category type) by analyzing a dataset when it is given

- Decision tree algorithm, etc

- What is the Decision Tree in Data Mining? A Decision Tree is a plan that includes a root node, branches, and leaf nodes. Every internal node characterizes an examination on an attribute, each division characterizes the consequence of an examination, and each leaf node grasps a class tag.

77
New cards

Clustering

- An analysis algorithm that groups records with similar attributes, by considering several attributes of given records (customers, products).

- K-Means algorithm, EM algorithm, etc.

- k-means is a technique for data clustering that may be used for unsupervised machine learning. It is capable of classifying unlabeled data into a predetermined number of clusters based on similarities (k)

- The Expectation-Maximization Algorithm, or EM algorithm for short, is an approach for maximum likelihood estimation in the presence of latent variables. A general technique for finding maximum likelihood estimators in latent variable models is the expectation-maximization (EM) algorithm.