Data Warehousing and Data Mining – Unit 1 Concepts (Vocabulary)

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key concepts, terms, and definitions from the data warehousing lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Data Warehouse

A repository that stores data from multiple sources, organized, cleansed, and standardized for enterprise use; stores current and historical data for analysis.

2
New cards

Subject-oriented

Data is organized around a theme or topic (e.g., sales, product).

3
New cards

Integrated

Consolidates data from disparate sources to create consistency across the enterprise.

4
New cards

Time-variant

Data is organized in time intervals (e.g., weekly, monthly, quarterly).

5
New cards

Non-volatile

Once in the warehouse, data remains unchanged and is not deleted.

6
New cards

Summarized

Data is often aggregated for optimized reporting.

7
New cards

ETL

Extract, Transform, Load — the process of moving and preparing data from source systems into the data warehouse.

8
New cards

Extract

Phase of ETL where data is pulled from operational or transactional systems.

9
New cards

Transform

Phase of ETL where data is cleaned, formatted, and converted to fit the warehouse schema.

10
New cards

Load

Phase of ETL where transformed data is loaded into the data warehouse for analytics and reporting.

11
New cards

OLAP

Online Analytical Processing; a data processing framework that supports complex analytical queries and decision-making, typically using a dimensional model.

12
New cards

OLTP

Online Transactional Processing; a data processing framework for day-to-day operations, real-time processing, simple queries, and a relational model.

13
New cards

Data Marts

Subsets of a data warehouse focused on a specific department or unit and optimized for its needs.

14
New cards

Data Lakes

A decentralized approach storing raw, untransformed data from operational sources; often uses ELT (loading first, transforming later).

15
New cards

On-Premises

Traditional deployment of data warehouses inside an organization’s local infrastructure.

16
New cards

Cloud

Deployment of data warehouses in the cloud, with upkeep managed by a third party, offering scalability and off-site hosting.

17
New cards

Big Data

Massive amounts of data, often raw and unstructured, coming in various formats.

18
New cards

NoSQL

Non-relational databases capable of handling structured, semi-structured, and unstructured data at large scale.

19
New cards

Data Warehouse vs Database

Data warehouse is designed for analysis and reporting with integrated, time-variant data; database supports day-to-day transactions with simple queries.

20
New cards

ELT

Extract, Load, Transform — data is loaded first and then transformed inside the warehouse or data store.

21
New cards

ETL vs ELT

ETL transforms data before loading; ELT loads raw data first and then transforms it inside the warehouse.

22
New cards

Data quality and consistency

Benefits of DW include improved data quality and consistency due to cleansing and standardization across sources.

23
New cards

Data ownership concerns

Disadvantages of DW include potential governance and ownership issues when integrating data from multiple sources.

24
New cards

Deployment considerations

DW deployments can be traditional on-premises or modern cloud-based, affecting maintenance and focus on business goals.