Chapter 3

0.0(0)
studied byStudied by 5 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/65

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

66 Terms

1
New cards
What is a data warehouse?
A data warehouse is an integrated, time-variant, nonvolatile collection of data in support of management's decision-making process.
2
New cards
What are the characteristics of a data warehouse?

1. Subject oriented - ••Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support.
2. Integrated - •Data warehouses must place data from different sources into a consistent format.
3. Time variant - A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems).
4. Non-volatile - After data are entered into a data warehouse, users cannot change or update the data
3
New cards
What is a data mart?
a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business
4
New cards
What is a dependent data mart?
A subset that is created directly from a data warehouse
5
New cards
What is an independent data mart?
A small data warehouse designed for a strategic business unit or a department
6
New cards
What is an operational data store (ODS)?
A type of database often used as an interim area for a data warehouse \n \n Unlike the static contents of a data warehouse, the contents of an ODS are updated throughout the course of business operations. \n \n An ODS is used for short-term decisions involving mission-critical applications rather than for the medium- and long-term decisions
7
New cards
What is an enterprise data warehouse?
An enterprise data warehouse (EDW) is a large-scale data warehouse that is used across the enterprise for decision support; it's used to provide data for many different types of decision support systems
8
New cards
What is metadata?
"data about data" \n In DW metadata describe the contents (structure and meaning) of a data
9
New cards
Describe the steps of the data warehouse framework?
Data sources -> ETL process -> EDW & metadata -> data marts -> applications (visualization)
10
New cards
Describe the 3-tier data warehouse archictecture
Tier 1: client workstation \n Tier 2: application server \n Tier 3: database server
11
New cards
Describe the 2 tier DW archictecture
client workstation -> application & database server
12
New cards
Describe a web-based data warehouse architecture
Web server is central, affecting and being affected by web pages, an application server, a data warehouse, and the client's web browser (the internet)
13
New cards
what are opermarts?
created when operations data needs to be analyzed multidimensionally
14
New cards
What are the different types of metadata?
syntactic, structural, semantic (meaning)
15
New cards
What are the three parts to data warehouse architectures?

1. data warehouse itself
2. Back-end software -> data acquisition -> extracts, consolidates, and loads data into the DW
3. Client (front-end) software -> allows users to access and analyze data from the warehouse
16
New cards
What is the advantage of 3 tier archictectures?
its separation of the functions of a DW, which eliminates resource constraints and makes it possible to easily create data marts
17
New cards
Web-based data warehousing
client workstation -> Internet -> web server
18
New cards
What are the alternative DW architectures?
Independent Data Marts; Data Mart Bus Arch; Hub & spoke architecture; centralized DW architecture; federated architecture
19
New cards
Describe the data mart bus architecture
individual marts linked together via some kind of middleware
20
New cards
Describe the hub-and-spoke architecture
focused on building a scalable and maintainable infrastructure; includes a centralized DW and several dependent DMs
21
New cards
Describe a centralized DW architecture
similar to hub-and-spoke, except no dependent DMs; a gigantic EDW that serves the needs of all organizational units
22
New cards
Describe a federated DW
involves integrating disparate systems; works well to supplement DWs but not replace them!
23
New cards
What are the worst DW architectures?
Independent DMs, and federated
24
New cards
What are the best DW architectures?
hub & spoke, centralized, data mart bus; depends on situation; bub and spoke is the most expensive, but it is bet for Enterprise-wide implementations and larger warehouses
25
New cards
Name the main data integration technologies
Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA), Enterprise Info Integration (EII), and ETL
26
New cards
Describe Enterprise Application Integration (EAI):
A technology that provides a vehicle to integrate a set of enterprise applications.
27
New cards
Enterprise information integration (EII)
An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
28
New cards
What issues affect whether an organization will purchase a data transformation tool?
tools are expensive, they may have a long learning curve, and it is difficult to measure how the IT org is doing until it has learned to use the tool
29
New cards
What are the four categories of ETL technologies?

1. Sophisticated
2. Enabler
3. Simple
4. Rudimentary
30
New cards
What are the criteria for selecting an ETL tool?

1. The ability to read from and write to an unlimited number of data source architectures
2. Automatic capturing and delivery of metadata
3. A history of conforming to open standards
4. An easy-to-use interface for the developer and functional user
31
New cards
What does it mean when extensive ETL is performed?
this is a sign of poorly managed data and a fundamental lack of a coherent data management strategy
32
New cards
What are the benefits of a data warehouse?

1. End users can perform extensive analysis in numerous ways
2. A consolidated view of corporate data is possible
3. Better and more timely info is possible
4. Enhanced system performance can result
5. Data access is simplified
33
New cards
What defines a successful DW project?

1. Clearly defining the business objective
2. Gathering project support from management end users
3. Setting reasonable time frames & budgets
4. Managing expectations
34
New cards
What are the two competing DW approaches?
Inmon Model (EDW Approach - top down) \n Kimball Model (Data mart approach -> bottom-up)
35
New cards
Describe the Inmon model (EDW Approach)
Scope includes several subject areas; very expensive, difficult, and takes a long time to develop; very large; primary audience is IT professionals
36
New cards
Describe the Kimball model (Data mart approach)
bottom-up; starts with data marts focused on specific departments; much smaller and simpler than EDW approach; the primary audience is the end user
37
New cards
What are the benefits of a hosted data warehouse?
\-Requires minimal investment in infrastructure \n -Frees up capacity on in-house systems \n -Frees up cash flow \n -Makes powerful solutions affordable \n -Enables solutions that provide for growth \n -Offers better quality equipment and software \n -Provides faster connections \n -Enables users to access data from remote locations \n -Allows a company to focus on core business \n -Meets storage needs for large volumes of data
38
New cards
What is dimensional modeling?
A retrieval-based system that supports high-volume query access (e.g. star schema and snowflake schema)
39
New cards
Describe Star Schema.
•The most commonly used and the simplest style of dimensional modeling \n •Contain a fact table surrounded by and connected to several dimension tables
40
New cards
describe snowflake schema
•An extension of star schema where the diagram resembles a snowflake in shape. \n - In the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema's dimensions are denormalized, with each dimension being represented by a single table.
41
New cards
OLTP (online transaction processing)
Capturing and storing data from ERP, CRM, POS, ... \n The main focus is on efficiency of routine tasks
42
New cards
OLAP (online analytical processing)
Converting data into information for decision support \n Data cubes, drill-down / rollup, slice & dice \n Requesting ad hoc reports \n Conducting statistical and other analyses \n Developing multimedia-based applications
43
New cards
Describe the skills of a Data Warehouse Administrator
•have the knowledge of high-performance software, hardware, and networking technologies \n •possess solid business knowledge and insight \n •be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure \n •possess excellent communications skills
44
New cards
What are the future sources of data?
\-Web, social media, and Big Data \n -Open source software \n -SaaS (software as a service) \n -Cloud computing \n -Data lakes
45
New cards
What are the major differences between Data Warehouses and Data Lakes?
DW: data is structured, processed using SQL, expensive, less agile, mature, well-secured, and used by business professionals \n \n DL: data is raw and in any format, processed using NoSQL, slow, low-cost, agile, new, not well-secured, and used by data scientists
46
New cards
What is business performance management?
BPM refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance
47
New cards
What are the steps within a closed-loop process for optimized business performance?
1\.Strategize \n 2.Plan \n 3.Monitor/analyze \n 4.Act/adjust
48
New cards
What are the common tasks for the strategic planning process?
1\.Conduct a current situation analysis \n 2.Determine the planning horizon \n 3.Conduct an environment scan \n 4.Identify critical success factors \n 5.Complete a gap analysis \n 6.Create a strategic vision \n 7.Develop a business strategy \n 8.Identify strategic objectives and goals
49
New cards
What is a performance measurement system?
A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives
50
New cards
What is an example of a lagging indicator or outcome KPI?
Revenues
51
New cards
What is an example of a leading indicator or driver KPI?
sales leads
52
New cards
What is the balanced scorecard?
A performance measurement and management methodology that helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives
53
New cards
What are the four perspectives of the balanced scorecard?

1. Financial
2. Customer
3. Internal
4. Learning and Growth
54
New cards
What does each perspective of the BSC need to have?
goals and metrics
55
New cards
What is Six Sigma?
a methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible
56
New cards
What is the DMAIC performance model?
A closed-loop business improvement model that encompasses the steps of defining, measuring, analyzing, improving, and controlling a process
57
New cards
What is the difference between BSC and Six Sigma?
BSC: focused on strategy, vision, long-term growth \n \n Six Sigma: focuses on aggressive improvement, performance, process management and feedback, profitability
58
New cards
Why is data arranged into cubes during OLAP?
to overcome the limitation of databases, since relational databases are not well suited for near instantaneous analysis of large amounts of data (better suited for transactions)
59
New cards
What are the most commonly used OLAP operations?
slice, dice, drill-down/up, roll-up, pivot
60
New cards
What are the main issues pertaining to data warehouse scaleability?
\-amount of data in the DW \n -how quickly the DW is expected to grow \n -the complexity of user queries \n -the # of current users
61
New cards
What is meant by "good scaleability"?
that queries and other data access functions will grow (ideally) linearly with the size of the DW
62
New cards
Effective security in a DW should focus on 4 main areas

1. Establishing effective corporate and security policies and procedures
2. Implementing logical security procedures & techniques to restrict access
3. Limiting physical access to the data center environment
4. Establishing an effective internal control review process with an emphasis on security and privacy
63
New cards
The future of data warehousing:
Web, social media, & Big Data; open source software; SaaS; cloud computing; Data lake
64
New cards
What is SaaS (Software As A Service)?
provider licenses its application to customers for use as a service on demand (usually over the internet) e.g. Google Drive
65
New cards
What is cloud computing?
is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. e.g. Dropbox, Gmail
66
New cards
What is a data lake?
is a centralized repository that allows you to store all your structured and unstructured data at any scale.