Chapter 3

studied byStudied by 5 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 65

flashcard set

Earn XP

Description and Tags

66 Terms

1
What is a data warehouse?
A data warehouse is an integrated, time-variant, nonvolatile collection of data in support of management's decision-making process.
New cards
2
What are the characteristics of a data warehouse?
  1. Subject oriented - ••Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support.

  2. Integrated - •Data warehouses must place data from different sources into a consistent format.

  3. Time variant - A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems).

  4. Non-volatile - After data are entered into a data warehouse, users cannot change or update the data

New cards
3
What is a data mart?
a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business
New cards
4
What is a dependent data mart?
A subset that is created directly from a data warehouse
New cards
5
What is an independent data mart?
A small data warehouse designed for a strategic business unit or a department
New cards
6
What is an operational data store (ODS)?
A type of database often used as an interim area for a data warehouse \n \n Unlike the static contents of a data warehouse, the contents of an ODS are updated throughout the course of business operations. \n \n An ODS is used for short-term decisions involving mission-critical applications rather than for the medium- and long-term decisions
New cards
7
What is an enterprise data warehouse?
An enterprise data warehouse (EDW) is a large-scale data warehouse that is used across the enterprise for decision support; it's used to provide data for many different types of decision support systems
New cards
8
What is metadata?
"data about data" \n In DW metadata describe the contents (structure and meaning) of a data
New cards
9
Describe the steps of the data warehouse framework?
Data sources -> ETL process -> EDW & metadata -> data marts -> applications (visualization)
New cards
10
Describe the 3-tier data warehouse archictecture
Tier 1: client workstation \n Tier 2: application server \n Tier 3: database server
New cards
11
Describe the 2 tier DW archictecture
client workstation -> application & database server
New cards
12
Describe a web-based data warehouse architecture
Web server is central, affecting and being affected by web pages, an application server, a data warehouse, and the client's web browser (the internet)
New cards
13
what are opermarts?
created when operations data needs to be analyzed multidimensionally
New cards
14
What are the different types of metadata?
syntactic, structural, semantic (meaning)
New cards
15
What are the three parts to data warehouse architectures?
  1. data warehouse itself

  2. Back-end software -> data acquisition -> extracts, consolidates, and loads data into the DW

  3. Client (front-end) software -> allows users to access and analyze data from the warehouse

New cards
16
What is the advantage of 3 tier archictectures?
its separation of the functions of a DW, which eliminates resource constraints and makes it possible to easily create data marts
New cards
17
Web-based data warehousing
client workstation -> Internet -> web server
New cards
18
What are the alternative DW architectures?
Independent Data Marts; Data Mart Bus Arch; Hub & spoke architecture; centralized DW architecture; federated architecture
New cards
19
Describe the data mart bus architecture
individual marts linked together via some kind of middleware
New cards
20
Describe the hub-and-spoke architecture
focused on building a scalable and maintainable infrastructure; includes a centralized DW and several dependent DMs
New cards
21
Describe a centralized DW architecture
similar to hub-and-spoke, except no dependent DMs; a gigantic EDW that serves the needs of all organizational units
New cards
22
Describe a federated DW
involves integrating disparate systems; works well to supplement DWs but not replace them!
New cards
23
What are the worst DW architectures?
Independent DMs, and federated
New cards
24
What are the best DW architectures?
hub & spoke, centralized, data mart bus; depends on situation; bub and spoke is the most expensive, but it is bet for Enterprise-wide implementations and larger warehouses
New cards
25
Name the main data integration technologies
Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA), Enterprise Info Integration (EII), and ETL
New cards
26
Describe Enterprise Application Integration (EAI):
A technology that provides a vehicle to integrate a set of enterprise applications.
New cards
27
Enterprise information integration (EII)
An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
New cards
28
What issues affect whether an organization will purchase a data transformation tool?
tools are expensive, they may have a long learning curve, and it is difficult to measure how the IT org is doing until it has learned to use the tool
New cards
29
What are the four categories of ETL technologies?
  1. Sophisticated

  2. Enabler

  3. Simple

  4. Rudimentary

New cards
30
What are the criteria for selecting an ETL tool?
  1. The ability to read from and write to an unlimited number of data source architectures

  2. Automatic capturing and delivery of metadata

  3. A history of conforming to open standards

  4. An easy-to-use interface for the developer and functional user

New cards
31
What does it mean when extensive ETL is performed?
this is a sign of poorly managed data and a fundamental lack of a coherent data management strategy
New cards
32
What are the benefits of a data warehouse?
  1. End users can perform extensive analysis in numerous ways

  2. A consolidated view of corporate data is possible

  3. Better and more timely info is possible

  4. Enhanced system performance can result

  5. Data access is simplified

New cards
33
What defines a successful DW project?
  1. Clearly defining the business objective

  2. Gathering project support from management end users

  3. Setting reasonable time frames & budgets

  4. Managing expectations

New cards
34
What are the two competing DW approaches?
Inmon Model (EDW Approach - top down) \n Kimball Model (Data mart approach -> bottom-up)
New cards
35
Describe the Inmon model (EDW Approach)
Scope includes several subject areas; very expensive, difficult, and takes a long time to develop; very large; primary audience is IT professionals
New cards
36
Describe the Kimball model (Data mart approach)
bottom-up; starts with data marts focused on specific departments; much smaller and simpler than EDW approach; the primary audience is the end user
New cards
37
What are the benefits of a hosted data warehouse?
\-Requires minimal investment in infrastructure \n -Frees up capacity on in-house systems \n -Frees up cash flow \n -Makes powerful solutions affordable \n -Enables solutions that provide for growth \n -Offers better quality equipment and software \n -Provides faster connections \n -Enables users to access data from remote locations \n -Allows a company to focus on core business \n -Meets storage needs for large volumes of data
New cards
38
What is dimensional modeling?
A retrieval-based system that supports high-volume query access (e.g. star schema and snowflake schema)
New cards
39
Describe Star Schema.
•The most commonly used and the simplest style of dimensional modeling \n •Contain a fact table surrounded by and connected to several dimension tables
New cards
40
describe snowflake schema
•An extension of star schema where the diagram resembles a snowflake in shape. \n - In the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema's dimensions are denormalized, with each dimension being represented by a single table.
New cards
41
OLTP (online transaction processing)
Capturing and storing data from ERP, CRM, POS, ... \n The main focus is on efficiency of routine tasks
New cards
42
OLAP (online analytical processing)
Converting data into information for decision support \n Data cubes, drill-down / rollup, slice & dice \n Requesting ad hoc reports \n Conducting statistical and other analyses \n Developing multimedia-based applications
New cards
43
Describe the skills of a Data Warehouse Administrator
•have the knowledge of high-performance software, hardware, and networking technologies \n •possess solid business knowledge and insight \n •be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure \n •possess excellent communications skills
New cards
44
What are the future sources of data?
\-Web, social media, and Big Data \n -Open source software \n -SaaS (software as a service) \n -Cloud computing \n -Data lakes
New cards
45
What are the major differences between Data Warehouses and Data Lakes?
DW: data is structured, processed using SQL, expensive, less agile, mature, well-secured, and used by business professionals \n \n DL: data is raw and in any format, processed using NoSQL, slow, low-cost, agile, new, not well-secured, and used by data scientists
New cards
46
What is business performance management?
BPM refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance
New cards
47
What are the steps within a closed-loop process for optimized business performance?
1\.Strategize \n 2.Plan \n 3.Monitor/analyze \n 4.Act/adjust
New cards
48
What are the common tasks for the strategic planning process?
1\.Conduct a current situation analysis \n 2.Determine the planning horizon \n 3.Conduct an environment scan \n 4.Identify critical success factors \n 5.Complete a gap analysis \n 6.Create a strategic vision \n 7.Develop a business strategy \n 8.Identify strategic objectives and goals
New cards
49
What is a performance measurement system?
A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives
New cards
50
What is an example of a lagging indicator or outcome KPI?
Revenues
New cards
51
What is an example of a leading indicator or driver KPI?
sales leads
New cards
52
What is the balanced scorecard?
A performance measurement and management methodology that helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives
New cards
53
What are the four perspectives of the balanced scorecard?
  1. Financial

  2. Customer

  3. Internal

  4. Learning and Growth

New cards
54
What does each perspective of the BSC need to have?
goals and metrics
New cards
55
What is Six Sigma?
a methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible
New cards
56
What is the DMAIC performance model?
A closed-loop business improvement model that encompasses the steps of defining, measuring, analyzing, improving, and controlling a process
New cards
57
What is the difference between BSC and Six Sigma?
BSC: focused on strategy, vision, long-term growth \n \n Six Sigma: focuses on aggressive improvement, performance, process management and feedback, profitability
New cards
58
Why is data arranged into cubes during OLAP?
to overcome the limitation of databases, since relational databases are not well suited for near instantaneous analysis of large amounts of data (better suited for transactions)
New cards
59
What are the most commonly used OLAP operations?
slice, dice, drill-down/up, roll-up, pivot
New cards
60
What are the main issues pertaining to data warehouse scaleability?
\-amount of data in the DW \n -how quickly the DW is expected to grow \n -the complexity of user queries \n -the # of current users
New cards
61
What is meant by "good scaleability"?
that queries and other data access functions will grow (ideally) linearly with the size of the DW
New cards
62
Effective security in a DW should focus on 4 main areas
  1. Establishing effective corporate and security policies and procedures

  2. Implementing logical security procedures & techniques to restrict access

  3. Limiting physical access to the data center environment

  4. Establishing an effective internal control review process with an emphasis on security and privacy

New cards
63
The future of data warehousing:
Web, social media, & Big Data; open source software; SaaS; cloud computing; Data lake
New cards
64
What is SaaS (Software As A Service)?
provider licenses its application to customers for use as a service on demand (usually over the internet) e.g. Google Drive
New cards
65
What is cloud computing?
is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. e.g. Dropbox, Gmail
New cards
66
What is a data lake?
is a centralized repository that allows you to store all your structured and unstructured data at any scale.
New cards
robot