D426 Database Management Chapter 6: Database Architecture

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/97

Earn XP

Description and Tags

database management

data management

data

sql

mysql

wgu

d426

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

98 Terms

New cards

architecture

describes the components of a computer system and the relationships between those components

New cards

What are the four layers of MySQL components?

Tools
query processor
storage engine/storage manager
file system

New cards

What information does MySQL Enterprise Monitor collect, and what do DBAs use it for?

CPU, memory, index utilization, and information about queries and results. This can be used to manage and tune large databases with many users.

New cards

What does Audit track, and what kinds of entities does it support?

The time that any change is made to the database and who made the change. This supports sensitive databases such as finance, medical, and defense that might receive audits from government or other businesses.

New cards

What are the five categories of utility programs?

installation
client
administrative
developer
miscellaneous

New cards

Most utility programs are intended for __.

database administrators or programmers

New cards

What is the difference between a Connector and an API in MySQL?

Connectors are newer versions of APIs that are mostly built on top of APIs and developed directly by Oracle as opposed to a third party.

New cards

Name the two main functions of a query processor.

Manage connections between tools and the query processor.
Compile queries and generate execution plans.

New cards

execution plan

a detailed, low-level sequence of steps that specify exactly how to process a query

New cards

query parser

checks each query for syntax errors and converts valid queries to an internal representation

New cards

query optimizer

reads the internal representation of a query, generates alternative execution plans, estimates execution times, and selects the fastest plan

New cards

What does a query optimizer base its estimates of query time on?

Heuristics and statistics about data, like the number of rows in each table and the number of values in each column, which it pulls from the data dictionary.

New cards

cache manager

a component of the query processor layer that stores reusable information in main memory, such as execution plans for common queries or even query results if the data does not change

New cards

Name the two main functions of the storage engine layer.

transaction management
data access

New cards

What does the data access component do?

Communicate with the file system and translate table, column, and index reads into block addresses.

New cards

buffer manager

retains data blocks from the file system for possible reuse in order to reduce data access time

New cards

On the storage engine layer, the buffer manager is the equivalent of the __ on the query processor layer.

cache manager

New cards

How does InnoDB handle the limited amount of memory in the buffer manager?

Using a least recently used (LRU) algorithm, which tracks the time each block was last accessed and, when space is needed, discards ‘stale’ blocks. A block that has been updated gets saved to disk before deletion.

New cards

What three components are included in the transaction manager?

concurrency system
recovery system
lock manager

New cards

Name the three types of data that a file system contains for each database.

user data
log files
data dictionary

New cards

What are the tiers in a multi-tier architecture?

Grouped layers of computers linked by a network and arranged in a hierarchy.

New cards

The top tier of a multi-tier architecture consists of __.

computers interacting directly with end-users

New cards

The bottom tier of a multi-tier architecture consists of __.

servers managing resources like databases and email

New cards

What functions do the middle tiers of a multi-tier architecture execute?

A variety, such as user authorization, business logic, and communication with other computers.

New cards

web architecture

a multi-tier architecture consisting of web browsers and web servers communicating over the internet

New cards

Name the three broad categories of cloud services.

Infrastructure as a service
Platform as a service
Software as a service

New cards

Infrastructure-as-a-Service (IaaS)

provides computer processing, memory, and storage media, as if the customer were renting a computer

ex: Elastic Compute Cloud (EC2) from AWS

New cards

Platform-as-a-service (PaaS)

provides tools and services, such as databases, application development tools, and messaging services.

ex: Microsoft Azure

New cards

Software-as-a-service (SaaS)

provides complete applications, usually through web browsers on customer machines

ex: Salesforce, Google Drive

New cards

virtual machine (VM)

a software layer that emulates a complete, independent computing environment

New cards

parallel computer

consists of multiple processors managed by a single operating system instance and can achieve faster processing speeds by processing multiple instructions concurrently

New cards

Name the 3 categories of parallel computers.

shared memory
shared storage
shared nothing

New cards

What is the benefit of shared memory for data management?

It is optimal for parallel processing against a common data set stored within a single memory space.

New cards

node

a single computer in a group connected be either LAN or WAN

New cards

cluster

a group of nodes connected by LAN, managed by separate operating system instances, and coordinated by specialized cluster management software.

New cards

What is the difference between a cluster and a parallel computer?

Computers in a cluster can share storage, but not memory because LAN connections are too slow to support memory access.

New cards

parallel database

runs on a parallel computer or a cluster

New cards

distributed database

runs on multiple computers connected by a wide area network

New cards

distributed transaction

a transaction that updates data on multiple nodes of a distributed database

New cards

What are the steps of a two-phase commit?

A central transaction coordinator notifies all participating nodes of the required updates.
Participating nodes receive the notification, store the update in a local log, and send a confirmation message to the transaction coordinator.
After receiving confirmation from all participating nodes, the transaction coordinator instructs all nodes to commit.
Participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.

New cards

local transaction

updates data on a single node of a distributed database

New cards

What is the downside of local transactions?

They can create temporary inconsistency in a distribute database, as nodes are updated at different times.

New cards

network partition

forms when a network error prevents nodes from communicating

New cards

CAP theorem

states that a distributed database cannot simultaneously be Consistent, Available, and Partition-tolerant; it can guarantee two at most of these properties

New cards

available (database)

‘live’ nodes must respond to queries at all times, regardless of the state of other nodes

New cards

consistent (database)

conforms to all rules at all times across all nodes that these rules apply to

New cards

Most distributed databases must always function and are therefore __.

partition-tolerant

New cards

partition-tolerant

when a database can continue to function if a network partition occurs

New cards

replicated database

maintains two or more replicas on separate storage devices

New cards

What are the advantages of replicated databases?

High availability in case of storage device failures.
Fast concurrent reads by decomposing large queries into smaller queries that read separate replicas in parallel.
Local reads for distributed databases.

New cards

What is the major disadvantage of replicated databases?

Slow or inconsistent updates.

New cards

How is security enhanced for a replicated database?

Updates can be restricted to a single replica, accessible only to trusted database users, and then propagated to read-only replicas across the userbase.

New cards

How do replicated databases simplify backups?

One replica can be backed up while transactions execute against other replicas, meaning that the database is more consistently available.

New cards

How do replicated databases complicate server administration?

DBAs have to figure out how to propagate updates across replicas.

New cards

storage arrays

a type of storage device that can manage database replicas internally, without database intervention

New cards

What happens if the primary node fails when updating a database with the primary/secondary technique?

The database automatically designates a new primary node to ensure continued availability.

New cards

primary/secondary technique

one node is designated as a primary node that receives updates first in local transactions, and all other nodes receive the update only after the primary node commits

New cards

group replication technique

applies updates to any node in a group, which then broadcasts transaction information to other nodes so that they can search for conflicts with concurrent transactions before any node commits the update

New cards

What happens if a node in a group replication system detects a conflict between concurrent transactions?

A predetermined algorithm selects which transaction commits and rolls back any others.

New cards

Between group replication, distributed transactions, and primary/secondary technique, which is partition-tolerant, and which is always consistent?

Primary/secondary technique is partition-tolerant because it updates only the primary node independent of secondary nodes and assigns a new primary node if the original is unavailable. Distributed transactions are always consistent because they update data on multiple nodes within one transaction.

New cards

central catalog

the entire catalog of a distributed database resides on a single node

New cards

What are the pros and cons of a central catalog?

Pros: it is easy to manage

Cons:
- remote nodes must access via WAN, which may be slow or unreliable
- there may be a bottleneck at the central catalog

New cards

replicated catalog

a catalog for a distributed database that has a copy on each node

New cards

What are the pros and cons of a replicated catalog?

Pros: most queries are fast and reliable since the catalog data is all available locally

Cons: statements that update the catalog will increase network traffic and may fail in a distributed transaction if any replica is unavailable

New cards

Catalog updates are __ compared to other database queries.

infrequent

New cards

operational data

used to conduct daily business functions

Ex: sales invoices, student test scores, driving violation records

New cards

analytic data

used to understand, manage, and plan the business

Ex: sales totals by region, average student grades over time, driving violation counts by ZIP code

New cards

Analytic data is sometimes called __.

reporting data or decision support data

New cards

What are four ways in which operational and analytic data differ?

Volatility - operational data changes in real time and analytic data is updated at fixed intervals.
Detail - operational data reflects individual transactions and analytic data is summarized by business dimensions like time period or geography
Scope - operational databases supporting different functions are often incompatible, while analytic databases combine this data with standard formats, data types, and keys
History - operational databases have primarily current data, while analytic databases contain both current and historic data in order to track trends over time

New cards

What problems does storing operational and analytic data in the same database create?

The two types of data have different design requirements.
Analytic queries on an operational database can interfere with business operations.
The volatility of operational databases creates analytic queries with uncertain reference times and misleading results.

New cards

data mart

a data warehouse designed for a specific business area, such as sales, HR, or product development

New cards

Name the 5 steps of the ETL process.

Extract data to the staging area.
Cleanse data to eliminate errors, unusual spellings, and incorrect data.
Integrate data into a uniform structure.
Restructure data into a design optimized for analytic queries.
Load data to the data warehouse.

New cards

List 3 examples of ETL tools

PowerCenter from Informatica
SQL Server Integration Services from Microsoft
Oracle Data Integrator

New cards

fact table

contains numeric data used to measure business performance where each row consists of numeric fact columns and foreign keys that reference dimension tables

New cards

dimension tables

contain textual data that describes the fact data

New cards

Why are fact tables easy to maintain?

Because their primary keys are made up of foreign keys that consist of small, meaningless integers that never change.

New cards

dimension hierarchy

a sequence of columns in which each column has a one-many relationship to the next column

New cards

type 2 design for slowly changing dimensions

a method of tracking historical data that adds start and end date foreign keys to a fact table

New cards

What are high-level dimensions of value for a BI program?

Financial value
Productivity value
Trust value
Risk value

New cards

knowledge discovery (data mining)

the process of discovering patterns that lead to actionable knowledge from large data sets through one or more traditional data mining techniques, such as market basket analysis and clusering

New cards

Explain the difference between clustering and classification.

In clustering tasks, the classes are not defined beforehand, but are discovered in the clustering process.

New cards

in-memory database

a database that stores data in main memory instead of, or in addition to, storage media

New cards

Describe SQL Server In-Memory OLTP

An extension to SQL Server by Microsoft that supports in-memory tables with the same transaction and recover options as storage media tables.

New cards

Oracle Database In-Memory

Creates in-memory copies of tables with columnar organization to optimize analytic queries while the source data remains on storage media, grouped by rows in blocks.

New cards

What storage engines allow in-memory databases for MySQL?

MEMORY can create temporary in-memory tables that do not support transactions or recovery. NDB Cluster supports transactions, recovery, and distributed data for in-memory tables.

New cards

embedded/in-process database

a database that is packaged with a programming language and used in single-user applications that require no database administration, such as apps for mobile devices

New cards

SQLite

an open source relational database that supports all major programming languages and is the dominant embedded relational database

New cards

federated database

a collection of two or more participating databases underneath a coordinating software layer, each of which is autonomous and heterogeneous

New cards

autonomous database

operates independently of other participating databases in a coordinating software layer

New cards

heterogeneous database

either run under different database systems or have incompatible schema, such as inconsistent primary and foreign keys or similar columns with different names and data types

New cards

middleware

the coordinating software layer in a federated database

New cards

What components are common to most federated database middleware products?

global catalog
global query processor
database wrapper - converts decomposed queries to the appropriate syntax for each participating database

New cards

SQL/MED

an extension of the SQL standard for federated database that adds constructs such as nicknames for participating database objects and user mapping, which associates a federated database user with a participating database user

New cards

What are the main benefits of a federated database over a distributed database or a data warehouse?

It is easier to build and may be the only practical option to combine data from existing, incompatible databases.

New cards

Name two prominent examples of federated database middleware.

InfoSphere Federated Server from IBM
WebLogic Server from Oracle

New cards

How is a data lake similar to a data warehouse?

It is a separate database designed for analytic queries and consisting of data extracted from multiple sources.

New cards

How is a data lake similar to a federated database?

The data in a lake is not cleansed, integrated or restructured, and it may be loaded continuously rather than at fixed intervals, depending on the source.

New cards

Because formulating and understanding queries is more difficult with a data lake than a data warehouse, lakes are more suitable for __ than business analysts.

data scientists