[MSYS 140] Module 2

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/43

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

Data Life Cycle

The data lifecycle is based on the product life cycle

1. Plan

2. Design & Enable

3. Create / Obtain

4. Store / Maintain

5. Dispose of*

6. Use

7. Enhance

<p>The data lifecycle is based on the product life cycle</p><p>1. Plan</p><p>2. Design &amp; Enable</p><p>3. Create / Obtain</p><p>4. Store / Maintain</p><p>5. Dispose of*</p><p>6. Use</p><p>7. Enhance</p>
2
New cards

Data Storage and Operations

The design, implementation, and support of stored data to maximize its value.

● Primary Goals:

  • Manage availability of data throughout the data lifecycle.

  • Ensure the integrity of data assets.

  • Manage performance of data transactions.

● Crucial for the continuity of operations of businesses relying on data.

  • If a system becomes unavailable, company operations may be impaired or stopped completely.

  • A reliable data storage infrastructure for IT operations minimizes the risk of disruption.

3
New cards

Database

Any collection of stored data, regardless of structure or content.

4
New cards

Instance

An execution of database software controlling access to a certain

area of storage.

5
New cards

Schema

A subset of a database objects contained within the database or an instance. Used to organize objects into more manageable parts.

6
New cards

Node

An individual computer hosting either processing or data as part of a distributed database.

7
New cards

Database Abstraction

means that APIs are used to call database functions,

such that an application can connect to multiple different databases.

8
New cards

Database Architecture Types

knowt flashcard image
9
New cards

Centralized Databases

  • Have all the data in one system in one place.

  • If the centralized system is unavailable, there are no other alternatives for accessing the data.

  • Ideal for data security

  • Not ideal for data accessibility

10
New cards

Distributed Databases

  • Provide quick access to data over a large number of nodes.

  • Designed to scale out from single servers to thousands of machines, each offering local computation and storage.

  • Federated or Non-Federated: depending on the view it provides it users.

  • Blockchain is an example of a distributed federated database commonly used to store financial transactions.

11
New cards

Federated vs Non-Federated Database

cohesive, integrated view of the dispersed data (federated) or a more decentralized view (non-federated)

Blockchain is given as an example of a distributed federated database that is commonly used to store financial transaction

  • Federated: Like a travel aggregator (Skyscanner, Expedia). You search once, it fetches results from multiple airlines and hotels without moving their data into one place.

  • Non-Federated: Like a single airline’s booking site. All data is already managed and stored in one system.

<p><span>cohesive, integrated view of the dispersed data (federated) or a more decentralized view (non-federated)</span></p><p></p><p><strong>Blockchain</strong><span> is given as an example of a distributed federated database that is commonly used to store financial transaction</span></p><p></p><ul><li><p><strong>Federated</strong>: Like a travel aggregator (Skyscanner, Expedia). You search once, it fetches results from multiple airlines and hotels without moving their data into one place.</p></li><li><p><strong>Non-Federated</strong>: Like a single airline’s booking site. All data is already managed and stored in one system.</p></li></ul><p></p>
12
New cards

Database Organization Types

knowt flashcard image
13
New cards

Hierarchical Databases

Oldest and most rigid database model, used in early mainframe DBMS.

  • Data is organized into a tree-like structure with mandatory parent/child relationships

  • Each parent can have many children, but each child has only one parent

14
New cards

Relational Databases

Row-oriented, where tables in the database are sets of relations with identical structure.

  • Predominant choice in storing data that constantly changes

  • Set operations (like union, intersect, and minus) are used to organize and retrieve data from relational databases, in the form of Structured Query Language (SQL).

15
New cards

Non-relational Databases

May be row-oriented, but is not required.

  • Stores data as simple strings or complete files

  • Employs a less constrained consistency model for storage and retrieval of data

  • NoSQL (which stands for “Not Only SQL”).

The primary differentiating factor is the storage structure itself, where the data structure is no longer bound to a tabular relational design.

It could be a tree, a graph, a network, or a key-value pairing.

16
New cards

CAP Theorem

The theorem asserts that a distributed system cannot comply with all

parts of ACID at all time.

The larger the system, the lower the compliance. A distributed system must instead trade-off between properties

<p>The theorem asserts that a distributed system cannot comply with all</p><p class="p1">parts of ACID at all time.</p><p class="p1"></p><p class="p1"><strong>The larger the system, the lower the compliance.</strong> A distributed system must instead trade-off between properties</p>
17
New cards

Consistency

The system must operate as designed and expected at all times

18
New cards

Availability

The system must be available when requested and must respond

to each request.

19
New cards

Partition Tolerance

The system must be able to continue operations during

occasions of data loss or partial system failure.

20
New cards

ACID Databases

Database Processing Types

  • Atomicity: All operations are performed, or none of them is, so that if one part of the transaction fails, then the entire transaction fails.

  • Consistency: The transaction must meet all rules defined by the system at all times and must void half-completed transactions.

  • Isolation: Each transaction is independent unto itself.

  • Durability: Once complete, the transaction cannot be undone.

21
New cards

BASE Databases

  • Basically Available: Guarantees some level of availability to the data even when there are node failures.

  • Soft State: The data is in a constant state of flux; while a response may be given, the data is not guaranteed to be current.

  • Eventual Consistency: The data will eventually be consistent through all nodes and in all databases, but not every transaction will be consistent at every moment.

22
New cards

Database Processing Types

knowt flashcard image
23
New cards

Database Environments

24
New cards

Production Environment

  • ‘Real’ environment from a business perspective

  • It is where all actual business processes occur

  • Mission-critical to the business, if this environment ceases to operate, business processes will stop.

  • Should not be used for development and testing

25
New cards

Pre-Production Environment

  • Used to develop and test changes before such changes are introduced to the production environment

  • Issues with changes can be detected and addressed without affecting normal business processes.

  • Must closely resemble the production environment.

  • Common types: development, test, support, and special use environments.

26
New cards

Sandboxes or Experimental Environment

  • Used to experiment with development options and test hypotheses about data from production

  • Provides quick validation ideas and options for changes to the system.

  • Used when performing Proof-of-Concept

  • Should never write back to the production systems

27
New cards

Database Type According to

knowt flashcard image
28
New cards

Database Administrators

DBAs are the most established and the most widely adopted data professional role.

  • Provide support for development, testing, Quality Assurance, and special use database environments.

  • Companies with huge operations divide specific roles for DBAs according to different database environments and use cases.

29
New cards

Production DBAs

Data Operations Management

  • Ensures the performance and reliability of the database, through performance tuning, monitoring, error reporting, etc.

  • Implementing measures for:

    • Backup and recovery mechanisms

    • Clustering and failover of the database

    • Archiving data

30
New cards

Procedural and Development DBA

Procedural: specializes in development and support of procedural

logic controlled and executed by the DBMS (stored procedures,

triggers, and user-defined functions)

Development: focused on data design activities including creating

and managing special use databases, such as ‘sandbox’ or exploration areas.

These 2 roles are usually combined under 1 position

31
New cards

Application DBA

Focused on a specific database for certain application/s so, they can provide better service to application developers.

  • Responsible for one or more databases in all environments, all concerned to the specific application.

32
New cards

Network Storage Administrators

Concerned with the hardware and software supporting data storage arrays

33
New cards

Database Administrators Positioning

knowt flashcard image
34
New cards

Archiving

Process of moving data off immediately accessible storage media and onto media with lower retrieval performance

35
New cards

Capacity and Growth Projections

Determining the capacity of the database means deciding on the finite amount of storage that would be utilized for the business.

Growth Projections pertain to how quick the storage must increase to meet the demands of the business.

36
New cards

Change Data Capture (CDC)

Process of detecting that data has changed and ensuring that information relevant to the change is stored appropriately

37
New cards

Purging

Process of completely removing data from storage media such that it cannot be recovered

38
New cards

Replication

Storing the same data on multiple storage devices to make data

highly-available

39
New cards

Resiliency and Recovery

Resiliency in databases is the measurement of how tolerant a system is to error conditions.

Recovery is the process of continuing the ongoing function that has crashed.

  • Increasing the resilience of data processing systems means:

    • trap and re-route data causing errors,

    • detect and ignore data causing errors,

    • implement flags in processing for completed steps

40
New cards

Retention

Refers to how long data is kept available. Data retention planning should be part of the physical database design.

○ Retention requirements also affect capacity planning.

41
New cards

Sharding

Process where small chunks of the database are isolated and can be updated independently of other shards, so replication is merely a file copy

42
New cards

Data Storage and Operations Activities

1. Database Technology Support

  • selecting and maintaining the software that stores and manages the data.

2. Database Operations Support

  • specific to the data and processes that the software manages.

43
New cards

Database Technology Support

Understand Database Technology Characteristics

  • Understanding how technology works, and how it can provide value

  • DBAs and Database Architects combine their knowledge of available tools with the business requirements in order to suggest the best possible applications of technology to meet organizational needs.

  • Data professionals must first understand the characteristics of a candidate database technology before determining which to recommend as a solution.

Evaluate Database Technology

Manage and Monitor Database Technology

  • DBAs should have working knowledge of application development skills, such as data modeling, use-case analysis, and application data access.

  • The DBA will be responsible for ensuring databases have regular backups and for performing recovery tests.

  • When a business requires new technology, the DBAs will work with business users and application developers to ensure the most effective use of the technology

<p><strong>Understand Database Technology Characteristics</strong></p><ul><li><p>Understanding how technology works, and how it can provide value</p></li><li><p>DBAs and Database Architects combine their knowledge of available tools with the business requirements in order to suggest the best possible applications of technology to meet organizational needs.</p></li><li><p>Data professionals must first understand the characteristics of a candidate database technology before determining which to recommend as a solution.</p></li></ul><p></p><p><strong>Evaluate Database Technology</strong></p><p></p><p><strong>Manage and Monitor Database Technology</strong></p><ul><li><p>DBAs should have working knowledge of application development skills, such as data modeling, use-case analysis, and application data access.</p></li><li><p>The DBA will be responsible for ensuring databases have regular backups and for performing r<strong>ecovery tests.</strong></p></li><li><p>When a business requires new technology, the DBAs will work with business users and application developers to ensure the<strong> most effective use of the technology</strong></p></li></ul><p></p>
44
New cards

Database Operations Support

Understand Requirements

  • Define Storage Requirements

  • Identify Usage Patterns

  • Define Access Requirements

Plan for Business Continuity

In the event of disaster or adverse event, DBAs must make sure a recovery plan exists for all databases and database servers.

Each database should be evaluated for criticality so that its restoration can be prioritized

  • Make Backups

  • Recover Data

Develop Database Instances

a. Manage the Physical Storage Environment

b. Manage Database Access Controls

c. Create Storage Containers

d. Implement Physical Data Models

e. Load Data

f. Manage Data Replication

Manage Database Performance

a. Set Database Performance Service Levels

b. Manage Database Availability

c. Manage Database Execution

d. Maintain Database Performance Service Levels

e. Maintain Alternate Environments

Manage Test Data Sets

  • Test data is data that has been specifically identified to test a system.

  • It can be generated from production data that was filtered or aggregated to create multiple sample data sets, depending on the need, but with masked identifiers.

  • Test data may be produced by the tester, by a program or function that aids the tester, or by a copy of production data that has been selected and screened for the purpose

Manage Data Migration

  • Data migration is the process of transferring data between storage types, formats, or computer systems, with as little change as possible.

  • Data migration occurs for a variety of reasons, including server or storage equipment replacements or upgrades, website consolidation, server maintenance, or data center relocation

  • Automated and manual data remediation is commonly performed in migration to improve the quality of data, eliminate redundant or obsolete information, and match the requirements of the new system