DP-900 Reviewer

0.0(0)
studied byStudied by 4 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/139

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

140 Terms

1
New cards

Classifications of data

structured, semi-structured, unstructured

2
New cards

Structured data

  • adheres to a fixed schema; data has the same fields

  • tabular schema; consists of rows and columns, rows for each entity, columns for attributes

3
New cards

Semi-structured data

  • has some structure, but allows for some variation between entities

4
New cards

Common semi-structured data format

JSON (JavaScript Object Notation)

5
New cards

Unstructured data

  • do not have a specific structure; documents, images, audio, video, binary, etc.

6
New cards

How are attributes represented in a JSON format?

name-value pairs

7
New cards

Give three common optimized file formats

Avro, ORC (Optimized Row-Columnar) format, Parquet

8
New cards

What data types are commonly stored as binary?

images, video, audio, application-specific documents, etc.

9
New cards

Describe Avro

  • row-based

  • each record contains a header that describes data structure in the record

  • header is stored as JSON

  • data stored as binary information

10
New cards

When should you use Avro format?

if you need to compress data and minimize storage and network bandwidth requirements

11
New cards

How are headers and data stored in an Avro record?

  • headers: JSON

  • data: binary

12
New cards

Describe ORC (what is ORC?)

  • optimized row-columnar format

  • organizes data into columns

  • contains stripes of data

13
New cards

Components of a stripe (ORC data)

  • data for a column or a set of columns

  • index into the rows

  • data for each row

  • footer that contains summary statistics (count, sum, max, min, etc.) for each column

<ul><li><p>data for a column or a set of columns</p></li><li><p>index into the rows</p></li><li><p>data for each row</p></li><li><p>footer that contains summary statistics (count, sum, max, min, etc.) for each column</p></li></ul><p></p>
14
New cards

Describe Parquet

  • columnar

  • contains row groups for each column

  • each row group contains chunk/s of data

  • includes metadata for easy location of the correct chunk for a given set of rows and for easy retrieval of specific columns for the rows

<ul><li><p>columnar</p></li><li><p>contains row groups for each column</p></li><li><p>each row group contains chunk/s of data</p></li><li><p>includes metadata for easy location of the correct chunk for a given set of rows and for easy retrieval of specific columns for the rows</p></li></ul><p></p>
15
New cards

When to use Parquet

if you need to store and process nested data types efficiently

16
New cards

Compare relational and non-relational databases

  • relational - for storing and querying structured data

  • non-relational - for data without a relational schema

17
New cards

Four common types of non-relational databases

  • key-value (any format; consists of a unique key and an associated value)

  • document (JSON)

  • column family (tabular data; columns are grouped logically into families)

  • graph (stores entities as nodes with links that define relationships between nodes)

18
New cards

What does OLTP stand for?

online transactional processing

19
New cards

What are CRUD operations? Why is it important for OLTP solutions?

  • create, delete, update, and delete

  • ensures data integrity for transactional data workloads

20
New cards

What do OLTP systems enforce to ensure data integrity? (hint: ACID)

  • atomicity - each transaction is a single unit that either succeeds or fails completely; cannot be both

  • consistency - transactions can only take the data in the database from one valid state or the other

  • isolation - concurrent transactions do not interfere with one another

  • durability - when a transaction has been committed, it will remain committed

21
New cards

line of business (LOB) applications

live applications that process business data

22
New cards

T/F: Analytical data processing typically uses read-write systems.

FALSE; read-only systems

23
New cards

Differentiate data warehouses, data lakes, and data lakehouses

  • data warehouses - store cleaned and processed data in a relational scheme optimized for read operations

  • data lakes - can store all types of raw data; storage for large amount of file-based data

  • data lakehouses - combines flexibility and scalability of a data lake with querying semantics of a data warehouse

24
New cards

What is an OLAP model?

online analytical processing - aggregated data storage optimized for analytical workloads

25
New cards

How and why is data aggregated in an OLAP model?

  • how: across dimensions at different levels

  • why: to enable drilling up/down data to view aggregations at multiple hierarchical levels

  • ex. finding total sales by region, city, or for an individual addresses

26
New cards

Three key job roles that deal with data and what they do

  • database administrators - manage databases, assigns permissions to users, administers backups and restores

  • data engineers - manage infrastructure, process data for integration across organizations (applies data cleaning routines, governance rules, and implements pipelines)

  • data analysts - explore and analyze data for decision-making

27
New cards

Services under Azure SQL

  • Azure SQL database - fully-managed PaaS database in Azure

  • Azure SQL managed instance - hosted SQL server instance with automated maintenance; more flexible than Azure SQL DB but with more administrative responsibility for the owner

  • Azure SQL VM - a VM that hosts a SQL server, allowing maximum configurability with full management responsibility

28
New cards

Azure Database for MySQL

open-source DBMS commonly used in Linux, Apache, MySQL, etc.

29
New cards

Azure DB for MariaDB

newer DBMS; optimized to improve performance and offers compatibility with Oracle

30
New cards

Azure DB for PostgreSQL

hybrid relational-object database; can store data in relational tables and in custom data types with non-relational properties

31
New cards

Azure CosmosDB

  • a global-scale non-relational (NoSQL) DB that supports multiple APIs

  • can store and manage JSON documents, key-value pairs, column-families, and graphs

32
New cards

Azure Storage

  • enables data storage in

    • blob containers - for binary files (BLOB - binary large object)

    • file shares - network file shares, typically in corporate networks

    • tables - key-value storage for quick read-write operations

33
New cards

Azure Data Factory

  • Azure service for defining and scheduling data pipelines for ETL solutions

  • enables integration of pipelines with other Azure services or other cloud data stores for data ingestion, processing, and storage

34
New cards

Microsoft Fabric

unified SaaS analytics platform that combines data ingestion pipelines, data warehouses, real-time analytics, business intelligence, and AI-powered insights through a single service centrally stored with Microsoft OneLake

35
New cards

Azure Databricks

  • Azure-integration version of Databricks, which combines Apache Spark with SQL database semantics

  • integrated management interface for large-scale data analytics

36
New cards

Azure Stream Analytics

  • real-time stream processing engine

  • captures data stream from an input, applies queries to extract and manipulate data, and stores results into an output for analysis or further processing

37
New cards

Azure Data Explorer

fully managed, standalone big data analytics platform for high-performance querying of log and telemetry data

38
New cards

Microsoft Purview

for enterprise-wide data governance and discoverability to ensure data integrity for analytical workloads

39
New cards

Why is it necessary to normalize data?

to avoid data duplication and enforce data integrity

40
New cards

Primary key

unique identifiers for each row in a database table

41
New cards

Foreign keys

link data in one table to data in another table

42
New cards

Differentiate primary key and unique key

  • primary key - uniquely identifies each record in a table and cannot store NULL values

  • unique key - prevents duplicate values in a column and can store NULL values

43
New cards

Composite key

key based on a unique combination of multiple columns

44
New cards

Transact-SQL

SQL version used by Microsoft SQL and Azure SQL services

45
New cards

pg-SQL

SQL version implemented in PostgreSQL

46
New cards

PL/SQL

SQL version used by Oracle (Procedural Language/SQL)

47
New cards

SQL statements are grouped into these logical groups:

  • Data Definition Language

  • Data Control Language

  • Data Manipulation Language

48
New cards

Enumerate the four commonly used DDL statements

Data Definition Language

  • CREATE

  • ALTER

  • DROP

  • RENAME

49
New cards

[DDL] CREATE

creates new objects in the database (table, view)

50
New cards

[DDL] ALTER

modifies structure of object (ex. to add a new column)

51
New cards

[DDL] DROP

removes object from db

52
New cards

[DDL] RENAME

renames existing object

53
New cards

T/F: DROP statement removes rows in the table permanently.

TRUE

54
New cards

What should you add in a CREATE statement when creating a mandatory column?

NOT NULL

55
New cards

Enumerate the three most common DCL statements. What are DCL statements for?

Data Control Language - to manage database access by specific users or groups

  • GRANT

  • DENY

  • REVOKE

56
New cards

[DCL] GRANT

grants permissions to perform specific actions

57
New cards

[DCL] DENY

denies permissions to perform specific actions

58
New cards

[DCL] REVOKE

revokes previously granted permission

59
New cards

Enumerate the four main DML statements. What are DML statements for?

Data Manipulation Language - manipulate rows in the table; retrieve, insert, modify, and delete rows

  • SELECT

  • INSERT

  • UPDATE

  • DELETE

60
New cards

T/F: The basic form of an INSERT statement can insert multiple rows simultaneously.

FALSE; one at a time

61
New cards

T/F: SELECT, UPDATE, and DELETE statements are applied to every row in the table.

TRUE

62
New cards

Which SQL statement clause ensures that statements only apply to rows that meet specific criteria?

WHERE

63
New cards

How do you sort data retrieved by a query?

ORDER BY clause

64
New cards

How do you merge data from multiple tables?

JOIN clause; specify how rows in one table are connected with rows in the other

65
New cards

Can you assign aliases to tables?

YES, by using AS clause

ex.

SELECT o.OrderNo, o.OrderDate, c.Address, c.City
FROM Order AS o
JOIN Customer AS c
ON o.Customer = c.ID

66
New cards

Clauses in an INSERT statement

  • INTO - determines which table and columns the new entry will be stored

  • VALUES - values to be stored

ex.

INSERT INTO Product(ID, Name, Price)
VALUES (99, 'Drill', 4.99);

67
New cards

What is a view?

virtual table that is outputted by the SELECT query

68
New cards

What is a stored procedure?

a set of statements that can be executed on command

69
New cards

What is an index?

makes querying more efficient by specifying a column from the table

70
New cards

How are indexes made?

a column from a table is specified, and the index creates a sorted copy of this data, with pointers to the corresponding rows in the table. when a query is run which has a WHERE clause, the DBMS uses the index to fetch the data more quickly

71
New cards

What is a possible downside of using indexes?

it may consume storage space which can slow down insert, delete, and update operations

72
New cards

SQL Server on Azure VMs is under which service type?

IaaS

73
New cards

Azure SQL Managed Instance

a PaaS option that provides abstraction of the hardware and OS for on-premises SQL Server instances

74
New cards

Differentiate Azure SQL DB and Managed Instance

DB

  • fully managed PaaS db service designed for the cloud

  • includes core database-level capabilities of on-prem SQL server

  • You can provision a single database in a dedicated, managed (logical) server; or you can use an elastic pool to share resources across multiple databases and take advantage of on-demand scalability.

  • ideal for new cloud solutions, or to migrate applications that have minimal instance-level dependencies.

Managed Instances

  • Each managed instance can support multiple databases. Additionally, instance pools can be used to share resources efficiently across smaller instances.

  • Use this option for most cloud migration scenarios, particularly when you need minimal changes to existing applications.

75
New cards

Azure SQL Edge

optimized for IoT scenarios; for workloads requiring time-series data streaming

76
New cards

Azure SQL Managed Instance supports which types of logins?

  • SQL Server DB engine logins

  • logins integrated with Microsoft Entra ID

77
New cards

Data Migration Assistant

analyzes your SQL server databases and reports any compatibility issues when migrating to different SQL offerings

78
New cards

Azure SQL Database is available as ____ and ____

  • single database

  • elastic pool

79
New cards

[Azure SQL DB] Single database vs elastic pool

single database

  • for quick setup of a single SQL server database

  • uses pre-allocated resources and charges are made per hour of use

  • also has a serverless configuration; server is provided by MS and is shared with other Azure subscribers

elastic pool

  • multiple databases share the same pool

  • you create the pool and only your databases can use the pool

  • for use cases when you have databases with varying resource requirements

80
New cards

Business benefits of Azure SQL database

  • automatic updates

  • scalable

  • high availability (99.995%)

  • advanced threat protection (vulnerability assessments, detection of suspicious activities, etc.)

  • conducts auditing (db activity, regulatory compliance, anomaly detection, etc.)

  • provides encryption for data at rest and in motion

81
New cards

MySQL

  • open-source DBMS

  • leading RDB for Linux, Apache, MySQL, and PHP (LAMP)

  • available in three editions: community (free), standard, and enterprise

82
New cards

MariaDB

supports temporal data; think version history for tables

83
New cards

PostgreSQL

hybrid relational-object DB

  • can store relational tables

  • can also store custom data types with non-relational properties

  • DBMS is extensible via code modules run by queries

  • also supports storage and manipulation of geometric data

  • uses pgSQL

84
New cards

Azure DB for MySQL is based on which MySQL edition?

community edition

85
New cards

T/F: All features of on-prem PostgreSQL databases are available in Azure DB for PostgreSQL.

FALSE; Azure DB for PostgreSQL does not support extensions and interacting directly with the OS

86
New cards

What is Azure Blob Storage used for? How is data stored in Azure Blob Storage?

massive amounts of unstructured data; data stored as blobs (binary large objects)

87
New cards

Three types of blobs supported by Azure Blob Storage

  • block blobs

  • page blobs

  • append blobs

88
New cards

Characteristics of block blobs

  • blobs are organized into blocks and these blocks are organized into a set

  • a block is the smallest amount of data that can be read/written as an individual unit

89
New cards

Characteristics of page blobs

  • a page blob is organized as a collection of fixed size 512-byte pages

  • optimized to support random read-write operations

  • can hold up to 8TB of data

  • used for implementing virtual disk storage for VMs

90
New cards

Characteristics of append blobs

  • block blobs optimized for append operations

  • can only add blocks to the end of an append blob

    • blocks can be 4MB in size and up to ~195 GB

91
New cards

T/F: It is possible to update or delete existing blocks in an append blob.

FALSE; not supported

92
New cards

Three Blob storage tiers

  • hot - used by default; for blobs that are accessed frequently

  • cool - has lower performance and storage charges than hot; for data accessed infrequently

  • archive - lowest storage cost with highest latency; for historical data that is needed but accessed rarely

93
New cards

T/F: You can migrate a blob from the hot to the cool tier but not vice versa.

FALSE; possible both ways

94
New cards

T/F: Blobs in the archive tier are stored online.

FALSE; offline state, that is why latency is high

95
New cards

What is rehydration in Azure Blob storage?

changing the tier from archive to hot/cool

96
New cards

How do you create an Azure Data Lake Store Gen2 files system?

enable the hierarchical namespace option of an Azure Storage Account

97
New cards

T/F: You can downgrade a storage account to disable hierarchical namespace support for blob storage.

FALSE; upgrade is one-way

98
New cards

What do you need to create an Azure File Storage?

Azure Storage Account

99
New cards

How much data can you store in Azure Files for a single storage account?

100TB of data

100
New cards

Two performance tiers of Azure File Storage

  • standard

  • premium