D305 OA FIXED

0.0(0)

Studied by 45 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/109

There's no tags or description

Looks like no tags are added yet.

Last updated 6:31 PM on 8/24/24

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

110 Terms

New cards

Which option should be enabled to recover only previously deleted directories in Azure Data Lake Storage Gen2?

Soft delete for blobs

New cards

A company has files stored on Azure Data Lake Storage that need to be scanned by Azure Purview with the option of detecting the data types of the data.
Which two file types should be used?

CSV
JSON

New cards

An organization loads data from different systems to Azure Blob Storage.
After a team processes the data, the data is rarely accessed and should be moved to a low-cost, offline location.
Which access tier should be used for this offline data?

Archive

New cards

A company requires a service that must incrementally migrate new files between blob containers based on time partitioned file names.
Which Azure solution Should be used to implement this partitioning strategy?

Data Factory

New cards

What is a requirement to split a partition on a table in a dedicated SQL pool of Azure Synapse Analytics that has a clustered columnstore index?

The partition must be empty

New cards

An organization maintains a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics- A data engineer creates an external table against CSV files in a data lake. The engineer must make sure that statistics are up to date on all required columns for improving the performance.
Which action Should be taken to maintain up-to-date Statistics?

Enable automatic statistics update

New cards

An organization wants to keep the frequently accessed data fields in Azure Data Lake Storage separate.
Which partitioning strategy should be used?

Vertical

New cards

What describes a hierarchy in a tabular model?

A metadata that defines relationships between two or more columns

New cards

Which solution should be used to design an external metastore on Azure Databricks?

Azure SQL Database

New cards

Which compression method must be used to compress data using the GZip algorithm format in an Azure SQL Database?

Compress Function

New cards

An Azure Synapse Analytics SQL pool stores data in a fact table. An aggregated subset of the data must be exported to an Azure Data Lake Storage.
Which statement will fulfill this requirement?

CREATE EXTERNAL TABLE

New cards

Which characteristics of the archive tier Should be considered when implementing data archiving on Azure Blob Storage?
Choose 2 answers.

High access cost
High latency

New cards

Which step is necessary to move data to online tiers in Azure Blob Storage?

Rehydrate

New cards

Which type of Slowly Changing Dimension (SCD) overwrites the table data when changes occur?

Type 1

New cards

Which Azure Synapse Analytics feature provides a visual way of specifying how to populate a Slowly Changing Dimension (SCD)?

Mapping Data Flow

New cards

A data engineer designs a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics. Data must be presented with predefined schema.
Which statement should be used for defining the schema needed?

Create External Table

New cards

An organization is creating an external table on a serverless SQL pool in Azure Synapse Analytics. The tables should point to a delimited text file on Azure Data Lake Storage Gen2. The file format options for the table should be configured as follows:
• The header row of the file should be skipped
• Missing values in date columns Should contain the value "1900-01-01
Which two configurations should be implemented?

FIRST_ROW = 2
USAGE_TYPE_DEFAULT = TRUE

New cards

Which BULK parameter value for the OPENROWSET function should be used to read all files from folders and subfolders in the CSV/TAXI path when using a serverless SQL pool of Azure Synapse Analytics?

'CSV/TAXI/*'

New cards

Which scenario Should use Auto Loader for loading data into Delta Lake?

Source that contains millions or more files

New cards

An orders fact table contains trillions of records and it is queried by a deterministic column each time. Which table configuration should be used for this table?

Hash

New cards

Which solution should be used to remove duplicate data from an Azure SQL Database instance with minimal development?

Azure Data Factory pipeline code snippet

New cards

Which table configuration option in a dedicated SQL pool in Azure Synapse Analytics should be used to increase the speed of data loading times?

Heap

New cards

An organization has an Azure SQL Database and plans to use the Data Discovery & Classification feature to classify data. Which metadata setting should be used to define the sensitivity level of the data stored in columns?

Labels

New cards

A company has an Azure Analysis Services tabular model that contains a dimension table With 100 columns. The table must be organized to meet the following requirements:
provide a structure that arranges columns into related sets
allow slicing and dicing data in a predefined order
What should be created to fulfill these requirements?

Hierarchies

New cards

A data engineer needs to configure the in-line source transformation of the data now activity in a Spark pool in an Azure Synapse Analytics service Pipeline. The pipeline reads records in the Common Data Model from an Azure Data Lake Storage Gen2.
What should be used?

Linked Service

New cards

An organization uses Machine learning Studio (classic). The data need to be divided by testing a column for the presence of a location value.
Which type of split module configuration should be used to divide the data?

Regular Expression Split

New cards

An architecture design requires an extract. transform. and load (ETL) process that moves JavaScript Object Notation (JSON) data to a database as tabular formatted data.
Which solution should be used to transform the JSON data?

Use an Azure Data Factory pipeline to insert the JSON data into Apache Spark SQL; use from json to shred the JSON data.

New cards

A data engineer performs a data transformation using an Azure Databricks Python notebook
Which solution should be used to handle errors and end the notebook execution at an exception?

Try/except block with dbutiIs.notebook.exit command

New cards

Which code snippet Should be used to ingest and transform data from a JSON file on an Azure Data Lake Storage Gen2 in Azure Databricks using DataFrames?

spark. read. j son . core . windows . net/ radiofile. json") ;
val df
val dfGroup df. qroupBy ("name") . aqa (sum ("length") ) ;

New cards

A data engineer needs to transfer data from an Apache Spark pool of an Azure Synapse Analytics workspace to an external table of the newly provisioned dedicated SQL pool in the same workspace. The transfer method must invoke PolyBase for data load.
What should be done first to perform the transfer?

Create an external data source

New cards

Which SQL operation should be used to develop a Slowly Changing Dimension (SCD) Type 2 operation on a Delta table?

MERGE

New cards

An organization is developing a continuous integration (Cl) process to shorten application cycles and speed up releases. Azure Databricks is used as a batch engine to process small and frequent files on Azure Blob Storage. A data engineer configures Azure Repos.
Which additional version control system should be used for integration?

GitHub

New cards

An analytics system uses Apache Spark on Azure Databricks. The solution reads data from JavaScript Object Notation (JSON) files that may be unavailable in the data set An error needs to be reported when a JSON file is missing.
Which solution Should be used to report missing files?

Set a path for the badRecordsPath data source option

New cards

An analytics system stores data in Delta Lake for Azure Databricks.
Which statement is used to upsert data from a source table into a target Delta table?

MERGE

New cards

Data from a real-time source should be captured and split for the following scenarios:
real-time analysis
big data analysis
Which service Should be used to capture the data?

Azure Stream Analytics

New cards

Stock market data is processed using Spark Structured Streaming. The following requirements exist:
• Events Should be processed using a query every five seconds.
• The query Should aggregate the Stock value each time it runs.
• If the query runs for more than five seconds. the next run should only run after the prior run is finished.
What Should be used to design the solution?

Complete Mode with the Default Trigger Type

New cards

A Spark Structured Streaming solution processes data on Azure Databricks. Events that arrive two minutes later than expected should be dropped and not processed.
Which feature should be configured?

Watermarking

New cards

An Azure Data Factory pipeline contains a data flow activity with the source and derived column transformations that read and process Parquet files. A data engineer needs to enable the inference of drifted schema within the Parquet files.
What should be done in the Data Factory UI?

In the derived column, configure the derived column's settings

New cards

A data pipeline process, Which uses structured streaming. runs continuously on Azure Databricks. reading files from Azure Blob Storage
Which feature should be used to guarantee data consistency and allow processes to resume from where they were in case Of a restart?

Checkpointing

New cards

A file is copied to an Azure Data Lake Storage Gen2 daily. An Azure Data Factory pipeline Should transform the file to a data warehouse immediately after the file is copied.
Which type Of trigger Should be used?

Event-based

New cards

Which Azure Data Factory solution is used to trigger a single Azure Batch process that did not complete?

Rerunning the pipeline that failed - maybe

New cards

An organization maintains an Azure Synapse pipeline that has a data flow configured with 128 cores. The pipeline processes files that are 1 MB to 2 GB in size and higher cost is noticed for processing small files.
Which solution should be used to minimize the cost for data now execution?

Get the file size using Get Metadata activity and set the number of cores dynamically

New cards

An Azure Data Factory pipeline kicks off once daily using a trigger. The pipeline should store the trigger start time for each run in a database.
Which feature should be configured on the pipeline?

System Variables

New cards

Which type of account should be used in an Azure SQL Database instance when using Azure Storage as the destination of audit logs?

general-purpose v2 storage account

New cards

An organization has a database on Azure SQL Database with a table named Table1 ." Columns in "Table 1" contain sensitive information that is masked. The organization has the following requirements:
• Userl from DeptA needs to be able to see all the data in cleartext,
• Userl cannot change the masking on Table 1.
• User2 from DeptB needs to be able to see masked data on "Table 1
Which two policies should be used to fulfill these masking requirements?
Choose 2 answers

Grant User 1 the SELECT and UNMASK permissions on "Table 1"
Grant User 2 the SELECT permission on "Table 1"

New cards

Which row-level security option should be used to protect a table from the addition of new rows?

Block predicate with AFTER INSERT

New cards

An organization uses a dedicated SQL pool in Azure Synapse Analytics to host its data warehouse. The organization needs to implement multiple copies of the data warehouse system to different company branches dispersed in different geographical locations.
There are the following requirements:
• Snapshots must be taken automatically.
• A recovery point objective (RPO) of eight hours must be met.
Which solution should be used?

Automatic Restore Points

New cards

Which Azure Databricks Apache Spark event time feature should be used to handle late data?

Watermark

New cards

An organization stores data in Azure Data Lake Storage Gen2. The organization plans to grant storage container management functionality to an identity in Azure Active Directory.

RBAC

New cards

An organization needs to mask a column in an Azure SQL Database table. The data type of the column is varchar(50). All characters except the last four must be masked.
Which dynamic data masking function should be used?

Custom

New cards

The "Orders- table is configured with row-level security, and filter predicate is configured on the SalesReplJserld column, which validates the username. SalesReplJserld02 executes the following statements:
UPDATE orders SET quantity = 100 WHERE orderld = 3
INSERT INTO Orders (orderld, SalesRepUser1d, Quantity) VALUES (100, 'SalesRepUser1d01' , 100)
What is the result of these statements?
Choose 2 answers.

Insert throws an error
The existing record is updated

New cards

A department requires information for self-service analysis. The information resides on an Azure Data Lake Storage Gen2 container. The following requirements for the container exist:
• The lead analyst of the department should have full access to the container and its content.
• The lead analyst should be able to grant other analysts access to information.
• Access granted to the lead analyst Should follow the principle of least privilege.
Which Azure role-based access control (RBAC) role should be assigned to this lead analyst?

Storage Blob Data Owner

New cards

An organization needs to provide direct connectivity over the Microsoft backbone network from Azure virtual machines (VMS) to an Azure Synapse Analytics workspace. The solution must eliminate the possibility of data exfiltration to another workspace of Azure Synapse Analytics
Which Azure service Should be used to meet the requirements for connectivity?

Private Endpoint (Link)

New cards

Which language should be used With batch processing when data is in Azure Data Lake and the pay-per-execution cost model is required?

U-SQL

New cards

Which method should be used to send application events without code from Azure Databricks to Azure Monitor?

Log4j

New cards

Which log category should be set on a data warehouse in Azure Synapse Analytics to log query execution?

SQLRequests

New cards

Which destination resource should be used for sending diagnostic telemetry data to monitor the performance of an Azure SQL Database using Azure SQL Analytics?

Azure Log Analytics

New cards

A data engineer monitors tagged queries issued against a dedicated SQL pool in Azure Synapse Analytics. A query that loads data into the warehouse takes hours to complete.
Which OPTION type should be used to identify the query progress?

LABEL

New cards

An organization has an e-commerce system and Plans to store invoice data in one partition and inventory data in another
Which partitioning strategy Should be used to fulfill the requirement in Azure Data Lake Storage Gen2?

Functional unless folder based is option

New cards

Which solution is used to monitor Azure HDInsight cluster performance information?

Log Analytics

New cards

An organization maintains an Azure Data Lake Storage Gen2. Data files are used to perform analytics using Apache Spark configured with Azure HDInsight.
Which two solutions should be used to minimize the cost related to read and write?

Maintain File sizes below 4MB and aggregate small files into larger ones

New cards

A company needs to optimize Apache Spark jobs in Azure Synapse Analytics. The following observations are made:
• The columns are not bucketed,
• Data Skew is happening at the join.
• The data set is small.
Which solution should be used to optimize this solution?

Broadcast

New cards

Which method should be used to optimize Spark jobs in a pipeline in Apache Spark in Azure Databricks?

Control shuffle partitions

New cards

An organization is building an optimization Plan for its Azure Databricks data pipelines to meet the following requirements:
• costs must be minimized.
• It provides simultaneous access to resources.
• It includes fine-grained resource sharing.
• Jobs cannot tolerate delays or failures during processing.
Which Strategy should be used?

High Concurrency mode with autoscaling

New cards

Which two file formats support Azure Data Lake Storage query acceleration functionality?

CSV and JSON

New cards

Which operation should be implemented to maximize the benefit of dynamic file pruning (DFP) in Azure Databricks?

Z-Ordering

New cards

Which partitioning option is used for creating folder hierarchies on Azure Data Lake Storage?

Key or Folder

New cards

An organization must design a large fact table in a star schema that receives frequent insert. update. and delete operations on Synapse Analytics dedicated pool.
Which strategy Should be used?

Hash distributed with clustered columnstore index type

New cards

An analytics system must incrementally migrate new files between blob containers based on time partitioned file names.
Which solution Should be used to configure this migration?

Copy Data Tool

New cards

A data analytics system must incrementally migrate new files between Azure Blob Storage containers based on time partitioned file names.
HOW Should the partitioning be configured?

Append variables to the source and destination folder paths

New cards

Which sharding strategy reduces hot spots in Azure Data Lake Storage Gen2?

Hash

New cards

A database stores over 4.000.000 products in the Product table. The table stores the following information about products as columns...

Horizontally partition the table into monthly partitions using a column that tracks the last order date (Maybe)

New cards

Which feature should be used to analyze data history on an Azure SQL Database instance?

Temporal Tables

New cards

An analytics solution must analyze only Changed information in its extract. transform. and load (ETL) process,
Which solution Should be used?

Incremental Loading

New cards

An organization maintains an Azure SQL Database that has a clustered columnstore table that is rarely accessed by users. The organization requires that the space consumption of the table be optimized. and slow performance on data retrieval is acceptable.
Which compression type should be used?