D305 OA FIXED

0.0(0)
studied byStudied by 43 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/109

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

110 Terms

1
New cards

Which option should be enabled to recover only previously deleted directories in Azure Data Lake Storage Gen2?

Soft delete for blobs

2
New cards

A company has files stored on Azure Data Lake Storage that need to be scanned by Azure Purview with the option of detecting the data types of the data.
Which two file types should be used?

CSV
JSON

3
New cards

An organization loads data from different systems to Azure Blob Storage.
After a team processes the data, the data is rarely accessed and should be moved to a low-cost, offline location.
Which access tier should be used for this offline data?

Archive

4
New cards

An organization needs to design an automatic life cycle management policy for handling files residing on Azure Blob Storage.
What is the least expensive tier for storing data rarely accessed?

Archive

5
New cards

A company requires a service that must incrementally migrate new files between blob containers based on time partitioned file names.
Which Azure solution Should be used to implement this partitioning strategy?

Data Factory

6
New cards

What is a requirement to split a partition on a table in a dedicated SQL pool of Azure Synapse Analytics that has a clustered columnstore index?

The partition must be empty

7
New cards

An organization maintains a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics- A data engineer creates an external table against CSV files in a data lake. The engineer must make sure that statistics are up to date on all required columns for improving the performance.
Which action Should be taken to maintain up-to-date Statistics?

Enable automatic statistics update

8
New cards

An organization wants to keep the frequently accessed data fields in Azure Data Lake Storage separate.
Which partitioning strategy should be used?

Vertical

9
New cards

What describes a hierarchy in a tabular model?

A metadata that defines relationships between two or more columns

10
New cards

Which solution should be used to design an external metastore on Azure Databricks?

Azure SQL Database

11
New cards

Which compression method must be used to compress data using the GZip algorithm format in an Azure SQL Database?

Compress Function

12
New cards

An Azure Synapse Analytics SQL pool stores data in a fact table. An aggregated subset of the data must be exported to an Azure Data Lake Storage.
Which statement will fulfill this requirement?

CREATE EXTERNAL TABLE

13
New cards

Which characteristics of the archive tier Should be considered when implementing data archiving on Azure Blob Storage?
Choose 2 answers.

High access cost
High latency

14
New cards

Which step is necessary to move data to online tiers in Azure Blob Storage?

Rehydrate

15
New cards

Which type of Slowly Changing Dimension (SCD) overwrites the table data when changes occur?

Type 1

16
New cards

Which Azure Synapse Analytics feature provides a visual way of specifying how to populate a Slowly Changing Dimension (SCD)?

Mapping Data Flow

17
New cards

A data engineer designs a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics. Data must be presented with predefined schema.
Which statement should be used for defining the schema needed?

Create External Table

18
New cards

An organization is creating an external table on a serverless SQL pool in Azure Synapse Analytics. The tables should point to a delimited text file on Azure Data Lake Storage Gen2. The file format options for the table should be configured as follows:
• The header row of the file should be skipped
• Missing values in date columns Should contain the value "1900-01-01
Which two configurations should be implemented?

FIRST_ROW = 2
USAGE_TYPE_DEFAULT = TRUE

19
New cards

Which BULK parameter value for the OPENROWSET function should be used to read all files from folders and subfolders in the CSV/TAXI path when using a serverless SQL pool of Azure Synapse Analytics?

'CSV/TAXI/*'

20
New cards

Which scenario Should use Auto Loader for loading data into Delta Lake?

Source that contains millions or more files

21
New cards

An orders fact table contains trillions of records and it is queried by a deterministic column each time. Which table configuration should be used for this table?

Hash

22
New cards

Which solution should be used to remove duplicate data from an Azure SQL Database instance with minimal development?

Azure Data Factory pipeline code snippet

23
New cards

Which table configuration option in a dedicated SQL pool in Azure Synapse Analytics should be used to increase the speed of data loading times?

Heap

24
New cards

An organization has an Azure SQL Database and plans to use the Data Discovery & Classification feature to classify data. Which metadata setting should be used to define the sensitivity level of the data stored in columns?

Labels

25
New cards

A company has an Azure Analysis Services tabular model that contains a dimension table With 100 columns. The table must be organized to meet the following requirements:
provide a structure that arranges columns into related sets
allow slicing and dicing data in a predefined order
What should be created to fulfill these requirements?

Hierarchies

26
New cards

A data engineer needs to configure the in-line source transformation of the data now activity in a Spark pool in an Azure Synapse Analytics service Pipeline. The pipeline reads records in the Common Data Model from an Azure Data Lake Storage Gen2.
What should be used?

Linked Service

27
New cards

An organization uses Machine learning Studio (classic). The data need to be divided by testing a column for the presence of a location value.
Which type of split module configuration should be used to divide the data?

Regular Expression Split

28
New cards

An architecture design requires an extract. transform. and load (ETL) process that moves JavaScript Object Notation (JSON) data to a database as tabular formatted data.
Which solution should be used to transform the JSON data?

Use an Azure Data Factory pipeline to insert the JSON data into Apache Spark SQL; use from json to shred the JSON data.

29
New cards

A data engineer performs a data transformation using an Azure Databricks Python notebook
Which solution should be used to handle errors and end the notebook execution at an exception?

Try/except block with dbutiIs.notebook.exit command

30
New cards

Which code snippet Should be used to ingest and transform data from a JSON file on an Azure Data Lake Storage Gen2 in Azure Databricks using DataFrames?

spark. read. j son . core . windows . net/ radiofile. json") ;
val df
val dfGroup df. qroupBy ("name") . aqa (sum ("length") ) ;

31
New cards

A data engineer needs to transfer data from an Apache Spark pool of an Azure Synapse Analytics workspace to an external table of the newly provisioned dedicated SQL pool in the same workspace. The transfer method must invoke PolyBase for data load.
What should be done first to perform the transfer?

Create an external data source

32
New cards

Which SQL operation should be used to develop a Slowly Changing Dimension (SCD) Type 2 operation on a Delta table?

MERGE

33
New cards

An organization is developing a continuous integration (Cl) process to shorten application cycles and speed up releases. Azure Databricks is used as a batch engine to process small and frequent files on Azure Blob Storage. A data engineer configures Azure Repos.
Which additional version control system should be used for integration?

GitHub

34
New cards

An analytics system uses Apache Spark on Azure Databricks. The solution reads data from JavaScript Object Notation (JSON) files that may be unavailable in the data set An error needs to be reported when a JSON file is missing.
Which solution Should be used to report missing files?

Set a path for the badRecordsPath data source option

35
New cards

An analytics system stores data in Delta Lake for Azure Databricks.
Which statement is used to upsert data from a source table into a target Delta table?

MERGE

36
New cards

Data from a real-time source should be captured and split for the following scenarios:
real-time analysis
big data analysis
Which service Should be used to capture the data?

Azure Stream Analytics

37
New cards

Stock market data is processed using Spark Structured Streaming. The following requirements exist:
• Events Should be processed using a query every five seconds.
• The query Should aggregate the Stock value each time it runs.
• If the query runs for more than five seconds. the next run should only run after the prior run is finished.
What Should be used to design the solution?

Complete Mode with the Default Trigger Type

38
New cards

A Spark Structured Streaming solution processes data on Azure Databricks. Events that arrive two minutes later than expected should be dropped and not processed.
Which feature should be configured?

Watermarking

39
New cards

An Azure Data Factory pipeline contains a data flow activity with the source and derived column transformations that read and process Parquet files. A data engineer needs to enable the inference of drifted schema within the Parquet files.
What should be done in the Data Factory UI?

In the derived column, configure the derived column's settings

40
New cards

A data pipeline process, Which uses structured streaming. runs continuously on Azure Databricks. reading files from Azure Blob Storage
Which feature should be used to guarantee data consistency and allow processes to resume from where they were in case Of a restart?

Checkpointing

41
New cards

A file is copied to an Azure Data Lake Storage Gen2 daily. An Azure Data Factory pipeline Should transform the file to a data warehouse immediately after the file is copied.
Which type Of trigger Should be used?

Event-based

42
New cards

Which Azure Data Factory solution is used to trigger a single Azure Batch process that did not complete?

Rerunning the pipeline that failed - maybe

43
New cards

An organization maintains an Azure Synapse pipeline that has a data flow configured with 128 cores. The pipeline processes files that are 1 MB to 2 GB in size and higher cost is noticed for processing small files.
Which solution should be used to minimize the cost for data now execution?

Get the file size using Get Metadata activity and set the number of cores dynamically

44
New cards

An Azure Data Factory pipeline kicks off once daily using a trigger. The pipeline should store the trigger start time for each run in a database.
Which feature should be configured on the pipeline?

System Variables

45
New cards

Which type of account should be used in an Azure SQL Database instance when using Azure Storage as the destination of audit logs?

general-purpose v2 storage account

46
New cards

An organization has a database on Azure SQL Database with a table named Table1 ." Columns in "Table 1" contain sensitive information that is masked. The organization has the following requirements:
• Userl from DeptA needs to be able to see all the data in cleartext,
• Userl cannot change the masking on Table 1.
• User2 from DeptB needs to be able to see masked data on "Table 1
Which two policies should be used to fulfill these masking requirements?
Choose 2 answers

Grant User 1 the SELECT and UNMASK permissions on "Table 1"
Grant User 2 the SELECT permission on "Table 1"

47
New cards

Which row-level security option should be used to protect a table from the addition of new rows?

Block predicate with AFTER INSERT

48
New cards

An organization uses a dedicated SQL pool in Azure Synapse Analytics to host its data warehouse. The organization needs to implement multiple copies of the data warehouse system to different company branches dispersed in different geographical locations.
There are the following requirements:
• Snapshots must be taken automatically.
• A recovery point objective (RPO) of eight hours must be met.
Which solution should be used?

Automatic Restore Points

49
New cards

Which Azure Databricks Apache Spark event time feature should be used to handle late data?

Watermark

50
New cards

An organization stores data in Azure Data Lake Storage Gen2. The organization plans to grant storage container management functionality to an identity in Azure Active Directory.

RBAC

51
New cards

An organization needs to mask a column in an Azure SQL Database table. The data type of the column is varchar(50). All characters except the last four must be masked.
Which dynamic data masking function should be used?

Custom

52
New cards

The "Orders- table is configured with row-level security, and filter predicate is configured on the SalesReplJserld column, which validates the username. SalesReplJserld02 executes the following statements:
UPDATE orders SET quantity = 100 WHERE orderld = 3
INSERT INTO Orders (orderld, SalesRepUser1d, Quantity) VALUES (100, 'SalesRepUser1d01' , 100)
What is the result of these statements?
Choose 2 answers.

Insert throws an error
The existing record is updated

53
New cards

A department requires information for self-service analysis. The information resides on an Azure Data Lake Storage Gen2 container. The following requirements for the container exist:
• The lead analyst of the department should have full access to the container and its content.
• The lead analyst should be able to grant other analysts access to information.
• Access granted to the lead analyst Should follow the principle of least privilege.
Which Azure role-based access control (RBAC) role should be assigned to this lead analyst?

Storage Blob Data Owner

54
New cards

An organization needs to provide direct connectivity over the Microsoft backbone network from Azure virtual machines (VMS) to an Azure Synapse Analytics workspace. The solution must eliminate the possibility of data exfiltration to another workspace of Azure Synapse Analytics
Which Azure service Should be used to meet the requirements for connectivity?

Private Endpoint (Link)

55
New cards

Which language should be used With batch processing when data is in Azure Data Lake and the pay-per-execution cost model is required?

U-SQL

56
New cards

Which method should be used to send application events without code from Azure Databricks to Azure Monitor?

Log4j

57
New cards

Which log category should be set on a data warehouse in Azure Synapse Analytics to log query execution?

SQLRequests

58
New cards

Which destination resource should be used for sending diagnostic telemetry data to monitor the performance of an Azure SQL Database using Azure SQL Analytics?

Azure Log Analytics

59
New cards

A data engineer monitors tagged queries issued against a dedicated SQL pool in Azure Synapse Analytics. A query that loads data into the warehouse takes hours to complete.
Which OPTION type should be used to identify the query progress?

LABEL

60
New cards

An organization has an e-commerce system and Plans to store invoice data in one partition and inventory data in another
Which partitioning strategy Should be used to fulfill the requirement in Azure Data Lake Storage Gen2?

Functional unless folder based is option

61
New cards

Which solution is used to monitor Azure HDInsight cluster performance information?

Log Analytics

62
New cards

An organization maintains an Azure Data Lake Storage Gen2. Data files are used to perform analytics using Apache Spark configured with Azure HDInsight.
Which two solutions should be used to minimize the cost related to read and write?

Maintain File sizes below 4MB and aggregate small files into larger ones

63
New cards

A company needs to optimize Apache Spark jobs in Azure Synapse Analytics. The following observations are made:
• The columns are not bucketed,
• Data Skew is happening at the join.
• The data set is small.
Which solution should be used to optimize this solution?

Broadcast

64
New cards

Which method should be used to optimize Spark jobs in a pipeline in Apache Spark in Azure Databricks?

Control shuffle partitions

65
New cards

An organization is building an optimization Plan for its Azure Databricks data pipelines to meet the following requirements:
• costs must be minimized.
• It provides simultaneous access to resources.
• It includes fine-grained resource sharing.
• Jobs cannot tolerate delays or failures during processing.
Which Strategy should be used?

High Concurrency mode with autoscaling

66
New cards

Which two file formats support Azure Data Lake Storage query acceleration functionality?

CSV and JSON

67
New cards

Which operation should be implemented to maximize the benefit of dynamic file pruning (DFP) in Azure Databricks?

Z-Ordering

68
New cards

Which partitioning option is used for creating folder hierarchies on Azure Data Lake Storage?

Key or Folder

69
New cards

An organization must design a large fact table in a star schema that receives frequent insert. update. and delete operations on Synapse Analytics dedicated pool.
Which strategy Should be used?

Hash distributed with clustered columnstore index type

70
New cards

An analytics system must incrementally migrate new files between blob containers based on time partitioned file names.
Which solution Should be used to configure this migration?

Copy Data Tool

71
New cards

A data analytics system must incrementally migrate new files between Azure Blob Storage containers based on time partitioned file names.
HOW Should the partitioning be configured?

Append variables to the source and destination folder paths

72
New cards

Which sharding strategy reduces hot spots in Azure Data Lake Storage Gen2?

Hash

73
New cards

A database stores over 4.000.000 products in the Product table. The table stores the following information about products as columns...

Horizontally partition the table into monthly partitions using a column that tracks the last order date (Maybe)

74
New cards

Which feature should be used to analyze data history on an Azure SQL Database instance?

Temporal Tables

75
New cards

An analytics solution must analyze only Changed information in its extract. transform. and load (ETL) process,
Which solution Should be used?

Incremental Loading

76
New cards

An organization maintains an Azure SQL Database that has a clustered columnstore table that is rarely accessed by users. The organization requires that the space consumption of the table be optimized. and slow performance on data retrieval is acceptable.
Which compression type should be used?

Archive

77
New cards

An organization maintains an Azure Table storage for structured. nonrelational data. A natural composite key consisting of two properties in entities is identified and they are used with most queries.
Which partition strategy should be used for optimizing queries?

Set slowest changing property as the partition key and other as the row key

78
New cards

Which table structure in the dedicated SQL pool of Azure Synapse Analytics should be used for joining small tables while reducing data shuffles?

Replicated

79
New cards

Which table structure in the dedicated SQL pool of Azure Synapse Analytics should be used for workloads that update tables larger than 2 GB?

Hash

80
New cards

Which parameter must be set on table creation to skip the validation process between the system-versioned and history table?

Data_Consistency_check

81
New cards

Which Azure Synapse Analytics feature provides a visual way of specifying how to populate a Slowy Changing Dimension (SCD)?

Mapping data flow

82
New cards

A business analyst needs to create Power Bl reports using data in the Azure Data Lake. Data must be presented as relational tables.
Which approach should be used for making data available with the least effort?

Use Azure Synapse Analytics Serverless SQL Pools to query data directly from Azure Data Lake and present it as relational tables for Power BI reports.

83
New cards

Which index type should be used to improve the performance of batch mode query processing and achieve high compression rates in a fact table?

Clustered columnstore index

84
New cards

An organization uses an Azure SQL Database and Plans to use the Data Discovery & Classification feature to classify data.
Which metadata setting Should be used to define the granular details Of the data stored in columns?

Information types

85
New cards

Which permission allows users to see the metadata of a table in an Azure SQL Database?

SELECT

86
New cards

Which Apache Spark SQL function for JavaScript Object Notaton (JSON) creates a new row for each element in a map column?

Explode

87
New cards

Which Apache Spark SQL function translates a UTF-16 binary expression to string?

Decode (expression ,' UTE-16')

88
New cards

The pipeline execution throws an error due to null Ids generated by the lookup transformation for unmatched records.
Which solution should be used to insert only matching records?

Use the 1 smatch function to get the output from the lookup transformation

89
New cards

The data should be transformed into a normalized structure for analysis.
Which Azure Synapse Analytics mapping data now transformation Should be used to transform the data?

Pivot

90
New cards

Which magic command should be used to develop Scala code on Python-based Azure Synapse Analytics notebooks?

%%spark

91
New cards

An analytics system stores data in Delta Lake for Azure Databricks.
Which statement is used to upsert data from a source table into a target Delta table?

MERGE

92
New cards

which link in the Azure HDInsight Spark IJI stage tab should be used to view the operation flow invoked from an application and drill down to analyze details?

DAG visualization

93
New cards

A structured streaming job on Azure Databricks stopped sinking data to Apache Kafka.
Which output mode should be used for printing output records for debugging?

Console sink

94
New cards

Which resource provides support for double encryption Of Azure Analytics workspace?

Azure Key Vault

95
New cards

An organization stores data in a column in Azure SQL Database. The column needs to be masked to expose only the first character. @ and me last character of the column, with a padding string in the middle.
Which masking function should be used?

Custom

96
New cards

An organization is a pipeline to capture data Azure Cosmos DB_ Once analyzed. data needs to be removed from a table to reduce storage
Which automatically removes items at tie container level?

Time to live

97
New cards

An organization is implementing new policies to comply with the General Data Protection Regulation (GDPR). A data engineer uses Azure Databricks with Delta to store sales and customer information. Data needs to be deleted based on a join between these two tables-
Which two improvements should be used to optimize the delete operation?

Apply Z-Ordering
Vacuum

98
New cards

A data engineer has a Detta table named "Customers.- The engineer wants to speed up the reading process of the table by using the customer_id column to remove user data.
Which Apache Spark feature should be executed prior to the deletion?

Time Travel

99
New cards

An Azure SOL Database table named Customer has a column named Credit Limit The column is masked with the default masking function and some of the values of Credit Limit are higher than 10,000. A user. who has no UNMASK permission. executes the following query:
SELECT * FROM customer WHERE (credit Limit) > 10000
What is the output of the query?

Qualified Records

100
New cards

A database in Azure SQL Database is managed by Userl and User2. Userl instructs User2 to grant SELECT permission on a table to User3-
The following requirements exist
• User3 must be able to grant the same permission to others
• The system must record Userl as the grantor.
Which SQL statement should be used to meet these requirements?

GRANT SELECT ON Table TO User3 WITH GRANT OPTION AS User1