1/109
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Which option should be enabled to recover only previously deleted directories in Azure Data Lake Storage Gen2?
Soft delete for blobs
A company has files stored on Azure Data Lake Storage that need to be scanned by Azure Purview with the option of detecting the data types of the data.
Which two file types should be used?
CSV
JSON
An organization loads data from different systems to Azure Blob Storage.
After a team processes the data, the data is rarely accessed and should be moved to a low-cost, offline location.
Which access tier should be used for this offline data?
Archive
An organization needs to design an automatic life cycle management policy for handling files residing on Azure Blob Storage.
What is the least expensive tier for storing data rarely accessed?
Archive
A company requires a service that must incrementally migrate new files between blob containers based on time partitioned file names.
Which Azure solution Should be used to implement this partitioning strategy?
Data Factory
What is a requirement to split a partition on a table in a dedicated SQL pool of Azure Synapse Analytics that has a clustered columnstore index?
The partition must be empty
An organization maintains a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics- A data engineer creates an external table against CSV files in a data lake. The engineer must make sure that statistics are up to date on all required columns for improving the performance.
Which action Should be taken to maintain up-to-date Statistics?
Enable automatic statistics update
An organization wants to keep the frequently accessed data fields in Azure Data Lake Storage separate.
Which partitioning strategy should be used?
Vertical
What describes a hierarchy in a tabular model?
A metadata that defines relationships between two or more columns
Which solution should be used to design an external metastore on Azure Databricks?
Azure SQL Database
Which compression method must be used to compress data using the GZip algorithm format in an Azure SQL Database?
Compress Function
An Azure Synapse Analytics SQL pool stores data in a fact table. An aggregated subset of the data must be exported to an Azure Data Lake Storage.
Which statement will fulfill this requirement?
CREATE EXTERNAL TABLE
Which characteristics of the archive tier Should be considered when implementing data archiving on Azure Blob Storage?
Choose 2 answers.
High access cost
High latency
Which step is necessary to move data to online tiers in Azure Blob Storage?
Rehydrate
Which type of Slowly Changing Dimension (SCD) overwrites the table data when changes occur?
Type 1
Which Azure Synapse Analytics feature provides a visual way of specifying how to populate a Slowly Changing Dimension (SCD)?
Mapping Data Flow
A data engineer designs a logical data warehouse using a serverless SQL pool in Azure Synapse Analytics. Data must be presented with predefined schema.
Which statement should be used for defining the schema needed?
Create External Table
An organization is creating an external table on a serverless SQL pool in Azure Synapse Analytics. The tables should point to a delimited text file on Azure Data Lake Storage Gen2. The file format options for the table should be configured as follows:
• The header row of the file should be skipped
• Missing values in date columns Should contain the value "1900-01-01
Which two configurations should be implemented?
FIRST_ROW = 2
USAGE_TYPE_DEFAULT = TRUE
Which BULK parameter value for the OPENROWSET function should be used to read all files from folders and subfolders in the CSV/TAXI path when using a serverless SQL pool of Azure Synapse Analytics?
'CSV/TAXI/*'
Which scenario Should use Auto Loader for loading data into Delta Lake?
Source that contains millions or more files
An orders fact table contains trillions of records and it is queried by a deterministic column each time. Which table configuration should be used for this table?
Hash
Which solution should be used to remove duplicate data from an Azure SQL Database instance with minimal development?
Azure Data Factory pipeline code snippet
Which table configuration option in a dedicated SQL pool in Azure Synapse Analytics should be used to increase the speed of data loading times?
Heap
An organization has an Azure SQL Database and plans to use the Data Discovery & Classification feature to classify data. Which metadata setting should be used to define the sensitivity level of the data stored in columns?
Labels
A company has an Azure Analysis Services tabular model that contains a dimension table With 100 columns. The table must be organized to meet the following requirements:
provide a structure that arranges columns into related sets
allow slicing and dicing data in a predefined order
What should be created to fulfill these requirements?
Hierarchies
A data engineer needs to configure the in-line source transformation of the data now activity in a Spark pool in an Azure Synapse Analytics service Pipeline. The pipeline reads records in the Common Data Model from an Azure Data Lake Storage Gen2.
What should be used?
Linked Service
An organization uses Machine learning Studio (classic). The data need to be divided by testing a column for the presence of a location value.
Which type of split module configuration should be used to divide the data?
Regular Expression Split
An architecture design requires an extract. transform. and load (ETL) process that moves JavaScript Object Notation (JSON) data to a database as tabular formatted data.
Which solution should be used to transform the JSON data?
Use an Azure Data Factory pipeline to insert the JSON data into Apache Spark SQL; use from json to shred the JSON data.
A data engineer performs a data transformation using an Azure Databricks Python notebook
Which solution should be used to handle errors and end the notebook execution at an exception?
Try/except block with dbutiIs.notebook.exit command
Which code snippet Should be used to ingest and transform data from a JSON file on an Azure Data Lake Storage Gen2 in Azure Databricks using DataFrames?
spark. read. j son . core . windows . net/ radiofile. json") ;
val df
val dfGroup df. qroupBy ("name") . aqa (sum ("length") ) ;
A data engineer needs to transfer data from an Apache Spark pool of an Azure Synapse Analytics workspace to an external table of the newly provisioned dedicated SQL pool in the same workspace. The transfer method must invoke PolyBase for data load.
What should be done first to perform the transfer?
Create an external data source
Which SQL operation should be used to develop a Slowly Changing Dimension (SCD) Type 2 operation on a Delta table?
MERGE
An organization is developing a continuous integration (Cl) process to shorten application cycles and speed up releases. Azure Databricks is used as a batch engine to process small and frequent files on Azure Blob Storage. A data engineer configures Azure Repos.
Which additional version control system should be used for integration?
GitHub
An analytics system uses Apache Spark on Azure Databricks. The solution reads data from JavaScript Object Notation (JSON) files that may be unavailable in the data set An error needs to be reported when a JSON file is missing.
Which solution Should be used to report missing files?
Set a path for the badRecordsPath data source option
An analytics system stores data in Delta Lake for Azure Databricks.
Which statement is used to upsert data from a source table into a target Delta table?
MERGE
Data from a real-time source should be captured and split for the following scenarios:
real-time analysis
big data analysis
Which service Should be used to capture the data?
Azure Stream Analytics
Stock market data is processed using Spark Structured Streaming. The following requirements exist:
• Events Should be processed using a query every five seconds.
• The query Should aggregate the Stock value each time it runs.
• If the query runs for more than five seconds. the next run should only run after the prior run is finished.
What Should be used to design the solution?
Complete Mode with the Default Trigger Type
A Spark Structured Streaming solution processes data on Azure Databricks. Events that arrive two minutes later than expected should be dropped and not processed.
Which feature should be configured?
Watermarking
An Azure Data Factory pipeline contains a data flow activity with the source and derived column transformations that read and process Parquet files. A data engineer needs to enable the inference of drifted schema within the Parquet files.
What should be done in the Data Factory UI?
In the derived column, configure the derived column's settings
A data pipeline process, Which uses structured streaming. runs continuously on Azure Databricks. reading files from Azure Blob Storage
Which feature should be used to guarantee data consistency and allow processes to resume from where they were in case Of a restart?
Checkpointing
A file is copied to an Azure Data Lake Storage Gen2 daily. An Azure Data Factory pipeline Should transform the file to a data warehouse immediately after the file is copied.
Which type Of trigger Should be used?
Event-based
Which Azure Data Factory solution is used to trigger a single Azure Batch process that did not complete?
Rerunning the pipeline that failed - maybe
An organization maintains an Azure Synapse pipeline that has a data flow configured with 128 cores. The pipeline processes files that are 1 MB to 2 GB in size and higher cost is noticed for processing small files.
Which solution should be used to minimize the cost for data now execution?
Get the file size using Get Metadata activity and set the number of cores dynamically
An Azure Data Factory pipeline kicks off once daily using a trigger. The pipeline should store the trigger start time for each run in a database.
Which feature should be configured on the pipeline?
System Variables
Which type of account should be used in an Azure SQL Database instance when using Azure Storage as the destination of audit logs?
general-purpose v2 storage account
An organization has a database on Azure SQL Database with a table named Table1 ." Columns in "Table 1" contain sensitive information that is masked. The organization has the following requirements:
• Userl from DeptA needs to be able to see all the data in cleartext,
• Userl cannot change the masking on Table 1.
• User2 from DeptB needs to be able to see masked data on "Table 1
Which two policies should be used to fulfill these masking requirements?
Choose 2 answers
Grant User 1 the SELECT and UNMASK permissions on "Table 1"
Grant User 2 the SELECT permission on "Table 1"
Which row-level security option should be used to protect a table from the addition of new rows?
Block predicate with AFTER INSERT
An organization uses a dedicated SQL pool in Azure Synapse Analytics to host its data warehouse. The organization needs to implement multiple copies of the data warehouse system to different company branches dispersed in different geographical locations.
There are the following requirements:
• Snapshots must be taken automatically.
• A recovery point objective (RPO) of eight hours must be met.
Which solution should be used?
Automatic Restore Points
Which Azure Databricks Apache Spark event time feature should be used to handle late data?
Watermark
An organization stores data in Azure Data Lake Storage Gen2. The organization plans to grant storage container management functionality to an identity in Azure Active Directory.
RBAC
An organization needs to mask a column in an Azure SQL Database table. The data type of the column is varchar(50). All characters except the last four must be masked.
Which dynamic data masking function should be used?
Custom
The "Orders- table is configured with row-level security, and filter predicate is configured on the SalesReplJserld column, which validates the username. SalesReplJserld02 executes the following statements:
UPDATE orders SET quantity = 100 WHERE orderld = 3
INSERT INTO Orders (orderld, SalesRepUser1d, Quantity) VALUES (100, 'SalesRepUser1d01' , 100)
What is the result of these statements?
Choose 2 answers.
Insert throws an error
The existing record is updated
A department requires information for self-service analysis. The information resides on an Azure Data Lake Storage Gen2 container. The following requirements for the container exist:
• The lead analyst of the department should have full access to the container and its content.
• The lead analyst should be able to grant other analysts access to information.
• Access granted to the lead analyst Should follow the principle of least privilege.
Which Azure role-based access control (RBAC) role should be assigned to this lead analyst?
Storage Blob Data Owner
An organization needs to provide direct connectivity over the Microsoft backbone network from Azure virtual machines (VMS) to an Azure Synapse Analytics workspace. The solution must eliminate the possibility of data exfiltration to another workspace of Azure Synapse Analytics
Which Azure service Should be used to meet the requirements for connectivity?
Private Endpoint (Link)
Which language should be used With batch processing when data is in Azure Data Lake and the pay-per-execution cost model is required?
U-SQL
Which method should be used to send application events without code from Azure Databricks to Azure Monitor?
Log4j
Which log category should be set on a data warehouse in Azure Synapse Analytics to log query execution?
SQLRequests
Which destination resource should be used for sending diagnostic telemetry data to monitor the performance of an Azure SQL Database using Azure SQL Analytics?
Azure Log Analytics
A data engineer monitors tagged queries issued against a dedicated SQL pool in Azure Synapse Analytics. A query that loads data into the warehouse takes hours to complete.
Which OPTION type should be used to identify the query progress?
LABEL
An organization has an e-commerce system and Plans to store invoice data in one partition and inventory data in another
Which partitioning strategy Should be used to fulfill the requirement in Azure Data Lake Storage Gen2?
Functional unless folder based is option
Which solution is used to monitor Azure HDInsight cluster performance information?
Log Analytics
An organization maintains an Azure Data Lake Storage Gen2. Data files are used to perform analytics using Apache Spark configured with Azure HDInsight.
Which two solutions should be used to minimize the cost related to read and write?
Maintain File sizes below 4MB and aggregate small files into larger ones
A company needs to optimize Apache Spark jobs in Azure Synapse Analytics. The following observations are made:
• The columns are not bucketed,
• Data Skew is happening at the join.
• The data set is small.
Which solution should be used to optimize this solution?
Broadcast
Which method should be used to optimize Spark jobs in a pipeline in Apache Spark in Azure Databricks?
Control shuffle partitions
An organization is building an optimization Plan for its Azure Databricks data pipelines to meet the following requirements:
• costs must be minimized.
• It provides simultaneous access to resources.
• It includes fine-grained resource sharing.
• Jobs cannot tolerate delays or failures during processing.
Which Strategy should be used?
High Concurrency mode with autoscaling
Which two file formats support Azure Data Lake Storage query acceleration functionality?
CSV and JSON
Which operation should be implemented to maximize the benefit of dynamic file pruning (DFP) in Azure Databricks?
Z-Ordering
Which partitioning option is used for creating folder hierarchies on Azure Data Lake Storage?
Key or Folder
An organization must design a large fact table in a star schema that receives frequent insert. update. and delete operations on Synapse Analytics dedicated pool.
Which strategy Should be used?
Hash distributed with clustered columnstore index type
An analytics system must incrementally migrate new files between blob containers based on time partitioned file names.
Which solution Should be used to configure this migration?
Copy Data Tool
A data analytics system must incrementally migrate new files between Azure Blob Storage containers based on time partitioned file names.
HOW Should the partitioning be configured?
Append variables to the source and destination folder paths
Which sharding strategy reduces hot spots in Azure Data Lake Storage Gen2?
Hash
A database stores over 4.000.000 products in the Product table. The table stores the following information about products as columns...
Horizontally partition the table into monthly partitions using a column that tracks the last order date (Maybe)
Which feature should be used to analyze data history on an Azure SQL Database instance?
Temporal Tables
An analytics solution must analyze only Changed information in its extract. transform. and load (ETL) process,
Which solution Should be used?
Incremental Loading
An organization maintains an Azure SQL Database that has a clustered columnstore table that is rarely accessed by users. The organization requires that the space consumption of the table be optimized. and slow performance on data retrieval is acceptable.
Which compression type should be used?
Archive
An organization maintains an Azure Table storage for structured. nonrelational data. A natural composite key consisting of two properties in entities is identified and they are used with most queries.
Which partition strategy should be used for optimizing queries?
Set slowest changing property as the partition key and other as the row key
Which table structure in the dedicated SQL pool of Azure Synapse Analytics should be used for joining small tables while reducing data shuffles?
Replicated
Which table structure in the dedicated SQL pool of Azure Synapse Analytics should be used for workloads that update tables larger than 2 GB?
Hash
Which parameter must be set on table creation to skip the validation process between the system-versioned and history table?
Data_Consistency_check
Which Azure Synapse Analytics feature provides a visual way of specifying how to populate a Slowy Changing Dimension (SCD)?
Mapping data flow
A business analyst needs to create Power Bl reports using data in the Azure Data Lake. Data must be presented as relational tables.
Which approach should be used for making data available with the least effort?
Use Azure Synapse Analytics Serverless SQL Pools to query data directly from Azure Data Lake and present it as relational tables for Power BI reports.
Which index type should be used to improve the performance of batch mode query processing and achieve high compression rates in a fact table?
Clustered columnstore index
An organization uses an Azure SQL Database and Plans to use the Data Discovery & Classification feature to classify data.
Which metadata setting Should be used to define the granular details Of the data stored in columns?
Information types
Which permission allows users to see the metadata of a table in an Azure SQL Database?
SELECT
Which Apache Spark SQL function for JavaScript Object Notaton (JSON) creates a new row for each element in a map column?
Explode
Which Apache Spark SQL function translates a UTF-16 binary expression to string?
Decode (expression ,' UTE-16')
The pipeline execution throws an error due to null Ids generated by the lookup transformation for unmatched records.
Which solution should be used to insert only matching records?
Use the 1 smatch function to get the output from the lookup transformation
The data should be transformed into a normalized structure for analysis.
Which Azure Synapse Analytics mapping data now transformation Should be used to transform the data?
Pivot
Which magic command should be used to develop Scala code on Python-based Azure Synapse Analytics notebooks?
%%spark
An analytics system stores data in Delta Lake for Azure Databricks.
Which statement is used to upsert data from a source table into a target Delta table?
MERGE
which link in the Azure HDInsight Spark IJI stage tab should be used to view the operation flow invoked from an application and drill down to analyze details?
DAG visualization
A structured streaming job on Azure Databricks stopped sinking data to Apache Kafka.
Which output mode should be used for printing output records for debugging?
Console sink
Which resource provides support for double encryption Of Azure Analytics workspace?
Azure Key Vault
An organization stores data in a column in Azure SQL Database. The column needs to be masked to expose only the first character. @ and me last character of the column, with a padding string in the middle.
Which masking function should be used?
Custom
An organization is a pipeline to capture data Azure Cosmos DB_ Once analyzed. data needs to be removed from a table to reduce storage
Which automatically removes items at tie container level?
Time to live
An organization is implementing new policies to comply with the General Data Protection Regulation (GDPR). A data engineer uses Azure Databricks with Delta to store sales and customer information. Data needs to be deleted based on a join between these two tables-
Which two improvements should be used to optimize the delete operation?
Apply Z-Ordering
Vacuum
A data engineer has a Detta table named "Customers.- The engineer wants to speed up the reading process of the table by using the customer_id column to remove user data.
Which Apache Spark feature should be executed prior to the deletion?
Time Travel
An Azure SOL Database table named Customer has a column named Credit Limit The column is masked with the default masking function and some of the values of Credit Limit are higher than 10,000. A user. who has no UNMASK permission. executes the following query:
SELECT * FROM customer WHERE (credit Limit) > 10000
What is the output of the query?
Qualified Records
A database in Azure SQL Database is managed by Userl and User2. Userl instructs User2 to grant SELECT permission on a table to User3-
The following requirements exist
• User3 must be able to grant the same permission to others
• The system must record Userl as the grantor.
Which SQL statement should be used to meet these requirements?
GRANT SELECT ON Table TO User3 WITH GRANT OPTION AS User1