1/399
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Which authorization types are supported for Blob Storage in Azure Synapse serverless SQL pools?
User Identity (with SAS token for non-firewalled storage), SAS, Managed Identity.
Which data processing framework will a data engineer use to ingest data onto cloud data platforms in Azure?
Extract, load, and transform (ELT)
The schema of what data type can be defined at query time?
Unstructured data
Duplicating customer content for redundancy and meeting service-level agreements (SLAs) in Azure meets which cloud technical requirement?
High availability
Azure Blob
A scalable object store for text and binary data
Azure Files
Managed file shares for cloud or on-premises deployments
Azure Queue
A messaging store for reliable messaging between application components
Azure Table
A NoSQL store for no-schema storage of structured data
Which data platform technology is a globally distributed, multimodel database that can perform queries in less than a second?
Azure Cosmos DB
Which data store is the least expensive choice when you want to store data but don't need to query it?
Azure Storage
Which Azure service is the best choice to store documentation about a data source?
Azure Purview
Another term for structured data
Relational Data
Another term of Semi-structured data
NoSQL
What is the process called for converting data into a format that can be transmitted or stored?
serialization
What are three common sterilization languages?
XML, JSON, and YAML
Examples of unstructured data
Media files, like photos, videos, and audio files
Microsoft 365 files, like Word documents
Text files
Log files
What type of data is a JSON file?
Semi-structured
What type of data is a video?
Unstructured
What is a transaction?
A logical group of database operations that execute together.
What does the acronym ACID stand for?
Atomicity, Consistency, Isolation, Durability
What is atomicity?
A transaction must execute exactly once, and it must be atomic. Either all of the work is done or none of it is. Operations within a transaction usually share a common intent and are interdependent.
What is consistency?
Ensures that the data is consistent both before and after the transaction.
What is Isolation?
Ensures that each transaction is unaffected by other transactions.
What is durability?
That changes made as a result of a transaction are permanently saved in the system. The system saves data that's committed so that even in the event of a failure and system restart, the data is available in its correct state.
What does OLAP stand for?
Online analytical processing
What does OLTP stand for?
Online Transaction Processing
What does OLTP system commonly support?
Many users, have quick response times, and handle large volumes of data.
What does OLAP system commonly support?
Fewer users, have longer response times, can be less available, and typically handle large transactions or complex transactions
Which type of transactional database system would work best for product data?
OLAP
What does Azure Blob Storage support?
Supports storing files like photos and videos.
What is the data classification for Business data?
Structured
What are the operations of business data?
Read-only, complex analytical queries across multiple databases
What are the latency and throughput of business data?
Some latency in the results is expected based on the complex nature of the queries.
What is latency?
Is a performance metric that measures the time gap between requests and responses for the disks.
What is the best service for business data?
Azure SQL Database
Business data most likely will be queried by business analysts, who are more likely to know SQL than any other query language. You can use Azure SQL Database as a solution by itself, but if you pair it with Azure Analysis Services, data analysts can create a semantic model over the data in Azure SQL Database.
Three primary types of data
Structured, Semi-structured, and Unstructured
Structured data
Comes from table-based source systems such as a relational database or from a flat file such as a comma separated (CSV) file. The primary element of a structured file is that the rows and columns are aligned consistently throughout the file.
Semi-Structured data
Data such as JavaScript object notation (JSON) files, which may require flattening prior to loading into your source system. When flattened, this data doesn't have to fit neatly into a table structure.
Unstructured data
Data stored as key-value pairs that don't adhere to standard relational models and Other types of unstructured data that are commonly used include portable data format (PDF), word processor documents, and images.
Data Engineer tasks in Azure (Data Operations)
Data Integration, Data Transformation, and data consolidation
Data integration
Establishing links between operational and analytical services and data sources to enable secure, reliable access to data across multiple systems.
Data transformation
Data usually needs to be transformed into suitable structure and format for analysis, often as part of an extract, transform, and load (ETL) process; though increasingly a variation in which you extract, load, and transform (ELT) the data is used to quickly ingest the data into a data lake and then apply "big data" processing techniques to transform it.
Data consolidation
Process of combining data that has been extracted from multiple data sources into a consistent structure - usually to support analytics and reporting.
Operational Data
Transactional data that is generated and stored by applications, often in a relational or non-relational database.
Streaming data
Streaming data refers to perpetual sources of data that generate data values in real-time, often relating to specific events.
Data pipelines
Used to orchestrate activities that transfer and transform data.
Data lakes
A storage repository that holds large amounts of data in native, raw formats.
Data warehouses
Centralized repository of integrated data from one or more disparate sources.
Apache Spark
Is a parallel processing framework that takes advantage of in-memory processing and a distributed file storage
Core Azure Technologies
Azure Synapse Analytics
Azure Data Lake Storage Gen2
Azure Stream Analytics
Azure Data Factory
Azure Databricks
In a data lake, data is stored in?
Files
Data in a relational database table is
Structured
Which of the following Azure services provides capabilities for running data pipelines AND managing analytical data in a data lake or relational data warehouse?
Azure Synapse Analytics
Benefit Azure Data Lake Storage
Data Lake Storage is designed to deal with this variety and volume of data at exabyte scale while securely handling hundreds of gigabytes of throughput.
Azure Blob storage
Store large amounts of unstructured data in a flat namespace within a blob container.
Stages Processing Big Data
Ingest
Store
Prep and train
Model and serve
Stages Processing Big Data: Model and serve
Involves the technologies that will present the data to users.
Technologies of model and serve
Microsoft Power BI
Azure Synapse Analytics
Stages Processing Big Data: Prep and train
Identifies the technologies that are used to perform data preparation and model training and scoring for machine learning solutions.
Technologies of Prep and train
Azure Synapse Analytics
Azure Databricks
Azure HDInsight
Azure Machine Learning
Stages Processing Big Data: Store
Identifies where the ingested data should be placed.
Technologies of Store
Azure Data Lake Storage Gen 2
Stages Processing Big Data: Ingest
Identifies the technology and processes that are used to acquire the source data.
Technologies for batch ingest
Azure Synapse Analytics
Azure Data Factory
Technologies for real-time ingest
Apache Kafka for HDInsight
Stream Analytics
Azure Data Lake Storage Gen2 stores data in...
An HDFS-compatible file system hosted in Azure Storage.
What option must you enable to use Azure Data Lake Storage Gen2?
Hierarchical namespace
Descriptive analytics
Answers the question "What is happening in my business?".
Diagnostic analytics
Answering the question "Why is it happening?".
Predictive analytics
Answer the question "What is likely to happen in the future based on previous trends and patterns?"
What is Azure Synapse Analytics?
Azure Synapse Analytics is a centralized service for data storage and processing with an extensible architecture. It integrates commonly used data stores, processing platforms, and visualization tools.
What is a Synapse Analytics workspace?
A Synapse Analytics workspace defines an instance of the Synapse Analytics service in which you manage the services and data resources for your analytics solution. You create it in an Azure subscription interactively using the Azure portal, Azure PowerShell, the Azure command-line interface (CLI), or an Azure Resource Manager or Bicep template.
What is a data lake in the context of Azure Synapse Analytics?
In a Synapse Analytics workspace, a data lake is a core resource where data files can be stored and processed at scale. A workspace typically has a default data lake, implemented as a linked service to an Azure Data Lake Storage Gen2 container.
What role do pipelines play in Azure Synapse Analytics?
Pipelines in Azure Synapse Analytics orchestrate activities necessary to retrieve data from sources, transform the data, and load the transformed data into an analytical store. They are based on the same underlying technology as Azure Data Factory.
How does Azure Synapse Analytics support SQL-based data querying and manipulation?
Azure Synapse Analytics supports SQL-based data querying and manipulation through two kinds of SQL pool: a built-in serverless pool for querying file-based data in a data lake, and custom dedicated SQL pools that host relational data warehouses.
How is Apache Spark used in Azure Synapse Analytics?
In Azure Synapse Analytics, you can create Spark pools and use interactive notebooks for data analytics, machine learning, and data visualization. Spark performs distributed processing of files in a data lake.
What is Azure Synapse Data Explorer?
Azure Synapse Data Explorer is a data processing engine in Azure Synapse Analytics, based on the Azure Data Explorer service. It uses Kusto Query Language (KQL) for high performance, low-latency analysis of batch and streaming data.
How can Azure Synapse Analytics be integrated with other Azure data services?
Azure Synapse Analytics can be integrated with other Azure data services for end-to-end analytics solutions. Integrations include Azure Synapse Link, Microsoft Power BI, Microsoft Purview, and Azure Machine Learning.
When is Azure Synapse Analytics used for large-scale data warehousing?
Azure Synapse Analytics is used for large-scale data warehousing when there's a need to integrate all data, including big data, for analytics and reporting purposes from a descriptive analytics perspective, independent of its location or structure.
How does Azure Synapse Analytics support advanced analytics?
Azure Synapse Analytics enables organizations to perform predictive analytics using both its native features and by integrating with other technologies such as Azure Machine Learning.
How is Azure Synapse Analytics used for data exploration and discovery?
The serverless SQL pool functionality in Azure Synapse Analytics enables Data Analysts, Data Engineers, and Data Scientists to explore data within the data estate. This supports data discovery, diagnostic analytics, and exploratory data analysis.
How does Azure Synapse Analytics support real-time analytics?
Azure Synapse Analytics can capture, store, and analyze data in real-time or near-real time with features like Azure Synapse Link, or through the integration of services like Azure Stream Analytics and Azure Data Explorer.
How does Azure Synapse Analytics facilitate data integration?
Azure Synapse Pipelines in Azure Synapse Analytics enables ingestion, preparation, modeling, and serving of data to be used by downstream systems.
What does integrated analytics mean in the context of Azure Synapse Analytics?
Integrated analytics in Azure Synapse Analytics refers to the ability to perform a variety of analytics on data in a cohesive solution, removing the complexity by integrating the analytics landscape into one service. This allows more focus on working with data to bring business benefit rather than spending time provisioning and maintaining multiple systems.
Which feature of Azure Synapse Analytics enables you to transfer data from one store to another and apply transformations to the data at scheduled intervals?
Pipelines
You want to create a data warehouse in Azure Synapse Analytics in which the data is stored and queried in a relational data store. What kind of pool should you create?
Dedicated SQL Pool
A data analyst wants to analyze data by using Python code combined with text descriptions of the insights gained from the analysis. What should they use to perform the analysis?
A notebook connected to an Apache Spark Pool
What are the two runtime environments offered by Azure Synapse SQL in Azure Synapse Analytics?
The two runtime environments are Serverless SQL pool, used for on-demand SQL query processing primarily with data in a data lake, and Dedicated SQL pool, used to host enterprise-scale relational database instances for data warehouses.
What are some benefits of using Serverless SQL pool in Azure Synapse Analytics?
Serverless SQL pool benefits include familiar Transact-SQL syntax, integrated connectivity from various BI and ad-hoc querying tools, distributed query processing, built-in query execution fault-tolerance, no infrastructure or clusters to maintain, and a pay-per-query model.
When should Serverless SQL pools in Azure Synapse Analytics be used?
Serverless SQL pools are best suited for querying data residing in a data lake, handling unplanned or "bursty" workloads, and when exact costs for each query need to be monitored and attributed. They are not recommended for OLTP workloads or tasks requiring millisecond response times.
What are some common use cases for Serverless SQL pools in Azure Synapse Analytics?
Common use cases include data exploration, where initial insights about the data are gathered, data transformation, which can be performed interactively or as part of an automated data pipeline, and creating a logical data warehouse where data is stored in the data lake but abstracted by a relational schema for use by client applications and analytical tools.
What is a serverless SQL pool used for?
Querying data files in various common file formats, including CSV, JSON, and Parquet.
Which SQL function is used to generate a tabular rowset from data in one or more files?
OPENROWSET
What does the BULK parameter do in an OPENROWSET function?
The function includes the full URL to the location in the data lake containing the data files.
How do you specify the type of data being queried in OPENROWSET?
Using the FORMAT parameter.
How can you include or exclude files in the query using the BULK parameter?
By using wildcards in the BULK parameter.
How do you query a delimited text file using OPENROWSET?
By using the OPENROWSET function with the csv FORMAT parameter and other parameters as required to handle the specific formatting details.
What does the PARSER_VERSION parameter do?
Determines how the query interprets the text encoding used in the files.
How can you specify the rowset schema in OPENROWSET?
By using a WITH clause to override the default column names and inferred data types, providing a schema definition.
How do you query a JSON file using OPENROWSET?
Use csv format with FIELDTERMINATOR, FIELDQUOTE, and ROWTERMINATOR set to 0x0b, and a schema that includes a single NVARCHAR(MAX) column.