1/62
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
a collection of facts such as numbers, descriptions, and observations used to record information
Structured Data
data that adheres to a fixed schema with the same fields or properties, often stored in databases using a relational model
Semi-Structured Data
information with some structure but allows variation, commonly represented in formats like JSON
Unstructured Data
data without a specific structure, including documents, images, audio, and video files
File Storage
the ability to store data in files, often done in local systems or shared file storage systems in the cloud
Delimited Text Files
data stored in plain text format with specific field delimiters, common formats include CSV, TSV, and space-delimited
JSON
a format with a hierarchical document schema used to define data entities, flexible for structured and semi-structured data
XML
a human-readable data format using tags enclosed in angle brackets to define elements and attributes
BLOB
binary data stored in formats that must be interpreted by applications, common for images, video, audio, and binary files
OLTP
Online Transactional Processing, systems optimized for read and write operations to support transactional workloads
OLAP
Online Analytical Processing, systems optimized for analytical workloads, aggregating data for reporting and visualization
Data Lakes
used in large-scale data analytical processing scenarios to collect and analyze file-based data
Data Warehouses
store data in a relational schema optimized for read operations to support reporting and data visualization
Data Lakehouses
combine the storage of a data lake with the querying semantics of a data warehouse, requiring some denormalization
Database Administrators
manage databases, ensure availability, performance, security, and backup and recovery plans
Data Engineers
manage data integration, cleansing, transformation, and pipelines across the organization
Data Analysts
explore and analyze data to create visualizations and insights for informed decision-making
Azure SQL
a family of relational database solutions on Microsoft Azure, including Azure SQL Database, Managed Instance, and SQL VM
Azure SQL Database
A family of MS SQL Server based database services in Azure, including SQL Server on Azure Virtual Machines, suitable for migrations and applications requiring access to operating system features.
Relational Database
Models collections of entities as tables, each row representing an instance of an entity, and each column storing data of a specific datatype.
Normalization
A schema design process that minimizes data duplication, enforces data integrity, and involves separating entities into tables, attributes into columns, and using primary and foreign keys.
SQL
Standard Query Language used to communicate with relational databases, with common statements like SELECT, INSERT, UPDATE, DELETE, and dialects like T-SQL, pgSQL, and PL/SQL.
Views
Virtual tables based on SELECT query results, allowing data filtering similar to tables.
Stored Procedures
Defines SQL statements for programmatic logic in databases, encapsulating actions for applications working with data.
Indexes
Structures that help search for data in tables by containing sorted data copies with pointers to corresponding rows, enabling quicker data retrieval.
Virtual Machine (VM)
Allows you to develop and test traditional applications, with full administrative rights over the DBMS and operating systems, suitable for organizations with existing IT resources.
Azure SQL Managed Instance
A platform-as-a-service (PaaS) option providing near-100% compatibility with on-premises SQL Server instances, automating software updates, backups, and maintenance tasks.
Azure SQL Database
A fully managed, highly scalable PaaS database service designed for the cloud, available as Single Database or Elastic Pool options.
MySQL
An open-source DBMS, a PaaS implementation in Azure with high availability, scalability, and automatic backups.
MariaDB
A fully managed DBMS controlled by Azure, offering high availability, predictable performance, and secure data storage.
PostgreSQL
A hybrid relational-object database, supported in Azure with the Flexible Server deployment option for high availability and server configuration customizations.
Azure Blob Storage
Enables storing unstructured data as blobs in the cloud, with containers for grouping related blobs and three types of blobs:Block Blobs, Page Blobs, and Append Blobs.
Access Tiers
Hot Tier for frequently accessed blobs, Cool Tier for infrequently accessed data, and Archive Tier for historical data with low storage cost.
Lifecycle Management Policy
Automatically moves blobs between access tiers based on age to optimize storage costs and performance.
Azure Data Lake Store (Gen1)
A service for hierarchical data storage for analytical data lakes, used by big data analytical solutions for structured, semi-structured, and unstructured data
Azure Data Lake Storage Gen2
An integrated service in Azure Storage combining scalability of blob storage, cost-control of storage tiers, hierarchical file system capabilities, and compatibility with major analytics systems
Azure Files
Cloud-based network shares for storing and sharing files in Azure, eliminating hardware costs, providing high availability, and scalable cloud storage
Azure Table Storage
NoSQL storage solution using tables with key/value data items, each represented by a row with columns for data fields, enabling storage of semi-structured data
Partitioning
Mechanism in Azure Table Storage for grouping related rows based on a common property or partition key to improve scalability, performance, and data organization
Azure Cosmos DB
Highly scalable DBMS supporting multiple APIs for relational and non-relational workloads, providing fast read and write performances, and enabling multi-region writes
Data Warehousing Architecture
Involves data injection and processing, analytical data store, analytical data model, and data visualization for large-scale data analytics
Data Ingestion Pipelines
Orchestrate ETL processes for large-scale data ingestion, can be created and run using Azure Data Factory, Azure Synapse Analytics, or Microsoft Fabric
Analytical Data Stores
Includes Data Warehouses, relational databases optimized for data analytics, and File-system based data lakes for large analytics.
Star Schema
A schema where numeric values from a transactional store are stored in central fact tables related to dimension tables, forming a star-like structure for data aggregation.
Snowflake Schema
An extension of a star schema where additional tables are added to represent dimensional hierarchies related to the dimension tables.
Data Lakehouses
A file store on a distributed file system for high-performance data access, supporting structured, semi-structured, and unstructured data for analysis without strict schema enforcement.
SQL Pools
In Azure Synapse Analytics, includes PolyBase to define external tables based on files in a data lake for querying using SQL.
Batch Processing
Processing method where data records are collected and processed together in a single operation, suitable for handling large datasets efficiently.
Stream Processing
Real-time data processing method where data is processed as individual units as they arrive, ideal for time-critical operations requiring instant responses.
Azure Synapse Analytics
A PaaS service for large-scale data analytics, combining data integrity and reliability of SQL Server with the flexibility of a data lake and Apache Spark.
Azure Databricks
Azure's implementation of the Databricks platform, built on Apache Spark for data analytics and data science with workload-optimized Spark clusters.
Microsoft Fabric
A unified Software-as-a-Service (SaaS) offering with OneLake architecture for scalable analytics, providing a single environment for data collaboration.
Stream Processing Architecture
Involves event data generation, capture, processing, and output, with technologies like Azure Stream Analytics, Spark Structured Streaming, and Azure Data Explorer.
Delta Lake
An open-source storage layer that adds support for transactional consistency, schema enforcement, and other data warehousing features to data lake storage.
Real-time Analytics
Utilizing streaming data in Spark-based data lakes or analytical data stores for immediate analysis.
Stream Processing
Data is processed continually as new data records arrive.
Azure Stream Analytics
Service used to continually capture data from an IoT Hub, aggregate it over temporal periods, and store results in Azure SQL Database.
Power BI Tools
Suite of tools and services for building interactive data visualizations for business users to consume.
Dimension Tables
Represent entities for aggregating numeric measures in data modeling.
Fact Tables
Store numeric measures associated with recorded events in data modeling.
Hierarchies
Enable drill-up or drill-down analysis to find aggregated values at different levels in analytical models.
Data Visualization
Various types include tables, bar/column charts, line charts, pie charts, scatter plots, and maps for effective communication of data.
Power BI Desktop
Tool used to import data from multiple sources, create data models, and design interactive reports for visualization.