1/109
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What are the three main types of cloud service models?
SaaS (Software as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service).
What is a Data Lake?
A centralized repository that allows you to store all your structured and unstructured data at any scale.
What is the purpose of a Data Warehouse?
To store and analyze structured data from various sources for business intelligence.
What command protocols do disks use to communicate with controllers?
Disks communicate using protocols based on either ATA (Advanced Technology Attachment) or SCSI (Small Computer System Interface).
What is the main characteristic of HDDs (Hard Disk Drives)?
They use mechanical heads to read/write data on spinning magnetic disks.
What distinguishes SSDs (Solid State Drives) from HDDs?
SSDs use NAND flash memory with no moving parts, making them faster and more durable.
What are SAS Drives used for?
They are commonly used in enterprise/server environments for high reliability and performance.
What is the typical rotational speed of SAS disks?
10,000 or 15,000 RPM.
What are the key advantages of SSDs?
Speed, reliability, durability, energy efficiency, and silent operation.
What limitations do SSDs have compared to HDDs?
Higher cost per gigabyte, smaller capacity, and limited write endurance.
What is the role of a storage controller?
It manages data flow between a server and its storage devices, impacting performance and reliability.
What does a storage controller virtualize?
It virtualizes all physical disks connected to it, presenting one or more virtual disks called Logical Unit Numbers (LUNs).
What are the common command sets for HDDs?
SCSI and ATA (IDE/SATA).
What is the typical cost comparison between SSDs and HDDs?
SSDs are typically more expensive per gigabyte than HDDs.
What is the typical data transfer speed of SSDs compared to HDDs?
SSDs can achieve speeds upwards of 500 MB/s, while HDDs are typically slower at 30-160 MB/s.
What is the primary function of a disk's command protocol?
To facilitate communication between disks and disk controllers.
What is the significance of wear leveling in SSDs?
It helps manage the limited number of write cycles for flash memory cells, extending the lifespan of the SSD.
What is the typical capacity of HDDs compared to SSDs?
HDDs typically offer larger capacities at a lower cost per gigabyte than SSDs.
What type of drives are USB flash drives classified as?
Removable drives using the USB Mass Storage command set.
What is the main advantage of using SATA disks?
They are low-cost, high-capacity disks ideal for bulk storage applications.
What is the primary use case for optical drives?
To read data from CDs, DVDs, or Blu-ray discs using lasers.
What is the purpose of a controller in storage systems?
The controller splits disks into small pieces called physical extents, from which new virtual disks (LUNs) are created for the operating system.
What does RAID stand for and what is its primary function?
RAID stands for Redundant Array of Independent Disks, and it combines multiple physical disks into a single logical unit to improve performance, increase storage capacity, and provide data redundancy.
What are the advantages of RAID 0?
RAID 0 offers very high performance through striping but provides no data redundancy.
What is the main benefit of RAID 1?
RAID 1 provides high redundancy and data protection by mirroring data across two or more disks.
How does RAID 5 ensure data protection?
RAID 5 uses striping with distributed parity, allowing recovery from a single disk failure.
What is the key feature of RAID 6?
RAID 6 supports striping with double parity, allowing it to survive two simultaneous disk failures.
What is RAID 10 and its advantages?
RAID 10 combines striping and mirroring, offering high performance and excellent fault tolerance but at a higher cost due to reduced usable storage.
What is Direct Attached Storage (DAS)?
DAS is a storage device directly connected to a single server or computer, providing high data transfer speeds but limited scalability.
What is Network Attached Storage (NAS)?
NAS is a standalone storage system connected to a network, allowing multiple clients to access files using file-sharing protocols.
What distinguishes a Storage Area Network (SAN) from DAS and NAS?
SAN is a high-speed network providing block-level storage access to multiple servers, designed for environments needing high reliability and massive data throughput.
What are the key characteristics of data that drive infrastructure decisions?
Key characteristics include volume, variety, velocity, value, and veracity.
What type of data does structured data typically include?
Structured data includes information from business applications like HRMS, ERP, and CRM, which is organized in a predictable format.
What is an example of unstructured data?
Unstructured data includes documents like PDFs, images, and media files that do not fit into traditional databases.
What is the significance of data lakes in modern data architecture?
Data lakes store vast amounts of raw data in its native format, allowing for flexible data analysis and processing.
What does the term 'data silos' refer to?
Data silos refer to isolated data repositories that are not easily accessible or integrated with other data systems.
What is the difference between OLTP and data warehouses?
OLTP (Online Transaction Processing) systems are designed for transaction-oriented applications, while data warehouses are optimized for analytical queries and reporting.
What is the role of block-level access in SAN?
Block-level access allows servers to interact with SAN storage as if it were local disks, providing high performance and flexibility.
What are the minimum drive requirements for RAID 5?
RAID 5 requires a minimum of 3 drives.
What is the main disadvantage of RAID 0?
RAID 0 has no fault tolerance; the failure of one disk results in total data loss.
What is the primary use case for RAID 6?
RAID 6 is best used for large-capacity arrays needing extra data protection due to its ability to survive two disk failures.
What is the typical setup for RAID 10?
RAID 10 requires a minimum of 4 disks and combines the benefits of both striping and mirroring.
What is the primary advantage of NAS over DAS?
NAS allows multiple users to access files over a network, providing better scalability and collaboration.
How does RAID 1 handle disk failures?
RAID 1 can survive the loss of one disk by using the mirrored copy to recover data.
What is the impact of data velocity on infrastructure decisions?
Data velocity refers to the speed at which data is created and needs to be ingested, influencing the choice of storage and processing solutions.
What is a key feature of cloud-based data storage?
Cloud-based storage offers scalability and accessibility from anywhere, often with pay-as-you-go pricing models.
What are the limitations of relational databases for analytics?
Relational databases cannot scale effectively for analytics and AI/ML due to their rigid schemas and inability to handle complex data relationships.
What types of data do big data systems need to store?
Big data systems need to store huge volumes of unstructured and semi-structured data.
What are the three main types of data storage?
File Storage, Block Storage, and Object Storage.
How does block storage function?
Data is split into fixed blocks and stored separately with unique identifiers, typically used for hard disk drives and frequently updated data.
What are the use cases for file storage?
File archival and file storage servers, where data is saved in a single file with a specific file extension.
What distinguishes object storage from file storage?
Object storage divides data into self-contained units called objects, stored in a flat environment without folders, and includes extensive metadata.
What are the advantages of object storage?
Object storage is more cost-effective, offers unlimited scalability, and is suitable for high volumes of unstructured data.
What factors should be considered when choosing a purpose-built database?
Application workload, data shape, performance requirements, and operations burden.
What is the primary purpose of a data warehouse?
To bridge the gap between operational data and business intelligence by aggregating and cleaning data for analysis.
What are the challenges faced by data warehouses?
Cost challenges, modernization challenges, scaling challenges, and issues with data freshness.
Who are the typical users of a data lake?
Data scientists and data analysts.
What is the difference between a data mart and a data warehouse?
A data mart is a smaller, application-specific data warehouse tied to a specific team or line of business.
What is the typical data type stored in a data warehouse?
Structured, cleaned data with a known schema, aggregated and inserted in batches.
What is the significance of metadata in object storage?
Metadata provides contextual information about the data, enhancing processing and usability.
What are the typical access methods for object storage?
Access and management of objects require an Application Programming Interface (API).
What are the scalability characteristics of block storage?
Block storage has limited scalability compared to object storage.
What is the main advantage of file storage's hierarchical structure?
It makes it easier to find and manage files through organized folders and subfolders.
What is the purpose of ETL in data warehousing?
ETL (Extract, Transform, Load) is used to aggregate and clean operational data before inserting it into the data warehouse.
How does data freshness impact data warehouses?
Data warehouses often struggle with keeping data fresh and current, which can hinder timely decision-making.
What is the typical storage cost comparison between object storage and block storage?
Object storage is generally more cost-effective than block storage.
What is a key characteristic of data lakes regarding data structure?
Data lakes are highly configurable and do not restrict data to a set schema.
What is a common use case for block storage?
Databases and email systems, where data is frequently updated.
What is the impact of geographical distance on block storage performance?
Latency may become an issue if the application and block storage are geographically far apart.
What is the role of data protection in storage solutions?
Data protection is essential to safeguard stored data from breaches and cybersecurity threats.
What are the main challenges associated with data lakes?
Governance challenges, total-cost of ownership (TCO) challenges, scaling challenges, and agility challenges.
What is the significance of Azure Blob Storage?
It allows unstructured data to be stored and accessed at a massive scale and supports enterprise big data analytics solutions.
What is the difference between a data lake and a data warehouse?
Data lakes store raw data in its native format, while data warehouses store structured data that has been processed for analysis.
What is Azure Files used for?
It offers fully managed cloud file shares accessible from anywhere and is suitable for applications using native file system APIs.
What does Azure Queues provide?
Asynchronous message queueing between application components to decouple them and facilitate communication.
What is the purpose of Azure Elastic SAN?
It simplifies deploying, scaling, managing, and configuring a SAN while providing built-in cloud capabilities like high availability.
What is the role of redundancy in Azure Storage?
Redundancy ensures data protection from planned and unplanned events, maintaining availability and durability targets.
What are the two options for data replication in Azure Storage's primary region?
Locally Redundant Storage (LRS) and Zone-Redundant Storage (ZRS).
What is Locally Redundant Storage (LRS)?
LRS synchronously copies data three times within a single physical location in the primary region.
What is Zone-Redundant Storage (ZRS)?
ZRS synchronously copies data across three Azure availability zones in the primary region for high availability.
What is Azure Container Storage used for?
To dynamically provision persistent volumes for stateful applications running on Kubernetes clusters.
What are the key components of Azure Storage architecture?
Blob containers, file shares, tables, and queues managed through a common Azure resource called a storage account.
What does Azure Tables allow you to do?
Store structured NoSQL data in the cloud with a key/attribute store and a schemaless design.
What is the purpose of Azure Disks?
To allow data to be persistently stored and accessed from an attached virtual hard disk.
What are the well-architected pillars for Blob storage?
Scalability, durability, availability, optimized for data lakes, comprehensive data management, and security.
What is the importance of data governance in data lakes?
It addresses compliance concerns and security issues that arise from managing large volumes of data.
What is the impact of resource utilization on TCO in data lakes?
Unmanaged resource utilization can lead to high costs and inefficiencies in on-premises data lakes.
What does Azure Storage's comprehensive data management involve?
End-to-end lifecycle management, policy-based access control, and immutable storage.
What challenges arise from data silos in data lakes?
They complicate data access and integration, leading to inefficiencies in data analytics.
What is the role of APIs in data lakes?
APIs facilitate data access and integration with various applications and services.
What does the term 'data marts' refer to?
Subsets of data warehouses that focus on specific business areas or departments.
What is the significance of having multiple clouds in data architecture?
It allows for flexibility and scalability but introduces complexity in governance and data management.
What is the concept of 'self-service' in data lakes?
It allows users to access and analyze data without needing extensive IT support.
What is the challenge of data duplication in data lakes?
It leads to inconsistencies and confusion regarding data definitions and usage.
What does LRS stand for in Azure Storage?
LRS stands for Locally Redundant Storage.
What is the primary benefit of using LRS?
LRS is the lowest-cost redundancy option and protects against server rack and drive failures.
What is a major risk associated with LRS?
LRS does not protect against disasters such as fire or flooding within the data center.
What does Microsoft recommend to mitigate the risks of LRS?
Microsoft recommends using Zone-Redundant Storage (ZRS), Geo-Redundant Storage (GRS), or Geo-Zone-Redundant Storage (GZRS).
In what scenario is LRS a good choice?
LRS is suitable for applications storing data that can be easily reconstructed if lost.
How does ZRS ensure data availability during zone outages?
ZRS allows read and write operations even if one zone becomes unavailable.