Domain 1 - Managing and Optimizing Storage
Domain 1 - Managing and Optimizing Storage
Amazon Elastic Block Storage (EBS)
Definition: Amazon Elastic Block Storage (EBS) is a high-performance, block-level storage service designed for use with Amazon EC2 instances for both throughput and transaction-intensive workloads.
Unlike Object Storage (S3), EBS provides raw block device access.
Availability and Durability:
Both the EBS volume and EC2 instance must reside in the same Availability Zone (AZ).
EBS volumes are automatically replicated within their AZ to protect against hardware failure, offering high durability.
Volume Lifecycle Management:
Volume Creation: Can be provisioned as empty or from an existing Snapshot. The volume status will transition to
availableonce ready for attachment.Attachment: A volume in the
in-usestate is attached to an instance.Volume Deletion: Deletion is prohibited while the volume is attached.
Root Volumes: By default, the
DeleteOnTerminationattribute is set toTruefor root volumes. For data persistence, this must be manually toggled toFalse.
Storage Metrics and Performance
IOPS (Input/Output Operations Per Second): A measure of the number of reads/writes a volume can perform per second. Crucial for database workloads.
Queue Length: The number of pending I/O requests. If this value is consistently high relative to IOPS, it indicates a performance bottleneck.
Throughput: Measured in MiB/s, representing the total volume of data moved. High throughput is vital for streaming and big data analysis.
Latency: The time taken for a single I/O unit to complete its round trip. High latency typically signifies that the volume has reached its performance limit.
Burst Balance: Applicable to
gp2,st1, andsc1.Volumes earn specific credits when operating below their baseline performance.
When the workload spikes, the volume consumes these credits to "burst" above the baseline. If
BurstBalance = 0, the volume is throttled to its baseline performance.
Detailed Volume Types
Solid State Drives (SSD)
General Purpose SSD (gp2/gp3):
gp2: Performance scales with volume size () with a minimum of IOPS and a burst up to IOPS.
gp3: Decouples performance from storage size. Provides a baseline of IOPS and regardless of volume size.
Provisioned IOPS SSD (io1/io2):
Designed for I/O-intensive database workloads.
io2 Block Express: Offers sub-millisecond latency and up to IOPS.
Hard Disk Drives (HDD)
Throughput Optimized HDD (st1): Focused on throughput () rather than IOPS. Good for Log processing.
Cold HDD (sc1): Lowest cost for infrequently accessed workloads.
Magnetic (Standard): Previous generation, rarely used in modern architectures.
RAID Configurations for EBS
RAID 0 (Striping): Used to increase total IOPS/Throughput by spreading data across multiple volumes. However, loss of one volume results in data loss for the whole set.
RAID 1, 5, 6: Generally redundant for EBS because Amazon already replicates the data at the hardware level. RAID 5/6 specifically incur a heavy parity-calculation overhead that degrades performance on network-attached storage.
Monitoring and Health Checks
Volume Status Checks: EBS sends metrics to CloudWatch every 1 minute.
Okay: Everything is functioning normally.
Impaired: The volume is unavailable or I/O is stalled.
Data Consistency: If AWS detects a potential inconsistency, I/O may be disabled. You must enable the
Auto-Enabled IOattribute or acknowledge the inconsistency via the CLI to resume service.OS Level Monitoring: Use
iostat -xdmzt 1on Linux or Perfmon on Windows to identify "Micro-bursting" (latency spikes shorter than the 1-minute CloudWatch polling interval).
Instance Store (Ephemeral Storage)
Characteristics:
Physically attached to the host server, resulting in very low latency and high IOPS.
Data Volatility: Data is lost if the instance is stopped, hibernates, or fails. Data persists only during instance reboots.
Use Cases: Temporary files, scratch space, distributed file systems (like HDFS), and swap files.
Modifying Volumes (Elastic Volumes)
Modifications: You can increase size, change volume type (e.g.,
gp2toio1), or adjust IOPS/throughput on the fly without downtime.Cooldown Period: Once a modification starts, you must wait at least hours before modifying the same volume again.
File System Extension: After the EBS volume is resized in AWS, the OS-level file system (e.g., ext4, xfs, or NTFS) must be extended using commands like
resize2fsorxfs_growfs.
Multi-Attach
Available for io1 and io2 volumes on Nitro-based instances.
Allows up to instances to mount the same volume simultaneously.
Requires a cluster-aware file system (e.g., GFS2, OCFS2) to manage write-locking and data integrity.
EBS Snapshots and Data Lifecycle
Incremental Nature: Only the blocks changed since the last snapshot are stored, reducing storage costs.
Fast Snapshot Restore (FSR): Eliminates the need for pre-warming (reading all blocks once) by ensuring the snapshot is instantly available at maximum performance.
Amazon Data Lifecycle Manager (DLM):
Automates snapshot creation and deletion based on tags.
Supports cross-region copy and cross-account sharing policies.
Recycle Bin: Provides a safety net for snapshots and AMIs. Deleted snapshots are retained in the Recycle Bin for a specified period () before being permanently purged.
Amazon EFS (Elastic File System)
Protocol: Managed NFS (Network File System) for Linux-based workloads.
Storage Tiers:
Standard: For active data.
Infrequent Access (IA): Significantly cheaper; data is moved here automatically by Lifecycle Management if not accessed for a set period ( days).
Performance Modes:
General Purpose: Standard mode.
Max I/O: For massive scale; higher latency but higher aggregate throughput.
Amazon FSx (Specialized File Systems)
FSx for Windows File Server: Fully managed native Windows SMB file system (supports NTFS and Active Directory).
FSx for Lustre: Designed for High-Performance Computing (HPC), machine learning, and video processing. Can process data directly from S3.
FSx for NetApp ONTAP: Provides the full capabilities of the NetApp ONTAP file system in the cloud.