1/8
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reliability
Ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues
Test recovery procedures
Use automation to simulate different failures or to recreate scenarios that led to failures before
Automatically recover from failure
Anticipate and remediate failures before they occur
Scale horizontally to increase aggregate system availability
Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure
Stop guessing capacity
Maintain the optimal level to satisfy demand without over or under provisioning
Manage change in automation
Use automation to make changes to infrastructure
Foundations Apps
IAM, Amazon VPC, Service Quotas, AWS Trusted Advisor
Change Management Apps
AWS Auto Scaling, Amazon CloudWatch, AWS CloudTrail, AWS Config
Failure Management Apps
Backups, AWS CloudFormation, Amazon S3, Amazon S3 Glacier, Amazon Route 53