1/62
Flashcards for AWS Well-Architected Framework and related concepts.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Cloud Architects
Engage with decision makers to identify business goals
Ensure alignment between technology deliverables and business goals
Work with delivery teams to ensure technology features are appropriate.
With well-architected systems, greatly increase the likelihood of business success
Architecture
Art and science of ________ and ________ large structures
Large systems require architects to manage their ________ and ________
Designing, Building
Size, Complexity
What is the AWS Well-Architected Framework?
A guide for designing infrastructures
Provides a set of foundational questions and best practices that can help evaluate and implement your cloud architectures
This gets constructedafter AWS reviewed thousands of customer architectures on AWS
The guide for designing these infrastructures consists of:
Secure
High-performing
Resilient
Efficient
What is the approach that the AWS Well-Architected Framework provides?
Consistent approach to evaluating and implementing cloud architectures
Framework provides best practices that were developed through lessons learned by reviewing customer architectures
Six Pillars of the AWS Well-Architected Framework
Operational Excellence
Security
Reliability
Performance Efficiency
Cost-Optimization
Above were a part of the framework since 2015
Sustainability
Added in 2021 to help organizations learn how to minimize the environmental impacts of running cloud workloads
Pillar Organization
Name the best practice area:
Question Text:
Question Context:
Best Practice Area: Identity and Access Management
Question Text: Sec 1: How do you manage credentials and authentication?
Question Context:
Credential and Authentication mechanisms include password, tokens, and keys that grant access directly or indirectly in your workload
Protect credentials with appropriate mechanisms to help reduce the risk of accidental or malicious use
Examples of Best Practices
Define requirements for IAM
Secure AWS account root user
Enforce use of MFA
Automate enforcement of access controls
Integrate with centralized federation provider
Enforce password requirements
Rotate credentials regularly
Audit credentials periodically
The relationship between the pillars and the best practices
Each pillar INCLUDES a set of design principles and best practice area
Each best practice area aligns to questions a reviewer should ask when designing an architecture
AnyCompany Background
John Doe founded AnyCompany in 2008
Sells high-quality 3D printed cityscapes of neighborhoods that enable you to see individual buildings and trees
Cityscapes are printed in colors, brickwork, roofs, gardens, and even cars in their correct coloration
AnyCompany wants to apply for private investment to fund their growth until IPO
John and the board want YOU to perform an independent review of the tech platform to make sure that it will pass due diligence
Created an account on AWS and created his first EC2 instance (cloud-native)
Team of 5, all use tech, AWS account root user credentials are with the team
Three main departments of AnyCompany
Fly and Snap
Show and Sell
Make and Ship
Fly and Snap - Major Parts
1) Image Acquisition
2) Preprocessing
3) Storage
Other departments request imagery from Fly and Snap
Show and Sell - Major Parts
1) Promoting
2) Selling
3) Working with customers
Sends orders and requests imagery
Make and Ship - Major Parts
a) Manufacturing of Products and Delivery
Tracks orders from show and sell, requests imagery from Fly and Snap
Fly and Snap - The Process
Multiple devices are mounted on aircraft to capture imagery of major cities
Capture Machine has an external storage array
Connected to the flight system and captures navigation data
Ingest Machine creates a compressed archive of the storage array and sends it to an EC2 Instance Preprocessor machine
Preprocessor machine processes and sets up the archive to be written to the tape (backup)
Extracts all the assets and stores them in an Amazon S3 Bucket
Storage array is cleared and ready for the next flight
Notifies the Imagery Service
Uses the flight information to compute a 3D orientation and location for every moment of the flight
Correlates to the imagery file timestamps
Stored in a RDBMS (relational database management system) based in Amazon EC2, includes links to assets in S3
Show and Sell - The Process
Elastic Load Balancing and Auto Scaling Group of EC2 instances run a content management system
Static assets are stored in an S3 bucket
Customers are able to select location on a map, and see a preview of the cityscape
Can choose the physical size of the map, color scheme, and option to place LED holes in the map
Mapping service correlates the map location input from the website with the Imagery service to confirm if imagert is available
Order Service pushes the order to production
Recorded in the Show and Sell database
Type of RDBMs that are based in the EC2
Praces a message on Production Queue, allowing the Render service to indicate when the preview video is available
Reads from Order Status Queue and records status changes in the database
Customers have the ability to track their own order through manufacturing and see when it has dispatched
Make and Ship - The Process
Render Service is a fleet of large instances
Takes orders from Production queue and generates the 3D models (assets) for S3 Bucket
Uses 3D models to create flyby videos so that customers can preview their orders
Order is placed → Print Queue with a link to the 3D model
Status updates are posted to the Order Status Queue
Presented on AnyCompany website
Print Conductor takes orders from the queue and sends them to the next available printer
Sends updates to the Order Status Queue
Sends final update when the order has been completed
Operational Excellence Pillar - Focus
Deliver Business value
Run and monitor systems to deliver business value, and to continually improve supporting proceses and procedures
Operational Excellence Pillar - Key Topics
1) Automating changes
2) Responding to events
3) Defining standards to manage daily operations
Operational Excellence - Principles
1) Perform operations as code
2) Make frequent small reversible changes
3) Refine operations procedures frequently
4) Anticipate Failure
5) Learn from all operational events and failures
What best practice areas are covered by Operational Excellence?
1) Organization
2) Prepare
3) Operate
4) Evolve
Operations Team
What is their goal and what do they do?
Goal:
Teams must understand business and customer needs so they can effectively and efficiently support business outcomes (prepare and organization)
Job:
Create and use procedures to respond to operational events and validate the effectiveness of procedures to support business needs (operate)
Collect metrics and measure achievements
Important to design operations that evolve in response to changes and incorporate lessons through their performance (evolve)
Security Pillar - Focus
Protect and monitor systems
Protect information, systems, and assets while delivering at a high business value
Done through risk assessments and mitigation strategies
Security Pillar - Key Topics
Protecting confidential information and the integrity of data
Identifying and managing who can do what
Protecting systems
Establishing controls to detect security breaches
Security Design Principles
1) Implement a strong identity foundation (such as principle of least privilege)
2) Enable traceability
3) Apply security at all layers
4) Automate security best practices
5) Protect data in transit AND at rest
6) Keep people away from data
7) Prepare for security events
What best practice areas are covered by Security?
1) Security
2) IAM
3) Detection
4) Infrastructure protection
5) Data protection
6) Incident Response
Security practices must be put into place…
CONTROL who can do what
Be able to identify security incidents and protect systems and services
Find a way to maintain confidentiality and integrity of data through data protection
Prevent financial loss/complying with regulatory obligations
Reliability Pillar - Focus
Recover from failure and mitigate disruption
Ensure a workload performs its intended function correctly and consistently when it’s expected to
Recovers from failures to meet business and customer demand
Reliability Pillar - Key Topics
Designing Distributed systems
Recovery planning
Handling change
Reliability Principles
1) Automatically recover from failure
2) Test recovery procedures
3) Scale horizontally to increase aggregate workload availability
4) Stop guessing capacity
5) Manage change in automation
What best practice areas are covered by Reliability?
1) Foundations
2) Workload architecture
3) Change management
4) Failure management
What does a system need to achieve Reliability?
System must have both a well-planned foundation and monitoring in place (foundation and workload architecture)
Must have mechanisms for handling changes in demand or requirements (change management)
System should be designed to detect failure and automatically heal itself (failure management)
Performance Efficiency Pillar - Focus
Use resources sparingly
Use IT and computing resources sparingly and efficiently to meet system requirements
Done to MAINTAIN efficiency as demand changes and technologies evolve
Performance Efficiency Pillar - Key Topics
Selecting the right resource types and sizes based on workload requirements
Monitoring performance
Making informed decisions to maintain efficiency as business needs evolve
Performance Efficiency - Principles
1) Democratize advanced technologies
2) Go global in minutes
3) Use serverless architectures
4) Experiment more often
5) Consider mechanical sympathy
What best practice areas are covered by Performance Efficiency?
1) Selection
2) Review
3) Monitoring
4) Trade Offs
How do I create a high-performance architecture for Performance Efficiency?
Gather data on all aspects of the architecture (selection)
Review periodically to ensure that you are taking advantage of AWS services (review)
Perform monitoring so that you become aware and can take prompt action action to remediate them (monitor)
Use tradeoffs in your architecture to improve performance (trade offs)
Cost Optimization Pillar - Focus
Focus is to eliminate Unneeded expense
Avoid Unnecessary costs
Cost Optimization Pillar - Key Topics
Understanding and controlling where money is being spent
Selecting the most appropriate and right number of resource types
Analyzing spend over time
Scaling to meeting business needs without overspending
Cost Optimization Design - Principles
1) Implement Cloud Financial Management
2) Adopt a consumption model (pay only for what you require)
3) Measure overall efficiency
4) Stop spending money on undifferentiated heavy lifting
5) Analyze and attribute expenditure
What best practice areas are covered by Cost-Optimization?
1) Practice cloud financial management
2) Expenditure and usage awareness
3) Cost-effective resources
4) Manage demand and supply resources
5) Optimize over time
AWS Well-Architected Tool
A service that helps assess workloads against best practices and provides recommendations for improvement
Reviews the state of your workloads and compares them to the latest AWS best practices
Offering step-by-step guidance on building better workloads for the cloud
Provides a consistent approach for you to review and measure your cloud architectures
Werner Vogels
“Everything fails, all the time”
What are the plans for dealing with failure?
Architect the applications and workloads to withstand failure
This can be done using 2 important factors that cloud architectures consider when designing architectures to withstand failures:
1) Reliability
2) Availability
Reliability (in context of failure)
Measure of your system’s ability to provide functionality when desired by the user
Includes all system components: Hard, Firmware, and Software
Most importantly -
Probability that your entire system will function as intended for a specific period of time
MTBF
Mean Time Between Failure
Total time in service over number of failures
The equation is MTTF + MTTR.
MTTF
Mean Time To Failure
Length of time the application is available
MTTR
Mean Time To Repair
Length of time it takes to repair the application
Availability
Percentage of time that a system is operating normally or correctly performing expected operations
Normal operation time / total time
Value is reduced anytime the application IS NOT operating normally
This can be schedulued or unscheduled interruptions
Percentage of Uptime
Length of time a system is online between failures OVER a period of time (such as one year)
Number of 9s
Example: Five 9s means 99.999 percent availability and this is a way to refer to availability
High Availability
Software that can withstand some measure of degradation while still remaining available
Shared resources that cooperate to guarantee essential services
Downtime is minimized as much as possible
They call this rapid restoration (often less than 1 minute)
As part of this, minimal human intervention is required
99%
Max Disruption
Examples of Types of Application
3 days 15 hours
Batch Processing, data extraction, transfer and load jobs
99.9%
Max Disruption
Examples of Application Category
8 hours 45 minutes
Internal tools like knowledge management, project tracking
99.95%
Max Disruption
Examples of Application Category
4 hours and 22 minutes
Online commerce, point of sale
99.99%
Max Disruption
Examples of Application Category
52 minutes
Video delivery, broadcast systems
99.999%
Max Disruption
Examples of Application Category
5 minutes
ATM transactions, telecommunications systems
What are the 3 factors that influence availability?
1) Fault Tolerance
2) Recoverability
3) Scalability
Fault Tolerance
Built-in redundancy of an application’s components and its ability to remain operational even if some of its components fail
Relying on specialized hardware for failure detection and switch to the redundant (backup) component
Recoverability
Process, policies, and procedures that are related to restoring service after a catastrophic event (such as a failure)
Scalability
Ability of an application to accommodate increases in capacity needs, remain available, and without changing design.
AWS Trusted Advisor - What is it?
Online tool that provides real-time guidance to help you provision your resources following AWS best practices
Looks at your entire AWS environment and gives real-time recommendations in 5 different categories
What are the 5 major categories of the AWS Trusted Advisor?
1) Cost Optimization
Assists in eliminating unused and idle resources via making commitments to reserved capacity
2) Performance
Improves performance of services by checking service limits, ensuring that you take advantage of provisioned throughput
3) Security
Closes gaps, enables various AWS security features
4) Fault Tolerance
Increases availability and redundancy of AWS applications by taking advantage of automatic scaling, health checks, Multi-AZ deployments and backup capabilities
5) Service Limits
Checks for service usage that is more than 80 percent of the service limit