Control: AWS > Well-Architected Tool > AWS Well-Architected Framework > Reliability
The Reliability pillar includes the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle. See Reliability for more information.
Primary Policies
The following policies can be used to configure this control:
- Reliability
- Reliability > REL 01. How do you manage service quotas and constraints?
- Reliability > REL 01. How do you manage service quotas and constraints? > Automate quota management
- Reliability > REL 01. How do you manage service quotas and constraints? > Accommodate fixed service quotas and constraints through architecture
- Reliability > REL 01. How do you manage service quotas and constraints? > Aware of service quotas and constraints
- Reliability > REL 01. How do you manage service quotas and constraints? > Manage service quotas across accounts and regions
- Reliability > REL 01. How do you manage service quotas and constraints? > Monitor and manage quotas
- Reliability > REL 01. How do you manage service quotas and constraints? > Ensure that a sufficient gap exists between the current quotas and the maximum usage to accommodate failover
- Reliability > REL 02. How do you plan your network topology?
- Reliability > REL 02. How do you plan your network topology? > Provision redundant connectivity between private networks in the cloud and on-premises environments
- Reliability > REL 02. How do you plan your network topology? > Use highly available network connectivity for your workload public endpoints
- Reliability > REL 02. How do you plan your network topology? > Ensure IP subnet allocation accounts for expansion and availability
- Reliability > REL 02. How do you plan your network topology? > Enforce non-overlapping private IP address ranges in all private address spaces where they are connected
- Reliability > REL 02. How do you plan your network topology? > Prefer hub-and-spoke topologies over many-to-many mesh
- Reliability > REL 03. How do you design your workload service architecture?
- Reliability > REL 03. How do you design your workload service architecture? > Provide service contracts per API
- Reliability > REL 03. How do you design your workload service architecture? > Build services focused on specific business domains and functionality
- Reliability > REL 03. How do you design your workload service architecture? > Choose how to segment your workload
- Reliability > REL 04. How do you design interactions in a distributed system to prevent failures?
- Reliability > REL 04. How do you design interactions in a distributed system to prevent failures? > Do constant work
- Reliability > REL 04. How do you design interactions in a distributed system to prevent failures? > Make all responses idempotent
- Reliability > REL 04. How do you design interactions in a distributed system to prevent failures? > Identify which kind of distributed system is required
- Reliability > REL 04. How do you design interactions in a distributed system to prevent failures? > Implement loosely coupled dependencies
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures?
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Set client timeouts
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Implement emergency levers
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Fail fast and limit queues
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Make services stateless where possible
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Implement graceful degradation to transform applicable hard dependencies into soft dependencies
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Control and limit retry calls
- Reliability > REL 05. How do you design interactions in a distributed system to mitigate or withstand failures? > Throttle requests
- Reliability > REL 06. How do you monitor workload resources?
- Reliability > REL 06. How do you monitor workload resources? > Automate responses (Real-time processing and alarming)
- Reliability > REL 06. How do you monitor workload resources? > Monitor end-to-end tracing of requests through your system
- Reliability > REL 06. How do you monitor workload resources? > Monitor all components for the workload (Generation)
- Reliability > REL 06. How do you monitor workload resources? > Define and calculate metrics (Aggregation)
- Reliability > REL 06. How do you monitor workload resources? > Send notifications (Real-time processing and alarming)
- Reliability > REL 06. How do you monitor workload resources? > Conduct reviews regularly
- Reliability > REL 06. How do you monitor workload resources? > Storage and Analytics
- Reliability > REL 07. How do you design your workload to adapt to changes in demand?
- Reliability > REL 07. How do you design your workload to adapt to changes in demand? > Use automation when obtaining or scaling resources
- Reliability > REL 07. How do you design your workload to adapt to changes in demand? > Load test your workload
- Reliability > REL 07. How do you design your workload to adapt to changes in demand? > Obtain resources upon detection that more resources are needed for a workload
- Reliability > REL 07. How do you design your workload to adapt to changes in demand? > Obtain resources upon detection of impairment to a workload
- Reliability > REL 08. How do you implement change?
- Reliability > REL 08. How do you implement change? > Deploy changes with automation
- Reliability > REL 08. How do you implement change? > Integrate functional testing as part of your deployment
- Reliability > REL 08. How do you implement change? > Deploy using immutable infrastructure
- Reliability > REL 08. How do you implement change? > Use runbooks for standard activities such as deployment
- Reliability > REL 08. How do you implement change? > Integrate resiliency testing as part of your deployment
- Reliability > REL 09. How do you back up data?
- Reliability > REL 09. How do you back up data? > Perform data backup automatically
- Reliability > REL 09. How do you back up data? > Identify and back up all data that needs to be backed up, or reproduce the data from sources
- Reliability > REL 09. How do you back up data? > Perform periodic recovery of the data to verify backup integrity and processes
- Reliability > REL 09. How do you back up data? > Secure and encrypt backups
- Reliability > REL 10. How do you use fault isolation to protect your workload?
- Reliability > REL 10. How do you use fault isolation to protect your workload? > Deploy the workload to multiple locations
- Reliability > REL 10. How do you use fault isolation to protect your workload? > Automate recovery for components constrained to a single location
- Reliability > REL 10. How do you use fault isolation to protect your workload? > Use bulkhead architectures
- Reliability > REL 11. How do you design your workload to withstand component failures?
- Reliability > REL 11. How do you design your workload to withstand component failures? > Automate healing on all layers
- Reliability > REL 11. How do you design your workload to withstand component failures? > Fail over to healthy resources
- Reliability > REL 11. How do you design your workload to withstand component failures? > Monitor all components of the workload to detect failures
- Reliability > REL 11. How do you design your workload to withstand component failures? > Send notifications when events impact availability
- Reliability > REL 11. How do you design your workload to withstand component failures? > Use static stability to prevent bimodal behavior
- Reliability > REL 12. How do you test reliability?
- Reliability > REL 12. How do you test reliability? > Test resiliency using chaos engineering
- Reliability > REL 12. How do you test reliability? > Conduct game days regularly
- Reliability > REL 12. How do you test reliability? > Use playbooks to investigate failures
- Reliability > REL 12. How do you test reliability? > Perform post-incident analysis
- Reliability > REL 12. How do you test reliability? > Test functional requirements
- Reliability > REL 12. How do you test reliability? > Test scaling and performance requirements
- Reliability > REL 13. How do you plan for disaster recovery (DR)?
- Reliability > REL 13. How do you plan for disaster recovery (DR)? > Automate recovery
- Reliability > REL 13. How do you plan for disaster recovery (DR)? > Manage configuration drift at the DR site or region
- Reliability > REL 13. How do you plan for disaster recovery (DR)? > Use defined recovery strategies to meet the recovery objectives
- Reliability > REL 13. How do you plan for disaster recovery (DR)? > Test disaster recovery implementation to validate the implementation
- Reliability > REL 13. How do you plan for disaster recovery (DR)? > Define recovery objectives for downtime and data loss
Category
In Your Workspace
Developers
- tmod:@turbot/aws-wellarchitected-framework#/control/types/rel
- tmod:@turbot/turbot#/control/categories/other
- turbot graphql controls --filter "controlTypeId:tmod:@turbot/aws-wellarchitected-framework#/control/types/rel"
Get Controls
Control Type URI
Category URI
GraphQL
CLI