AWS outage: building operational resilience to withstand cloud disruption

Updated as of: 22 October 2025

A recent outage at AWS grounded online platforms globally, from banking and messaging apps to UK HMRC. With cloud dependency rising and regulations tightening, prioritising operational resilience is no longer optional.

Shuetterstock.com/amgun

An outage at the world’s largest cloud computing service, Amazon Web Services (AWS) on 20 October 2025 caused widespread disruption across platforms from banking apps and communication channels, like Signal and Zoom, to Amazon’s own suite of digital services. 

Over 4 million companies rely on AWS’ cloud services worldwide and thousands were affected directly by the outage

Shortly after 8AM British Standard Time (BST), Amazon confirmed “increased error rates and latencies” for AWS services on the US East Coast. The disruption continued to ripple elsewhere, hitting services across the globe. 

AWS says the outage was caused by an error with its DynamoDB database service, a system for managing database tables and indexes.

The incident underscores a major weakness in the cloud computing space: a lack of diversification. The infrastructure underpinning the global economy, including critical services providers, is dependent on just a handful of dominant companies: AWS, Microsoft and Google.

The fall-out from the CrowdStrike incident of July 2024, which impacted services hosted by Microsoft, was even more widespread, estimated to have cost US Fortune 500 companies at least US$5.4 billion.

Lexology PRO considers the increasing urgency for companies to prioritise operational resilience in the event of a cloud service provider incident, and measures for achieving this. 

Operational resilience in the spotlight

Major companies are dependent on third and fourth-party technology providers, while the expectation for critical business services to be maintained, or swiftly resumed, during a major disruption is mounting. 

Companies in highly regulated industries, including financial services, healthcare and energy, face the greatest pressure to prioritise risk management and operational resilience. 

These expectations are backed by regulatory requirements, from the Bank of England’s Critical Third-Party framework, the EU Digital Operational Resilience Act 2022 (DORA 2022) and the Canadian OSFI’s Operational Resilience and Operational Risk Management framework

The UK Financial Conduct Authority (FCA) outlined key lessons for financial institutions (FIs) to boost operational resilience following the disruption caused by the CrowdStrike outage, requiring companies to implement the changes such as mapping third- and nth-party relationships before 31 March 2025. 

Banking services, including Lloyds Bank, Bank of Scotland and Halifax were impacted by the latest outage at AWS, demonstrating what can go wrong in financial services when vendors suffer outages. 

The incident has also raised questions around security, particularly whether countries based outside the US should be so dependent on US tech. “Europe’s dependency on monopoly cloud companies like Amazon is a security vulnerability and an economic threat we can’t ignore,” Cori Crider, executive director of the Future of Technology Institute told the media. 

Tips for building operational resilience

Part of building operational resilience is accepting that disruption is inevitable. Companies, therefore, must plan accordingly, mapping third-party relationships and developing a robust testing programme, among other measures, to minimise disruption when IT outages do occur.

Expert insights

"It's essential to ensure that business continuity and disaster recovery plans are thoroughly documented and tested, so everyone knows how to respond in a crisis. Key to building operational resilience is avoiding a single point of failure. Make use of other availability zones and regions – the AWS incident occurred in the US-East-1 region, and half the internet went down because everyone uses the same region. Meanwhile, other regions in the US and EU were still up and running. 

Adopting a multi-cloud strategy, while expensive, could save a global business from catastrophe. This should ideally involve stateless architecture patterns, so apps can restart in new environments with minimal overhead,” says security expert Eyitemi Egbejule.

Assess concentration risk

Concentration risk refers to an organisation’s level of dependency on a single third-party provider: the more dependency is concentrated, the greater the risk of adverse impacts if something goes wrong. 

To mitigate this, it’s vital that organisations have a clear understanding of the dependencies between their business processes, ICT platforms, software, and third-party relationships.

Once third-party relationships have been mapped, companies can identify the level of concentration of critical services with a single external provider, ascertaining where to focus when managing concentration risk.

If dependency concentration is so that it presents an intolerable risk, companies should consider feasible alternatives.

Review impact tolerances

Companies should set impact tolerances for each of their core business services and review these regularly. This means defining the maximum tolerable level of disruption to important business services, which will inform risk and crisis management planning. 

Companies will need to identify and map important services, then gather baseline data and assess how disruption could impact end users of these services to propose appropriate impact tolerances. 

In regulated industries, setting impact tolerances may be mandatory for compliance. 

Establish a comprehensive testing programme

Companies should have a comprehensive testing programme in place to ensure the effectiveness of their operational resilience measures. Testing should also help identify any legacy or emerging resilience gaps that need addressing. 

Testing will involve identifying severe but plausible scenarios across an appropriate range of adverse circumstances, varying in nature, severity, and duration, tailored to the company’s specific risks and vulnerabilities. Companies should consider including vendors in testing to better understand their capability to remain within impact tolerances. 

Develop a crisis plan

A robust crisis management plan is essential to ensure companies can respond swiftly and effectively following a third-party IT outage. 

Crisis plans should set out contingencies for every aspect of the business and a list of key people and their designated responsibilities.

There is no “one-size-fits-all” approach in a crisis. Companies need to consider the various possible outcomes and build enough flexibility into their plans to be able to adapt as new information and developments arise. 

The plan should also identify the key internal and external stakeholders who need to be informed in the event of a crisis and designate responsibility for communication.

Create backups

Best practice dictates that companies should have backup sites situated in different physical locations, to help ensure IT systems and data can be restored with minimal downtime, limited disruption and loss. 

Backups should be capable of ensuring the continuity of important business functions, comparable to the primary site, and be securely protected from any unauthorised access or other privacy breaches. 

Under certain legislation, including EU DORA 2022, regulated companies must comply with mandatory minimum requirements for backups to ensure a baseline of operational resilience. 

See Lexology PRO’s interactive Compliance Calendar for key upcoming deadlines and dates in core compliance areas throughout 2025, including enforcement dates, reporting deadlines and changes to regulations.  

Track the latest data protection updates from authorities around the world using Scanner, Lexology PRO’s automated regulatory monitoring tool.

Stay up to date with key developments and in-depth articles by following Lexology’s Data hub.