0% found this document useful (0 votes)

48 views10 pages

Uptime Institute 2025 Outage Analysis

Uploaded by

omarzahran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views10 pages

Uptime Institute 2025 Outage Analysis

Uploaded by

omarzahran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RISK &

RESILIENCY

Uptime Institute data shows outages are common,

costly, and preventable
FOCUS | JUNE 2018
Andy Lawrence, executive director of Research, Uptime Institute

New Uptime Institute research data shows that downtime is common and may
even be increasing, despite many advances. Complexity and extensive use of
third parties has made life more difficult for management.
Critical systems and data centers are immeasurably more reliable than they were two or three
decades ago. In most cases, problems are identified and resolved before users and customers
notice. In the new cloud world, distributed architectures, traffic management, and low-cost
replication mean that IT can re-route many failures.

All of this appears to represent significant progress. But new research by Uptime Institute
suggests far from a simple picture, with failures and downtime still common and possibly even
increasing. When problems do occur, recovery times can be lengthy, fault diagnosis can be
complicated by interlocking or interdependent systems, and costs can be significant because of
ever-greater reliance on IT in every area of society and business.

Uptime Institute also found that there has been little research into the causes and costs of
downtime in IT, both at a macro level (largely due to a reluctance by organizations to report failures
externally) and at a micro level. This finding suggests that significant investments may be being
made without real assessments of the risks and that some of these investments are misplaced.
Moreover, many organizations clearly have management problems, even when diligent efforts have
been made. Most failures, the new research finds, could have been prevented.

KEY FINDINGS

• IT service and data center outages around the world are not only common, suggesting most SLAs are very
often broken, and that outages may actually be increasing.

• The biggest cause of IT service outage is a data center power outage, closely followed by network problems,
and then by an IT system failure.

• Failures at third-party cloud, colocation, and hosting providers, when aggregated, are now the second most
commonly cited reason for IT service failure.

• Power failures accounted for 36% of the biggest, global public service outages tracked by Uptime Institute
since January 2016.
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

KEY FINDINGS
• 41 survey respondents reported an outage that had cost over $1 million. One outage cost over $50 million.

• Around a third of all reported outages cost most than $250,000.

• Most data center managers think that extending the life of a data center increases both risks and operating costs.

• Most IT service outages are preventable. 80% of respondents say that their most recent service outage
could have been prevented.

• Many organizations have little understanding of the likely financial and overall business impact of particular
IT service failures, nor have they carefully assessed the particular risks they face.

Uptime Data
Uptime Institute has three sources of data on the causes of outages or incidents that can potentially
lead to [Link] are:

• The Abnormal Incident Report (AIRs) database. This is a confidential system for Uptime
Institute Network members to report incidents under NDA. No information from this database
is included in this analysis.

• Uptime Institute Research has collected data relating to over 100 major publicly recorded
service outages since January 2016.

• The Uptime Institute 2018 Data Center Survey. The eight annual Uptime Institute Industry
Data Center Survey provides an overview of the major trends shaping IT infrastructure
delivery and strategy. The survey was conducted via email between February and May 2018,
and includes responses from 1,104 data center operators and IT practitioners globally, from
enterprise and service provider facilities. For the first time, this survey included many detailed
questions about outages.

This is the most comprehensive data set that has ever been collected on outages at an industry-
wide level. Although each of these research methods and the types of information they collect are
different, they combine to create a complex picture that suggests that outages are far from rare
and continue to be a major problem for operators. And as the publicly reported outages show, the
consequences of failure can be more expensive and damaging than ever.
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

Outages are neither rare nor declining

Uptime Institute’s survey found that almost one third (31%) of those responding (n=664)
had experienced an IT downtime incident or severe degradation of service in the past year.
Moreover, about half (48%) said they had experienced at least one outage in the past three
years either at one of their own sites or that of a service provider.

This is a very high number and is entirely at odds with most publicly announced availability
figures (usually > 99.9%). The result suggests that most service level agreements (SLAs) are
commonly broken (although they are usually phrased so that the operator pays only a nominal
penalty). It is also out of line with the number of outages reported by Uptime Institute Network
members, who very often report sustained periods without any site outages at all.

Perhaps more remarkably, the portion of operators suffering an outage appears to be

increasing. In the 2017 survey, “only” 25% had experienced an outage during the previous 12
months (among members of the Uptime Institute Network, this level was halved).

Explanations for this high outage rate are not clear. Possibly, increased complexity and
interdependencies of different systems and different data centers using ever more complicated
management systems may be increasing the number and impact of failures (remember that
most data centers handle more work each year). Uptime Institute has also found evidence that
failures tend to occur during periods of technology change and investment but also at sites
where there is under-investment and legacy assets are not upgraded. Some survey evidence
suggests that many sites fall into one of these two categories.

Power still the big challenge

What is the main cause of IT service downtime? Uptime Institute’s survey data show that the
biggest single cause of failure is the one that the industry invests so much to prevent – a loss of
on-site power, which was cited 93 times by 285 respondents who suffered at least one outage
(33% of all respondents suffered at least one such incident). This was closely followed by
network failure (30%) and an IT/software error (28%).

But there is a new factor reducing IT service availability. The failures experienced at third-
party service providers (colocation, hosting, or cloud) account for 87 incidents (31% of those
reporting an incident), which is only slightly fewer than on-site power failures at enterprise data
centers. Third-party failures have become a critical issue; in hybrid environments, CIOs need to
be as mindful of their data center suppliers as they are of their data center operations.
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

ANSWER CHOICES RESPONSES

On-premise data center power failure 32.63% 93

Network failure 29.82% 85

Software, IT systems error 27.72% 79

On-premise data center failure (not power related) 12.28% 35

Managed hosted services provider downtime incident 9.82% 28

Colocation provider power failure 9.47% 27

Public cloud or SaaS downtime incident 7.72% 22

Security-related 6.32% 18

Unknown 5.26% 15

Colocation provider failure (not power related) 3.51% 10

Total Respondents: 285

Source: Uptime Institute Global Survey of Data Center Operators and Managers, 2018

Uptime Institute’s data aligns closely with the causes of big, public failures recorded since early
2016. Power outages accounted for 36% of failures, followed by 25% for network issues and 22%
for IT/software issues.

CAUSES OF 100 MAJOR PUBLIC OUTAGES

(JAN 2016 - JUN 2018)
3
5
Power 6
Network
36
IT Systems

Security Issue

Not Revealed/Not
Determined 22
Fire

Fire Suppression

Cooling or Mechanical

25
Source: Uptime Institute, June 2018
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

These results support our view that power and facilities issues are most likely to cause problems
but IT and network problems tend to cause the most issues, because of interdependencies,
systems complexity, relatively difficult fault analysis, and longer recovery times. However, it is
rare that the initial cause of a problem is contained, so a power problem may soon become an
IT systems recovery issue, especially where multiple, interdependent databases are affected,
as is usually the case. Older, single-site transactional databases can usually be recovered more
quickly, for reasons of both design and relative simplicity.

Causes of 10 most serious outages

During the period from January 2016 to June 2018, Uptime Institute recorded 10 outages
that were “extremely serious,” meaning that they caused a very serious loss of revenue and
brand and potentially an existential threat to the data center operator or its clients (for a formal
definition and deeper discussion, see upcoming reports by Uptime Institute).

Primary cause of Ten big “category outages”

IT Systems 3

Network 3

Power 3

Cooling 2

Preventable outages
Uptime Institute frequently sees percentages of failures ascribed to human error. In our
experience, all major failures that happen in normal peacetime can be attributed to human error.
Uptime Institute has not visited all the sites concerned in this report, but the hands of the error-
prone operator, manager, and supplier are clearly evident.

A majority (80%) of the respondents to the Uptime Institute survey believe their biggest/most
recent outage (for those who had suffered one) was “preventable.” This result suggests, as Uptime
Institute always advises, that the most common cause of problems lies in processes and practice,
rather than architecture or equipment. But the survey also supports the view that cautious and
careful design at the outset does reduce outages (see 2N vs. N+1 below, as an example).
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

Costs of downtime
Historically, Uptime Institute has been wary of developing an average cost of downtime,
largely because it would vary so widely according the service and type of data center. When
applications are interdependent and spread across multiple data centers, it makes such
estimates still less reliable or meaningful.

Even so, in the 2018 survey, we did ask respondents to estimate the cost of downtime for individual
incidents they had suffered. The first and perhaps most surprising finding is that, in cases where a
significant incident did occur, 43% of respondents (n=271) did not actually calculate the cost at all.
As stated, this is not best practice, if only for assessing investment decisions.

The costs that were calculated and reported should worry any CIO. While half of the reported
incidents cost under $100,000, there were 39 outages that cost more than $1 million (15%).
Outages, it seems, are not common, but very expensive. Around a third of outages reported by
respondents cost over $250,000.

Cost ($ million) % Number of Incidents

<1 50.55 137

1-2 6.27 17

2-5 4.43 12

5-10 0.74 2

10-20 1.85 5

20-50 0.74 2

>50 million 0.37 1

Did not calculate 35.05 95

RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

Overall, the costs breakdown looks like this:

COST OF IT SERVICE OUTAGE ($)

2% 1%
1%

Under 100,000
11%
100,000 - 250,000

250,000 - 500,000
9%
500,000 - 1 million

1 - 5 million 50%
5 - 10 million
8%
10 - 20 million

> 20 million

18%

Reducing or increasing risks

In conducting this research, Uptime Institute wanted to know if certain strategies or
architectures were likely to lead to more or fewer outages. We considered three: distributed
resiliency, N+1 vs. 2N in power and cooling, and extending the life of (legacy) assets.

• Distributed Resiliency. This is often cited as the ultimate, and certainly the long-term,
solution to outages. Cloud and hybrid architectures enable risks to be spread across many
data centers. We found that 40% of respondents use high-availability, public cloud services,
and that a similar number, 41%, say they replicate data across two or more sites. This, it
seems, gives most a high degree of confidence in their solution. 60% say this has made them
more resilient. However, roughly 10% believe it has not, and roughly 30% say they don’t know.

• Are 2N architectures more resilient? By analyzing just those respondents with 2N

(as opposed to N+1 architectures) for cooling and power, we are also able to see if N+1
architectures, sometimes viewed as riskier, are any less reliable.
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

The results were clear: Of those with a 2N architecture, 22% had experienced an outage
in the past year, rising to 35% in the past three years (remember that this is for all failures,
not just on-site facility failures). But those with an N+1 architecture did not fare so well: 33%
said they had an outage in the past year, rising to 51% in the past three years – higher than
the overall average.

This response is clear and perhaps, given that 2N redundancy costs more and is designed
to reduce failures, somewhat expected: 2N architectures, according the data, are more
effective at preventing outages than N+1 solutions. The data, however, may change in the
years ahead, as N+1 systems, aided by management software, become more effective.

• Extending legacy asset life. First, we asked if respondents were operating data centers
beyond their expected life cycle. Of those responding (n-503), 34% said yes. Of this group,
approximately three quarters (in each case) believed this practice increases the likelihood of
an outage, increases operating costs, and reduces agility/introduces constraints.

Who’s in charge?
Given the growing complexity of issues, especially where multiple service providers are involved,
we were curious to know how organizations deal with risk/resiliency. Who is responsible for
managing and assessing risk?

We found that in most cases it is the CIO/CTO. But in a few cases (7%), it is a chief risk or
resiliency officer. This position may be more common in the future.

CIO/CTO

Data Center Team

Applications Owners

Risk or Resiliency

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

Summary
There is a widespread belief that advances in IT systems and software have made IT services far
more resilient, especially when coupled with highly engineered data center operations and well-drilled,
process-oriented facilities staff. But this research proves that the levels of complexity and sensitivity in
modern data center and IT operations, coupled with the high level of interdependence, may be working
against that trend. Operators, this survey suggests, are struggling to keep pace. The result is that
expensive and damaging failures keep occurring, and when they do, diagnosis and quick recovery can
be challenging. Over time, experience and new technologies will no doubt lead to improvements, but
diligence, investment, and planning are clearly required; it is, above all, a management issue.

SOME UPTIME RECOMMENDATIONS:

• Conduct regular system-wide resiliency analysis that spans data centers, power, cooling, connectivity, third-
party services, planning, and management.

• Review failures at other organizations. Almost all failures reported or researched by Uptime Institute have
happened before and are often well documented.

• Consider independent resiliency/risk analysis in the same way that your organization would carry out external
security analysis.

• Understand and model the costs of downtime, whether complete or partial, which will help inform investments
in resilient infrastructure.

• Ask service providers (cloud companies, applications service providers, hosting companies, colocation
providers, and carriers) to provide detailed risk/resiliency reports.

• Remember that downtime costs are not necessarily unplanned. Planned downtime can also cause serious
problems if not properly managed and with appropriate processes in place.

• Almost all downtime results from planning and investment decisions, coupled with poor processes or a failure
to follow processes. They may therefore be termed management failures.

• When negotiating SLAs with colocation providers, include measures and multiple avenues of action that go
beyond ‘standard’ contracts. These could include arbitration clauses and a limited number of unplanned events
over a set period of time (rather than a limit of their cumulative duration, for example).

Note: Uptime Institute’s 2018 Data Center Survey was conducted between February and May 2018, with a total of nearly 1500 respondents. Not all
the data collected in that extensive survey is in this report. Please contact Uptime Institute if you are interested in a more detailed breakdown.

Uptime Institute’s M&O Stamp of Approval for operating data centers has been shown to reduce failures (according to Uptime 2017 research). Please
contact Uptime Institute for more details. More information: contact alawrence@[Link].
Contact alawrence@[Link]
RISK &
RESILIENCY

Uptime Institute data shows outages are common, costly, and preventable

NOTES

Common questions

Recent surveys, including those by the Uptime Institute, indicate that the commonality and costliness of outages can be attributed to various factors. Key contributors include power failures, which were the most frequently reported cause, followed by network failures and IT/software errors . Complexity in management systems and interdependencies among systems also exacerbate the issue, as they increase the likelihood and impact of failures . Additionally, many outages are deemed preventable, with human error playing a significant role . The misalignment of service level agreements (SLAs) and the lack of comprehensive risk assessments further contribute to the frequent occurrence and high cost of outages .

The Uptime Institute suggests several strategies to reduce the risk of outages in IT services. These include adopting distributed resiliency through cloud and hybrid architectures, which spread risks across multiple data centers and increase service availability . Moreover, choosing 2N configurations over N+1 can be more effective in preventing outages due to their higher level of redundancy . Additionally, regular resiliency analysis, thorough risk assessments, and strategic management practices are recommended to enhance the reliability of IT services .

Increased complexity and interdependencies among system components paradoxically decrease IT infrastructure resilience. While advances in IT systems and software theoretically enhance resilience, the high complexity and interdependence levels can actually lead to more failures. These complexities make fault analysis more challenging and recovery times longer, as evidenced by the survey findings where these issues cause most outages . The interplay between complex systems and the extended downtime they can cause challenges the anticipated benefits of modern, engineered IT operations, reflecting the nuanced reality of IT management .

Cost assessment of outages plays a crucial role in improving IT infrastructure resilience. According to Uptime Institute’s findings, over 43% of respondents did not calculate the cost of significant incidents, a practice deemed suboptimal . Understanding and modeling the costs of downtime help inform strategic investments in resilient infrastructure, as recognizing the financial impact of outages incentivizes better planning and resource allocation . Cost analysis is essential for making informed decisions that enhance infrastructure reliability and minimize future risks .

Extending the life of legacy assets has several implications on data center operations and risk levels. Survey data indicates that 34% of respondents operated data centers beyond their expected life cycles, with three-quarters of these respondents acknowledging that this practice increases the likelihood of outages, raises operating costs, and introduces constraints that reduce agility . These impacts suggest that while financially expedient in the short term, extending legacy asset life could jeopardize long-term reliability and operational efficiency, thus increasing risk .

The survey highlights a significant discrepancy between publicly reported high availability figures (usually >99.9%) and the actual rate of outages experienced by IT operators. Around 31% of survey respondents reported an IT downtime incident or severe service degradation in the past year, and 48% had experienced at least one outage in the past three years, suggesting a higher-than-expected occurrence of outages . This may indicate that despite high service level agreements, real-world conditions often lead to breaks in service, revealing a gap between expectations and operational realities .

Power failures at third-party service providers pose a significant threat to IT service availability, nearly equaling the impact of on-site power failures. Uptime Institute's survey data reveals that third-party failures (such as those from colocation, hosting, or cloud services) accounted for 31% of incidents, which is comparable to the 33% caused by on-site power loss . In hybrid environments, this underscores the necessity for CIOs and decision-makers to carefully manage their relationships with external service providers, as failures in these arenas can critically undermine service reliability .

Survey findings suggest that human error is a significant factor in IT infrastructure outages, with most incidents being classified as preventable due to process and practice errors rather than architectural flaws . A majority of respondents (80%) believed their most recent outage could have been avoided, highlighting the critical role of human oversight and error-prone processes in such failures. This points to a need for improved training, management practice, and adherence to protocols to minimize these preventable outages .

Data centers with a 2N architecture, which offers complete redundancy by having twice the necessary infrastructure, tend to experience fewer outages compared to N+1 configurations that provide only enough redundancy to maintain operation during maintenance or failure of a single component. According to survey data, 22% of those with a 2N architecture experienced an outage in the past year compared to 33% with an N+1 setup. Over three years, this disparity increased, with 35% of 2N-configured centers experiencing outages versus 51% of those with N+1 setups .

In an increasingly complex IT environment, recommended practices for managing and assessing risk include conducting regular system-wide resiliency analyses that cover all aspects such as data centers, power, cooling, connectivity, and third-party services . Reviewing failures at other organizations and learning from documented incidents can also aid in formulating proactive strategies. Additionally, carrying out independent resiliency and risk analyses, understanding the costs associated with downtime, and enforcing detailed risk/resiliency reports from service providers are vital steps. Such diligence helps in building a robust framework for managing risks effectively .

MOSA Datasheet - EN
No ratings yet
MOSA Datasheet - EN
2 pages
Fundamentals of Availability
No ratings yet
Fundamentals of Availability
12 pages
CDCAP 02 Scoping The Audit - v1.2
No ratings yet
CDCAP 02 Scoping The Audit - v1.2
50 pages
Thought Leadership White Paper
No ratings yet
Thought Leadership White Paper
8 pages
Uptime Institute Data Center Survey 2022
No ratings yet
Uptime Institute Data Center Survey 2022
33 pages
Data Centre Operations Management Guide
No ratings yet
Data Centre Operations Management Guide
3 pages
Key Grounding and Voltage Considerations in The Data Center: E-Guide
No ratings yet
Key Grounding and Voltage Considerations in The Data Center: E-Guide
17 pages
WWW 4sight Com PK
No ratings yet
WWW 4sight Com PK
5 pages
The Unmaking of Broadband: Insights & Trends
100% (1)
The Unmaking of Broadband: Insights & Trends
100 pages
Data Center Cooling Fundamentals Guide
No ratings yet
Data Center Cooling Fundamentals Guide
12 pages
RD21DSR1 PDF
No ratings yet
RD21DSR1 PDF
7 pages
Seven Best Practices For Increasing Efficiency, Availability and Capacity: The Enterprise Data Center Design Guide
No ratings yet
Seven Best Practices For Increasing Efficiency, Availability and Capacity: The Enterprise Data Center Design Guide
24 pages
Green IT: Beyond The Data Center
No ratings yet
Green IT: Beyond The Data Center
8 pages
Battery Technology For Data Centers and Network Rooms Transcript
No ratings yet
Battery Technology For Data Centers and Network Rooms Transcript
13 pages
5.2 MW, Pod-Based Build, Chilled Water, 85000 FT: Reference Design 65)
No ratings yet
5.2 MW, Pod-Based Build, Chilled Water, 85000 FT: Reference Design 65)
7 pages
Green Data Centers: A Practitioner’s Guide
100% (1)
Green Data Centers: A Practitioner’s Guide
88 pages
Critical Power Solutions for Data Centres
No ratings yet
Critical Power Solutions for Data Centres
28 pages
Fire Protection - Industry Best Practices
No ratings yet
Fire Protection - Industry Best Practices
3 pages
Data Center Brochure
No ratings yet
Data Center Brochure
9 pages
EPI Computer Room Utilization Ratio
No ratings yet
EPI Computer Room Utilization Ratio
31 pages
TEKsystems Microsoft Onboarding Guide
No ratings yet
TEKsystems Microsoft Onboarding Guide
33 pages
Data Centre Security Control Checklist
No ratings yet
Data Centre Security Control Checklist
3 pages
Reliable Data Centres: Guide
No ratings yet
Reliable Data Centres: Guide
57 pages
Uptime Detailed SimpleModelDetermingTrueTCO
No ratings yet
Uptime Detailed SimpleModelDetermingTrueTCO
9 pages
Data Center Standards Overview
No ratings yet
Data Center Standards Overview
2 pages
Rack Powering Requirements Overview
No ratings yet
Rack Powering Requirements Overview
16 pages
IEC 62443-4-2 in OCPP 2.0.1 Implementation
No ratings yet
IEC 62443-4-2 in OCPP 2.0.1 Implementation
16 pages
ABB Voltage Limiting Device HVL 060-0.3 - Data Sheet 1HC0129170 AB en
No ratings yet
ABB Voltage Limiting Device HVL 060-0.3 - Data Sheet 1HC0129170 AB en
6 pages
Rittal Edge Data Center Solutions
No ratings yet
Rittal Edge Data Center Solutions
16 pages
Data Centre Setup & Maintenance Workshop
No ratings yet
Data Centre Setup & Maintenance Workshop
3 pages
1 19292 DCK Guide To DCIM Final
No ratings yet
1 19292 DCK Guide To DCIM Final
19 pages
Data Center Tier Level Requirements
No ratings yet
Data Center Tier Level Requirements
27 pages
Best Practices in Data Center Management
No ratings yet
Best Practices in Data Center Management
11 pages
Data Center Infrastructure Management Solutions
No ratings yet
Data Center Infrastructure Management Solutions
39 pages
Energy Management in Data Centres
No ratings yet
Energy Management in Data Centres
20 pages
High-Density AI Data Center Design
No ratings yet
High-Density AI Data Center Design
11 pages
Green Cooling Solutions for Data Centers
No ratings yet
Green Cooling Solutions for Data Centers
28 pages
Green Data Centres: Sustainable Solutions
No ratings yet
Green Data Centres: Sustainable Solutions
15 pages
Key Concepts in Data Center Design
No ratings yet
Key Concepts in Data Center Design
3 pages
77 KW, Tier 2, Direct Expansion, 916 FT: Reference Design 41)
No ratings yet
77 KW, Tier 2, Direct Expansion, 916 FT: Reference Design 41)
7 pages
Data Center Design and Specifications Guide
No ratings yet
Data Center Design and Specifications Guide
97 pages
Data Center Site Infrastructure Tier Standar Topology PDF
No ratings yet
Data Center Site Infrastructure Tier Standar Topology PDF
12 pages
Certified TIA-942 Design Consultant (CTDC) H7G61S: Audience Course Objectives
No ratings yet
Certified TIA-942 Design Consultant (CTDC) H7G61S: Audience Course Objectives
3 pages
Best Practices for Data Center Implementation
No ratings yet
Best Practices for Data Center Implementation
20 pages
EcoStruxure IT Advisor CFD Guide
No ratings yet
EcoStruxure IT Advisor CFD Guide
7 pages
Redundancy Pdu PDF
No ratings yet
Redundancy Pdu PDF
21 pages
Power and Cooling for AI Data Centers
No ratings yet
Power and Cooling for AI Data Centers
11 pages
Data Centre Efficiency 0
No ratings yet
Data Centre Efficiency 0
28 pages
The Energy Policy Act of 2005
No ratings yet
The Energy Policy Act of 2005
10 pages
Data Center User Requirements Case Study
No ratings yet
Data Center User Requirements Case Study
22 pages
Uptime Institute Data Center Survey 2024
100% (1)
Uptime Institute Data Center Survey 2024
31 pages
Examining Fire Protection Methods in The Data Center Transcript-SH
100% (1)
Examining Fire Protection Methods in The Data Center Transcript-SH
14 pages
Motivair's Direct Liquid Cooling Solutions
No ratings yet
Motivair's Direct Liquid Cooling Solutions
5 pages
Data Center Design Best Practices Guide
No ratings yet
Data Center Design Best Practices Guide
56 pages
AWS Infrastructure Management Blueprint
No ratings yet
AWS Infrastructure Management Blueprint
11 pages
40ft Energy Container Spec Sheet CAT 3512
No ratings yet
40ft Energy Container Spec Sheet CAT 3512
3 pages
2021 Data Center Outage Analysis Report
No ratings yet
2021 Data Center Outage Analysis Report
21 pages
Publicly Reported IT Outages 2018-19
No ratings yet
Publicly Reported IT Outages 2018-19
6 pages
Outrage 2023
No ratings yet
Outrage 2023
27 pages
AnnualOutageAnalysis2023 03092023
No ratings yet
AnnualOutageAnalysis2023 03092023
27 pages
DIA Time of Day Service Proposal
No ratings yet
DIA Time of Day Service Proposal
11 pages
AgroSuite Predictive Maintenance Solution
No ratings yet
AgroSuite Predictive Maintenance Solution
2 pages
Solutions June 2012 H
No ratings yet
Solutions June 2012 H
36 pages
Cloud Security Management Standards Guide
No ratings yet
Cloud Security Management Standards Guide
28 pages
PFMEA for Label Error Proofing
No ratings yet
PFMEA for Label Error Proofing
9 pages
HP Server AMC Tender Notice
No ratings yet
HP Server AMC Tender Notice
23 pages
Comptia: Exam Questions 220-1002
No ratings yet
Comptia: Exam Questions 220-1002
17 pages
Data Center Commissioning - What You Need To Know
100% (1)
Data Center Commissioning - What You Need To Know
72 pages
Case Study Conveyors
No ratings yet
Case Study Conveyors
3 pages
Instagram Server Status Update in India
No ratings yet
Instagram Server Status Update in India
1 page
KPIs Implementation Case Study for IT
No ratings yet
KPIs Implementation Case Study for IT
5 pages
Final MOBtexting - Business & Services Level Agreement - Domestic
No ratings yet
Final MOBtexting - Business & Services Level Agreement - Domestic
19 pages
VPNs, Security Breaches, and IT Support Tips
No ratings yet
VPNs, Security Breaches, and IT Support Tips
17 pages
Maintenance Performance Measurement Strategies
No ratings yet
Maintenance Performance Measurement Strategies
14 pages
MJ Logistics Gaming CRM System Overview
No ratings yet
MJ Logistics Gaming CRM System Overview
23 pages
Maryland Temp Staffing Bid FY24-OHR-001
No ratings yet
Maryland Temp Staffing Bid FY24-OHR-001
72 pages
FortiGate SD-WAN Configuration Guide
No ratings yet
FortiGate SD-WAN Configuration Guide
7 pages
AIOps in Banking: Enhancing Reliability
No ratings yet
AIOps in Banking: Enhancing Reliability
13 pages
Johnson Controls Data Centre Solutions
No ratings yet
Johnson Controls Data Centre Solutions
11 pages
TCO Comparison: IBM vs Oracle SPARC
No ratings yet
TCO Comparison: IBM vs Oracle SPARC
25 pages
GTAG 2 - Change and Patch Management Control
No ratings yet
GTAG 2 - Change and Patch Management Control
49 pages
EHV Transmission Maintenance Trends 2025
No ratings yet
EHV Transmission Maintenance Trends 2025
11 pages
Help Desk Support TORFP for ROW
No ratings yet
Help Desk Support TORFP for ROW
110 pages
Boost IoT Product Sales Strategies
No ratings yet
Boost IoT Product Sales Strategies
11 pages
The Total Economic Impact of Ibm Netcool Operations Insight (NOI)
No ratings yet
The Total Economic Impact of Ibm Netcool Operations Insight (NOI)
19 pages
Digital Technology in Wind Power Operations and Maintenance - v2 - PR
No ratings yet
Digital Technology in Wind Power Operations and Maintenance - v2 - PR
16 pages
CTN Gearbox Maintenance Analysis
No ratings yet
CTN Gearbox Maintenance Analysis
12 pages
Importance of IT Maintenance
No ratings yet
Importance of IT Maintenance
2 pages
Steam Turbine Commissioning Guide
No ratings yet
Steam Turbine Commissioning Guide
27 pages
GPM-F Field Ground Protection Module
No ratings yet
GPM-F Field Ground Protection Module
4 pages

Uptime Institute 2025 Outage Analysis

Uploaded by

Uptime Institute 2025 Outage Analysis

Uploaded by

RISK &

Uptime Institute data shows outages are common,

• Around a third of all reported outages cost most than $250,000.

Outages are neither rare nor declining

Perhaps more remarkably, the portion of operators suffering an outage appears to be

Power still the big challenge

ANSWER CHOICES RESPONSES

On-premise data center power failure 32.63% 93

Network failure 29.82% 85

Software, IT systems error 27.72% 79

On-premise data center failure (not power related) 12.28% 35

Managed hosted services provider downtime incident 9.82% 28

Colocation provider power failure 9.47% 27

Public cloud or SaaS downtime incident 7.72% 22

Colocation provider failure (not power related) 3.51% 10

Total Respondents: 285

CAUSES OF 100 MAJOR PUBLIC OUTAGES

Causes of 10 most serious outages

Primary cause of Ten big “category outages”

Cost ($ million) % Number of Incidents

<1 50.55 137

>50 million 0.37 1

Did not calculate 35.05 95

Overall, the costs breakdown looks like this:

COST OF IT SERVICE OUTAGE ($)

Reducing or increasing risks

• Are 2N architectures more resilient? By analyzing just those respondents with 2N

Data Center Team

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

SOME UPTIME RECOMMENDATIONS:

Common questions

What are the key factors contributing to the commonality and costliness of outages in IT infrastructure, according to recent surveys?

What are the key factors contributing to the commonality and costliness of outages in IT infrastructure, according to recent surveys?

What strategies or architectural choices does the Uptime Institute suggest could reduce the risk of outages in IT services?

What strategies or architectural choices does the Uptime Institute suggest could reduce the risk of outages in IT services?

Discuss the paradoxical impact that increased complexity and interdependencies among system components have on IT infrastructure resilience.

Discuss the paradoxical impact that increased complexity and interdependencies among system components have on IT infrastructure resilience.

What role does the cost assessment of outages play in improving IT infrastructure resilience according to Uptime Institute’s findings?

What role does the cost assessment of outages play in improving IT infrastructure resilience according to Uptime Institute’s findings?

Describe the implications of extending the life of legacy assets on data center operations and risk levels.

Describe the implications of extending the life of legacy assets on data center operations and risk levels.

How does the survey highlight the discrepancy between publicly reported availability figures and the actual rate of outages experienced by IT operators?

How does the survey highlight the discrepancy between publicly reported availability figures and the actual rate of outages experienced by IT operators?

How do power failures at third-party service providers pose a threat to IT service availability compared to on-site power failures?

How do power failures at third-party service providers pose a threat to IT service availability compared to on-site power failures?

What does the survey suggest about the role of human error in IT infrastructure outages, and how does it impact the preventability of these incidents?

What does the survey suggest about the role of human error in IT infrastructure outages, and how does it impact the preventability of these incidents?

How does the architecture of data centers, specifically N+1 versus 2N configurations, influence outage rates?

How does the architecture of data centers, specifically N+1 versus 2N configurations, influence outage rates?

What are some recommended practices for managing and assessing risk in an increasingly complex IT environment?

What are some recommended practices for managing and assessing risk in an increasingly complex IT environment?

You might also like