Uptime Institute today announced the key findings of its ninth annual Global Data Center Survey, the largest and most comprehensive in the data center sector. Findings show that the sector has become exceptionally complex as it continues to adapt to change. Demanding business requirements strain the ability of traditional mechanical and electrical systems to keep pace. Meanwhile, new challenges have arisen, including those associated with orchestrating hybrid architectures, staffing and skills alignment, workload placement decision making processes, and the desire to leverage innovative technologies without increasing risk.
Annually, Uptime Institute conducts its comprehensive global survey about the data center industry. Respondents are separated into two groups to assure perspectives are not mixed. The first group includes IT managers, owners and operators of data centers and focuses on their business platform choices. The second group includes suppliers, designers and consultants and those that service this industry. This year the survey was conducted between March and April 2019 and includes responses from nearly 1,100 end users in more than 50 countries.
“This year’s survey shows that the data center sector is finding it challenging to manage complexity. Most organizations have hybrid infrastructure, with a computing platform that spans multiple cloud, co-location and enterprise environments. This in turn, increases application and data access complexity,” said Andy Lawrence, executive director of research, Uptime Institute. “It’s an approach that has the potential to be very liberating – it offers greater agility and when deployed effectively, increased resiliency. But it also carries a higher risk of business services performance issues due to highly leveraged network and orchestration requirements. In a hybrid infrastructure, any of these failures can cause service degradation or complete service outages depending on how the hybrid architecture is designed. The survey reveals that the transition to these more diversified, dynamic architectures raises many issues around resiliency and business service delivery and that we need more management oversight, transparency and accountability at the business level.”
This year’s survey shows that in corporate IT, there is still a strong dependence on the privately-owned or operated enterprise data center, which currently accounts for more than half of all IT workloads. Capacity demand in the enterprise space is still growing, along with cloud and colocation data center buildouts. IT workloads are being spread across a range of platforms, with a third predicted to be contracted to external suppliers (such as cloud and SAAS providers) by 2021. And while enterprise data center capacity is actually growing in size, it is decreasing as a percentage of the total capacity needed.
Traditionally, most IT and data center managers have maintained availability at their primary data center through rigorous attention to power, infrastructure, connectivity and on-site IT replication. Today, distributed resiliency using active-active data centers is becoming more common; 40% of those surveyed said they use availability zones for resiliency, a strategy that requires at least two active data centers replicating data to each other.
According to respondents, modern architectures have made it easier to spread work more reliably and cost effectively across multiple sites, and with proper operational processes this can reduce overall risk and improve resiliency. Business processes and applications often span multiple data centers; even temporary unavailability can cause problems. A full 60% of respondents said their most recent significant data center outage could have been prevented with better management/processes or configuration discipline.
Lack of Visibility into Public Cloud
Public cloud operators are the most committed users of distributed resiliency and availability zones that promise to achieve high levels of performance and availability. Despite this, almost three-quarters of respondents indicated that they are not placing their core mission-critical applications into a public cloud for various reasons. Findings show that a lack of visibility, transparency and accountability of public cloud services is a major issue for enterprises with mission-critical applications. Over one-fifth of all managers said they would be more likely to place workloads in a public cloud if they had more visibility into the provider’s operational resiliency. Among those using a public cloud for mission-critical applications, almost half said they do not think they have adequate visibility, which increases their risk profile. The vetting process for third-party services, such as cloud, should go beyond a contractual service level agreement, and should include greater dynamic visibility, direct support during service degradations, more accountability and a better cost recovery plan when problems occur.
Outages Still Common
Outages this year map closely to 2018 findings, with just over a third of respondents having business impacts associated with some type of infrastructure outage or serious degradation of service in the past year. These outages increasingly span multiple data centers, and best practices dictate comprehensive and ongoing resiliency reviews of all company-owned and third-party digital infrastructure. Over 10% of all respondents said their most recent reportable outage impacted their organization in a very tangible fashion, citing more than $1 million in direct and indirect costs.
The sector is facing a staffing crisis, with 61% of all respondents saying they have significant difficulty recruiting or retaining staff. And diversity issues are becoming more impactful to the business as skilled resources become harder to recruit. A full 25% of managers surveyed had no women among their design, build or operations staff, and only 5% of respondents said women represented 50% or more of staff.
While power loss continues to be the single biggest cause of service outages, accounting for one-third of those outages reported, the aggregate of all higher-level IT stack issues (including the network) accounted for twice that percentage of service failures.