Staggering explosion in unstructured data causing a lot of challenges for organisations: Amy Fowler, VP and General Manager , FlashBlade, Pure Storage
By Aaratrika Talukdar, Senior Correspondent, Express Computer
The rise of mobile technologies, cloud computing, machine learning, and IoT, and emerging technologies such as AI and VR-based applications are resulting in exponential growth of unstructured data. According to Gartner, unstructured data growth rates have hit 30 percent per year, which means total unstructured data volumes will almost quadruple by 2027. Businesses that can successfully leverage unstructured data will be able to innovate and adapt to future conditions with more agility and fewer wasted resources.
The demands of unstructured data, the complexity of management, and the rigidity of traditional silos platforms have necessitated a fresh approach to how one stores, manages, and extracts value from this data.
To discuss the Implications and challenges for organisations with respect to the staggering explosion in unstructured data and leveraging unstructured data for continuous innovation and successful digital transformation, especially in healthcare, financial and retail sectors, Amy Fowler, VP and General Manager, FlashBlade, Pure Storage speaks to Express Computer in an exclusive interview.
She also highlights why the ability to not just backup data fast but to restore rapidly and at scale during a ransomware attack is critical and what should be the sustainability considerations in storage strategy lead to reduced energy consumption.
Please tell us more about Pure Storage.
Pure Storage has a portfolio of storage-related platforms and solutions intended to help solve a broad set of customers’ business challenges in applications and use cases ranging from their database environments to their analytics environments, advanced technical compute and AI environments, and data protection and rapid restore requirements. One of our platforms is called FlashBlade. FlashBlade is a scale-out file and object platform really aimed at addressing the requirements associated with customers’ unstructured data. Pure Storage delivers not just software or standalone hardware but innovates on both so that our customers are able to get the best out of the storage platform. We do our design and development across three major sites: at our headquarters in California as well as our significant sites in both Prague in the Czech Republic and here in Bangalore, which is our newest and very fast-growing site that was started just about a year ago.
What are the implications and challenges for organisations with respect to the staggering explosion in unstructured data?
The explosion is causing a lot of challenges for organisations, and these can fit into a couple of different categories, related to the environment, people, and getting the most out of the data itself.
So, let’s start with the environment. According to leading analysts like Gartner, there’s going to be a three times explosion of unstructured data just in the next three years. That translates into petabytes that need to be stored somewhere. There’s a physical implication of this that’s really significant and there’s a tremendous amount of pressure on organisations to minimise their energy usage and the associated space and cooling that comes with major pieces of infrastructure to store petabytes worth of data.
So, because of our hardware and software co-design, we were able to develop a very differentiated offering for our customers that is significantly better than a disk-based architecture which not only takes up a huge amount of space but also has severe implications for energy consumption as spinning media gets hot very quickly.
But even in comparison to other modern flash vendors, we are still at least 60 percent more direct energy efficient with our FlashBlade//S because not only do we architect this hardware, but inside of it we create direct flash modules. These are like our own flash drives so we don’t have to depend upon the density increases that the flash vendors provide off the shelf to everyone else.
We are currently shipping 48 terabyte direct flash modules in just a single chassis of our FlashBlade//S which supports about two petabytes worth of data storage capacity. So, density is the number one problem that we solve.
Problem number two for customers with this explosion of unstructured data is the manageability of the associated infrastructure. This data comes in different forms. Sometimes it’s file data, sometimes it’s object data. Sometimes it needs billions of super small objects or a whole big sequential set of large files. So, there’s some diversity in that unstructured data. If the customers had to deploy a platform for every single type of their unstructured data, they’d end up with silos that would need people who know how to manage a whole bunch of different silos. Traditionally, unstructured data platforms have not been known for their ease of use. They’ve tended to be very complex and require a huge amount of expertise. They take a lot of time to set up, and so we are focused on ensuring that organisations can handle the growth of their data in a simple and unified platform that can serve both fast files and fast objects at a significant scale.
The third objective that customers have is – I’m storing all this data, can I get some insight out of it?
Can I get some value from it? The way our software architecture works is optimised for speed to insights so that they can actually go across so many petabytes of data, billions of objects, and find those needles in the haystack helping them find out what customers need to know quickly to be able to make smarter, actionable business decisions. These are the three major areas that Pure Storage is addressing for our customers when it comes to some of their challenges associated with the explosion of unstructured data.
How are organisations leveraging unstructured data for continuous innovation and successful digital transformation? What sectors are using unstructured data the most?
Let me share two examples from different sectors. One would be in unstructured data in a log or search analytics-type environment. If you’re a retailer that wants to be effective digitally then you need a rich, effective, and fast search environment. In order to have that, one needs a huge number of data sets for correlations and training and that needs to happen so that when somebody is on your retail site looking for something, they expect to get a good match quickly because we as consumers don’t have a lot of patience when it comes to search results.
An elastic structure environment would be an example of a huge amount of unstructured data in it and the retailer would have to ensure that as that environment grows and scales, they can continue to meet and deliver the performance insights of their end-user application.
Another great example of unstructured data is medical images associated with healthcare. If you have the throughput performance, instead of storing everyone’s medical images, you can go back through and correlate images of a certain diagnosis and use it to train a system to identify and triage more likely cases of a certain diagnosis, then you can speed that time to act for the physicians. There’s no replacement for the physicians themselves, but this is a tool to help them get a better sense of the patient’s problem. This is even more relevant in something called digital pathology when you’re actually talking about that type of medical images versus file types such as X-ray images because in digital pathology you’re able to then actually have the same consistent set of data that are reviewed and compared as opposed to just the human eye trying to diagnose and make a match against a particular diagnosis.
Why is the ability to not just backup data fast but to restore rapidly and at scale during a ransomware attack critical?
We are seeing a huge shift in dialogue in organisations to how fast we can back up and restore data at scale. When we talked to a customer 5 to 10 years ago, they would say: “Why would there be a situation where we might need to restore hundreds of terabytes or petabytes at once?” They really struggled to identify a likely scenario.
But with the rise of ransomware attacks over the last couple of years, organisations are facing situations where their entire environment is compromised and it’s propagated to the second site. So, they’ve got to go back to a clean point in time with hundreds of terabytes or even petabytes that they need to be able to get back super-fast. This requires a tremendous amount of throughput, which really is best delivered by a scale-out system. Backups tend to be effectively big unstructured data objects, and we are seeing a significant increase in the number of data protection software vendors shifting to objects as a target protocol. Pure Storage added one more layer of protection for our customers a few years ago called SafeMode snapshots. With SafeMode, not even an insider can meddle with the snapshot. That’s step one.
Step two is that Pure FlashBlade delivers extremely fast restore capabilities which allow the company to restore business operations in a matter of hours in the event of a ransomware attack.
Tell us about Pure’s latest innovation in Unified Fast File and Object (UFFO) – FlashBlade//S and how it addresses the demands of unstructured data and modern application growth.
In June 2022, we introduced FlashBlade//S, which is the next-generation platform for unified fast files and objects for all of this unstructured data. This presents a tremendous leap forward in terms of density and efficiency in terms of scale, supporting up to 10 petabytes in a single global namespace. Inside those 10 petabytes, we can support up to hundreds of billions of files and/or objects, depending on the application and use case. We generally see this being most applicable in unstructured data environments that are associated with analytics, AI, or other forms of commercial HPC or technical computing. Some examples include computational fluid dynamics that are being used by a Formula One team, market simulations in Fintech, and simulations for chip design and manufacturing or an electronic design automation environment. There are a lot of performance-intensive simulation environments. But on the back end, the data profile and the requirements of the architecture are somewhat similar as well as the ability to address the Rapid Restore ransomware recovery type requirements for the customers. On top of that new FlashBlade architecture, we will continue to innovate on their feature set and the types of additional enhancements that they can deliver to their customers in terms of protocols and data services in particular.
What are the sustainability considerations in storage strategy leading to reduced energy consumption?
Since our founding, Pure has invested in building sustainable and highly efficient products and services that not only contribute to the cause but differentiate us in the market. We issued our inaugural ESG report in 2021, gaining insights along the way that we believe can help enterprises decrease their carbon footprint while advancing digital transformation.
We also know that mitigating the environmental impact of data infrastructure is critical as data workloads increase. Organisations need to be building sustainable models for the future with data storage engineered to require significantly lower power, lower cooling, and far less waste, as well as having the potential to make a significant and immediate impact on reducing global data-centre carbon emissions. This is exactly what we’re doing at Pure.
The environmental benefits that Pure delivers through our products and services result from a combination of technology, design philosophy, and a ruthless focus on driving the best outcomes for customers. Our core technologies integrate software and hardware architecture to deliver not just unmatched density, longevity, and efficiency, but to continually improve and drive further efficiencies over time. On average, Pure’s products deliver up to 80 percent less energy usage.
Please share with us Pure Storage’s India growth journey
Pure was founded with innovation at its core and a customer-first mindset. With about 20 percent of revenues reinvested back into R&D activity, we are able to deliver industry-first solutions that ultimately enable customers to do more with data. Our India R&D site is designed to fuel Pure’s innovation engine, and further, accelerate time-to-market. It plays a key role in Pure’s global innovation strategy. Our teams in India aren’t just extended arms of the teams in the US but have responsibility for whole product lines. We are investing significantly in India to continue to develop FlashBlade and we hope to be able to announce some significant breakthroughs from the team here soon.