By Sriram NS, Senior Director, Technology, Tavant
The evolution of data and data platforms: Where did it all start? Since the advent of data computing in the 1960s, data has acted as the lifeline of any business. Whether it was about demographic records or sales numbers, industry figures, or customer statistics, data has always been one of the key tools in corporate decision-making. However, it is over the last couple of decades that we witnessed slow but remarkable progress from simple data-in-storage to multi-dimensional data-in-use. The world shifted from the challenge of finding or storing data to the problem of effectively and efficiently using it. And obvious enough, something that has changed faster than the nature or accessibility of data, or its use, is the technological aspect of it – what we call data platforms.
A data platform is an integrated set of technologies that collectively meet an organization’s end-to-end data needs and enables the acquisition, storage, preparation, management, delivery, and governance of data. It also acts as a security layer for users and applications.
Traditionally, though, in the early 2000s, enterprise data warehouses essentially acted as data platforms, which were responsible for the collection, categorization and storage of structured data, populated using ETL tools like Informatica and Talend. Data marts used to be built on top of these data warehouse models, which used to be a source for BI tools that helped in dashboarding. Basically, centralized data was sourced from multiple transactional systems to solve specific analytics use cases.
Example: Financial reporting for a manufacturer to show weekly sales with the ability to drill down to the category, product and customer. The sales details could come from an accounting information system (AIS), while the product information could come from an Enterprise Resource Planning system (ERP) or inventory management system (IMS), and customer information could come from a customer relationship management system (CRM).
However, soon with the onset of the data deluge, the existing solutions started becoming incapable of scaling with the volume, velocity or variety of data. The change in the approach of businesses to a data-driven, result-oriented and customer-focused system, and the need for digital transformation further increased the challenges that came with these traditional enterprise data warehouses. These concerns, along with the dawn of the Internet of Things (IoT) and Machine Learning (ML), quickly paved the way for the world of big data.
We saw the introduction of first-generation data platforms like Hadoop and distributed computing frameworks like Map Reduce. But Hadoop-based platforms had their own issues – scarcity of specialized skillsets, lack of standard tooling, many failed projects, colossal upfront investments, substantial maintenance overheads, delay in getting the ROI, etc. Companies also started facing an increase in compliance requirements like GDPR and HIPAA.
These concerns necessitated companies to look at solutions beyond Hadoop. Cloud data warehouses started to dominate the market, assisting in discovering the value of data. They enabled enterprises to derive actionable insights and business intelligence (BI) to make better data-driven decisions in near real-time, regardless of data volume, variety, or velocity. They became more than just aggregating data; they were also about having easy access to the right data for the right purpose at the right time. Some popular options are Amazon Redshift, Google Bigquery, Snowflake and Azure Synapse Analytics. These battle-tested platforms had balanced infrastructure costs, elasticity and flexibility, were readily available, could be easily integrated with source systems, were easy to maintain, and possessed out-of-the-box security and governance features. Standard patterns and a suite of tools, like the Modern Data Stack (MDS), to create data platforms centered around the cloud data warehouses started getting adopted.
But, the challenges do not end here. In fact, they have evolved along with the evolution of data or data platforms.
Data professionals – One of the difficulties that any company faces is the scarcity of data professionals. This is frequently because data handling tools have evolved rapidly, but most professionals have not. Hence, companies need to invest in the right resources with the right skill set. Organizations must also provide training programs for current employees to make the most of them.
Data quality and observability – The key to success is to get reliable data to the users. They should be confident in using the data to drive key business decisions. With complex data pipelines built by companies, it becomes critical to understand the data lineage and have a clear visibility of what’s happening to the data and why. This improves trust in the data.
Data Governance – As companies acquire and create more data assets, it becomes important to gain visibility and better control over them. Data Governance frameworks need to be adopted, which is not just about adopting a toolset. It requires a combination of people, processes and tools.
Data Architecture – Enterprise Data Warehouse can become a giant monolith. As the complexity increases, so does the challenge of maintaining an enterprise-wide data warehouse. Architectural patterns like data mesh need to be adopted by organizations. Data Producing teams should own their data sets. Datasets should be published by the owners using well-defined interfaces. These solutions will ensure low maintenance and high efficiency.
Today, what we see is a Modern Data Platform. It is not only a natural evolution from the Enterprise Data Warehouse but also a wider set of agile and future-ready capabilities, born out of necessity to sustain any business. In short, what’s necessary is to keep up with this progression and adapt to them with speed and accuracy.