Data engineering – State of the union & emerging trends

By Radhakrishnan Rajagopalan, Head of Data Engineering & Technology Practices, Tiger Analytics

Future-proofing your data infrastructure: Data engineering trends to watch

While Data Engineering has always been the nerve centre of any enterprise, major headwinds brought about by Generative AI (GenAI) this past year or so, pose interesting challenges to the data infrastructure set-ups put in place by enterprise teams. We’ve seen a significant increase in data volumes, especially with multi-structured data, and data engineers have been grappling to support AI/ML models from design and development to training to production deployments. All in all, data processing and data quality continue to underpin the success of any AI model.

As per Gartner’s 2023 hype cycle for emerging technologies – GenAI, AI/ML will continue to influence enterprises across their data value chain, playing an inextricable role across all aspects of business. The sheer volumes and availability of real-time and near real-time data and the interest to gather insights as things unfold – signal the need for a GenAI-integrated approach to Data Engineering. Based on our conversations with various clients and observations across the industry, here’s our take on the Data Engineering trends to watch for:

1. Generative AI will accelerate data platform modernisation.

We’ve seen businesses migrate from one technology to another in search of differentiated platform capabilities and for cost optimisation. Legacy code migration to a new platform on Cloud is likely to continue to be a major focus area for businesses, but with the intervention of well-trained Generative AI LLMs, innovative new models are now able to generate high levels of accurate code, at unprecedented levels, outpacing every other tool in the industry so far. Several of our clients have opted to move from on-prem legacy platforms to native cloud-based data processing and management solutions and are leveraging GenAI capabilities and solutions.

2. Data Observability will gain traction with anomalies detected from real-time and near real-time data

Data Observability is an emerging space with clients looking to observe data, pipelines, platforms etc., more closely than before. Data can be observed with data quality and monitoring solutions that continue to dipstick data pattern changes in real-time and propose actions. Processes and pipelines can also be observed by tapping into the changes during the course of the process and gathering insights. At Tiger, we developed a GenAI solution, for a global financial services giant to solve data and platform observability problems. This solution engages GenAI models, learning data patterns on the fly and helping detect anomalies in real-time data.

3. FinOps for better cost optimisation

Cloud Cost Management is a very important focus area for many organisations. Organisations are looking for ways to manage the cloud spend by enabling better controls across the data ecosystem. With the variety of services available in the Cloud, cost related challenges are likely to persist. GenAI models have the strong ability to analyse data, understand patterns and make right recommendations. We’ve developed a solution that feeds on cloud consumption data along with other logs to a GenAI model to generate cost-saving insights and our solution generates substantial cost savings.

4. Data governance and management through Generative AI

Data cataloguing solutions capture a wide variety of metadata, helping users to explore and discover data and come out with newer insights. While this is an excellent solution, the process of building, managing and democratising metadata is very complex and requires manual interventions. Business users look for business definitions of attributes, while databases and data extraction and transformation tools typically gather only the technical metadata. There are other use cases like metadata search, enriching metadata with owner information, sensitive data information etc. that demand such manual interventions too. Tiger has leveraged Gen AI to solve some of these complex problems with data models and being able to link and enrich metadata in a robust way.

5. Data democratisation through chatbots and more

In most enterprises, data is consumed and analysed through a wide variety of channels – SQL Queries, BI tools, API-based applications etc. While these meet most needs, other business users look for easier and more robust solutions. Data Democratisation measures are likely to get a major boost with GenAI. It has capabilities to address user needs with intelligent chatbots that can help users interact with a wide variety of data. GenAI models can be trained to learn the data models and nuances in business, identify business context and also identify gaps. When a user engages with such a solution, the underlying model understands and generates the equivalent query, which executes and produces appropriate results. Businesses across the spectrum (Financial Services, Insurance Companies, Hi-Tech etc.) can benefit from such solutions.

6. Better knowledge management and collaboration through enterprise knowledge bots

Technical and business communities need help to extract and leverage that knowledge from enterprise architecture documents, business information documents, technical flows etc. This need for collaboration and knowledge exchange will continue to lead clients towards user-friendly knowledge management (KM) solutions. At Tiger, we ‘ve seen clients prototyping knowledge bots for their communities that can serve the information from the unstructured content in a meaningful way. A conversational interface helps non-tech team members easily interact with enterprise content via chat. With the growth of GenAI, we are likely to see many enterprise clients build knowledge bots to generate insights from these untapped data sources.

7. The evolution of the full-stack data engineer

With the advent of Gen AI and other converging technologies – the demand for Data Engineers will increase. But the role itself is likely to evolve. While the focus on big data, cloud & modern BI specialists will continue to be the core requirement for organisations, we’re likely to see Data Engineers take on full- stack roles – with end-to-end data pipeline expertise cutting across heterogeneous tech stack, a holistic view into data architecture and their experience with varied technologies will give them a definite edge over their peers.

As per Forrester’s 2023 advisory, Augmented AI, MLOps and DataOps are key to successfully operationalising AI. At Tiger Analytics, here are a few of our observations…

The rise of augmented and metadata-driven data management

The use of metadata is becoming increasingly important to manage large data assets effectively. Augmented data management uses AI and Machine learning to improve the metadata that drives data management processes. Augmented Analytics is about leveraging AI and ML to automate data preparation and enable deeper insights is a growing trend. We’re seeing a lot of interest among our clients to leverage such solutions and augment them with their enterprise data management solutions. Since augmented analytics can help uncover hidden patterns and provide actionable insights without the need for expert data scientists, we’re likely to see a more enthusiastic adoption across the next few years.

DataOps and MLOps as levers for scalability

DataOps and MLOps are operational frameworks that apply the principles of DevOps to data analytics and machine learning, respectively. They emphasise automation, continuous integration/continuous deployment (CI/CD), and a collaborative approach to managing data workflows and machine learning models, leading to faster and more reliable outcomes. Given the need for businesses to scale up and streamline their processes, we find interest among our clients to cross-leverage their knowledge, experience, and investments in the space of release planning and management into these areas.

data engineeringGenerative AI
Comments (0)
Add Comment