Cloud computing services are growing rapidly in the enterprises, playing a dominant role in the digital transformation strategy. Most of the new investments in software applications are happening in the cloud considering the benefits of cost reduction, elasticity, agility, faster time-to-value, disaster recovery and reduced DevOps. However, the relative ease with which SaaS applications are being deployed means that enterprises are taking a best-of-breed application stack approach, and this is creating more data silos. Hybrid IT landscapes have become commonplace where on-premise operational applications have to co-exist with multiple cloud solutions. Integrating these IT systems that cross network boundaries and have varied data/application/security/service models to provide a unified view of the business remains an Achilles heel for many.
Data management teams are realizing that they would need to reboot legacy data platforms and generally accepted architectural practices. This would enable them to meet the continually evolving requirements of exploratory analysis, reporting, visualization and predictive analytics. Organizations may appraise numerous trends and choices before upgrading existing data integration infrastructure to seamlessly move data between in-house and externally hosted applications.
iPaaS: iPaaS or Integration Platform-as-a-Service suites are being widely adopted for data and application integration. They provide the benefits of managed services and elasticity. Ease-of-use, simpler pricing models and availability of integration marketplaces continues to be a huge draw for those flocking to iPaaS solutions. Many of the iPaaS vendors have been built from the ground up to solve cloud’s unique challenges and the general verdict on the efficacy of iPaaS tools to execute lightweight to medium complexity data integration scenarios remains favorable.
Self-service: All areas of analytics are undergoing a transition to a self-service model and data integration is no different. Traditionally data integration projects have been notorious for longer lead times and IT has often been unable to meet the data needs of the business users in a timely manner. IT managers are happy to crowdsource some of the integration work to the business. A new role called citizen integrators is emerging where users with minimal technical skills utilize friendly and intuitive, web browser-based drag and drop interfaces to integrate disparate systems themselves and quickly obtain the data they need for decision making or feed as training and test datasets for Data Science tasks.
API Centric: On-premise applications traditionally expose SQL interfaces; so legacy ETL pipelines have geared towards handling large tabular structures. In contrast, SaaS apps expose their data through REST and SOAP APIs. So, the data integration tools must understand web service protocols. Having out-of-box connectors to various SaaS applications can increase the speed of development and aid with ongoing maintenance.
Cloud Storage and Databases: Compared to a traditional Data Warehouse approach, Data Lakes are gaining more traction. In this approach, data is ingested from multiple sources in its native form into a data lake. To handle particular analytical use cases, data is prepped and pushed into a dimensional modelled warehouse for end user consumption. The data lake might itself be hosted on the cloud storage solutions that are most likely cheaper than on-premise storage. Integration platform should support connectivity to popular cloud Data Warehouses.
Multi-volume, multi-velocity and multi-latency: The data integration platform should be able to handle large volume of data and support computation frameworks like Apache Hadoop and Spark. It should support varied data velocities ranging from batch to streaming. The data environment should be able to orchestrate various integration use cases such as high-volume and bulk workloads, low-volume and micro-batch feeds, point-to-point and data hub workflows, asynchronous messaging, microservices, file transfers, data synchronization, replication and complex data transformation patterns.
The growing trend seems to be about supporting data governance with certified workflows while also empowering data savvy business users to avoid the IT queue and catalyze innovation by mining datasets drawn from a multitude of on-premise and cloud applications. Extending the data integration infrastructure that natively supports enterprise, internet and cloud-based data sources and targets is vital. Being able to implement self-healing processes that cover cloud-to-cloud, cloud-to-ground and ground-to-cloud integrations across a hybrid data environment in an agile and scalable way is the key. Modernizing Integration platforms also becomes crucial to benefit from machine learning algorithms that reduce the onerous tasks of mapping source and target fields, performing data quality checks, generate data masking recommendations and recognize patterns in the data in motion. Information workers would agree that the growth of cloud computing offers both challenges and solutions for the data integration tasks.
Authored by Shidhartha Sankar Pati, Data Professional and Development Manager, CDK Global