Data Turns Big: What Next?
The intersection of big data with advanced analytics will not only help enterprises derive actionable insights from voluminous data but also contribute to creating the information architecture of the future, says Ashish Pachory
Consider this – the world creates approximately 2.5 quintillion bytes of data every day, and 90% of all data ever generated has been in the last two years! This data can originate from a multitude of sources – your posts to social media sites, application downloads, digital pictures and video uploads, mobile GPS signals etc. The result: a huge explosion in data volume and variety, leading to what is gaining eminence as big data.
With this tremendous data growth, and the variety and complexity surrounding it, it is imperative that traditional storage approaches be re-evaluated. The cost of storage, improvements in processing power, storage optimization, networking, have all come together to fulfill a human and business need that was always there, but could not be realized. That is now changing.
How big is big?
Big data is fairly omnipresent but one can’t necessarily put a number to it. It’s not like you crossed certain bytes of stored data and got yourself into BIG territory. The thresholds for big data depend not only on size, but on structure and even usefulness of data. So what defines BIG? The fact is that data was never really considered as an undisputed business asset till a few years back, and hence no standard ‘measures’ are available. However, now with the advancement of technology there is a real possibility of identifying the nuggets from vast amounts of data which is routinely generated.
It is predicted that data will grow 800% over the next five years and 80% of it will be unstructured. This colossal data growth (volume) puts serious constraint on defining the threshold for big data, as it makes big data volumes a constantly climbing target. However, its size depends on the needs and capabilities of the organization. While in some cases a few hundreds of gigabytes may trigger a need to reconsider data management options, in others it may take hundreds of terabytes before data size becomes a significant consideration.
In addition to this, what matters is the speed (velocity) at which data can be processed to provide real time insights. Whether it is analyzing geological information for timely earthquake warning or spotting market trends to get ahead of competition, speed is the essence of it. Big data is only useful if it can yield insights in a timely fashion. Greater data volume yields more granular information, but increases the processing time. Also, the expanding ocean of data cannot be expected to conform to a single format – it can be structured, unstructured, somewhat structured, or a mix of these. The remarkable fact here is that the multiplicity of data formats, sources, sensing/gathering mechanisms, computing platforms etc. (variety) do not present a hindrance to the processing of big data in a consistent way for analytical tools to make sense of it and present useful insights.
The combination of the three- volume, velocity and variety – makes it almost impossible for conventional analytics tools and methods to meaningfully manage and analyze big data to its full extent. We need new ways of organizing data, and new computing techniques and algorithms to bring computational power in line with data growth. In other words, big data is only as useful as the analytics deployed to exploit it. These analytics must overcome any size, speed and structure limitations. Maximizing big data’s full potential requires advanced analytics to cull and leverage data from inside and outside the organization.
This also requires a shift from the concept of a single enterprise data warehouse that earlier, used to contain all information needed for decisions. Multiple systems is the key, where specialists are involved at each stage speeding up processes and managing data faster, resulting in quick, efficient and better business decisions.
Analytics – the key differentiator?
Big data, as discussed above, cannot lend itself to relational databases and desktop visualization tools. Also, its size, structure and speed is a variable mix depending on the need and capabilities of the organization managing it.
So what is THE thing that makes big data so desirable? It is its intersection with advanced analytics. The world of Big Data Analytics (BDA) is quite different from our familiar world of data processing, management and analysis. Apart from its innate ability to juggle different data types, structures and I/O speeds, BDA has to work with completely new processing and programming models. It is an ecosystem that needs to be carefully planned and implemented; a combination of processing technologies all working in parallel on distributed servers. You cannot just buy an application to make big data analytics happen. It is an evolutionary process.
What the future beholds
The fact is that there is already an increasing sense of urgency around big data and as businesses establish faster and stronger connection with their customers, the case for big data becomes stronger. Its barriers are same as with any new transformational concept – ownership, RoI, skill sets, capabilities etc. But these are mere temporary hiccups and will go away as the technology matures to gain stakeholder confidence.
Big data enables you to dive deeper into more varied and voluminous records to yield actionable insights which could not be accessed earlier. As it is emerging concurrently with a host of complementary trends- cloud computing, social media, enterprise mobility etc. we may see a very new kind of convergence – which brings all of these trends together, to create THE enterprise information architecture of the future – quite different from today’s landscape of disjoint applications and databases somehow connected together.
Big data is here to stay and grow. It is going to be a key driver in enabling not just enterprise growth, but its very sustenance. And when this happens where would you rather be?
Ashish Pachory is CIO of Tata Teleservices Limited.