At the recent EMC forum held in Mumbai, the company talked about how data has grown and continues to grow enormously, and it is not just the need to store this data, but to analyse it, that will drive its market in future.
Each day, the world sees a huge amount of data not just at the enterprise level, but at the consumer level with every individual who is digitally connected, contributing to it. Data volumes are already tipping the scales at petabytes and exabytes.
Quoting an industry study, Rajesh Janey, President, EMC India & SAARC says, “A recent study in India showed that data will grow from 40,000 petabytes in 2010 to 2.3 million petabytes by 2020. The focus will be on how to analyse this vast volume of data to generate some insight, inference and knowledge in a way that ‘what if’ scenarios can be defined rather than just taking decisions based on analysis of historical data.”
The company defines big data as large data sets that a traditional IT infrastructure cannot handle. The enterprise would need scale out systems as each system would have a lot of data to process and therefore, scaling up as the resource requirements increase would not help much.
Based on this approach, EMC is offering its Greenplum data analytics solution, which uses a Hadoop-based file system to handle Big Data. The company already has several Big Data customers in India, with one each in the BFSI, retail and government space, and two in the telco space.
Describing India as one of its key markets for Big Data, Janey says “Our biggest customer for Greenplum is from India, and we are in the process of signing up another one.”
While India throws up a lot of opportunities from a big data customer base point of view, EMC believes that the country will also prove to be a key source of skilled resource that can be trained to work in big data environments. With Greenplum, though it has the database and the analytics platform, the company believes that Big Data will create a need for data scientists who can understand the business analytics and data generated out of solutions such as Greenplum, and help build applications that will address specific analytical requirements of an enterprise.
Their core job would be to look at the data being churned and co-relate that data, such that the platform will allow the enterprise to take decisions in a predictive manner. However, the country currently does not have training courses that provide the specific skill sets needed to become a data scientist and this is where EMC is looking to leverage on its long established training and certification practice in India.
EMC launched the ‘EMC Academic Alliance’ in 2005 with a course on Information Storage Management. Designed as product independent industry standard course, it was aimed at providing technology skill sets to engineering students that could help them take up jobs in the area of storage.
The company signed up with several universities for this. According to Janey, this was around the time when for instance, an IDC study suggested that there is going to be a shortage of 100,000 storage professionals.
Over the last seven years or so, the company has had more than 200 colleges signing up for the alliance and more than 100,000 students have passed out. These students get absorbed by the entire ecosystem, which could include EMC itself, its partners and customers and sometimes even competition.
Says Janey, “A recent Nasscom study put the Big Data market in India to close to a billion dollars. However the same organisation had also recently observed that though we have engineers in the country, almost 75 % of them are not employable.”
“This is precisely the reason we are bringing a Cloud and data scientist curriculum in India. We launched the new course at the EMC Forum, and we had three vice chancellors from different universities and a total of six universities being represented, for the launch,” he further informs.
The curriculum is currently designed for engineering students with a background in mathematics and statistics, as a data scientist would be a professional with skill sets overlapping that of a statistician, programmer and visualizer.
Owing to the growth in digital information, with social media and mobile devices driving a lot of it, the industry believes that Big Data will slowly but steadily become an important approach. Going beyond human interactions, machines too are today, generating data logs and digital information which when put into perspective can be used to predict patterns and find business application for this data. Janey believes that besides data generated through business and consumer interactions, machine data will be a big contributor to Big Data related analytics.
For instance, according to him, traffic police can use cell-phone movement data to track traffic jams. Cellular phone service providers can track cell-phones as they move between cells and therefore can track their location. During peak hours, a collection of stagnant cell-phone signals on a road can indicate many cell phone users being stuck in traffic. This data can be collected from various locations and analyzed to provide a real-time traffic analysis to both traffic police and the general public.
Furthemore, a telecom VAS provider dealing with micro-payments for media downloads can cross-sell or up-sell his services. For instance, if a user frequently changes his ring-tone, and if a pattern can be drawn to determine the genre of music that he prefers, the service provider can push targeted promotional offers and discounts to the user for downloading ring-tones of that genre of music.
On the other hand, Big Data can have social impact too. For instance, data collected from various schools in rural areas can be used to analyse why they are seeing high drop out rates. Janey states an example of a village school in India that saw a high drop out rate among girl students because it did not have adequate toilet facilities for them. This data did not show up in regular surveys but showed up when unrelated data points were taken into consideration and analysed using Big Data analytics.