Working for Big Data
It is high time aspirants readied themselves for the multifarious jobs coming up in the domain of big data, says Sukhvinder Pal Singh
Big data is creating a buzz in the world today. Google Trends shows that interest in the term “big data” has been consistently increasing for the last 3 years and has been regularly reaching new peaks. A search for “big data” on Google throws more than 30 million results and over 100,000 news items.
Digital companies like Google, Yahoo, Facebook and Amazon are already leveraging the potential of big data and other enterprises are moving toward it. According to Gartner, by 2015 nearly 4.4 million new jobs will be created globally by big data demand and only one-third of them will be filled. There is already a demand-supply gap in the market and employers are willing to pay higher salaries to qualified professionals. All this is making big data an attractive career option.
A search for the keywords “big data” on a major US job portal shows over 8000 jobs in USA (with hundreds of different titles) from companies such as Intuit, Microsoft, Amazon and eBay. A closer analysis of these jobs shows majority of the jobs belongs to three primary categories of jobs: data scientists, big data architects, and big data engineers.
Data scientists
Data scientist has been termed as the sexiest job of the 21st century. As per McKinsey Global Institute, data scientists are so much in demand that there won’t be enough of them to fill every position by 2018. A data scientist is not exclusively tied to big data but demand for data scientists has grown exponentially with growth of big data.
Data scientists are professionals typically having a degree in STEM (Science, Technology, Engineering, and Mathematics) field, preferably with advanced degree in statistics, mathematics, analytics or machine language. Those with PhD, have further advantage. Data Scientists have deep analytical skills and are capable of analysing large volumes of data to derive business insights. To work with big data, they should also have an overview of the various big data technologies (Such as Hadoop, NoSQL, etc.).
The data scientist role can also be seen as an evolution of roles such as data analyst or business analyst. While a data analyst will typically collect data, look it from one dimension and publish reports, data scientist will look at enormous sets of data from all dimensions and publish insights which would have otherwise remain hidden. Data scientists should have strong business acumen to act as bridge between business and IT. Good data scientists will not just find the right solution to a business problem; they will also find the right problem to be addressed.
Typically, a data scientist will analyse the data to spot business opportunities, create a mathematical model and then convert that mathematical model into a pseudo algorithm for computerization.
A variety of trainings are available for those interested in making a career in data science, starting from short term courses in big data/ data science/ predictive analytics to M.S. In India, ISB provides a one year part-time course in predictive analytics. There are short to medium term courses available from IITs and IIMs also. Universities and institutes in the U.S. provide options for online and on-campus MS in predictive analytics.
Big data architects
Big data architects are solution architects specializing in big data technologies. They convert the vision of a data scientist into a technical blueprint. They are responsible for implementation of complete big data solutions, including platform selection, solution architecture, data acquisition, storage, transformation, and analysis. They should have a solid understanding of infrastructure planning, scaling, and administration considerations that are unique to big data products.
A big data architect generally comes with minimum 10 years of experience in the field of normal solutions architecture followed by training and experience in a variety of big data technologies and solutions like Hadoop, MapReduce, Oozie, Mahout, ZooKeeper, Hive, HBase, MongoDB, PIG, Ambari and Chukwa, NoSQL, etc. In addition, a big data architect should have experience in designing large data warehouse solutions and in-depth understanding of programming and scripting languages like Java, PHP, Ruby, Phyton and R. BI architects having experience in distributed RDBMS and ETL tools such as Informatica Microstrategy and Pentaho, etc. have an advantage. They should also be (at least) aware of cloud computing technologies.
Good solution and technical architects can switch over to big data with training in big data technologies. Big data engineers can also become big data architects with training in technical and solution architecture frameworks such as TOGAF. There are institutes and universities offering technical courses in big data. One such source, BigDataUniversity.com, can be a good starting point.
Big data engineers
Big data engineers develop computer programs based on the ‘solution blueprint’ prepared by big data architects. They implement, test and maintain the big data solution. They may also assist architects during the solution architecture and design phase. They should have experience in big data technologies being used by the organization e.g., Hadoop, MangoDB, NoSQL, etc. They also need to have experience in implementation of data warehouse solutions. They also need to optimize the solution to provide best performance.
Software engineers with 4-5 years of experience in object-oriented design and coding can move to big data with training in relevant technologies. Those with background in Java, PHP, Python, C++, SQL, NoSQL, ETL tools and data warehousing may have an edge. An engineering degree in computers provides an advantage.
Big data engineer aspirants can also go for similar training as specified for big data architects above. Organizations such as Hortonworks and Cloudera specialize in providing training on big data technologies.
Besides these three role categories, other role categories such as product managers and data visualizers are also in demand. The field of big data is wide and open; if you want to play the game, it is the time to put your socks on.
Sukhvinder Pal Singh is Director of Competencies at Fujitsu Consulting India.