By Prof.Smitha Rao, School of Computational and Data Science, Vidyashilp University
The knowledge of anything, since all things have causes, is not acquired or complete unless it is known by its causes. – Ibn Sina
Data Science is the science of extracting information and value from data to enable data-driven decision-making. In essence, it’s all about using data to solve real-world problems. As an interdisciplinary science, Data Science borrows its theories and practices from Statistics, Mathematics, Computer Science, and various other application domains such as Economics, Social Sciences, Psychology, Health Care, Business, and Finance, among others. Technologies that fuel the advancement of Data Science are Artificial Intelligence, Machine Learning, Deep Learning, Reinforcement Learning, and Natural Language Processing, to name a few. Data Scientists aim to draw reliable predictions and conclusions through the thorough analysis of massive quantities of current and historical, multivariate data.
Recent advancements in the availability of data, nascent technologies, computational power, and a high level of maturity of AI algorithms have made data science applicable to every domain. Apart from leveraging Data Science to unlock value for business applications, today, it is considered an indispensable and integral component in the field of Social Sciences, Drug Discovery, Life Sciences, Molecular Biology, and the like. Artificial Intelligence, in particular, Machine Learning has delivered coveted contributions to various scientific research and development initiatives. This article explores two use cases, wherein data science is employed for the betterment of life and society – Data Science in Development Economics, and Data Science in Proteomics and Drug Discovery.
Data Science in Development Economics
Piloting national censuses is a very expensive and elaborate task. Many developing and under-developed countries have conducted only a few poverty surveys, resulting in a lack of reliable data for policymakers and researchers to innovate robust solutions. These impediments have paved the way for the utilization of machine learning algorithms to predict various parameters. E.g. the prediction of poverty levels using alternate correlated data, in particular the use of mobile phone records, and high-resolution satellite imagery data to identify features such as metal roofs, paved roads, night-time (nocturnal light) images, etc. to identify areas with higher incomes and accurately predict wealth. A host of AI algorithms are used to detect, manage, and predict various other socio-economic outcomes such as agricultural yields, weed detection, literacy levels, etc.
Data Science in Proteomics and Drug Discovery
Proteomics is the large-scale study of proteins and their structure. Proteins, present in every cell of the human body, are the fundamental building blocks of life. The folded three-dimension structure of a protein determines its functionality. Predicting the 3D protein structure from its sequence has been a long-standing and intensely researched problem owing to the large number of proteins and the computationally intensive nature of the problem. Data in this domain is massive and highly complex. Today, Deep Learning Algorithms like AlphaFold have solved this challenging problem by predicting the structure of proteins, given their sequence, with reasonably high accuracy. This has paved the way for relevant drug discoveries and progressive research into diseases caused by aberrant protein structures, such as Alzheimer, Parkinson, etc. Machine Learning also contributes to clinical research by increasing the efficacy of the pre-trial phase, data analysis, participant selection, and management.
Data Science generally relies on credible and unbiased data, which is not always available. Also, AI predictions are not 100% accurate. Based on this uncertainty, reliable systems are defined as those that support active interactions between humans and machines. This mechanism is termed the “Human-In the Loop” (HITL) system. HITL allows machines to adjust the system through constant feedback from humans, which results in a reliable system and hence optimizes the learning process. We believe that the augmented power of AI will enable us to find tangible solutions to most of the problems plaguing the world today.