By Tathagato Rai Dastidar, Co-founder & CEO, SigTuple
The field of artificial intelligence in general, and machine learning in particular, has undergone a sea change in the last few years. The way machine learning works can be briefly summarized as follows:
– A parameter driven classification algorithm is formulated. This is used to derive “decisions” from “input data”. Some popular choices of classification algorithms include classification trees, support vector machines (SVM), and artificial neural networks (ANN)
-The classification algorithm is “trained” with retrospective “input data” where the “decisions” (either made by humans or made as part of a process) are already known. The algorithm “learns” which type of data leads to which decision. The “training” process involves tuning the parameter values of the algorithm.
-Given a hitherto unseen input data, the trained model can infer what “decision” it leads to. The “accuracy” of a model can be roughly defined as the proportion of cases where the model “gets it right”.
The “input data” in the above discussion can be rows of entries from a transaction database, images, audio clips, moves of a chess game – in short, practically anything.
While the basic underlying techniques, e.g. artificial neural networks (ANN) have remained more or less the same since the 1980s, three main factors have contributed to the exponential increase in machine learned model accuracy in the recent past.
#1 The advent of some nifty variations on top of the basic classification models which lead to better training, especially for unstructured data like images and audio.
#2 The widespread use of online collaboration platforms has made it easier to create bigger and bigger labeled training datasets, especially for computer vision related tasks. For example, the ImageNet dataset has 1.2 million training images, a number virtually unthinkable even ten years ago.
#3 The use of graphics cards, instead of CPUs for handling big matrix operations in parallel. With this technique, it is now possible to train models with tens of millions of parameters, within a matter of days – again an unthinkable feat a few years ago.
Armed with these new techniques, researchers are now breaking records on computer vision tasks practically every day, surpassing even human performance. IBM Deep Blue beat the legendary Kasparov at chess two decades ago. But Deep Blue (and most subsequent chess engines) employ a “brute force” method – evaluating all moves to choose the best one. Recently, a machine learned engine for chess has taught itself to play chess at international masters’ level, in a mere 72 hours of training, using a lot less computational power than conventional “brute force” chess engines. IBM Watson has defeated the human champions of the Jeopardy! game.
The question that naturally arises here is: Can we potentially automate every task that requires an objective decision making? The answer is, theoretically yes. The practical difficulty lies in creating enough training data for each task. In some cases, like building the brains behind a driver less car, the training data can be created relatively simply. A human driver needs to drive a car fitted with lots of cameras and sensors through typical and abnormal traffic situations for a long enough time.
Both the input data (through the sensors and cameras), and the decisions (the actions taken by the driver) gets recorded. In some other cases, it may be difficult to “create” training data and one may have to rely on retrospective data solely. Retrospective data can be difficult to obtain in certain situations, either because it was never captured properly, or due to privacy and security issues. However, with the proliferation of big data technologies, more and more data are being captured today in different systems, with the hope that it might prove useful in future.
It is only recently that these artificial intelligence techniques have started being applied to the medical diagnosis field. This can help in many ways – by making diagnosis faster, by making decision pointers available to doctors at diagnosis time, and by making quality healthcare available to a much larger population at a lower cost. Some of the obvious target fields are radiology image analysis and pathology slide analysis – both can be classified as computer vision tasks. Other areas could be analyzing the entire diagnosis and treatment process itself, to identify common patterns, which could later be used while treating a new patient. IBM Watson Health is targeting the latter area.
The main hurdle here is the generation of the training data. Vast wealth of medical data has not been digitized. Digitized data is sometimes hard to access due to privacy concerns. The data and the corresponding decisions are often not captured systematically. While for a normal computer vision task, the labeling (i.e. annotating the decision or class for a given image) can be done by lay persons, for medical data it requires trained medical practitioners. This can make the data generation prohibitively expensive and time consuming.
However, a lot of medical institutions are realizing the vast latent potential here, and are working with technology startups to create innovative solutions in this field.
Artificial intelligence also has its limits. Grand claims have been made, like “artificial neural networks mimic the functioning of the brain” and “consciousness is what goes on in a neural network of sufficiently high complexity”. I personally do not subscribe to these views, at least not with any level of certainty or conviction. We have not even begun to understand what “consciousness” really means, let along replicate it through a machine. We are not even sure whether the functioning of the brain follows a known mathematical process. Given all these uncertainties, it is too early to claim that artificial intelligence can parallel human intelligence for all tasks in the near future. Some of the main areas where humans excel, are learning from relatively fewer data points, making (mostly accurate) decisions in previously unseen situations (intuition), and applying learning from one task into another. These abilities are, at least at present, difficult to replicate by machines. Only time can tell what the future machines will be like.