Reinforcement learning: An exciting paradigm of AI

By Dr Kavita M Kelkar. Associate Professor, Department of Computer Engineering, K J Somaiya College of Engineering, Somaiya Vidyavihar University, Mumbai

The latest exciting paradigm of Artificial Intelligence (AI) is Reinforcement Learning!! Think that an artificial intelligent agent learns on the similar lines as a human being learns… Remember the good old childhood days when the child in us explored the world around and learnt. The child makes some mistakes in achieving a task and gets penalized. But the child achieves the set goal and gets rewarded. The reward gained by the child makes it learn what the good actions that achieve the goal are and the punishment makes it learn what actions to be avoided. This is most natural form of learning. Reinforcement learning (RL) is the branch of AI which emulates this behaviour in AI based systems. In effect it is a science of decision making.

Consider the example of a Chess playing AI agent which plays against a human player. When no supervisor is present and AI agent is playing against an expert human chess player, initial games would be lost by the AI agent. By RL algorithm of self-learning, this RL agent will be penalized for wrong choice of moves which resulted in losing the game. It will also be rewarded for good moves. After certain number of plays of the game against human player, RL chess playing agent will learn the good actions which result in winning the game. And afterwards it would be difficult to crack this RL chess playing agent! Really stimulating that RL has such powerful algorithms of self-learning and enhancement.

So how does all this learning happen? The RL algorithms believe that goal is described as maximisation function of the rewards. And RL agent should strive to achieve the goal. The typical elements of RL systems are
a) RL agent
b) Environment with which agent interacts
c) State of the agent
d) Policy which decides the actions of the agent
e) Reward function

For the chess playing RL agent, the environment is the chessboard, various types of pieces on the chessboard and the rules of movement of the pieces. Policy is set of actions that RL agent can take. State of the agent is representation of the current situation of the play. The reward function is based on the goal of the chess game. The game of chess with description of all mentioned elements is model based representation. This kind of representation is called as Markov Decision Process (MDP). The RL agent will learn through the dynamics of the chess environment via outcomes (rewards / penalty) of the game.

In the RL paradigm, certain problems may not be represented as model. This is called as model free representation. In this format, agent tries to build a policy which will make it learn the dynamics of the problem. It does not concentrate on environment. This approach is particularly useful when environment is huge, random and with too many parameters. These algorithms can be value-based algorithms where value of the state is calculated to be able to take the optimal action. Or the algorithms can be policy-based where learnable weights are associated with various actions of the policy. The agent learns optimal actions after long cycle of training. We have popular approaches called as Q-learning, SARSA, Temporal difference (T-D) learning under model-free RL. Consider the example of Autonomous vehicle driving where RL agent drives the vehicle. Vehicles are driven on the roads where there can be many unpredictable situations. The model of such environment can be very complex. The approach of model-free is especially useful for such problems.

As emerging and promising branch of AI, RL has lot of potential in solving futuristic problems. This paradigm looks at problems as a whole problem and optimization principle gets applied. This is important for decision making problems. RL techniques do need huge datasets. RL approaches are useful when the environment of the problem is stochastic, unpredictable or unknown. Of course some issues like handling of delayed rewards, extensive training time have to be dealt with. The technology has advanced to deep reinforcement learning which uses deep neural networks to tackle the complex problems that RL can solve. Definitely Reinforcement Learning is an extremely trilling and challenging for all technology enthusiasts to watch out for.

AIITtechnology
Comments (0)
Add Comment