Pranav Kharde

Reinforced learning

Until now you might have heard about Machine Learning, Artificial Intelligence and Deep Learning. But have you heard of Reinforcement Learning? No. Well, you are in bit of a luck today. We are just going to cover that.

Just googling the term “Reinforcement Learning” you will come across this standard Wikipedia definition

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

Confused? Let’s get rid of technical jargons and break it down to understand in a simpler term.

Reinforcement Learning or RL comes under Machine Learning along with Supervised and Unsupervised Learning. A agent (In RL, models are referred as agents) has to complete a certain task in a given environment while maximizing the reward.

Imagine you are trying to learn a song on a piano with no help whatsoever (no book, no trainer, no app) every time you hit the right key you would get a point and every time press the wrong key that point will be deducted. Eventually with enough time and with trial & error you would be able to play the song. This exactly what happens in Reinforcement Learning except a lot faster.

At first glance Supervised Learning and Reinforcement Learning looks quite similar but that is not the case. In supervised learning we provide feedback to the model or agent to take correct set of action. Whereas in Reinforcement Learning we use reward and punishment system for every positive or negative behaviour.

When we compare Reinforcement Learning with Unsupervised Learning the term goal is used in quite a different way. The goal in Unsupervised Learning is to find similarities or differences between data. Whereas in Reinforcement Learning the goal is to find best action possible to gain maximum rewards.

Reinforcement Learning WorkflowA simple workflow of RL looks something like this:

Here agent is introduced to an environment and interact with it which we call it as action. Now because of this action the state changes for the agent and depending on this change of state we reward the agent positively or negatively.   

Building Blocks

The basic components that you will find in any RL system. Majority of your time will be spent dealing with these components.

Agent: A learner or a decision maker.

Environment: A physical or virtual world where agent has to achieve a particular goal and decides what actions to take with respect to that goal.

Action: It is set to of actions a agent can perform.

State: A current situation the agent is in, after performing action or a set of actions state of agent changes.

Policy: It is responsible for how agent will behave in given environment, basically agents comes up with strategy to map situations to actions.

Reward: It is to indicate the agent on how to achieve best possible result in shortest amount of time. If the action is towards the end goal the agent is rewarded positively or else negatively.

Value Function: It basically helps agent to understand if it is good idea to stay in a particular state by determining the probability of future rewards.

Environment Model: It simulates the dynamics of environment the agent is placed in and how environment reacts to different actions taken by agent.