Reinforcement Learning – The Art of Interactive Learning

Want a treat? Just read on for a few minutes!

I intend to jot down a holistic yet fun view of Reinforcement Learning. It is that branch of machine learning which anyone can relate to the most. The power to explain our actions of everyday life, our ways of thinking and even how we play a game of ping pong can be understood ground-up from the roots using the amazing theory of Reinforcement Learning.

“Don’t believe me yet! Convince yourselves only after exploring it through this small ride of articles that I am planning to take you through!”

Reinforcement Learning

The roadmap would start with an introduction/motivation and then we will cover each important module of reinforcement learning relating it to an application. “Cuz who likes just theory!” Also, let’s abbreviate Reinforcement Learning to RL. The main focus while relating to applications would be on how RL can help solve problems in the financial world of e-commerce and “Of course we cannot leave out the Atari Games!”

So Let’s Dive Right In!

Introduction

In the simplest of words, RL is learning how to map states to actions. By “how” we mean an optimal mapping that can yield the maximum reward. Three new words here: states, actions, and reward. Let’s get to them one-by-one using an example of the game of Ping-Pong.

 

Imagine you are playing a game of Ping-Pong. When the ball arrives in your court, a snapshot of the scene could mean a state.

AI Time Journal Resources
Are you learning data science?

Check out what books helped 20+ successful data scientists grow in their career.

  1. Position of the ball in the air
  2. Its velocity and spin motion
  3. The dynamics of your bat
  4. Your body-posture
  5. The wind blowing over the table
  6. The whereabouts of your opponent and his bat
  7. Also your desire to win and adrenaline levels and what not – all makeup together what we call a STATE.

Many times it is really difficult to be able to quantify the components that together make up the state. In this case, we sample a set of features from the environment which we claim are good enough to explain the state, while in other cases the state space is relatively simply quantifiable.

From “State” to “Action”

Ok, we get what a state is! How do we define “action”? Given the state, if you decide to slice and spin the ball then that is called an action. If you decide to smash the ball then that too is an action and so is to gently tap the ball across the net. So we realize that both the state-space and the action-space can be quite complex. They can either be continuous or discrete as per the problem settings.

Join our weekly newsletter to receive:

  1. Latest articles & interviews
  2. AI events: updates, free passes and discount codes
  3. Opportunities to join AI Time Journal initiatives

The Reward

The best term Reward was saved for the last. Now imagine that the ball from your opponent was a lollipop or he had just tossed the ball up in the air across the net. Now if you choose to smash there is a high probability that you will win the point. Also if you too toss it up like a rookie then the probability of winning the point would be much less. The reward here is the point that you get in that rally and the goal is to maximise the sum total of these rewards till the end of the match or let’s call it till the end of the episode.

So, Reinforcement Learning is finding out the optimal matching of states to actions so that we can squeeze as much reward as possible.


In the next article, we will differentiate between the 3 forms of learning: Supervised, Unsupervised and Reinforced.
Stick around to learn about the art of learning! Till next time! Cheers!

Next articles of the serie:

2. Supervised vs Unsupervised vs Reinforced

3. Multi-Armed Bandit – Reinforcement Learning

4. Class of Algorithms for Solving Multi-Armed Bandit

 

Bruce Lee played ping pong really well by the way!

Contributor

Decision Scientist at Flipkart

Opinions expressed by contributors are their own.

About Prateek Singhi

Decision Scientist at Flipkart

View all posts by Prateek Singhi →