Decision Scientist at Flipkart
Welcome back! This is the 2nd article into our fun ride of RL. For those who have not read the first article, the only thing you need to know so far is that RL is the art of mapping states to actions in order to maximise the total expected reward. For more details and a fun example, the link to the first article is here.
Speeding up! We would explore in this article the striking differences between the three types of learning, namely supervised, unsupervised and reinforced. The names themselves suggest a great deal, but let’s dive deeper!
Supervised learning is learning from a training set of labeled examples provided by a knowledgeable external supervisor. Each example is a description (feature set) of a situation together with a specification—the target label. The object of this kind of learning is for the system to extrapolate, or generalise its responses so that it acts correctly in situations not present in the training set. This is an important kind of learning, but alone it is not adequate for learning from interaction.
Some standard examples that apply supervised learning are :
How do we learn about the interactions with the environment in which we are acting and make smarter choices as we learn? “Ah huh! RL comes to the rescue!”
RL approaches are designed to model such complex interactions which are hard to be moulded into a supervised setting. Hence, it does not rely on already available datasets but rather focuses on how to improve based on what it has seen.
In interactive problems, it is often impractical to obtain examples of the desired behaviour that are both correct and representative of all the situations in which the agent has to act. RL’s approaches establish an agent that must be able to learn from its own experience. RL is a widely applicable branch of machine learning and finds uses across multiple domains.
A very nice summary of RL algorithms is here :
Courtesy: David Silver’s course on RL
Reinforcement learning is also different from what machine learning researchers call unsupervised learning, which is typically about figuring out structure hidden in collections of unlabelled data. The terms supervised learning and unsupervised learning would seem to exhaustively classify machine learning paradigms, “But they do not!”. Although one might be tempted to think of RL as a kind of unsupervised learning because it does not rely on examples of correct behaviour, RL is trying to maximise a reward signal instead of trying to find out hidden structure. Discovering a structure or pattern in an agent’s experience can certainly be useful in RL, but by itself it does not address the reinforcement learning problem of maximising a reward signal. We therefore consider reinforcement learning to be a third machine learning paradigm, alongside supervised learning and unsupervised learning.
Some standard examples that apply unsupervised learning are :
- Hidden Markov Models
- Hierarchical Clustering
So these are the reasons why we study RL as a different branch of machine learning. “Next up! The fun ride is getting into action in the next article. Tighten your seatbelts cuz we are going to start the countdown to launch into the world of RL and all the amazing theories that it has to offer!” Cheers!
Other articles of the series:
Decision Scientist at Flipkart