Deep Reinforcement Learning Agent Learns to Play Atari Pong
Getting a computer program to learn how to play Atari games may seem like a very difficult task. Well I'll tell you what ... it is. But believe me, if I can do it then so can you. This page assumed you have read through and somewhat slightly understand the theory of deep RL. But beware, theory is not enough to get an agent to learn Atari! With reinforcement learning everything is in implementation and the devil is in the details! So, the rest of the post will be focused on implementing the code line by line to get our agent working.
For implementation, we will be using the Open AI Gym environment. If you don't have open ai gym you can easily pip install it. For the agents NN, I will be building a CNN using keras. We will first tackle Pong, then in a separate article we will get the agent to play breakout (it takes a lot longer to train). Really take your time and read through my code to understand what is going on.
Also, I highly recommend you do this project in google colab. If you don't know what google colab is, google it and check it out! They provide gpu and cpu's for you to run code on for free. I upgraded to colab pro for 10 bucks a month, but you don't have to. Alright, enough promoting googles products, let's build this agent!
A Quick Open AI Gym Tutorial
Open AI Gym is a library full of atari games. This library easily lets us test out our agent without having to build our environments ourselves. After you import gym, there are only 4 functions we will be using from it. These functions are; gym.make(env), env.reset(), env.step(a), and env.render().
- gym.make(env): This simply gets our environment from open ai gym. We will be calling env = gym.make('PongDeterministic-v4'), which is saying that our env is Pong.
- env.reset(): This resets the environment back to its first state
- env.step(a): This takes a step in the environment by performing action a. This returns the next frame, reward, a done flag, and info. If the done flag == True, then the game is over. Info will tell us how many lives we have left.
- env.render(): env.render() shows the agent playing the game. We are only going to use env.render() when checking the performance of our agent. I don't think this works in google colab or any other notebook, though I may be wrong.
Here's a little example of each of these functions.