Reinforcement learning is a type of machine learning in which an agent tries to perform actions in a way that maximizes the reward for a particular situation. In supervised learning, we are given the target label which acts as the ground truth for the model so that we can train the model to predict the label for unseen examples. In reinforcement learning, by contrast, there is no target label but the reinforcement agent decides what to do to perform the given task or action in a particular situation and the agent learns from its experience.
According to Wikipedia, ‘reinforcement learning’ is an area of machine learning inspired by behavioural psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
An application of reinforcement learning in the field of computational finance is where you want to have a model handle automated trading of stocks and shares. Here the agent is the specific software needed to make trades, the environment is other traders, the state is price history, the possible actions are buy or sell or hold, and the reward is profit/loss.
Another application of reinforcement learning in the field of operations research is exemplified by the challenge taken on by a company like Uber. When calculating how to route vehicles, a naive application of this might be a reinforcement learning algorithm. In this case, the agent is the vehicle routing software, the environment is the stochastic demand, the state is the vehicle locations, capacity and depot requests, the action is the particular route taken by a vehicle, and the reward is the travel costs.
In this article, I will be using ZenML to build a model that can solve Atari games using reinforcement learning. I will be using the Atari 2600 game environment. I will be using the Deep Q-Learning algorithm to solve the game. I found this Github repo, Building a Powerful DQN in TensorFlow 2.0, to get started with our solution.
I will be using OpenAI Gym which is a toolkit that provides a wide variety of simulated environments (Atari games, board games, 2D and 3D physical simulations, and so on), so you can train agents and compare them. I will be using the
BreakoutDeterministic-v4 environment from OpenAI Gym.
In the real world, building reinforcement learning applications can be challenging so I will be using ZenML (an MLOps framework) which allows for the deployment of models which can be used across the organization. ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. ZenML pipelines execute ML-specific workflows from sourcing data to splitting, preprocessing, training, all the way to the evaluation of results and even serving.
If you prefer consuming your content in video form, then this video covers the same which we cover in this blogpost.
I suggest you create and work out of a virtual environment. You can create a virtual environment using
conda by following these steps, but of course you can also use whatever you’re familiar with:
conda create -n envname python=x.x anaconda conda activate envname
Before running this project, you must install some Python packages in your environment which you can do with the following steps:
git clone https://github.com/zenml-io/zenfiles.git cd atari-game-play pip install -r requirements.txt
We’re ready to go now. You can run the code, using the
training_pipeline.py script is the main file that runs the training pipeline. In brief, the training pipeline consists of several steps which include:
game_wrapwhich wraps over the game environment that you want to train on
build_dqnwhich builds a Keras model
replay_bufferwhich stores the past experiences of the agent
get_information_metawhich restores the model from a given checkpoint
trainwhich trains the dqn agent.
Every step is connected with each other in a way such that output from one step is given as input to another step. The following is the code for the training pipeline:
from zenml.pipelines import pipeline @pipeline def train_pipeline(game_wrap, build_dqn, replay_buffer, agent, get_information_meta, train): """ It trains the agent. :param game_wrap: This is a function that returns a GameWrapper object. The GameWrapper object wraps over the game that you want to train on. It has functions that can be used to get the available actions, get the current state, etc :param build_dqn: This is a function that returns a DQN. The parameters are the game_wrapper, which is the game wrapper object that we created earlier :param replay_buffer: The replay buffer is a class that holds all the experiences a DQN has seen, and samples from it randomly to train the DQN :param agent: The agent that will be used to play the game :param get_information_meta: This is a function that returns the frame number, rewards, and loss list :param train: The function that will be called inside the train_pipeline function """ game_wrapper = game_wrap() main_dqn, target_dqn = build_dqn(game_wrapper) replay_buffer = replay_buffer() agent = agent(game_wrapper, replay_buffer, main_dqn, target_dqn) frame_number, rewards, loss_list = get_information_meta(agent) train(game_wrapper, loss_list, rewards, frame_number, agent)
We can see we have several steps that make up this pipeline, so let’s break it down and talk about each step in detail. All the steps can be found in the
game_wrap: This is a function that returns a GameWrapper object. The GameWrapper object wraps over the game that you want to train on. The GameWrapper class wraps the Open AI Gym environment and provides some useful functions such as resetting the environment and keeping track of useful statistics such as lives left.
build_dqn: It builds the DQN model in Keras.
replay_buffer: The replay buffer is a class that holds all the experiences a DQN has seen and samples from it randomly to train the DQN. It takes care of managing the stored experiences and sampling them on demand.
agent: Implements a standard (Double Dueling Deep Q-Learning Network) DDDQN agent, you can learn more about it from here
get_information_meta: If we’re loading from a checkpoint, load the information from the checkpoint. Otherwise, start from scratch. This is a step that returns the frame number, rewards, and loss list, frame number is the number of frames that have been played, rewards is the list of rewards that have been accumulated, loss list is the list of losses that have been accumulated.
train: We initialize the agent, the game environment, and the TensorBoard writer. Then, we train the agent until the game is over.
Now that you’re familiar with the individual steps of the pipeline, let’s take a look about how we run it with the
run_pipeline function. We import every step from the steps folder. We can see that the pipeline is a function that takes various functions as arguments.
from steps.game_wrap import game_wrap from steps.build_dqn import build_dqn from steps.replay_buffer import replay_buffer from steps.agent import agent from steps.get_information_meta import get_information_meta from steps.train import train from pipelines.training_pipeline import train_pipeline import argparse from materializer.dqn_custom_materializer import dqn_materializer def run_training(): training = train_pipeline( game_wrap().with_return_materializers(dqn_materializer), build_dqn(), replay_buffer().with_return_materializers(dqn_materializer), agent().with_return_materializers(dqn_materializer), get_information_meta(), train(), ) training.run()
You’ll probably have noticed that some of the steps in this pipeline require custom materializers to be used, so let’s take a closer look at those.
The precise way that data passes between the steps is dictated by materializers. The data that flows through steps are stored as artifacts and artifacts are stored in artifact stores. The logic that governs the reading and writing of data to and from the artifact stores lives in the materializers. You can learn more about custom materializers in the Materializer docs.
DEFAULT_FILENAME = "PyEnvironment" class dqn_materializer(BaseMaterializer): ASSOCIATED_TYPES = [Agent, GameWrapper, ReplayBuffer] def handle_input( self, data_type: Type[Any] ) -> Union[Agent, GameWrapper, ReplayBuffer]: """Reads a base sklearn label encoder from a pickle file.""" super().handle_input(data_type) filepath = os.path.join(self.artifact.uri, DEFAULT_FILENAME) with fileio.open(filepath, "rb") as fid: clf = pickle.load(fid) return clf def handle_return( self, clf: Union[Agent, GameWrapper, ReplayBuffer], ) -> None: """Creates a pickle for a sklearn label encoder. Args: clf: A sklearn label encoder. """ super().handle_return(clf) filepath = os.path.join(self.artifact.uri, DEFAULT_FILENAME) with fileio.open(filepath, "wb") as fid: pickle.dump(clf, fid)
handle_return methods are important for defining how the materializer knows how to do its job.
handle_inputis responsible for reading the artifact from the artifact store.
handle_returnis responsible for writing the artifact to the artifact store.
You can tune the configurations for the model training which you can find in
config.py file. I urge you to increase the
epochs to get a better training result. You can also change the
learning_rate to get a better training result. You can also fine-tune several other parameters in the
Deep Q Networks are not the newest or most efficient algorithm when it comes to playing games. Nevertheless, they are still very effective and can be used for games like the Atari games described in this blogpost. They lay the foundation for reinforcement learning. In this post we have seen how to build a DQN and train it to play Atari games. We used ZenML to build production-grade pipelines that are reproducible and scalable.