Tutorial

This chapter is a tutorial for AINE-DRL. It will guide you to train a REINFORCE agent in a OpenAI Gym CartPole-v1 environment.

CartPole-v1

OpenAI Gym CartPole-v1 is a classic control problem. The agent is a pole attached by an un-actuated joint to a cart, which moves along a frictionless track. The agent must learn to move the cart to keep the pole from falling over.

Configuration

First, make a configuration file config/cartpole_v1_reinforce.yaml. The configuration file is a YAML file. The following is a configuration file for REINFORCE agent:

CartPole-v1_REINFORCE:
  Env:
    type: Gym
    Config:
      id: CartPole-v1
  Train:
    Config:
      time_steps: 20000
      summary_freq: 1000
  Agent:
    gamma: 0.99
  

CartPole-v1_REINFORCE is the name of the configuration.

The configuration file has three sections: Env, Train, and Agent. The Env section is for environment configuration. The values of Env/Config key are the general arguments of gym.make() or gym.vector.make() functions. The Train section is for training configuration. The Agent section is for agent configuration REINFORCE agent has two settings. gamma and entropy_coef. If the setting has default value, you can skip it. See configuration details in Agent docs.

You can see entire configuration settings in config/samples/cartpole_v1_reinforce.yaml.

Training Script

Next, make a training script train.py.

Neural Network

AINE-DRL is based on PyTorch 1.11.0 - CUDA 11.3 library.

To make a REINFORCE network, you need to implement REINFORCENetwork interface class. The following is a training script:

import torch.nn as nn
import torch.optim as optim

import aine_drl
import aine_drl.agent as agent
from aine_drl.factory import AgentFactory, AINEInferenceFactory, AINETrainFactory
from aine_drl.policy import CategoricalPolicy


class CartPoleREINFORCENet(nn.Module, agent.REINFORCENetwork):    
    def __init__(self, obs_features, num_actions) -> None:
        super().__init__()
        
        # policy layer
        self.policy_net = nn.Sequential(
            nn.Linear(obs_features, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            CategoricalPolicy(64, num_actions)
        )
        
    def model(self) -> nn.Module:
        return self.policy_net
    
    def forward(self, obs: aine_drl.Observation) -> aine_drl.PolicyDist:
        return self.policy_net(obs.items[0])
  

Observation has a tuple of observation Tensor in items field. Environment may provide multiple observations (e.g., image and vector). In this case, CartPole-v1 provides only vector observation. So, items[0] is the vector observation.

CategoricalPolicy is a linear layer whose output is a categorical policy distribution.

Agent Factory

To make a REINFORCE agent, you need to implement AgentFactory interface class. The following is a training script:

class REINFORCEFactory(AgentFactory):
    def make(self, env: aine_drl.Env, config_dict: dict) -> agent.Agent:
        config = agent.REINFORCEConfig(**config_dict)
        
        network = CartPoleREINFORCENet(
            obs_features=env.obs_spaces[0][0],
            num_actions=env.action_space.discrete[0]
        )
        
        trainer = aine_drl.Trainer(optim.Adam(
            network.parameters(),
            lr=0.001
        )).enable_grad_clip(network.parameters(), max_norm=5.0)
        
        return agent.REINFORCE(
            config,
            network,
            trainer,
        )
  

config_dict is a dictionary of Agent section in the configuration file.

aine_drl.Env is a AINE-DRL environment wrapper class. env.obs_spaces is a tuple of ObservationSpace instances. ObservationSpace class represents the shape of observation space. For example, vector space is (4,) and image space is (600, 400, 3). env.action_space is an action space. env.action_space.discrete is a tuple whose each element is the number of actions of each action branch. env.action_space.continuous is the number of continuous actions. env.action_space.discrete[0] is the number of discrete actions of the first discrete action branch.

Trainer is a optimizer wrapper class. Trainer has enable_grad_clip() method to enable gradient clipping. It can prevent gradient explosion which causes training instability.

Main Code

Now write a main code to start training:

if __name__ == "__main__": 
    config_path = "config/cartpole_v1_reinforce.yaml"
    
    AINETrainFactory.from_yaml(config_path) \
        .make_env() \
        .make_agent(REINFORCEFactory()) \
        .ready() \
        .train() \
        .close()
  

AINETrainFactory is a factory class to make a Train instance. You can use methods by chain rule! AINETrainFactory.from_yaml() method loads a configuration file. AINETrainFactory.make_env() method makes a training environment. In this case OpenAI Gym CartPole-v1. AINETrainFactory.make_agent() method makes an agent (i.e., REINFORCE agent). AINETrainFactory.ready() method makes a Train instance to prepare the training. Train.train() method starts training. Train.close() method closes the training safely.

You can see entire training codes in samples/cartpole_v1_reinforce.py.

Start Training!

If you follow the above steps, run the training script by entering the following command:

python train.py

then you can see the training information in your shell:

+----------------------------------------------------+
| AINE-DRL Training Start!                           |
|====================================================|
| ID: CartPole-v1_REINFORCE                          |
| Output Path: results/CartPole-v1_REINFORCE         |
|----------------------------------------------------|
| Training INFO:                                     |
|     number of environments: 1                      |
|     total time steps: 20000                        |
|     summary frequency: 1000                        |
|     agent save frequency: 10000                    |
|----------------------------------------------------|
| REINFORCE Agent:                                   |
|     gamma: 0.99                                    |
|     entropy_coef: 0.001                            |
|     device: cpu                                    |
+----------------------------------------------------+

[AINE-DRL] training time: 0.47, time steps: 1000, cumulated reward: 22.23
[AINE-DRL] training time: 0.99, time steps: 2000, cumulated reward: 33.30
[AINE-DRL] training time: 1.47, time steps: 3000, cumulated reward: 48.10
  

When the training is finished, you can see the training result files (tensorboard, log message, agent save file) in the results/CartPole-v1_REINFORCE directory.

If you want to see the graphical training result, use:

tensorboard --logdir=results

Inference

Now, let's inference the trained agent!

First, add CartPole-v1_REINFORCE/Inference section in the configuration file config/cartpole_v1_reinforce.yaml:

Inference:
  Config:
    episodes: 3
  

episodes is the number of episodes to inference.

Then, add an inference code in train.py:

AINEInferenceFactory.from_yaml(config_path) \
    .make_env() \
    .make_agent(REINFORCEFactory()) \
    .ready() \
    .inference() \
    .close()
  

It's similar to the AINETrainFactory code. AINEInferenceFactory.ready() method makes an Inference instance. Inference.inference() method starts inference. Inference.close() method closes the inference safely.

Now, run the training script by entering the following command:

python train.py

then you can see the inference information in your shell:

[AINE-DRL] Training is already finished.
[AINE-DRL] Agent is successfully loaded from: results/CartPole-v1_REINFORCE/agent.pt
[AINE-DRL] inference - episode: 0, cumulative reward: 302.00
[AINE-DRL] inference - episode: 1, cumulative reward: 295.00
[AINE-DRL] inference - episode: 2, cumulative reward: 328.00
[AINE-DRL] Inference is finished.
  

You can see the real-time rendering where the cart moves!

You may not satisfy the inference result, because cumulative reward is not high. The reason why is the training is not enough. You can increase the training time steps by changing the Train/Config/time_steps setting in the configuration file.

AINE-DRL provides different inference options. You can export the inference result as a GIF file or pictures (video is not currently supported). Change the Inference/Config section in the configuration file:

Inference:
  Config:
    episodes: 3
    export: gif # default: render_only
  

then you can see the GIF files in the results/CartPole-v1_REINFORCE/exports/gifs directory.