Training a model

After creating the enviroment the user can simply import the file and initialize the enviroment with the Gym function make. After the enviroment is initialized the user can train any kind of model with any RL algorithm using the Gym env. In the Python script the user also needs to initialize the ROS node and can add any commands as neeeded. An example script is shown below:

from kobuki_maze_rl.task_env import kobuki_maze
from frobs_rl.common import ros_gazebo
import gym
import rospy

if __name__ == '__main__':
    # Kill all processes related to previous runs
    ros_node.ros_kill_all_processes()

    # Launch Gazebo
    ros_gazebo.launch_Gazebo(paused=True, gui=False)

    # Start node
    rospy.logwarn("Start")
    rospy.init_node('kobuki_maze_train')

    # Launch the task environment
    env = gym.make('KobukiMazeEnv-v0')

Note that the string input of the Gym make function is the ID of the env used when registering it.

FRobs_RL Env Wrappers

To change some properties of the enviroments without changing the class env code the user can use Gym Wrappers. These wrappers are used to change properties like the observation or action space of the enviroment without the need to change them directly in the class. This is useful when the user might want to normalize the observation or action space (to change the speed of learning) or when the user want to specify a limit of steps per episode. In FRobs_RL the previous wrappers are already included with the names:

  • NormalizeActionWrapper

  • NormalizeObservWrapper

  • TimeLimitWrapper

To use them the user only needs to import them from the library and pass the initialized enviroment. An example where the three wrappers are used is shown below:

from kobuki_maze_rl.task_env import kobuki_maze
from frobs_rl.common import ros_gazebo
import gym
import rospy

from frobs_rl.wrappers.NormalizeActionWrapper import NormalizeActionWrapper
from frobs_rl.wrappers.TimeLimitWrapper import TimeLimitWrapper
from frobs_rl.wrappers.NormalizeObservWrapper import NormalizeObservWrapper

if __name__ == '__main__':
    # Kill all processes related to previous runs
    ros_node.ros_kill_all_processes()

    # Launch Gazebo
    ros_gazebo.launch_Gazebo(paused=True, gui=False)

    # Start node
    rospy.logwarn("Start")
    rospy.init_node('kobuki_maze_train')

    # Launch the task environment
    env = gym.make('KobukiMazeEnv-v0')

    #--- Normalize action space
    env = NormalizeActionWrapper(env)

    #--- Normalize observation space
    env = NormalizeObservWrapper(env)

    #--- Set max steps
    env = TimeLimitWrapper(env, max_steps=15000)
    env.reset()

Included RL models

In the next step the RL models included from stable-baselines3 in the FRobs_RL and how to use them is shown.

Enviroment Wrappers

class NormalizeActionWrapper.NormalizeActionWrapper(env)[source]

Wrapper to normalize the action space.

Parameters

env – (gym.Env) Gym environment that will be wrapped

rescale_action(scaled_action)[source]

Rescale the action from [-1, 1] to [low, high]

Parameters

scaled_action (np.ndarray) – The action to rescale.

Returns

The rescaled action.

Return type

np.ndarray

reset()[source]

Reset the environment

step(action)[source]
Parameters

action (floar or int) – Action taken by the agent

Returns

observation, reward, is the episode over, additional informations

Return type

(np.ndarray, float, bool, dict)

class NormalizeObservWrapper.NormalizeObservWrapper(env)[source]

Wrapper to normalize the observation space.

Parameters

env – (gym.Env) Gym environment that will be wrapped

reset()[source]

Reset the environment

scale_observation(observation)[source]

Scale the observation from [low, high] to [-1, 1].

Parameters

observation (np.ndarray) – Observation to scale

Returns

scaled observation

Return type

np.ndarray

step(action)[source]
Parameters

action (float or int) – Action taken by the agent

Returns

observation, reward, is the episode over, additional informations

Return type

(np.ndarray, float, bool, dict)

class TimeLimitWrapper.TimeLimitWrapper(env, max_steps=100)[source]

Wrapper to limit the number of steps per episode.

Parameters
  • env – (gym.Env) Gym environment that will be wrapped

  • max_steps – (int) Max number of steps per episode

reset()[source]

Reset the environment

step(action)[source]
Parameters

action ([float] or int) – Action taken by the agent

Returns

observation, reward, is the episode over, additional informations

Return type

(np.ndarray, float, bool, dict)