Training a model

After creating the enviroment the user can simply import the file and initialize the enviroment with the Gym function make. After the enviroment is initialized the user can train any kind of model with any RL algorithm using the Gym env. In the Python script the user also needs to initialize the ROS node and can add any commands as neeeded. An example script is shown below:

from kobuki_maze_rl.task_env import kobuki_maze
from frobs_rl.common import ros_gazebo
import gym
import rospy

if __name__ == '__main__':
    # Kill all processes related to previous runs
    ros_node.ros_kill_all_processes()

    # Launch Gazebo
    ros_gazebo.launch_Gazebo(paused=True, gui=False)

    # Start node
    rospy.logwarn("Start")
    rospy.init_node('kobuki_maze_train')

    # Launch the task environment
    env = gym.make('KobukiMazeEnv-v0')

Note that the string input of the Gym make function is the ID of the env used when registering it.

FRobs_RL Env Wrappers

To change some properties of the enviroments without changing the class env code the user can use Gym Wrappers. These wrappers are used to change properties like the observation or action space of the enviroment without the need to change them directly in the class. This is useful when the user might want to normalize the observation or action space (to change the speed of learning) or when the user want to specify a limit of steps per episode. In FRobs_RL the previous wrappers are already included with the names:

NormalizeActionWrapper
NormalizeObservWrapper
TimeLimitWrapper

To use them the user only needs to import them from the library and pass the initialized enviroment. An example where the three wrappers are used is shown below:

from kobuki_maze_rl.task_env import kobuki_maze
from frobs_rl.common import ros_gazebo
import gym
import rospy

from frobs_rl.wrappers.NormalizeActionWrapper import NormalizeActionWrapper
from frobs_rl.wrappers.TimeLimitWrapper import TimeLimitWrapper
from frobs_rl.wrappers.NormalizeObservWrapper import NormalizeObservWrapper

if __name__ == '__main__':
    # Kill all processes related to previous runs
    ros_node.ros_kill_all_processes()

    # Launch Gazebo
    ros_gazebo.launch_Gazebo(paused=True, gui=False)

    # Start node
    rospy.logwarn("Start")
    rospy.init_node('kobuki_maze_train')

    # Launch the task environment
    env = gym.make('KobukiMazeEnv-v0')

    #--- Normalize action space
    env = NormalizeActionWrapper(env)

    #--- Normalize observation space
    env = NormalizeObservWrapper(env)

    #--- Set max steps
    env = TimeLimitWrapper(env, max_steps=15000)
    env.reset()

Included RL models

In the next step the RL models included from stable-baselines3 in the FRobs_RL and how to use them is shown.

Enviroment Wrappers

class NormalizeActionWrapper.NormalizeActionWrapper(env)[source]

Wrapper to normalize the action space.

Parameters: env – (gym.Env) Gym environment that will be wrapped

rescale_action(scaled_action)[source]

Rescale the action from [-1, 1] to [low, high]

Parameters: scaled_action (np.ndarray) – The action to rescale.
Returns: The rescaled action.
Return type: np.ndarray

reset()[source]: Reset the environment

step(action)[source]

Parameters: action (floar or int) – Action taken by the agent
Returns: observation, reward, is the episode over, additional informations
Return type: (np.ndarray, float, bool, dict)

class NormalizeObservWrapper.NormalizeObservWrapper(env)[source]

Wrapper to normalize the observation space.

Parameters: env – (gym.Env) Gym environment that will be wrapped

reset()[source]: Reset the environment

scale_observation(observation)[source]

Scale the observation from [low, high] to [-1, 1].

Parameters: observation (np.ndarray) – Observation to scale
Returns: scaled observation
Return type: np.ndarray

step(action)[source]

Parameters: action (float or int) – Action taken by the agent
Returns: observation, reward, is the episode over, additional informations
Return type: (np.ndarray, float, bool, dict)

class TimeLimitWrapper.TimeLimitWrapper(env, max_steps=100)[source]

Wrapper to limit the number of steps per episode.

Parameters

env – (gym.Env) Gym environment that will be wrapped
max_steps – (int) Max number of steps per episode

reset()[source]: Reset the environment

step(action)[source]

Parameters: action ([float] or int) – Action taken by the agent
Returns: observation, reward, is the episode over, additional informations
Return type: (np.ndarray, float, bool, dict)