Pipeline for training MLAgents environments using Kaggle Notebooks

Why this combination?

The main advantage of the Kaggle core is the ability to work in the background for quite a long time for free (compared to Colab). This allows you not to use your own computer for rather lengthy calculations. By "Commit & Run" your notebook, you can get up to 12 hours of computing time in the background, with the output saved and available for download at any time.

Unity MLAgents makes it easy to create your own reinforcement learning environments and has connections to Python and Gym to standardize the environment.

Stable Baselines provides state-of-the-art, robust reinforcement learning methods, as well as easy tracking of experiments in Tensorboard.

What you should pay attention to

MLAgents, Gym and Stable Baselines may conflict with each other due to different versions of the packages themselves, as well as additionally installed dependencies (such as numpy). This is made worse by the fact that we cannot control the version of python in environments such as Kaggle Notebook or Colab. Therefore, when installing, you should ignore the python version, and install the additional shimmy package for stable baselines:

!pip install --ignore-requires-python mlagents==1.0.0
!pip install stable-baselines3
!pip install shimmy>=0.2.1

In addition, you need to remember that you do not have a display to display the Unity environment, so the visualization option must be disabled:

unity_env = UnityEnvironment(env_path, no_graphics=True)

Converting to a Unity model

To convert the resulting SAC model into .onnx format for use in Unity, you can run the script onnx_convert.py after first changing the CONTINUOUS_ACTIONS_SIZE to your own and placing saс_model.zip in the same folder.

Enviroment and results

My own environment was used, created in Unity using Ariticulation Body (allows you to achieve realism in joints and actuators for robotics). The environment is a spider with 8 degrees of freedom. A reward is given for each step in proportion to the speed in the desired direction with a shift (negative reward for standing still). More details about the environment will be written in another repository. The Soft Aсtor-Critic algorithm from Stable Baselines 3 taught a spider to walk in 100,000 steps.

Tensorboard Visualization

To visualize the results, you can use Google Colab by loading the resulting tensorboard archive into the content folder and calling the console commands:

!unzip ./sac_spyder_tensorboard.zip
%load_ext tensorboard
%tensorboard --logdir /content/kaggle/working/sac_spyder_tensorboard

denisgriaznov / ml_agents_custom_enviroment_kaggle_notebook Goto Github PK

ml_agents_custom_enviroment_kaggle_notebook's Introduction

Pipeline for training MLAgents environments using Kaggle Notebooks

Why this combination?

What you should pay attention to

Converting to a Unity model

Enviroment and results

Tensorboard Visualization

ml_agents_custom_enviroment_kaggle_notebook's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent