Learning Task-Agnostic Action Spaces for Movement Optimization
This repository contains the source code for the algorithm, described in this paper.
Abstract
We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications.
Prerequisites
- Python 3.5 or above
- cma
- glfw
- gym
- Keras
- mujoco-py
- numpy
- opencv-python
- pandas
- Pillow
- stable-baselines
- tensorflow
More detailed requirements are specified in requirements.txt
.
Code Structure
Primary scripts
NaiveExplorer.py
The script for generating the exploration data using naive explorationContactExplorer.py
The script for generating the exploration data using the proposed contact-based exploration algorithmproduce_llcs.py
The script for training the LLCs using the exploration dataoffline_trajectory_optimization.py
The script for offline trajectory optimization using CMA-ESonline_trajectory_optimization.py
The script for online trajectory optimization using a simplified version of Fixed-Depth Informed MCTS (FDI-MCTS)RL_Trainer.py
The script for reinforcement learning using PPO or SACRL_Renderer.py
The script for rendering policies trained using PPO or SAC
Secondary scripts
LLC.py
The script for implementing and training state-reaching LLCsMLP.py
Neural network helper classlogger.py
The logger script, taken from OpenAI Baselines repositoryRenderTimer.py
Helper script for helping with realtime rendering
Data and models (used in the paper)
ExplorationData
The folder containing the exploration data generated using naive and contact-based exploration methods.Models
The folder containing all the LLCs for two exploration methods, four agents, and five horizon values (both in multi-target and single-target mode)