Giter Club home page Giter Club logo

motion-policy-networks's Introduction

Motion Policy Networks

This repo has the expert data generation infrastructure and Pytorch implementation of MPiNets.

Table of Contents

Installation

The easiest way to install the code here is to build our included docker container, which contains all of the dependencies for data generation, model training, inference, and a ROS-based interactive demo. Due to the many components, this container will require ~30gb when built. Despite the sizeable storage requirements, we use Docker for several reasons:

  1. For the data generation pipeline, some of our dependencies cannot be installed easily via PyPI
  2. Docker makes it easy to manage the entire system environment, e.g. the CUDA runtime, the various upstream dependencies, ROS, etc.

If you have a strong need to build this repo on your host machine, you can follow the same steps as are outlined in the Dockerfile. For example, if you are limited in disk space and, have your own expert data generation pipeline, or simply intend to use our publicly available datasets, you only need to install the learning dependencies (the different areas should be well-documented in the Dockerfile). Likewise, if you do not intend to use the interactive demo, you do not need to install the ROS dependencies.

If you intend to run the data generation pipeline, you must use Isaac Sim to get access to Geometric Fabrics. Again, you can install this on your host machine, but we recommend using our included Dockerfile, which streamlines the installation process.

In order to build the Docker container included in this repo, you will need access to the Isaac Sim docker container (our container is built on top of it). You can find detailed instructioned on how to do this here, but we provide summarized instructions here as well.

First you will need to join the Nvidia Developer Program. After joining, you can generate your NGC API key as described here.

With your NGC API key, you must log in to NGC from your host computer. As your username, use the string $oauthtoken. As your password, use your NGC API key. To log in, use the command

sudo docker login nvcr.io

Next, clone this repo using:

git clone https://github.com/NVlabs/motion-policy-networks.git

Navigate inside the repo (e.g. cd motion-policy-networks) and build the docker with

docker build --tag mpinets --network=host --file docker/Dockerfile .

After this is built, you should be able to launch the docker using this command (be sure to use the correct paths on your system for the /PATH/TO/THE/REPO arg)

docker run --interactive --tty --rm --gpus all --network host --privileged --env DISPLAY=unix$DISPLAY --env XAUTHORITY --env NVIDIA_DRIVER_CAPABILITIES=all --env "ACCEPT_EULA=Y" --volume /PATH/TO/THE/REPO:/root/mpinets mpinets /bin/bash -c 'export PYTHONPATH=/root/mpinets:$PYTHONPATH; git config --global --add safe.directory /root/mpinets; /bin/bash'

In order to run any GUI-based code in the docker, be sure to add the correct user to xhost on the host machine. You can do this by running xhost +si:localuser:root in another terminal on the host machine.

Our suggested development setup would be to have two terminals open, one running the docker (use this one for running the code) and another editing code on the host machine. The docker run command above will mount your checkout of this repo into the docker, allowing you to edit the files from either inside the docker or on the host machine.

Usage

All usage described below must happen inside the docker container. For all the commands below, assume that they should be run inside the docker container.

In order to easily import mpinets as a module in our python scripts, we first install it in developer mode as follows:

pip install -e .

Inference With Motion Policy Networks

In order to run inference, you will need a pretrained model file. This will be generated by the training script if you wish to retrain the model, but you can also download a pretrained set of weights. You can download those weights here.

We present an inference script run_inference.py that will run the policy and visualize the results. This script requires a model checkpoint and a set of test problems. These problems must be formatted in the following format.

@dataclass
class PlanningProblem:
    """
    Defines a common interface to describe planning problems
    """

    # The problem target in the `right_gripper` frame
    target: SE3

    # A trajectory is only successful if the final pose is within this region
    target_volume: Union[Cuboid, Cylinder]

    # The starting configuration
    q0: np.ndarray

    # The obstacles in the scene
    obstacles: Optional[Obstacles] = None

    # An alternate representation of the obstacles
    obstacle_point_cloud: Optional[np.ndarray] = None

    # Trajectories are not successful unless final pose is outside of these regions.
    target_negative_volumes: Obstacles = field(default_factory=lambda: [])

We have three sets of test problems that you can download--those solvable by the Global Planner, those solvable by the Hybrid Planner, and those solvable by both planners.

The inference script uses MPiNets to solve problems in these formats and provide metrics for their performance. By default, it will also simulate the rollout in PyBullet and visualize the perception with Meshcat. When you run the script, it will print a URL that you can visit in the browser on your host machine to view the point cloud of the scene.

This script allows you to choose a scene type and class of problem--or evaluate all of them. For example, to see visuals of the policy in a tabletop setting moving from a neutral pose to a task-oriented pose where the problems are solvable by the hybrid planner, run the following. Be sure to use the correct paths the checkpoint and test problems.

python3 mpinets/mpinets/run_inference.py /PATH/TO/corl_2022_hybrid_expert_checkpoint.ckpt /PATH/TO/hybrid_solvable_problems.pkl tabletop neutral-start

To see a set of task-oriented dresser problems solvable by the global expert

python3 mpinets/mpinets/run_inference.py /PATH/TO/corl_2022_hybrid_expert_checkpoint.ckpt /PATH/TO/global_solvable_problems.pkl dresser neutral-start

To evaluate the problems and quickly see the metrics, you can also skip the visualization. For example, to run all scene types and all problem types without visuals for problems solvable by both planners, run the following.

python3 mpinets/mpinets/run_inference.py /PATH/TO/corl_2022_hybrid_expert_checkpoint.ckpt /PATH/TO/both_solvable_problems.pkl all all --skip-visuals

To see all of the options available, run

python3 mpinets/mpinets/run_inference.py --help

Interactive Demo Using ROS

In addition to the inference scripts described above, we also provide a way to run the network interactively in order to play around with it using real-world data. This demo requires ROS, which is included in the Dockerfile.

In order to run this demo, first download the real-world data. This data comes in a separate format from the inference problems above. You can download this data here. Just as in the inference examples, you will also need a model checkpoint as well. You can train your own or use our pretrained checkpoint (see the section on inference for the link). After downloading these files, place them somewhere accessible to the docker.

Upon launching the docker, you will first need to create a catkin workspace and place the relevant packages in the src directory. You can move the directories into the src folder or, even better, you can create symbolic links. The advantage of symbolic links is that you can still maintain the same developer workflow suggested above (edit on host machine, run in docker) because you haven't actually moved the directories out of the git repo. Finally, you must build and source the repo. To do so, run the following in the docker

mkdir -p /root/catkin_ws/src
cd /root/catkin_ws
ln -s /root/mpinets/interactive_demo/mpinets_ros src/mpinets_ros
ln -s /root/mpinets/interactive_demo/mpinets_msgs src/mpinets_msgs
catkin build && source devel/setup.bash

Next, you will need to run the launch file for the demo.

roslaunch mpinets_ros visualize.launch mdl_path:=/PATH/TO/corl_2022_hybrid_expert_checkpoint.ckpt point_cloud_path:=/PATH/TO/real_point_cloud_data.npy

This will bring up Rviz and an environment with a tabletop, a robot, a floating gripper, and some blocks. The floating gripper allows you to set the target for the policy. The blocks are buttons. You will have to wait until the script says 'System Ready' before planning. Click on the red box to reset the demo, the yellow box to plan a path to the currently set target, and the green button to execute. After planning, you will see a semi-transparent robot looping over the plan to show you what it looks like. The opaque robot is the "real" robot and will move only when you click execute.

Training Motion Policy Networks

Using our pregenerated data

We provide the dataset of Hybrid Expert trajectories that we used to train our model in Motion Policy Networks. You can download this data here.

If you are using the Docker container, you will need to mount this data after downloading it. If the path this data is downloaded to is /PATH/TO/THE/DATA, you can run the Docker with a similar command to the one above:

docker run --interactive --tty --rm --gpus all --network host --env DISPLAY=unix$DISPLAY --env XAUTHORITY --env NVIDIA_DRIVER_CAPABILITIES=all --env "ACCEPT_EULA=Y" --volume /PATH/TO/THE/REPO:/root/mpinets --volume /PATH/TO/THE/DATA:/data mpinets /bin/bash -c 'export PYTHONPATH=/root/mpinets:$PYTHONPATH; /bin/bash'

Training the model

Once you have the data, either by generating the dataset or by downloading ours, you can train the model using run_training.py. This script expects a configuration file and we provide a sample at jobconfig.yaml. Look through this configuration file and modify the necessary paths before training.

Then, to run training, use:

python3 mpinets/mpinets/run_training.py mpinets/jobconfig.yaml

We use Weights and Biases for logging training jobs, but you can disable this logger using:

python3 mpinets/mpinets/run_training.py mpinets/jobconfig.yaml --no-logging

There are a few other run-time flags, namely a test mode and a way to disable model checkpoints. Run python3 mpinets/mpinents/run_training.py --help to learn more.

Data Generation

If you would like to generate the data yourself, we provide scripts we used to generate our dataset. These scripts are designed to work in a cloud-based system, but generating a large dataset will require some data management. To generate the data in Motion Policy Networks, we used a cluster of 80 server nodes running these scripts in parallel.

The scripts we provide are designed to preserve an even distribution of data in the dataset. Some environments are easier than others and therefore we can produce data more quickly. To ensure evenness, you will first generate data for each individual problem type, downsize the individual sets to match, and then merge them together.

Generating Training Data for Individual Scene Types

You can use gen_data.py to generate data for a single type of environment and either between two task-oriented poses or between a neutral pose and a task-oriented one. This file should have a self-explanatory help string, which you can access with mpinets/mpinets/data_pipeline/gen_data.py --help.

Some examples of data generation:

To visualize some examples generated from a set of dressers where the robot reaches between drawers, you can do

cd /root/mpinets/mpinets/data_pipeline
python3 gen_data.py dresser test-environment

To run the a similar visualization test where the trajectories all start or end with a collision-free neutral pose, you can use

cd /root/mpinets/mpinets/data_pipeline
python3 gen_data.py dresser --neutral test-environment

To test the data pipeline (i.e. including how the data is saved on disk) and generate a small dataset within randomized tabletop environments using only task-oriented poses, you can run

mkdir -p /root/data/tabletop/task_oriented/1
python3 /root/mpinets/data_pipeline/gen_data.py tabletop test-pipeline /root/data/tabletop/task_oriented/1/

To generate a large dataset of trajectories in 2x2 cubby environments where each trajectory begins or ends with a neutral configuration.

mkdir -p /root/data/cubby/neutral/1
python3 /root/mpinets/data_pipeline/gen_data.py cubby --neutral full-pipeline /root/data/cubby/neutral/1/

Likewise, to generate merged cubby scenes, where the necessary cubbies in each scene are merged to allow for unobstructed paths between the start configurations and goal poses, replace cubby with merged_cubby in the command above.

mkdir -p /root/data/merged-cubby/neutral/1
python3 /root/mpinets/data_pipeline/gen_data.py merged-cubby --neutral full-pipeline /root/data/merged-cubby/neutral/1/

Data Cleaning

After generating the data, you will need to clean it and merge it into a single dataset. In the Motion Policy Network dataset, we use the following proportions:

  • Cubby Neutral: 1 / 12
  • Cubby Task Oriented: 1 / 12
  • Merged Cubby Neutral: 1 / 12
  • Merged Cubby Task Oriented: 1 / 12
  • Dresser Neutral: 1 / 6
  • Dresser Task Oriented: 1 / 6
  • Tabletop Neutral: 1 / 6
  • Tabletop Task Oriented: 1 / 6

We provide a script process_data.py that can take the output of gen_data.py and clean it up. After running the full-pipeline mode ofgen_data.py, it will produce a file in the specified directory called all_data.py. This is all the data of that multi-process run, which has data in a single environmental class where either every trajectory starts or ends with a neutral pose or does not. The next step is to clean this data (e.g. keep only the problems with a hybrid solution), downsize the dataset to have matching sizes across problem types, split the dataset into train, validation, test groups, and merge all these individual datasets across various scene types.

These methods are all documented in process_data.py --help.

After running the various stages of this script, you will have a dataset that looks like

FINAL_DATA_PATH/
  train/
    train.hdf5
  val/
    val.hdf5
  test/
    test.hdf5

Generating Inference Data for Individual Scene Types

Inference data differs from training data because it only ever has a single problem per environment. These problems are always solvable by one or both expert pipelines. To generate this data for a single scene type, use the same gen_data.py script with a different argument set. For example, to generate 300 task-oriented problems in dresser scenes solvable by the global pipeline, use the following commands. Note that the output file path is a .pkl file.

mkdir -p /root/data/inference/global
python3 /root/mpinets/data_pipeline/gen_data.py dresser for-inference global 300 /root/data/inference/global/dresser_task.pkl

When generating "neutral" poses, the number of requested must be even because the inference command will generate equal number of problems to and from neutral poses. For example, if you request 300 problems, it will generate 300 of each. For example, to generate 300 problems solvable by the hybrid expert in tabletop settings going to or from neutral poses, use these command. Note that the final problems stored will already have had hindsight goal revision applied.

mkdir -p /root/data/inference/hybrid/
python3 /root/mpinets/data_pipeline/gen_data.py tabletop --neutral for-inference hybrid 300 /root/data/inference/hybrid/tabletop_neutral.pkl

When generating inference problems solvable by both, the script will first solve the Hybrid problem, then use hindsight goal revision, and then solve the revised problem with the global planner. For example, to generate 150 task-oriented problems in cubby scenes with merged cubbies, use the following commands:

mkdir -p /root/data/inference/both
python3 /root/mpinets/data_pipeline/gen_data.py merged-cubby for-inference both 150 /root/data/inference/both/merged_cubby_task.pkl

Finally, you can merge these into a single test set (such as the one we provide above) by simply merging the appropriate pickle files. For example, to merge all global-pipeline solvable problems, use the following commands in the python3 terminal.

from pathlib import Path
import pickle
paths = list(Path(/root/data/inference/global/).glob('*.pkl'))
global_data = {}
for p in paths:
    with open(p, 'rb') as f:
        global_data = {**global_data, **pickle.load(f)}
with open('/root/data/inference/global/all_global.pkl', 'wb') as g:
    pickle.dump(global_data, g)

License

This work is released with the MIT License.

Citation

If you find this work useful in your research, please cite:

@inproceedings{fishman2022mpinets,
  title={Motion Policy Networks},
  author={Fishman, Adam and Murali, Adithyavairavan and Eppner, Clemens and Peele, Bryan and Boots, Byron and Fox, Dieter},
booktitle = {Proceedings of the 6th Conference on Robot Learning (CoRL)},
  year={2022}
}

motion-policy-networks's People

Contributors

adithyamurali avatar alaflaquiere-sony avatar clemense avatar fishbotics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motion-policy-networks's Issues

LULA Collision Checking Library - False Negatives on Collision Detection during MPINets Evaluation

During our recent evaluation of the LULA collision checking library, specifically lines 271-292 of the metrics.py file, we identified a critical problem. The library is providing false negatives, incorrectly indicating no collision when a custom trajectory (trajectory from our own framework) actually collides with obstacles in one of the MPINets environment (rack). We have conducted a thorough review of the implementation and verified that the trajectories we pass to the library do indeed collide with objects present in the environment.

To better illustrate this issue, we have attached a video showcasing a trajectory that collides with the MPINet environment (Timestamp 0:08 to 0:11 in the video), while the LULA library reports no collision. This discrepancy is causing difficulties in benchmarking our framework against MPINets. How to resolve this issue with LULA?

Please find the video here: https://drive.google.com/file/d/1P9_XTqY-EQYflpyEGDE2cTPP2wZ8n5EG/view?usp=sharing

provide a docker image link

Hi, thanks your work on the motion planning policy . and can you provide a docker image download link.? very appreciate it !

Problems when training the Network.

Hi,
Thanks for open sourcing such wonderful work. I am trying to reimplement the code these days.However, I found that the training time is too long. I also use 8 Tesla V100 to train the network with only 300 000 (not 3 million) trajectories, but it takes me 50 hours to train only one epoch! I did not modify the code except some special parts(e.g. data_dir). It seems that most of the time is spent on processing the raw data into point clouds(which runs on CPU).So I can not improve the speed by using more GPU. Could you please tell me how many epochs you have spent when training the network and how long every epoch is? Is there anything others that we need to change when we train our own network?

Problem when running run_inference

When running run_inference ,Where is corl_2022_hybrid_expert_checkpoint.ckpt.If using a weight file, it will be displayed at runtime
line 469, in
problems = pickle.load(f)
ModuleNotFoundError: No module named 'mpinets'

Request for Training Setup Details

Hello, I'm interested in replicating your neural network training for whole model in the paper. Could you please provide me with some details on your training setup? Specifically, I'm curious to know:

What GPU did you use for training?
How many GPUs did you use?
How long did the training process take?
Do you have any suggestions for accelerating training?

I'm asking because I'm currently unable to train the model due to a lack of available GPU resources in my lab. I'll need to rent GPU resources online, so it would be helpful to have an estimate of the resources required and the potential cost. Thank you in advance for your help!

Render franka with colors

Hi!

Why not choose to render Franka arm with black and white colors with original files in bullet, like Figure 1 in your paper?

This is just a minor suggestion to enhance the visualization.

Would it be possible to consider changes to the relevant files in robofin?

Thanks!

A problem when reading

Sorry to bother you again, I have a question. In the paper, I saw that you did experiments on MPNet, but I found that their data set was not available. How did you implement it at that time? Do you have relevant data? Woolen cloth?

Exception has occurred: RuntimeError: Points must be a float tensor

Hi,

While running the training script with the provided checkpoint and hybrid dataset, I am encountering the following error:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
points.scalar_type() == at::ScalarType::Float INTERNAL ASSERT FAILED at "PATH/TO/Pointnet2_PyTorch/pointnet2_ops_lib/pointnet2_ops/_ext-src/src/group_points.cpp":15, please report a bug to PyTorch. Points must be a float tensor

The fishbotics pointnet_ops package is not installing(getting stuck during installation), hence, tried eric wijman's Pointnet2_PyTorch/pointnet2_ops_lib, which installed fine. There was no issue running/debugging run_inference.py but run_training.py is giving the above error. Problem seems to be with this cpp function call (pointnet2_ops/_ext-src/src/group_points.cpp):

        return _ext.group_points(features, idx) # line 214, pointnet2_utils.py

features tensor is of type float16 when this error is thrown, when it is float32 there is no error. Why is this happening? Any idea?

NOTE: with features.float() conversion, not getting this error. But not sure why this is happening in the first place.

TypeError: __init__() missing 1 required positional argument: 'target_volume'

When I try to run "python3 /root/mpinets/data_pipeline/gen_data.py dresser for-inference global 300 /root/data/inference/global/dresser_task.pkl" for data generation,It gives the following errors:
0%| | 0/300 [11:37<?, ?it/s]
Traceback (most recent call last):
File "/root/mpinets/data_pipeline/gen_data.py", line 1076, in
generate_inference_data(args.expert, args.how_many, args.save_path)
File "/root/mpinets/data_pipeline/gen_data.py", line 948, in generate_inference_data
generate_task_oriented_inference_data(expert_pipeline, how_many, save_path)
File "/root/mpinets/data_pipeline/gen_data.py", line 848, in generate_task_oriented_inference_data
obstacles=result.cuboids + result.cylinders,
TypeError: init() missing 1 required positional argument: 'target_volume'

I just follow the instructions in Readme.So I can not figure out why it gives such an error.

ERROR: failed to solve: nvcr.io/nvidia/isaac-sim:2022.1.0

Hi!

According to the description and recommendation in Readme, I decided to finish the installation with docker, and it is a good chance for me to learn something about docker.

I have finished sudo docker login nvcr.io and logged in successfully, but encountered this error when executing docker build.

Here is the error report:

[+] Building 4.7s (3/3) FINISHED                                                                                                          docker:default
 => [internal] load build definition from Dockerfile                                                                                                0.0s
 => => transferring dockerfile: 7.94kB                                                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                   0.0s
 => => transferring context: 2B                                                                                                                     0.0s
 => ERROR [internal] load metadata for nvcr.io/nvidia/isaac-sim:2022.1.0                                                                            4.5s
------
 > [internal] load metadata for nvcr.io/nvidia/isaac-sim:2022.1.0:
------
Dockerfile:24
--------------------
  22 |     
  23 |     
  24 | >>> FROM nvcr.io/nvidia/isaac-sim:2022.1.0 
  25 |     
  26 |     WORKDIR /root
--------------------
ERROR: failed to solve: nvcr.io/nvidia/isaac-sim:2022.1.0: pulling from host nvcr.io failed with status code [manifests 2022.1.0]: 401 Unauthorized

I have little experience with docker and must miss something.

Any help would be appreciated! Thanks!

A problem about ros

Sorry to bother you, I currently want to generate trajectories in ros. For example, you have generated position information through mpinets in the article, but in rviz, it seems that information such as speed and acceleration is required to execute. If I simply publish position information , can it be displayed in moveit? Secondly, I saw the message method you posted in the code, but when I use it, the traj of the location information I posted seems to be missing in rviz, can you help me answer it? Thank you so much!

what do hdf5 file numbers represent

Appreciate your work .What are all those number .For a specific shape in that hdf5 file .And what do they represent?

Is there any pre-trained model for this network?

ERROR:import lula

I manage the environment in conda. When I used gen_data, I found that I could not import lula, so I went to download isaac sim. I planned to use the part of downloading isaac sim in your docker, but found that you did not download it in docker ( Maybe it's my lack of knowledge). I use the official download method, it seems very difficult, I would like to ask, how to download isaac sim so that the problem of import lula can be solved.

git clone URL does not work

The URL provided for cloning the repo in the Installation section seems to be wrong.
git clone https://gitlab-master.nvidia.com/srl/motion-policy-networks

Please provide the updated URL.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.