Giter Club home page Giter Club logo

tuplan_garage's Introduction

tuPlangarage

A Framework for Vehicle Motion Planning Research


Parting with Misconceptions about Learning-based Vehicle Motion Planning
Daniel Dauner1,2, Marcel Hallgarten1,3, Andreas Geiger1,2, and Kashyap Chitta1,2
1 University of TΓΌbingen, 2 TΓΌbingen AI Center, 3 Robert Bosch GmbH

Conference on Robot Learning (CoRL), 2023
Winner, 2023 nuPlan Challenge

This repo is intended to serve as a starting point for vehicle motion planning research on nuPlan. We provide a publicly accessible configuration for validation, comprehensive set of baselines, and pre-trained planning models.


teaser.mp4

News

  • 30 Aug, 2023: Our paper was accepted at CoRL 2023!
  • 20 Aug, 2023: We renamed our repository to "tuPlan Garage" due to trademark conflicts.
  • 01 Aug, 2023: We released the code for GC-PGP!
  • 26 Jun, 2023: We released our supplementary material and the code for PDM-Closed.
  • 14 Jun, 2023: We released our paper on arXiv.
  • 2 Jun, 2023: Our approach won the 2023 nuPlan Challenge!

Overview

  • The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements.

  • Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently.

  • We further assess the current state of closed-loop planning in the field, revealing the limitations of learning-based methods in complex real-world scenarios and the value of simple rule-based priors such as centerline selection through lane graph search algorithms.

  • More surprisingly, for the open-loop sub-task, we observe that the best results are achieved when using only this centerline as scene context (i.e., ignoring all information regarding the map and other agents).

  • Combining these insights, we propose an extremely simple and efficient planner which outperforms an extensive set of competitors, winning the nuPlan planning challenge 2023.


Videos

Here are four videos for talks and visualizations of our method:


Contributing

If you consider contributing to tuPlan Garage, make sure to check out our Contribution Guidelines

Method

We decompose the process of determining a safe and comfortable trajectory into two sub-tasks: (1) planning the short-term motion, and (2) accurately forecasting the long-term ego trajectory. While the former primarily impacts closed-loop performance, the latter is essential for the open-loop task. Our method employs a rule-based predictive planner to generate a trajectory proposal, and a learned ego-forecasting module that refines the trajectory with a particular emphasis on long-term forecasting.

Results

Planning results on the proposed Val14 benchmark. Please refer to the paper for more details.

Method Representation CLS-R ↑ CLS-NR ↑ OLS ↑ Time (ms) ↓
Urban Driver* Polygon 50 53 82 64
GC-PGP Graph 55 59 83 100
PlanCNN Raster 72 73 64 43
IDM Centerline 77 76 38 27
PDM-Open Centerline 54 50 86 7
PDM-Closed Centerline 92 93 42 91
PDM-Hybrid Centerline 92 93 84 96
Log Replay GT 80 94 100 -

*Open-loop reimplementation of Urban Driver

To Do

  • Additional baselines
  • Visualization scripts
  • Contribution guide
  • ML planners code & checkpoints
  • Supplementary material, video, slides
  • Val14 benchmark
  • Installation tutorial
  • PDM-Closed release
  • Initial repo & main paper

Getting started

1. Installation

To install tuPlan Garage, please follow these steps:

  • setup the nuPlan dataset (described here) and install the nuPlan devkit (see here)
  • download tuPlan Garage and move inside the folder
git clone https://github.com/autonomousvision/tuplan_garage.git && cd tuplan_garage
  • make sure the environment you created when installing the nuplan-devkit is activated
conda activate nuplan
  • install the local tuplan_garage as a pip package
pip install -e .
  • add the following environment variable to your ~/.bashrc
NUPLAN_DEVKIT_ROOT="$HOME/nuplan-devkit/"

2. Training

When running a training, you have to add the hydra.searchpath for the tuplan_garage correctly. Note: since hydra does not yet support appending to lists (see here), you have to add the original searchpaths in the override. Training scripts can be run with the scripts found in /scripts/training/. Before training from an already existing cache, please check this issue. You can find our trained models here.

3. Evaluation

Same as for the training, when running an evaluation, you have to add the hydra.searchpath for the tuplan_garage correctly. The example below runs an evaluation of the pdm_closed_planner on the val14_split, both of which are part of the tuplan_garage

python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_simulation.py \
+simulation=closed_loop_nonreactive_agents \
planner=pdm_closed_planner \
scenario_filter=val14_split \
scenario_builder=nuplan \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

You can find exemplary shells scripts in /scripts/simulation/

Contact

If you have any questions or suggestions, please feel free to open an issue or contact us ([email protected]).

Citation

If you find tuPlan Garage useful, please consider giving us a star 🌟 and citing our paper with the following BibTeX entry.

@InProceedings{Dauner2023CORL,
  title={Parting with Misconceptions about Learning-based Vehicle Motion Planning},
  author={Dauner, Daniel and Hallgarten, Marcel and Geiger, Andreas and Chitta, Kashyap},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2023}
}

Disclaimer

tuPlan Garage includes code from Motional's nuplan-devkit. We are not affiliated with Motional, and the repository is not published, maintained or otherwise related to Motional.

Other resources

Twitter Follow Twitter Follow Twitter Follow Twitter Follow

(back to top)

tuplan_garage's People

Contributors

danieldauner avatar kashyap7x avatar mh0797 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tuplan_garage's Issues

How to accelerate simulation process

Dear Author:
I tried to run the simulation process and found it to be very slow and low CPU usage.I estimate it will take 30 days to run val14,so I'm asking if there's a way to speed up.

Device Hardware

  • i7-12700K
  • RTX3090 24G
  • 64G RAM

Ray error before the start of training

Problem

Hello. I have set up the nuplan environment and installed tuplan_garage as a package, followed every step for the preparation in the readme.md. However, when I tried to train the model, I have encountered a fatal Ray error. Every time after 'ray objects' is finished, it soon failed to start the dashboard, causing the program to 'ray objects' again. Because of the failure to initialize the ray instance, there is no log recording the error. I have searched a similar issue here but of little help. Thank you for the assistance.

Reproduce

bash the code file below:

TRAIN_EPOCHS=100
TRAIN_LR=1e-4
TRAIN_LR_MILESTONES=[50,75]
TRAIN_LR_DECAY=0.1
BATCH_SIZE=64
SEED=0

JOB_NAME=training_pdm_open_model
CACHE_PATH=/mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/cache
USE_CACHE_WITHOUT_DATASET=False

source ~/.bashrc
conda activate nuplan
python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_training.py \
seed=$SEED \
py_func=train \
+training=training_pdm_open_model \
job_name=$JOB_NAME \
scenario_builder=nuplan \
cache.cache_path=$CACHE_PATH \
cache.use_cache_without_dataset=$USE_CACHE_WITHOUT_DATASET \
lightning.trainer.params.max_epochs=$TRAIN_EPOCHS \
data_loader.params.batch_size=$BATCH_SIZE \
optimizer.lr=$TRAIN_LR \
lr_scheduler=multistep_lr \
lr_scheduler.milestones=$TRAIN_LR_MILESTONES \
lr_scheduler.gamma=$TRAIN_LR_DECAY \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.training, pkg://tuplan_garage.planning.script.experiments, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

Output

Global seed set to 0
2023-09-16 11:08:48,865 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:08:48,868 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.08.46
2023-09-16 11:08:48,868 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:08:48,870 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:08:52,865 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:08:58,481 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:08:58,483 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
2023-09-16 11:08:59,487 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:08:59,488 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
2023-09-16 11:09:00,464 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:09:00,465 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:09:00,988 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:09:00,988 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:09:00,989 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 96/96 [13:16<00:00,  8.29s/it]
2023-09-16 11:22:25,347 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_builder.py:171}  Extracted 177435 scenarios for training
2023-09-16 11:22:25,347 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258}  WORLD_SIZE was not set.
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266}  PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277}  Number of gpus found to be in use: 8
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:114}  World size: 8
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:115}  Learning rate before: 0.0001
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:119}  Scaling method: Equal Variance
2023-09-16 11:22:25,349 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:141}  Betas after scaling: [0.7422979694372631, 0.9971741579476155]
2023-09-16 11:22:25,349 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:143}  Learning rate after scaling: 0.000282842712474619
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:172}  Updating Learning Rate Scheduler Config...
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258}  WORLD_SIZE was not set.
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266}  PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277}  Number of gpus found to be in use: 8
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:199}  Updating torch.optim.lr_scheduler.MultiStepLR in ddp setting is not yet supported. Learning rate scheduler config will not be updated.
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:245}  Optimizer and LR Scheduler configs updated according to ddp strategy.
2023-09-16 11:22:25,503 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:19}  Building callbacks...
2023-09-16 11:22:25,538 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:37}  Building callbacks...DONE!
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
2023-09-16 11:22:25,539 INFO {/home/linqing.zhao/nuplan-devkit//nuplan/planning/script/run_training.py:62}  Starting training...
Global seed set to 0
2023-09-16 11:22:39,118 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:39,121 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.38
2023-09-16 11:22:39,121 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:39,123 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
Global seed set to 0
2023-09-16 11:22:41,279 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:41,281 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.40
2023-09-16 11:22:41,281 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:41,283 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:42,819 INFO worker.py:1621 -- Started a local Ray instance.
Global seed set to 0
2023-09-16 11:22:45,132 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:45,138 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.44
2023-09-16 11:22:45,138 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:45,140 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:45,659 INFO worker.py:1621 -- Started a local Ray instance.
Global seed set to 0
2023-09-16 11:22:49,560 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:22:49,560 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
2023-09-16 11:22:49,782 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:49,784 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.49
2023-09-16 11:22:49,785 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:49,787 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:50,106 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:22:50,106 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
Global seed set to 0
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/8
2023-09-16 11:22:51,378 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:22:51,379 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:22:51,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:22:51,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:22:51,573 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects:   0%|                                                                      | 0/96 [00:00<?, ?it/s]2023-09-16 11:22:54,601 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:22:54,601 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
Global seed set to 0
2023-09-16 11:22:55,184 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:22:55,184 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
2023-09-16 11:22:55,567 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:22:55,599 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:55,607 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.55
2023-09-16 11:22:55,608 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:55,610 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:55,752 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:22:56,570 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:22:56,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:22:56,780 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:22:56,780 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:22:56,782 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects:   0%|                                                                      | 0/96 [00:00<?, ?it/s]Global seed set to 0
2023-09-16 11:23:03,159 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:23:03,167 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.23.02
2023-09-16 11:23:03,168 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:23:03,170 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
Global seed set to 0
2023-09-16 11:23:12,023 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:23:12,030 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.23.11
2023-09-16 11:23:12,031 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:23:12,034 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:23:17,381 ERROR services.py:1207 -- Failed to start the dashboard 
2023-09-16 11:23:17,382 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-09-16 11:23:17,382 ERROR services.py:1242 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2023-09-16_11-22-55_720350_89715/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr.
2023-09-16 11:23:17,382 ERROR services.py:1276 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues
2023-09-16 11:23:17,582 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:23:25,116 ERROR services.py:1207 -- Failed to start the dashboard 
2023-09-16 11:23:25,116 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-09-16 11:23:25,116 ERROR services.py:1242 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2023-09-16_11-23-03_304985_90490/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr.
2023-09-16 11:23:25,116 ERROR services.py:1276 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues
2023-09-16 11:23:25,233 INFO worker.py:1621 -- Started a local Ray instance.
[2023-09-16 11:23:26,416 E 89301 89301] core_worker.cc:201: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Why PDM planner is providing trajectories which the first state doesn't match with the ego current state?

Hi,

I noticed that inside the planner (get_closed_loop_trajectory) we have two types one trajectory:

  1. proposals_array which don't start at ego current state and are later extended from 4s to 8s (41 poses -> 81 poses as 10Hz is used as frequency)


    trajectory = self._generator.generate_trajectory(np.argmax(proposal_scores))

  2. simulated_proposals_array which are the same trajectories as proposals_array but starting at current ego pose and simulated using Bicycle model and LQRtracker, which are used for scoring. (and also have 81 poses)

    simulated_proposals_array = self._simulator.simulate_proposals(

What I am wondering is why return a trajectory that is not starting at ego current pose as starting point (extension of the best trajectory inside proposals_array) instead of using one trajectory that starts at ego position as in simulated_proposals? Is that done on purpose as part of a trick for the simulation?

Thanks in advance :)

Error when trying to run pgp model with nuplan config: Metric target: "multimodal_trajectories" is not in model computed targets!

Hello, im trying to get your urban_driver and GC-PGP pretrained model working and am experiencing an error. There are two separate problems. First, it seems that that the urban_driver model was not updated with the rest when you retrained GC-PGP, so it still has the 'cannot find module nuplan_garage' error. But further, when running the GC-PGP model with the type of config used in the nuplan tutorials, I get a different error regarding multimodal_trajectories.

Config I am using and the resulting error is below. This was run in a jupyter notebook sitting in the nuplan-devkit repo.

# Location of path with all simulation configs
CONFIG_PATH = '../nuplan/planning/script/config/simulation'
CONFIG_NAME = 'default_simulation'

CHECKPOINT_PATH='run_sim_closed_loop/pretrained_checkpoints/gc_pgp_checkpoint.ckpt'

# Select the planner and simulation challenge
PLANNER = 'ml_planner'  # [simple_planner, ml_planner]
CHALLENGE = 'closed_loop_reactive_agents'  # [open_loop_boxes, closed_loop_nonreactive_agents, closed_loop_reactive_agents]
DATASET_PARAMS = [
    'scenario_builder=nuplan_mini',  # use nuplan mini database
    'scenario_filter=all_scenarios',  # initially select all scenarios in the database
    'scenario_filter.scenario_types=[near_multiple_vehicles, on_pickup_dropoff, starting_unprotected_cross_turn, high_magnitude_jerk]',  # select scenario types
    'scenario_filter.num_scenarios_per_type=5',  # use 5 scenarios per scenario type
]

# Name of the experiment
EXPERIMENT = 'simulation_simple_experiment'

# Initialize configuration management system
hydra.core.global_hydra.GlobalHydra.instance().clear()  # reinitialize hydra if already initialized
hydra.initialize(config_path=CONFIG_PATH)

# Compose the configuration
cfg = hydra.compose(config_name=CONFIG_NAME, overrides=[
    f'experiment_name={EXPERIMENT}',
    f'planner={PLANNER}',
    f'model=raster_model',
    'planner.ml_planner.model_config=${model}',  # hydra notation to select model config
    f'planner.ml_planner.checkpoint_path={CHECKPOINT_PATH}',  # this path can be replaced by the checkpoint of the model trained in the previous section
    f'group={SAVE_DIR}',
    f'+simulation={CHALLENGE}',
    *DATASET_PARAMS,
    'hydra.searchpath=[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]'
])
from nuplan.planning.script.run_simulation import main as main_simulation

# Run the simulation loop (real-time visualization not yet supported, see next section for visualization)
main_simulation(cfg)

# Simple simulation folder for visualization in nuBoard
simple_simulation_folder = cfg.output_dir
AssertionError                            Traceback (most recent call last)
Cell In[10], line 4
      1 from nuplan.planning.script.run_simulation import main as main_simulation
      3 # Run the simulation loop (real-time visualization not yet supported, see next section for visualization)
----> 4 main_simulation(cfg)
      6 # Simple simulation folder for visualization in nuBoard
      7 simple_simulation_folder = cfg.output_dir

File /opt/conda/lib/python3.9/site-packages/hydra/main.py:44, in main.<locals>.main_decorator.<locals>.decorated_main(cfg_passthrough)
     41 @functools.wraps(task_function)
     42 def decorated_main(cfg_passthrough: Optional[DictConfig] = None) -> Any:
     43     if cfg_passthrough is not None:
---> 44         return task_function(cfg_passthrough)
     45     else:
     46         args = get_args_parser()

File ~/nuplan-devkit/nuplan/planning/script/run_simulation.py:110, in main(cfg)
    107 assert cfg.simulation_log_main_path is None, 'Simulation_log_main_path must not be set when running simulation.'
    109 # Execute simulation with preconfigured planner(s).
--> 110 run_simulation(cfg=cfg)
    112 if is_s3_path(Path(cfg.output_dir)):
    113     clean_up_s3_artifacts()

File ~/nuplan-devkit/nuplan/planning/script/run_simulation.py:66, in run_simulation(cfg, planners)
     63 if isinstance(planners, AbstractPlanner):
     64     planners = [planners]
---> 66 runners = build_simulations(
     67     cfg=cfg,
     68     callbacks=callbacks,
     69     worker=common_builder.worker,
     70     pre_built_planners=planners,
     71     callbacks_worker=callbacks_worker_pool,
     72 )
     74 if common_builder.profiler:
     75     # Stop simulation construction profiling
     76     common_builder.profiler.save_profiler(profiler_name)

File ~/nuplan-devkit/nuplan/planning/script/builders/simulation_builder.py:90, in build_simulations(cfg, worker, callbacks, callbacks_worker, pre_built_planners)
     87     if 'planner' not in cfg.keys():
     88         raise KeyError('Planner not specified in config. Please specify a planner using "planner" field.')
---> 90     planners = build_planners(cfg.planner, scenario)
     91 else:
     92     planners = pre_built_planners

File ~/nuplan-devkit/nuplan/planning/script/builders/planner_builder.py:58, in build_planners(planner_cfg, scenario)
     51 def build_planners(planner_cfg: DictConfig, scenario: Optional[AbstractScenario]) -> List[AbstractPlanner]:
     52     """
     53     Instantiate multiple planners by calling build_planner
     54     :param planners_cfg: planners config
     55     :param scenario: scenario
     56     :return planners: List of AbstractPlanners
     57     """
---> 58     return [_build_planner(planner, scenario) for planner in planner_cfg.values()]

File ~/nuplan-devkit/nuplan/planning/script/builders/planner_builder.py:58, in <listcomp>(.0)
     51 def build_planners(planner_cfg: DictConfig, scenario: Optional[AbstractScenario]) -> List[AbstractPlanner]:
     52     """
     53     Instantiate multiple planners by calling build_planner
     54     :param planners_cfg: planners config
     55     :param scenario: scenario
     56     :return planners: List of AbstractPlanners
     57     """
---> 58     return [_build_planner(planner, scenario) for planner in planner_cfg.values()]

File ~/nuplan-devkit/nuplan/planning/script/builders/planner_builder.py:26, in _build_planner(planner_cfg, scenario)
     23 if is_target_type(planner_cfg, MLPlanner):
     24     # Build model and feature builders needed to run an ML model in simulation
     25     torch_module_wrapper = build_torch_module_wrapper(planner_cfg.model_config)
---> 26     model = LightningModuleWrapper.load_from_checkpoint(
     27         planner_cfg.checkpoint_path, model=torch_module_wrapper
     28     ).model
     30     # Remove config elements that are redundant to MLPlanner
     31     OmegaConf.set_struct(config, False)

File /opt/conda/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:157, in ModelIO.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
    154 # override the hparams with values that were passed in
    155 checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)
--> 157 model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
    158 return model

File /opt/conda/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:199, in ModelIO._load_model_state(cls, checkpoint, strict, **cls_kwargs_new)
    195 if not cls_spec.varkw:
    196     # filter kwargs according to class init unless it allows any argument via kwargs
    197     _cls_kwargs = {k: v for k, v in _cls_kwargs.items() if k in cls_init_args_name}
--> 199 model = cls(**_cls_kwargs)
    201 # give model a chance to load something
    202 model.on_load_checkpoint(checkpoint)

File ~/nuplan-devkit/nuplan/planning/training/modeling/lightning_module_wrapper.py:67, in LightningModuleWrapper.__init__(self, model, objectives, metrics, batch_size, optimizer, lr_scheduler, warm_up_lr_scheduler, objective_aggregate_mode)
     65 for metric in self.metrics:
     66     for feature in metric.get_list_of_required_target_types():
---> 67         assert feature in model_targets, f"Metric target: \"{feature}\" is not in model computed targets!"

AssertionError: Metric target: "multimodal_trajectories" is not in model computed targets!

AssertionError in evaluation

Hi,

When I evaluate pdm_closed_planner and pdm_open_planner in closed loop, non reactive agent setting, both planners runs fine without error if I use a reduced scenario filter (with 3 or 5 scenarios randomly picked from the validation dataset). However, in val14_split, when running

python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_simulation.py +simulation=closed_loop_nonreactive_agents planner=pdm_closed_planner scenario_filter=val14_split scenario_builder=nuplan worker=single_machine_thread_pool scenario_builder.data_root=/fs/scratch/projects/proj-ai-planning/archive/nuScenes/nuplan/dataset/nuplan-v1.1/splits/val hydra.searchpath="[pkg://nuplan_garage.planning.script.config.common, pkg://nuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

I get the following error when the simulations are being executed:

Traceback (most recent call last):
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 112, in compute_metric_results
    metric_results[metric.name] = metric.compute(history, scenario=scenario)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/evaluation_metrics/common/speed_limit_compliance.py", line 218, in compute
    time_series = TimeSeries(
  File "<string>", line 7, in __init__
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_result.py", line 127, in __post_init__
    assert len(self.time_stamps) == len(self.values)
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/simulation/runner/executor.py", line 27, in run_simulation
    return sim_runner.run()
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/simulation/runner/simulations_runner.py", line 128, in run
    self.simulation.callback.on_simulation_end(self.simulation.setup, self.planner, self.simulation.history)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/simulation/callback/multi_callback.py", line 68, in on_simulation_end
    callback.on_simulation_end(setup, planner, history)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/simulation/callback/metric_callback.py", line 102, in on_simulation_end
    run_metric_engine(
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/simulation/callback/metric_callback.py", line 24, in run_metric_engine
    metric_files = metric_engine.compute(history, scenario=scenario, planner_name=planner_name)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 133, in compute
    all_metrics_results = self.compute_metric_results(history=history, scenario=scenario)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 119, in compute_metric_results
    raise RuntimeError(f"Metric Engine failed with: {e}")
RuntimeError: Metric Engine failed with: 
Traceback (most recent call last):
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 112, in compute_metric_results
    metric_results[metric.name] = metric.compute(history, scenario=scenario)
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/evaluation_metrics/common/speed_limit_compliance.py", line 218, in compute
    time_series = TimeSeries(
  File "<string>", line 7, in __init__
  File "/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/metrics/metric_result.py", line 127, in __post_init__
    assert len(self.time_stamps) == len(self.values)
AssertionError

PDM-Hybrid that combined PDM-Closed with GC-PGP

Dear Author
Thank you for your work.
I have built my own prediction models based on PGP and try to combine it with PDM-Closed, but It didn't work so well.

  • CLS-R 61
  • CLS-NR 60
  • OLS 37

GC-PGP's feature is different from PDM
So I want to ask how to conbine GC-PGP with PDM-Closed.
Uploading the code would be great!
Thank you for all the help you've given me before!
wishes,
Yifan

Inconsistency between centerline and PDM-Closed trajectory

Hi, running the simulation I noticed that the centerline and the trajectory generated by PDM-Closed planner are not consistent in more than one scenario, as you can see in the pictures below (the red line is the centerline, the blue line is the PDM-Closed trajectory):
plot
plot1

My question is the following:

for the PDM-Open model you take as input just the centerline (together with ego history). Instead, for the hybrid model, you fuse together the PDM-Closed trajectory with the PDM-Open one (trained on the centerline).
Since, as shown in the plots above, PDM-Closed trajectory and centerline are not consistent, is it possible that this leads to a distorted hybrid trajectory?

Screenshot from 2024-01-17 15-47-16

Indeed, as shown above, the hybrid trajectory is distorted in the point where you fuse together PDM-Closed and PDM-Open.

Do you think this could be the cause of the distorted trajectory?
If that is the case, is there a reason why you trained the MLP-Open just taking the centerline as input?

Thanks a lot in advance!

ROS bridge for real vehicle implementation

Hi, I was wondering if you have tried to implement this learning-based planner on a real vehicle, or if you have tried to build a bridge to connect the simulator with a ROS environment.
If that's the case, could you please provide more information about it?
I think it would be very interesting to test this model on a real vehicle! Thanks

Visualization scripts release

Thanks for great coding and work!

I'm wondering if you still have plans to release visualization script to generate videos like the teaser.mp4 file in README. Is this file made from simulation (closed-loop)?

Thanks beforehand for taking the time to consider my question!

The plan for open source code?

Great work!It helped me a lot!
Could you tell me what is the plan for open source code?
Thank you very much!

some questions about nuplan dataset

Hello, I met some issues, these made me confusing, I would appreaite it if u could help me.

Firstly, what's the different between the open-loop and closed-loop(NR)? I have read the paper of nuplan, and i know these are both non-reactive patterns, so they simulate agents by directly replaying logs. I think the difference is that planner know the new state in closed-loop(NR) in each step, so planner can correct its trajectory in future, but open-loop dont considers that. Do I understand correctly?Can you tell me more about the differences between these two modes?

Secondly, i am going to try to use my own controller based RL to control the background agents, but actually i dont know how. Have you ever tried before? And could you tell me how they implement background IDM in the nuplan-devkit in the code?

Val14 Dataset

Hi,

Are there any plans to release the Val14 dataset to the public or the logic used to split the dataset? Thanks in advance.

pdm_closed_planner trajectory's states have no dynamic information

Hello,
I was inspecting the trajectory generated by the pdm_closed_planner and I realized that it contains just static information and not dynamic states (I think it would be the same for the other planners as well):

InterpolatedTrajectory with 81 states
(wrapped_fn pid=1814060) EgoState(time=1633419573.0001209), Position=(365882.03552731016, 143116.11716887038, -1.80036579096818), Velocity=(0.0, 0.0), Acceleration=(0.0, 0.0), Steering_Angle=0.0)

This is the output (I showed just one state) I get when I print the states of the trajectory with this code (in ego_state.py):

def __str__(self):
        return (
            f"EgoState(time={self.time_point.time_s}), "
            f"Position=({self.rear_axle.x}, {self.rear_axle.y}, {self.rear_axle.heading}), "
            f"Velocity=({self.dynamic_car_state.rear_axle_velocity_2d.x}, {self.dynamic_car_state.rear_axle_velocity_2d.y}), "
            f"Acceleration=({self.dynamic_car_state.rear_axle_acceleration_2d.x}, {self.dynamic_car_state.rear_axle_acceleration_2d.y}), "
            f"Steering_Angle={self.tire_steering_angle})"
        )

and this (in interpolated_trajectory.py):

def __str__(self):
        return f"InterpolatedTrajectory with {len(self._trajectory)} states"

    def print_states(self):
        for state in self._trajectory:
            print(state)

and finally this (in pdm_closed_planner.py):

def compute_planner_trajectory():
     ......................................
     ......................................
      print(trajectory)
      trajectory.print_states()         
        
      return trajectory 

So, my question is: do you also generate a velocity profile for every state of the generated trajectory? If yes, how do you do it and how can you have access to it?

Also, do you take into account the left and right bounds of the road when computing the trajectory? If yes, how do you extract/generate them and how can you have access to it?

Thank you very much in advance, very appreciated!

accelerate simulation by cuda

Hello,
I run simulation by using cpu, but the speed is slowly.
I see this code in pdm_open_planer.py

        self._device = "cpu"
        self._model = LightningModuleWrapper.load_from_checkpoint(
            checkpoint_path,
            model=model,
            map_location=self._device,
        ).model

I change device cpu to cuda:0
but have bug about tensors have two different device cpu and cuda:0
so I want to ask if you use cuda to accelerate simulation process or usecpu run it.
and how to change cofing to use cuda:0
Thanks for your help!

Questions about NuPlan to Use

Hi Dear Authors,

Thank you for the excellent work! I am new to the field of planning and would really like to follow the path of your paper. However, the resource in my lab is quite limited and the >1T data of nuPlan looks really so large.

Therefore, I am curious if such data only contains the trajectory data, or do you have other suggestions for running your algorithms with some condensed trajectory data?

image

Best,

Ziqi

Ray problem before the start of training

Problem

Hello. I have set up the nuplan environment and installed tuplan_garage as a package, followed every step for the preparation in the readme.md. However, when I tried to train the model, I have encountered a fatal Ray error. Every time after 'ray objects' is finished, it soon failed to start the dashboard, causing the program to 'ray objects' again. Because of the failure to initialize the ray instance, there is no log recording the error. I have searched a similar issue here but of little help. Thank you for the assistance.

Reproduce

bash the code file below:

TRAIN_EPOCHS=100
TRAIN_LR=1e-4
TRAIN_LR_MILESTONES=[50,75]
TRAIN_LR_DECAY=0.1
BATCH_SIZE=64
SEED=0

JOB_NAME=training_pdm_open_model
CACHE_PATH=/mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/cache
USE_CACHE_WITHOUT_DATASET=False

source ~/.bashrc
conda activate nuplan
python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_training.py \
seed=$SEED \
py_func=train \
+training=training_pdm_open_model \
job_name=$JOB_NAME \
scenario_builder=nuplan \
cache.cache_path=$CACHE_PATH \
cache.use_cache_without_dataset=$USE_CACHE_WITHOUT_DATASET \
lightning.trainer.params.max_epochs=$TRAIN_EPOCHS \
data_loader.params.batch_size=$BATCH_SIZE \
optimizer.lr=$TRAIN_LR \
lr_scheduler=multistep_lr \
lr_scheduler.milestones=$TRAIN_LR_MILESTONES \
lr_scheduler.gamma=$TRAIN_LR_DECAY \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.training, pkg://tuplan_garage.planning.script.experiments, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

Output

Global seed set to 0
2023-09-16 11:08:48,865 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:08:48,868 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.08.46
2023-09-16 11:08:48,868 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:08:48,870 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:08:52,865 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:08:58,481 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:08:58,482 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:08:58,483 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
2023-09-16 11:08:59,487 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:08:59,488 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
2023-09-16 11:09:00,464 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:09:00,465 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:09:00,988 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:09:00,988 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:09:00,989 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 96/96 [13:16<00:00,  8.29s/it]
2023-09-16 11:22:25,347 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_builder.py:171}  Extracted 177435 scenarios for training
2023-09-16 11:22:25,347 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258}  WORLD_SIZE was not set.
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266}  PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277}  Number of gpus found to be in use: 8
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:114}  World size: 8
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:115}  Learning rate before: 0.0001
2023-09-16 11:22:25,348 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:119}  Scaling method: Equal Variance
2023-09-16 11:22:25,349 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:141}  Betas after scaling: [0.7422979694372631, 0.9971741579476155]
2023-09-16 11:22:25,349 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:143}  Learning rate after scaling: 0.000282842712474619
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:172}  Updating Learning Rate Scheduler Config...
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258}  WORLD_SIZE was not set.
2023-09-16 11:22:25,478 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266}  PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277}  Number of gpus found to be in use: 8
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:199}  Updating torch.optim.lr_scheduler.MultiStepLR in ddp setting is not yet supported. Learning rate scheduler config will not be updated.
2023-09-16 11:22:25,479 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:245}  Optimizer and LR Scheduler configs updated according to ddp strategy.
2023-09-16 11:22:25,503 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:19}  Building callbacks...
2023-09-16 11:22:25,538 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:37}  Building callbacks...DONE!
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
2023-09-16 11:22:25,539 INFO {/home/linqing.zhao/nuplan-devkit//nuplan/planning/script/run_training.py:62}  Starting training...
Global seed set to 0
2023-09-16 11:22:39,118 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:39,121 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.38
2023-09-16 11:22:39,121 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:39,123 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
Global seed set to 0
2023-09-16 11:22:41,279 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:41,281 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.40
2023-09-16 11:22:41,281 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:41,283 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:42,819 INFO worker.py:1621 -- Started a local Ray instance.
Global seed set to 0
2023-09-16 11:22:45,132 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:45,138 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.44
2023-09-16 11:22:45,138 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:45,140 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:45,659 INFO worker.py:1621 -- Started a local Ray instance.
Global seed set to 0
2023-09-16 11:22:49,560 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:22:49,560 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:22:49,561 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
2023-09-16 11:22:49,782 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:49,784 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.49
2023-09-16 11:22:49,785 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:49,787 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:50,106 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:22:50,106 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
Global seed set to 0
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/8
2023-09-16 11:22:51,378 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:22:51,379 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:22:51,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:22:51,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:22:51,573 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects:   0%|                                                                      | 0/96 [00:00<?, ?it/s]2023-09-16 11:22:54,601 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101}  Worker: RayDistributed
2023-09-16 11:22:54,601 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102}  Number of nodes: 1
Number of CPUs per node: 96
Number of GPUs per node: 8
Number of threads across all nodes: 96
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27}  Building WorkerPool...DONE!
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/training/experiments/training.py:41}  Building training engine...
2023-09-16 11:22:54,602 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18}  Building TorchModuleWrapper...
Global seed set to 0
2023-09-16 11:22:55,184 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21}  Building TorchModuleWrapper...DONE!
2023-09-16 11:22:55,184 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18}  Building Splitter...
2023-09-16 11:22:55,567 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:22:55,599 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:22:55,607 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.22.55
2023-09-16 11:22:55,608 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:22:55,610 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:22:55,752 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:22:56,570 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21}  Building Splitter...DONE!
2023-09-16 11:22:56,571 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18}  Building AbstractScenarioBuilder...
2023-09-16 11:22:56,780 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21}  Building AbstractScenarioBuilder...DONE!
2023-09-16 11:22:56,780 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35}  Building ScenarioFilter...
2023-09-16 11:22:56,782 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44}  Building ScenarioFilter...DONE!
Ray objects:   0%|                                                                      | 0/96 [00:00<?, ?it/s]Global seed set to 0
2023-09-16 11:23:03,159 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:23:03,167 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.23.02
2023-09-16 11:23:03,168 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:23:03,170 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
Global seed set to 0
2023-09-16 11:23:12,023 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20}  Building experiment folders...
2023-09-16 11:23:12,030 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22}  Experimental folder: /mnt/cfs/algorithm/linqing.zhao/haozhe/Tuplan_garage/exp/exp/training_pdm_open_model/training_pdm_open_model/2023.09.16.11.23.11
2023-09-16 11:23:12,031 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-09-16 11:23:12,034 INFO {/home/linqing.zhao/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-09-16 11:23:17,381 ERROR services.py:1207 -- Failed to start the dashboard 
2023-09-16 11:23:17,382 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-09-16 11:23:17,382 ERROR services.py:1242 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2023-09-16_11-22-55_720350_89715/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr.
2023-09-16 11:23:17,382 ERROR services.py:1276 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues
2023-09-16 11:23:17,582 INFO worker.py:1621 -- Started a local Ray instance.
2023-09-16 11:23:25,116 ERROR services.py:1207 -- Failed to start the dashboard 
2023-09-16 11:23:25,116 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-09-16 11:23:25,116 ERROR services.py:1242 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2023-09-16_11-23-03_304985_90490/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr.
2023-09-16 11:23:25,116 ERROR services.py:1276 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues
2023-09-16 11:23:25,233 INFO worker.py:1621 -- Started a local Ray instance.
[2023-09-16 11:23:26,416 E 89301 89301] core_worker.cc:201: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

time for training

Hello,
Great work
How much time needed for training based on a single 3090 GPU
Thanks a lot!

AssertionError: Class to be of type <class 'pytorch_lightning.callbacks.base.Callback'>, but is <class 'omegaconf.dictconfig.DictConfig'>!

I met the following error when running train_gc_pgp.sh. I did not modify the repo except essential path setups. Here is the error information:

Starting Pre-Training with gt traversals as input for decoder
Global seed set to 0
2023-12-19 12:59:32,159 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20} Building experiment folders...
2023-12-19 12:59:32,159 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22} Experimental folder: /home/mh/code/nuplan/exp/exp/training_gc_pgp_model/training_gc_pgp_model/2023.12.19.12.59.31
2023-12-19 12:59:32,159 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19} Building WorkerPool...
2023-12-19 12:59:32,160 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78} Starting ray local!
2023-12-19 12:59:33,686 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at 35.3.215.205:8265
2023-12-19 12:59:34,153 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101} Worker: RayDistributed
2023-12-19 12:59:34,155 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102} Number of nodes: 1
Number of CPUs per node: 32
Number of GPUs per node: 1
Number of threads across all nodes: 32
2023-12-19 12:59:34,155 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27} Building WorkerPool...DONE!
2023-12-19 12:59:34,155 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/training/experiments/training.py:41} Building training engine...
2023-12-19 12:59:34,155 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18} Building TorchModuleWrapper...
2023-12-19 12:59:34,299 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21} Building TorchModuleWrapper...DONE!
2023-12-19 12:59:34,299 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18} Building Splitter...
2023-12-19 12:59:34,675 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21} Building Splitter...DONE!
2023-12-19 12:59:34,675 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/data_augmentation_builder.py:19} Building augmentors...
2023-12-19 12:59:34,685 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/data_augmentation_builder.py:28} Building augmentors...DONE!
2023-12-19 12:59:34,686 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18} Building AbstractScenarioBuilder...
2023-12-19 12:59:34,737 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21} Building AbstractScenarioBuilder...DONE!
2023-12-19 12:59:34,737 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35} Building ScenarioFilter...
2023-12-19 12:59:34,738 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44} Building ScenarioFilter...DONE!
Ray objects: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 32/32 [02:11<00:00, 4.12s/it]
2023-12-19 13:01:49,405 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_builder.py:171} Extracted 177435 scenarios for training
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258} WORLD_SIZE was not set.
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266} PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277} Number of gpus found to be in use: 1
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:114} World size: 1
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:115} Learning rate before: 0.0001
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:119} Scaling method: Equal Variance
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:141} Betas after scaling: [0.9, 0.999]
2023-12-19 13:01:49,408 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:143} Learning rate after scaling: 0.0001
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:172} Updating Learning Rate Scheduler Config...
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258} WORLD_SIZE was not set.
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266} PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277} Number of gpus found to be in use: 1
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:199} Updating torch.optim.lr_scheduler.MultiStepLR in ddp setting is not yet supported. Learning rate scheduler config will not be updated.
2023-12-19 13:01:49,487 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:245} Optimizer and LR Scheduler configs updated according to ddp strategy.
2023-12-19 13:01:49,494 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:19} Building callbacks...
Error executing job with overrides: ['seed=0', 'py_func=train', '+training=training_gc_pgp_model', 'job_name=training_gc_pgp_model', 'scenario_builder=nuplan', 'scenario_filter.num_scenarios_per_type=4000', 'cache.cache_path=/home/mh/code/nuplan/exp/mh/cache', 'cache.use_cache_without_dataset=False', 'callbacks.visualization_callback.pixel_size=0.25', '+callbacks.multimodal_visualization_callback.pixel_size=0.25', 'lightning.trainer.params.max_epochs=20', 'lightning.trainer.params.max_time=null', 'data_loader.params.batch_size=32', 'optimizer.lr=1e-4', 'lr_scheduler=multistep_lr', 'lr_scheduler.milestones=[40,50,55]', 'lr_scheduler.gamma=0.5', 'model.encoder.use_red_light_feature=TRUE', 'model.aggregator.use_route_mask=FALSE', 'model.aggregator.hard_masking=FALSE', 'model.aggregator.pre_train=true']
Traceback (most recent call last):
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/run_training.py", line 89, in
main()
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/main.py", line 49, in decorated_main
_run_hydra(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra
run_and_report(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 368, in
lambda: hydra.run(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 110, in run
_ = ret.return_value
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/run_training.py", line 59, in main
engine = build_training_engine(cfg, worker)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/training/experiments/training.py", line 60, in build_training_engine
trainer = build_trainer(cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_builder.py", line 109, in build_trainer
callbacks = build_callbacks(cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py", line 25, in build_callbacks
validate_type(callback, pl.Callback)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_type.py", line 32, in validate_type
assert isinstance(
AssertionError: Class to be of type <class 'pytorch_lightning.callbacks.base.Callback'>, but is <class 'omegaconf.dictconfig.DictConfig'>!
Starting Training with aggregator traversals as input for decoder
Global seed set to 0
2023-12-19 13:01:56,346 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:20} Building experiment folders...
2023-12-19 13:01:56,346 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/folder_builder.py:22} Experimental folder: /home/mh/code/nuplan/exp/exp/training_gc_pgp_model/training_gc_pgp_model/2023.12.19.13.01.55
2023-12-19 13:01:56,347 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19} Building WorkerPool...
2023-12-19 13:01:56,347 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78} Starting ray local!
2023-12-19 13:01:57,778 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at 35.3.215.205:8265
2023-12-19 13:01:58,236 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:101} Worker: RayDistributed
2023-12-19 13:01:58,237 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_pool.py:102} Number of nodes: 1
Number of CPUs per node: 32
Number of GPUs per node: 1
Number of threads across all nodes: 32
2023-12-19 13:01:58,237 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:27} Building WorkerPool...DONE!
2023-12-19 13:01:58,237 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/training/experiments/training.py:41} Building training engine...
2023-12-19 13:01:58,237 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:18} Building TorchModuleWrapper...
2023-12-19 13:01:58,376 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/model_builder.py:21} Building TorchModuleWrapper...DONE!
2023-12-19 13:01:58,376 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:18} Building Splitter...
2023-12-19 13:01:58,760 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/splitter_builder.py:21} Building Splitter...DONE!
2023-12-19 13:01:58,760 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/data_augmentation_builder.py:19} Building augmentors...
2023-12-19 13:01:58,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/data_augmentation_builder.py:28} Building augmentors...DONE!
2023-12-19 13:01:58,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:18} Building AbstractScenarioBuilder...
2023-12-19 13:01:58,821 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_building_builder.py:21} Building AbstractScenarioBuilder...DONE!
2023-12-19 13:01:58,821 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:35} Building ScenarioFilter...
2023-12-19 13:01:58,821 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_filter_builder.py:44} Building ScenarioFilter...DONE!
Ray objects: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 32/32 [02:10<00:00, 4.09s/it]
2023-12-19 13:04:12,689 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/scenario_builder.py:171} Extracted 177435 scenarios for training
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258} WORLD_SIZE was not set.
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266} PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277} Number of gpus found to be in use: 1
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:114} World size: 1
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:115} Learning rate before: 0.0001
2023-12-19 13:04:12,692 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:119} Scaling method: Equal Variance
2023-12-19 13:04:12,693 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:141} Betas after scaling: [0.9, 0.999]
2023-12-19 13:04:12,693 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:143} Learning rate after scaling: 0.0001
2023-12-19 13:04:12,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:172} Updating Learning Rate Scheduler Config...
2023-12-19 13:04:12,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:258} WORLD_SIZE was not set.
2023-12-19 13:04:12,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:266} PytorchLightning Trainer gpus was set to -1, finding number of GPUs used from torch.cuda.device_count().
2023-12-19 13:04:12,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:277} Number of gpus found to be in use: 1
2023-12-19 13:04:12,770 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:199} Updating torch.optim.lr_scheduler.MultiStepLR in ddp setting is not yet supported. Learning rate scheduler config will not be updated.
2023-12-19 13:04:12,771 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_config.py:245} Optimizer and LR Scheduler configs updated according to ddp strategy.
2023-12-19 13:04:12,777 INFO {/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py:19} Building callbacks...
Error executing job with overrides: ['seed=0', 'py_func=train', '+training=training_gc_pgp_model', 'job_name=training_gc_pgp_model', 'scenario_builder=nuplan', 'scenario_filter.num_scenarios_per_type=4000', 'cache.cache_path=/home/mh/code/nuplan/exp/mh/cache', 'cache.use_cache_without_dataset=False', 'callbacks.visualization_callback.pixel_size=0.25', '+callbacks.multimodal_visualization_callback.pixel_size=0.25', 'lightning.trainer.params.max_epochs=90', 'lightning.trainer.params.max_time=null', 'lightning.trainer.checkpoint.resume_training=true', 'data_loader.params.batch_size=32', 'optimizer.lr=1e-4', 'lr_scheduler=multistep_lr', 'model.encoder.use_red_light_feature=TRUE', 'model.aggregator.use_route_mask=FALSE', 'model.aggregator.hard_masking=FALSE', 'model.aggregator.pre_train=false']
Traceback (most recent call last):
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/run_training.py", line 89, in
main()
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/main.py", line 49, in decorated_main
_run_hydra(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra
run_and_report(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/utils.py", line 368, in
lambda: hydra.run(
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 110, in run
_ = ret.return_value
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/mh/miniconda3/envs/nuplan/lib/python3.9/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/run_training.py", line 59, in main
engine = build_training_engine(cfg, worker)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/training/experiments/training.py", line 60, in build_training_engine
trainer = build_trainer(cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_builder.py", line 109, in build_trainer
callbacks = build_callbacks(cfg)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/training_callback_builder.py", line 25, in build_callbacks
validate_type(callback, pl.Callback)
File "/home/mh/code/nuplan/nuplan-devkit/nuplan/planning/script/builders/utils/utils_type.py", line 32, in validate_type
assert isinstance(
AssertionError: Class to be of type <class 'pytorch_lightning.callbacks.base.Callback'>, but is <class 'omegaconf.dictconfig.DictConfig'>!

I appreciate any suggestions!

Qusetion about the result of PDM-Open with only the centerline input

Hi, I am trying to reproduce the result in CoRL23 Table1 with only the centerline input. In the article, pdm-open achieved a score of 85 on the OLS, while my replication results only reached 21 in reduced_val14. I would like to know whether it is due to differences in my hyperparameter settings or if there are other issues.Β 

Changed pdm_open_model.py

self.planner_head = nn.Sequential(
            # nn.Linear(self.hidden_dim * 2, self.hidden_dim),
            nn.Linear(self.hidden_dim * 1, self.hidden_dim),
            nn.Dropout(0.1),
            nn.ReLU(),
            nn.Linear(self.hidden_dim, self.hidden_dim),
            nn.ReLU(),
            nn.Linear(self.hidden_dim, trajectory_sampling.num_poses * len(SE2Index)),
        )
# planner_features = torch.cat([state_encodings, centerline_encodings], dim=-1)
planner_features = centerline_encodings

pdm_open_only_with_centerline_input

calculate inference time

Hello,
I see the eval table have times subject
I guess it is taken from exp/simulation/runner_report.parquet
but I don't know which it is from duration,comput_trajectory_runtimes and so on
so I want to ask how to calculate inference time(from readme.md table)
Thanks a lot!
dfe78a81b36e736cbc18f5959aef658

Some question about "Val14" evaluation

Hi

Thank you for your work.
However I have some question sim urban driver with Val14_split evaluation result

when run bash file located ../tupaln_garage/scripts/simulation/sim_urban_driver.sh
,I met IOError with [RayletClient]. So modify worker parameter to worker=single_machine_thread_pool in bashfile
The full bash file here!

SPLIT=val14_split
CHALLENGE=closed_loop_reactive_agents # open_loop_boxes, closed_loop_nonreactive_agents, closed_loop_reactive_agents
CHECKPOINT_PATH="/mnt/sdd/jyYun/planning/tuplan_garage/trained_weights/urban_driver.ckpt"

python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_simulation.py \
+simulation=$CHALLENGE \
planner=ml_planner \
worker=single_machine_thread_pool \
scenario_filter=$SPLIT \
scenario_builder=nuplan \
planner.ml_planner.model_config='\${model}' \
planner.ml_planner.checkpoint_path=$CHECKPOINT_PATH \
model=urban_driver_open_loop_model \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

It works, but I have noticed that some scenarios fail during evaluation
when evaluation ended it report 38 out of 1118 scenarios have failed
Despite the severall failed scenarios, we have confirmed that the results are still being successfully saved in the 'exp' folder as configured in Nuplan

Under these circumstances, I have some questions

  1. Could you provide information about the meaning of scenario failure and the reasons behind its occurrence?
    The scenario failed with this terminal log
Traceback (most recent call last):

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/runner/executor.py", line 28, in run_simulation

    run_results = sim_runner.run()

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/runner/simulations_runner.py", line 128, in run

    self.simulation.callback.on_simulation_end(self.simulation.setup, self.planner, self.simulation.history)

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/callback/multi_callback.py", line 68, in on_simulation_end

    callback.on_simulation_end(setup, planner, history)

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/callback/metric_callback.py", line 102, in on_simulation_end

    run_metric_engine(

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/callback/metric_callback.py", line 24, in run_metric_engine

    metric_files = metric_engine.compute(history, scenario=scenario, planner_name=planner_name)

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 133, in compute

    all_metrics_results = self.compute_metric_results(history=history, scenario=scenario)

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 119, in compute_metric_results

    raise RuntimeError(f"Metric Engine failed with: {e}")

RuntimeError: Metric Engine failed with: 

'

2023-12-09 09:45:38,768 WARNING {/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/simulation/runner/executor.py:125}  Failed Simulation.

 'Traceback (most recent call last):

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/metrics/metric_engine.py", line 112, in compute_metric_results

    metric_results[metric.name] = metric.compute(history, scenario=scenario)

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/metrics/evaluation_metrics/common/speed_limit_compliance.py", line 218, in compute

    time_series = TimeSeries(

  File "<string>", line 7, in __init__

  File "/mnt/sdd/jyYun/planning/nuplan-devkit/nuplan-devkit/nuplan/planning/metrics/metric_result.py", line 127, in __post_init__

    assert len(self.time_stamps) == len(self.values)

AssertionError
  1. Is the failure of scenarios related to the worker parameter in the bash file being modified?

  2. Is the higher evaluation performance compared to the values reported in the paper related to scenario failures?
    It appears that the CLS-R score is 0.5986 in nuboard report
    image

No module named 'nuplan_garage' when loading "pdm_offset_checkpoint.ckpt"

I want to run simulations using your provided models from https://drive.google.com/drive/folders/1LLdunqyvQQuBuknzmf7KMIJiA2grLYB2. However, there is an error "No module named 'nuplan_garage'" when I load "pdm_offset_checkpoint.ckpt" and "gc_pgp_checkpoint.ckpt". This error does not happen when I load "pdm_open_checkpoint.ckpt".

Maybe the reason is that the checkpoints are not regenerated after you changed the repository name due to the trademark conflicts.

Thanks in advance!

Some errors feedback

Dear Author

I want to make some suggestions.When run bash file located

/home/wjl/jyf/tuplan_garage/scripts/simulation/sim_gc_pgp.sh

I met error mismatched input '=' expecting <EOF>

image

After many attempts I realized that the problem is checkpoint path

"/home/wjl/jyf/nuplandata/exp/exp/training_gc_pgp_model/training_gc_pgp_model/2023.12.12.09.53.38/checkpoints/epoch=1.ckpt"

When I delete the '='and rename the checkpoint file , this bash file can run correctly. This code name epoch=<int> initially. So I suggest name checkpoint file likeepoch-<int>

APPENDIX

error shell

SPLIT=val14_split
CHALLENGE=closed_loop_reactive_agents # open_loop_boxes, closed_loop_nonreactive_agents, closed_loop_reactive_agents
CHECKPOINT_PATH="/home/wjl/jyf/nuplandata/exp/exp/training_gc_pgp_model/training_gc_pgp_model/2023.12.12.09.53.38/checkpoints/epoch=1.ckpt" 

python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_simulation.py \
+simulation=$CHALLENGE \
planner=ml_planner \
scenario_filter=$SPLIT \
scenario_builder=nuplan \
planner.ml_planner.model_config='\${model}' \
planner.ml_planner.checkpoint_path=$CHECKPOINT_PATH \
model=gc_pgp_model \
model.aggregator.pre_train=false \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

correct shell

SPLIT=val14_split
CHALLENGE=closed_loop_reactive_agents # open_loop_boxes, closed_loop_nonreactive_agents, closed_loop_reactive_agents
CHECKPOINT_PATH="/home/wjl/jyf/nuplandata/exp/exp/training_gc_pgp_model/training_gc_pgp_model/2023.12.12.09.53.38/checkpoints/epoch1.ckpt"  #delete '='

python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_simulation.py \
+simulation=$CHALLENGE \
planner=ml_planner \
scenario_filter=$SPLIT \
scenario_builder=nuplan \
planner.ml_planner.model_config='\${model}' \
planner.ml_planner.checkpoint_path=$CHECKPOINT_PATH \
model=gc_pgp_model \
model.aggregator.pre_train=false \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

Best whishes
Yifan

Callbacks'module problem before the start of training GC-PGP

Dear Authors,
Thank you for the excellent work! I have set up the nuplan environment and installed tuplan_garage as a package.But I have a problem before the start of training GC-PGP. I can't find callbacks.multimodal_visualization_callback.pixel_size key so I can't train GC-PGP model. Then I try to look the path /home/wjl/jyf/nuplan-devkit/nuplan/planning/training/callbacks but it not exists the file multimodal_visualization_callback .So do you have suggestions for trainning the GC-PGP model
image

bash the code file below:

BATCH_SIZE=32
SEED=0
NUPLAN_DEVKIT_ROOT=/home/wjl/jyf/nuplan-devkit #加nuplan-devkitεœ°ε€

PRETRAIN_EPOCHS=20
PRETRAIN_LR=1e-4
TRAIN_EPOCHS=90
TRAIN_LR=1e-4
TRAIN_LR_MILESTONES=[40,50,55]
TRAIN_LR_DECAY=0.5

JOB_NAME=training_gc_pgp_model
CACHE_PATH=/home/wjl/jyf/tuplan_garage/nuplandata/exp/jyf/cache #cacheεœ°ε€
USE_CACHE_WITHOUT_DATASET=False

ROUTE_FEATURE=FALSE
ROUTE_MASK=FALSE
HARD_MASK=FALSE
TRAFFIC_LIGHT=TRUE


echo "Starting Pre-Training with gt traversals as input for decoder"
python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_training.py \
seed=$SEED \
py_func=train \
+training=training_gc_pgp_model \
job_name=$JOB_NAME \
scenario_builder=nuplan \
scenario_filter.num_scenarios_per_type=4000 \
cache.cache_path=$CACHE_PATH \
cache.use_cache_without_dataset=$USE_CACHE_WITHOUT_DATASET \
callbacks.visualization_callback.pixel_size=0.25 \
callbacks.multimodal_visualization_callback.pixel_size=0.25 \
lightning.trainer.params.max_epochs=$PRETRAIN_EPOCHS \
lightning.trainer.params.max_time=null \
data_loader.params.batch_size=$BATCH_SIZE \
optimizer.lr=$PRETRAIN_LR \
lr_scheduler=multistep_lr \
lr_scheduler.milestones=$TRAIN_LR_MILESTONES \
lr_scheduler.gamma=$TRAIN_LR_DECAY \
model.encoder.use_red_light_feature=$TRAFFIC_LIGHT \
model.aggregator.use_route_mask=$ROUTE_MASK \
model.aggregator.hard_masking=$HARD_MASK \
model.aggregator.pre_train=true \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.training, pkg://tuplan_garage.planning.script.experiments, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

echo "Starting Training with aggregator traversals as input for decoder"
python $NUPLAN_DEVKIT_ROOT/nuplan/planning/script/run_training.py \
seed=$SEED \
py_func=train \
+training=training_gc_pgp_model \
job_name=$JOB_NAME \
scenario_builder=nuplan \
scenario_filter.num_scenarios_per_type=4000 \
cache.cache_path=$CACHE_PATH \
cache.use_cache_without_dataset=$USE_CACHE_WITHOUT_DATASET \
callbacks.visualization_callback.pixel_size=0.25 \
callbacks.multimodal_visualization_callback.pixel_size=0.25 \
lightning.trainer.params.max_epochs=$TRAIN_EPOCHS \
lightning.trainer.params.max_time=null \
lightning.trainer.checkpoint.resume_training=true \
data_loader.params.batch_size=$BATCH_SIZE \
optimizer.lr=$TRAIN_LR \
lr_scheduler=multistep_lr \
model.encoder.use_red_light_feature=$TRAFFIC_LIGHT \
model.aggregator.use_route_mask=$ROUTE_MASK \
model.aggregator.hard_masking=$HARD_MASK \
model.aggregator.pre_train=false \
hydra.searchpath="[pkg://tuplan_garage.planning.script.config.common, pkg://tuplan_garage.planning.script.config.training, pkg://tuplan_garage.planning.script.experiments, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.experiments]"

output

Starting Pre-Training with gt traversals as input for decoder
Could not override 'callbacks.multimodal_visualization_callback.pixel_size'.
To append to your config use +callbacks.multimodal_visualization_callback.pixel_size=0.25
Key 'multimodal_visualization_callback' is not in struct
    full_key: callbacks.multimodal_visualization_callback
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Starting Training with aggregator traversals as input for decoder
Could not override 'callbacks.multimodal_visualization_callback.pixel_size'.
To append to your config use +callbacks.multimodal_visualization_callback.pixel_size=0.25
Key 'multimodal_visualization_callback' is not in struct
    full_key: callbacks.multimodal_visualization_callback
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Best wishes,
Yifan

Error in evaluation

Hi,

Thank you for the code! However, when I run the evaluation command in readme, I encounter this error:

INFO:nuplan.planning.script.builders.main_callback_builder:Building MultiMainCallback...
INFO:nuplan.planning.script.builders.main_callback_builder:Building MultiMainCallback: 4...DONE!
2023-07-19 21:44:16,393 INFO {/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/script/builders/worker_pool_builder.py:19}  Building WorkerPool...
2023-07-19 21:44:16,436 INFO {/home/lis2syv/nuPlan/nuplan-devkit/nuplan/planning/utils/multithreading/worker_ray.py:78}  Starting ray local!
2023-07-19 21:44:20,657 ERROR services.py:1207 -- Failed to start the dashboard , return code -11
2023-07-19 21:44:20,657 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-07-19 21:44:20,658 ERROR services.py:1276 --
The last 20 lines of /tmp/ray/session_2023-07-19_21-44-16_486474_454342/logs/dashboard.log (it contains the error message from the dashboard):
2023-07-19 21:44:18,280 INFO head.py:242 -- Starting dashboard metrics server on port 44227

2023-07-19 21:44:20,971 INFO worker.py:1636 -- Started a local Ray instance.
[2023-07-19 21:44:22,949 E 454342 454342] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Do you have any hints on what went wrong?

GPUs seems not runing when training

I encountered issues when running the tuplan-garage project on my computer. The training process on the nuplan dataset was slow and the GPU memory usage was only at 20%. I tried on the A100 cluster, the GPU usage was pretty low too.

I have been using the default training settings. I'm curious if any important settings were overlooked during the process?

GPU_no_runing.txt

GPU usage info
image

training gc_pgp model

Hi,

For you released gc_pgp code,I notice you pretrained the aggregator. Did you load the pretrained aggregator when training the gc_pgp? I didn't find any model loading code. Also,could you share the training configuration of urban driver? I trained the urbandriver on the train150k set with 4000 scenarios per type, but the performance is much lower than it in the table in your readme, with only 55 open-loop score.

Bests,

Centerline and States encoding for PDM_Open and PDM_Offset models

Hello,
I have a doubt regarding the encoding process of the centerline and the states of the ego:

1) Centerline: from my understanding, you first extract the centerline using Dijkstra, from the starting position to the goal position.
Afterwards, you just keep a length of 120m sampled at 1m and you pass it to the neural network.
My question is: how do you select the "local" centerline of 120m? Is its starting point the same as the ego vehicle for each frame? Or do you implement another logic?

Moreover, each state of the centerline only includes x, y and theta, but in which reference frame?

2) Ego states: ego_position, ego_velocity and ego_acceleration have 11 elements (the last one is the present one and the first 10 are the past ones), each with 3 states (x, y and theta). Also in this case, which is the reference frame?

I guess it's the local one, but I ask you just to make sure.

Thank you very much in advance!

about the validation set in the training process

Hi,

I am a little confused about the val14 data split. Did you use all the validation set and the training set for sampling the 178k samples for the train150 set, in which the samples in the val set is used to do validation, then the val14 set with 1118 samples from the val set is used to test the performance? That means the val set and the test set of val14 setting might overlap?

Training settings for Val14 benchmark

Thanks for open sourcing the great work!

I want to ask about the training details of PDM-open and PDM-Offset.

  • Hardwares:
    • GPU type and numbers
    • CPU cores numbers
    • Maximum Memory Usage
  • Training time
    • How many hours per epoch
    • How many epochs do you used?
  • Cache
    • Do you save the features like central lines that you calculated?
      • If yes, what is the size of it and how to save and load it?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.