Giter Club home page Giter Club logo

mars's Introduction

MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving

PyPI version

CICAI 2023 Best Paper Runner-up Award

For business inquiries, please contact us at [email protected].

We have just finished a refactorization of our codebase. Now you can use pip install to start using mars instantly! Please contact us without hesitation if you encounter any issues using the latest version. Thanks!

Our related(dependent) project -- CarStudio is accepted by RAL and their code is available at GitHub. Congrats to them!

1. Installation: Setup the environment

Prerequisites

You must have an NVIDIA video card with CUDA installed on the system. This library has been tested with version 11.7 of CUDA. You can find more information about installing CUDA here.

Create environment

Nerfstudio requires python >= 3.7. We recommend using conda to manage dependencies. Make sure to install Conda before proceeding.

conda create --name mars -y python=3.9
conda activate mars

Installation

This section will walk you through the installation process. Our system is dependent on the tiny-cuda-nn project.

pip install mars-nerfstudio
cd /path/to/tiny-cuda-nn/bindings/torch
python setup.py install

2. Training from Scratch

The following will train a MARS model.

Our repository provides dataparser for KITTI and vKITTI2 datasets, for your own data, you can write your own dataparser or convert your own dataset to the format of the provided datasets.

From Datasets

Data Preparation

The data used in our experiments should contain both the pose parameters of cameras and object tracklets. The camera parameters include the intrinsics and the extrinsics. The object tracklets include the bounding box poses, types, ids, etc. For more information, you can refer to KITTI-MOT or vKITTI2 datasets below.

KITTI

The KITTI-MOT dataset should look like this:

.(KITTI_MOT_ROOT)
├── panoptic_maps                    # (Optional) panoptic segmentation from KITTI-STEP dataset.
│   ├── colors
│   │   └── sequence_id.txt
│   ├── train
│   │   └── sequence_id
│   │       └── frame_id.png
└── training
    ├── calib
    │   └── sequence_id.txt
    ├── completion_02                # (Optional) depth completion
    │   └── sequence_id
    │       └── frame_id.png
    ├── completion_03
    │   └── sequence_id
    │       └── frame_id.png
    ├── image_02
    │   └── sequence_id
    │       └── frame_id.png
    ├── image_03
    │   └── sequence_id
    │       └── frame_id.png
    ├── label_02
    │   └── sequence_id.txt
    └── oxts
        └── sequence_id.txt

We use a monocular depth estimation model to generate the depth maps for KITTI-MOT dataset. Here is the estimation result of 0006 sequence of KITTI-MOT datasets. You can download and put them in the KITTI-MOT/training directory.

We download the KITTI-STEP annotations and generate the panoptic segmentation maps for KITTI-MOT dataset. You can download the demo panoptic maps here and put them in the KITTI-MOT directory, or you can visit the official website of KITTI-STEP for more information.

To train a reconstruction model, you can use the following command:

ns-train mars-kitti-car-depth-recon --data /data/kitti-MOT/training/image_02/0006

or if you want to use the Python script (please refer to the launch.json file in the .vscode directory):

python nerfstudio/nerfstudio/scripts/train.py mars-kitti-car-depth-recon --data /data/kitti-MOT/training/image_02/0006

vKITTI2

The vKITTI2 dataset should look like this:

.(vKITTI2_ROOT)
└── sequence_id
    └── scene_name
        ├── bbox.txt
        ├── colors.txt
        ├── extrinsic.txt
        ├── info.txt
        ├── instrinsic.txt
        ├── pose.txt
        └── frames
            ├── depth
            │   ├── Camera_0
            │   │   └── frame_id.png
            │   └── Camera_1
            │   │   └── frame_id.png
            ├── instanceSegmentation
            │   ├── Camera_0
            │   │   └── frame_id.png
            │   └── Camera_1
            │   │   └── frame_id.png
            ├── classSegmentation
            │   ├── Camera_0
            │   │   └── frame_id.png
            │   └── Camera_1
            │   │   └── frame_id.png
            └── rgb
                ├── Camera_0
                │   └── frame_id.png
                └── Camera_1
                    └── frame_id.png

To train a reconstruction model, you can use the following command:

ns-train mars-vkitti-car-depth-recon --data /data/vkitti/Scene06/clone

or if you want to use the python script:

python nerfstudio/nerfstudio/scripts/train.py mars-vkitti-car-depth-recon --data /data/vkitti/Scene06/clone

Your Own Data

For your own data, you can refer to the above data structure and write your own dataparser, or you can convert your own dataset to the format of the dataset above.

From Pre-Trained Model

Our model uses nerfstudio as the training framework, we provide the reconstruction and novel view synthesis tasks checkpoints.

Our pre-trained model is uploaded to Google Drive, you can refer to the below table to download the model.

Dataset Scene Setting Start-End Steps PSNR SSIM Download Wandb
KITTI-MOT 0006 Reconstruction 65-120 400k 27.96 0.900 model report
0006 Novel View Synthesis 75% 65-120 200k 27.32 0.890 model report
0006 Novel View Synthesis 50% 65-120 200k 26.80 0.883 model report
0006 Novel View Synthesis 25% 65-120 200k 25.87 0.866 model report
Vitural KITTI-2 Scene06 Novel View Synthesis 75% 0-237 600k 32.32 0.940 model report
Scene06 Novel View Synthesis 50% 0-237 600k 32.16 0.938 model report
Scene06 Novel View Synthesis 25% 0-237 600k 30.87 0.935 model report

You can use the following command to train a model from a pre-trained model:

ns-train mars-kitti-car-depth-recon --data /data/kitti-MOT/training/image_02/0006 --load-dir outputs/experiment_name/method_name/timestamp/nerfstudio

Model Configs

Our modular framework supports combining different architectures for each node by modifying model configurations. Here's an example of using Nerfacto for background and our category-level object model:

model=SceneGraphModelConfig(
    background_model=NerfactoModelConfig(),
    object_model_template=CarNeRFModelConfig(_target=CarNeRF),
    object_representation="class-wise",
    object_ray_sample_strategy="remove-bg",
)

If you choose to use the category-level object model, please make sure that the use_car_latents=True and the latent codes exists. We provide latent codes of some sequences on KITTI-MOT and vKITTI2 datasets here.

For more information, please refer to our provided configurations at mars/cicai_configs.py. We use wandb for logging by default, you can also specify other viewers (tensorboard/nerfstudio-viewer supported) with the --vis config. Please refer to the nerfstudio documentation for details.

Render

If you want to render with our pre-trained model, you should visit here to download our checkpoints and config. To run the render script, you need to ensure that your config is the same as the config.yml that you load in.

You can use the following command to render. You can modify output format and directory by specificing --output-format and --output-path :

python scripts/cicai_render.py --load-config outputs/kitti-recon-65-120/mars-kitti-car-depth-recon/2023-10-30_212654/config.yml --output-format video

or

python scripts/cicai_render.py --load-config outputs/kitti-recon-65-120/mars-kitti-car-depth-recon/2023-10-30_212654/config.yml --output-format images --output-path /path/to/your/output/directory

Citation

You can find our paper here. If you use this library or find the repo useful for your research, please consider citing:

@article{wu2023mars,
  author    = {Wu, Zirui and Liu, Tianyu and Luo, Liyi and Zhong, Zhide and Chen, Jianteng and Xiao, Hongmin and Hou, Chao and Lou, Haozhe and Chen, Yuantao and Yang, Runyi and Huang, Yuxin and Ye, Xiaoyu and Yan, Zike and Shi, Yongliang and Liao, Yiyi and Zhao, Hao},
  title     = {MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving},
  journal   = {CICAI},
  year      = {2023},
}

Acknoledgement

Part of our code is borrowed from Nerfstudio. This project is sponsored by Tsinghua-Toyota Joint Research Fund (20223930097) and Baidu Inc. through Apollo-AIR Joint Research Center.

Notice

This open-sourced version will be actively maintained and regularly updated. For more features, please contact us for a commercial version.

mars's People

Contributors

eltociear avatar j-pens avatar jiantengchen avatar nplace-su avatar wuzirui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mars's Issues

How to adapt to roadside dataset?

Hi, thanks for your great work. I notice that you have shown a result that was running on the DAIR-V2X dataset but you have not release relevant data parser and training config. Do you have the plan to release them? If it is not convenient, could you give me some tips on how to make adaptations on this kind of dataset? For example, how to obtain the car latents on this dataset? How to handle the camera ex/intrinsics and convert them to the vkitti format? Meanwhile, there is no motion on camera poses and there is only mono camera, does it affect the depth estimation on roadside background? Thanks a lot in advance.

The interlevel loss is increasing

Hi, I'm trying to reproduce the result on KITTI, but I find the interlevel loss is increasing during the training process, is this normal? Could you explain a bit about the usage of this loss function? Can I just set its weight to zero? Thanks.

KITTI 0002 BUG

When I run the scrip : ns-train nsg-kitti-car-depth-recon --data /*/kitti/training/image_02/0006, It's working fine.

I have generated the depth completion and //mars/data/kitti/panoptic_maps/train files, but I didn't have //mars/data/kitti/panoptic_maps/colors/000*.txt
However,I run the scrip : ns-train nsg-kitti-car-depth-recon --data /*/kitti/training/image_02/0002
The error occur, I need you help thanks

Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 2.1515
NSGPipeline.get_train_loss_dict: 2.1508
Traceback (most recent call last):
File "/home//anaconda3/envs/SUDS/bin/ns-train", line 8, in
sys.exit(entrypoint())
File "/home/
/mars/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
main(
File "/home//mars/nerfstudio/nerfstudio/scripts/train.py", line 248, in main
launch(
File "/home/
/mars/nerfstudio/nerfstudio/scripts/train.py", line 187, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home//mars/nerfstudio/nerfstudio/scripts/train.py", line 102, in train_loop
trainer.train()
File "/home/
/mars/nerfstudio/nerfstudio/engine/trainer.py", line 242, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "/home//mars/nerfstudio/nerfstudio/utils/profiler.py", line 91, in inner
out = func(args, **kwargs)
File "/home/
/mars/nerfstudio/nerfstudio/engine/trainer.py", line 446, in train_iteration
_, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "/home/
/mars/nerfstudio/nerfstudio/utils/profiler.py", line 91, in inner
out = func(args, **kwargs)
File "/home/
/mars/nsg/nsg_pipeline.py", line 134, in get_train_loss_dict
model_outputs = self.model(ray_bundle)
File "/home//anaconda3/envs/SUDS/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/home/
/mars/nerfstudio/nerfstudio/models/base_model.py", line 140, in forward
return self.get_outputs(ray_bundle)
File "/home/
/mars/nsg/models/scene_graph.py", line 443, in get_outputs
result = model.inference_without_render(ray_obj)
File "/home/*/mars/nsg/models/car_nerf.py", line 228, in inference_without_render
self.car_latents[int(obj_id)].view(1, -1).to(obj_id.device),
KeyError: 3

bug on kitti 0002

Hi, thanks for sharing your great work.
There is only one file under the directory of car-nerf-state-dict.
mars/latents/KITTI-MOT/car-nerf-state-dict/epoch_670.ckpt
Are 0002 and 0001 used differently? If not can you provide the required documentation?
I tested the method you provided, but there are still bugs. The details are as follows

ns-train nsg-kitti-car-depth-recon --data /home//mars/data/kitti/training/image_02/0002
──────────────────────────────────────────────────────── Config ────────────────────────────────────────────────────────
TrainerConfig(
_target=<class 'nerfstudio.engine.trainer.Trainer'>,
output_dir=PosixPath('outputs'),
method_name='nsg-kitti-car-depth-recon',
experiment_name=None,
project_name='nerfstudio-project',
timestamp='2023-08-03_104959',
machine=MachineConfig(seed=42, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto'),
logging=LoggingConfig(
relative_log_dir=PosixPath('.'),
steps_per_log=10,
max_buffer_size=20,
local_writer=LocalWriterConfig(
_target=<class 'nerfstudio.utils.writer.LocalWriter'>,
enable=True,
stats_to_track=(
<EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>,
<EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>,
<EventName.CURR_TEST_PSNR: 'Test PSNR'>,
<EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>,
<EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>,
<EventName.ETA: 'ETA (time)'>
),
max_log_size=10
),
profiler='basic'
),
viewer=ViewerConfig(
relative_log_filename='viewer_log_filename.txt',
websocket_port=None,
websocket_port_default=7007,
websocket_host='0.0.0.0',
num_rays_per_chunk=32768,
max_num_display_images=512,
quit_on_train_completion=False,
image_format='jpeg',
jpeg_quality=90
),
pipeline=NSGPipelineConfig(
_target=<class 'nsg.nsg_pipeline.NSGPipeline'>,
datamanager=NSGkittiDataManagerConfig(
_target=<class 'nsg.data.nsg_datamanager.NSGkittiDataManager'>,
data=PosixPath('/home/
/mars/data/kitti/training/image_02/0002'),
camera_optimizer=CameraOptimizerConfig(
_target=<class 'nerfstudio.cameras.camera_optimizers.CameraOptimizer'>,
mode='off',
position_noise_std=0.0,
orientation_noise_std=0.0,
optimizer=AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.0006,
eps=1e-15,
max_norm=None,
weight_decay=0
),
scheduler=ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=None,
warmup_steps=0,
max_steps=10000,
ramp='cosine'
),
param_group='camera_opt'
),
dataparser=NSGkittiDataParserConfig(
_target=<class 'nsg.data.nsg_dataparser.NSGkitti'>,
data=PosixPath('data/kitti/training/image_02/0002'),
scale_factor=1,
scene_scale=1.0,
alpha_color='white',
first_frame=140,
last_frame=224,
use_object_properties=True,
object_setting=0,
obj_opaque=True,
box_scale=1.5,
novel_view='left',
use_obj=True,
render_only=False,
bckg_only=False,
near_plane=0.5,
far_plane=150.0,
dataset_type='kitti',
obj_only=False,
netchunk=65536,
chunk=32768,
max_input_objects=-1,
add_input_rows=-1,
use_car_latents=True,
car_object_latents_path=PosixPath('/home//mars/latents/KITTI-MOT/car-object-latents/latent_cod
es02.pt'),
car_nerf_state_dict_path=PosixPath('/home/
/mars/latents/KITTI-MOT/car-nerf-state-dict/epoch_67
0.ckpt'),
use_depth=True,
split_setting='reconstruction',
use_semantic=False,
semantic_path=PosixPath('.'),
semantic_mask_classes=[]
),
train_num_rays_per_batch=4096,
train_num_images_to_sample_from=-1,
train_num_times_to_repeat_images=-1,
eval_num_rays_per_batch=4096,
eval_num_images_to_sample_from=-1,
eval_num_times_to_repeat_images=-1,
eval_image_indices=(0,),
camera_res_scale_factor=1.0,
patch_size=1
),
model=SceneGraphModelConfig(
_target=<class 'nsg.models.scene_graph.SceneGraphModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=4096,
background_model=NerfactoModelConfig(
_target=<class 'nsg.models.nerfacto.NerfactoModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=4096,
near_plane=0.05,
far_plane=150.0,
background_color='black',
hidden_dim=64,
hidden_dim_color=64,
hidden_dim_transient=64,
num_levels=16,
max_res=2048,
log2_hashmap_size=19,
num_coarse_samples=24,
distortion_loss_mult=0.002,
orientation_loss_mult=0.0001,
pred_normal_loss_mult=0.001,
use_average_appearance_embedding=True,
predict_normals=False,
obj_feat_dim=0,
disable_scene_contraction=False,
sampler='proposal',
num_proposal_samples_per_ray=(256, 128),
num_nerf_samples_per_ray=97,
proposal_update_every=5,
proposal_warmup=5000,
num_proposal_iterations=2,
use_same_proposal_network=False,
proposal_net_args_list=[
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 128, 'use_linear': False},
{'hidden_dim': 16, 'log2_hashmap_size': 17, 'num_levels': 5, 'max_res': 256, 'use_linear': False}
],
use_single_jitter=True,
use_proposal_weight_anneal=True,
proposal_weights_anneal_slope=10.0,
proposal_weights_anneal_max_num_iters=1000,
use_gradient_scaling=False
),
object_model_template=CarNeRFModelConfig(
_target=<class 'nsg.models.car_nerf.CarNeRF'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=4096,
num_coarse_samples=32,
num_fine_samples=97,
background_color='black',
optimize_latents=False
),
max_num_obj=-1,
ray_add_input_rows=-1,
near_plane=0.05,
far_plane=1000.0,
background_color='black',
latent_size=256,
orientation_loss_mult=0.0001,
pred_normal_loss_mult=0.001,
predict_normals=False,
object_representation='class-wise',
object_ray_sample_strategy='remove-bg',
object_warmup_steps=1000,
depth_loss_mult=0.01,
semantic_loss_mult=1.0,
mono_depth_loss_mult=0.0,
is_euclidean_depth=False,
depth_sigma=0.05,
should_decay_sigma=False,
starting_depth_sigma=4.0,
sigma_decay_rate=0.9998,
depth_loss_type=<DepthLossType.DS_NERF: 1>,
use_interlevel_loss=True,
interlevel_loss_mult=1.0,
debug_object_pose=False,
use_sky_model=False,
sky_model=SkyModelConfig(
_target=<class 'nsg.models.sky_model.SkyModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=4096,
hidden_dim=128,
num_layers=5
)
)
),
optimizers={
'background_model': {
'optimizer': RAdamOptimizerConfig(
_target=<class 'torch.optim.radam.RAdam'>,
lr=0.001,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=1e-05,
warmup_steps=0,
max_steps=200000,
ramp='cosine'
)
},
'learnable_global': {
'optimizer': RAdamOptimizerConfig(
_target=<class 'torch.optim.radam.RAdam'>,
lr=0.001,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=1e-05,
warmup_steps=0,
max_steps=200000,
ramp='cosine'
)
},
'object_model': {
'optimizer': RAdamOptimizerConfig(
_target=<class 'torch.optim.radam.RAdam'>,
lr=0.005,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=1e-05,
warmup_steps=0,
max_steps=200000,
ramp='cosine'
)
}
},
vis='wandb',
data=PosixPath('/home//mars/data/kitti/training/image_02/0002'),
relative_model_dir=PosixPath('nerfstudio_models'),
steps_per_save=2000,
steps_per_eval_batch=500,
steps_per_eval_image=500,
steps_per_eval_all_images=5000,
max_num_iterations=600000,
mixed_precision=False,
use_grad_scaler=True,
save_only_latest_checkpoint=False,
load_dir=None,
load_step=None,
load_config=None,
load_checkpoint=None,
log_gradients=True
)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Saving config to: experiment_config.py:129
outputs/0002/nsg-kitti-car-depth-recon/2023-08-03_104959/config.yml
Saving checkpoints to: trainer.py:138
outputs/0002/nsg-kitti-car-depth-recon/2023-08-03_104959/nerfstudio_models
[10:50:00] [array([3.]), array([6.]), array([7.]), array([8.]), array([14.]), array([15.]), nsg_dataparser.py:843
array([16.]), array([17.]), array([18.]), array([19.]), array([9.])] in this scene.
finished data parsing
[10:50:09] [array([3.]), array([6.]), array([7.]), array([8.]), array([14.]), array([15.]), nsg_dataparser.py:843
array([16.]), array([17.]), array([18.]), array([19.]), array([9.])] in this scene.
finished data parsing
Setting up training dataset...
Caching all 170 images.
Setting up evaluation dataset...
Caching all 42 images.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
No Nerfstudio checkpoint to load, so training from scratch.
wandb: Tracking run with wandb version 0.15.4
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
logging events to: outputs/0002/nsg-kitti-car-depth-recon/2023-08-03_104959
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 4.7364
NSGPipeline.get_train_loss_dict: 4.7356
Traceback (most recent call last):
File "/home/
/anaconda3/envs//bin/ns-train", line 8, in
sys.exit(entrypoint())
File "/home/
/mars/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
main(
File "/home//mars/nerfstudio/nerfstudio/scripts/train.py", line 248, in main
launch(
File "/home/
/mars/nerfstudio/nerfstudio/scripts/train.py", line 187, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home//mars/nerfstudio/nerfstudio/scripts/train.py", line 102, in train_loop
trainer.train()
File "/home/
/mars/nerfstudio/nerfstudio/engine/trainer.py", line 242, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "/home//mars/nerfstudio/nerfstudio/utils/profiler.py", line 91, in inner
out = func(args, **kwargs)
File "/home/
/mars/nerfstudio/nerfstudio/engine/trainer.py", line 446, in train_iteration
_, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "/home/
/mars/nerfstudio/nerfstudio/utils/profiler.py", line 91, in inner
out = func(args, **kwargs)
File "/home/
/mars/nsg/nsg_pipeline.py", line 134, in get_train_loss_dict
model_outputs = self.model(ray_bundle)
File "/home//anaconda3/envs/SUDS/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/home/
/mars/nerfstudio/nerfstudio/models/base_model.py", line 140, in forward
return self.get_outputs(ray_bundle)
File "/home/
/mars/nsg/models/scene_graph.py", line 443, in get_outputs
result = model.inference_without_render(ray_obj)
File "/home/*/mars/nsg/models/car_nerf.py", line 228, in inference_without_render
self.car_latents[int(obj_id)].view(1, -1).to(obj_id.device),
KeyError: 3

How to edit a scene?

Hi, thanks for sharing your great work.
How do you perform the editing function of the scene, and do you have the corresponding code script?

Why the dataparser execute twice?

I notice that the dataparser execute twice with the same results. Why the function execute twice? It confuses me a lot.

def _generate_dataparser_outputs(self, split="train"):

Error: latents not exist

I met the error below.

           Saving config to: outputs/0006/ablation-no-depth-kitti/2023-07-31_015043/config.yml  experiment_config.py:128
[01:50:44] Saving checkpoints to:                                                                         trainer.py:138
           outputs/0006/ablation-no-depth-kitti/2023-07-31_015043/nerfstudio_models                                     
[01:50:45] [array([0.]), array([1.]), array([2.]), array([7.]), array([8.]), array([3.]),          nsg_dataparser.py:843
           array([4.]), array([9.]), array([5.]), array([12.]), array([6.]), array([10.]),                              
           array([11.]), array([13.]), array([14.])] in this scene.                                                     
Error: latents not exist

The running script is below.

python nerfstudio/nerfstudio/scripts/train.py ablation-no-depth-kitti --data /rockywin.wang/NeRF/mars/data/kitti_tracking/training/image_02/0006

The reproduction result on vkitti2/Scene18/morning is poor

the train cmd is

CUDA_VISIBLE_DEVICES="0,1" python nerfstudio/nerfstudio/scripts/train.py  nsg-vkitti-car-depth-recon --data vkitti2/Scene18/morning --pipeline.datamanager.dataparser.car_object_latents_pa
th  pretrain/VituralKITTI2/car-object-latents/latent_codes18.pt --pipeline.datamanager.dataparser.car_nerf_state_dict_path pretrain/VituralKITTI2/car-nerf-state-dict/epoch_805.ckpt --experiment_name reduce_eval --machine.num-gpus
 2

As seen in the video, the sporty car is very blurry
https://github.com/OPEN-AIR-SUN/mars/assets/18495380/0940157d-a307-4516-a732-4be841f5a805

the wandb log is here, the psnr seems low
image

Could you give some suggestions about how to solve the problem?
Many thanks!

how to solve this case when running render scripts

hi,
when i download your pretrained model and run scripts python scripts/cicai_render.py --load-config outputs/nvs75fullseq/nsg-vkitti-car-depth-nvs/2023-06-21_135412/config.yml --output-path renders/ as ReadME, I encounter this error

FileNotFoundError: [Errno 2] No such file or directory: '/data22/luoly/dataset/demo/vkitti/Scene06/clone/extrinsic.txt'

I search some keys words such as luoy but find nothing in my render code, how could I do to get right results?

A question about VRAM and training time

I want to know if it's normal to only have around 3GB of VRAM usage when training on the KITTI dataset scene 0005. It's taking around 5 days on a 3090 GPU. Are there any parameters that can be configured to speed up the training? Thanks.

Config question about novel_view and split_setting

The config has two setting
split_setting: str = "reconstruction"
novel_view: str = "left"
For the split_setting I notice that there are four chioces:reconstruction,nvs-75,nvs-50,nvs-25.
For the novel_view, it has four choices : left,mid,shift,right. Dose it mean change the value will be able to render from a new view, and if the split_setting has certain connection with it.

How to set scale factor for depth?

self.depth_unit_scale_factor = 0.01 # VKITTI provide depth maps in centimeters
As mentioned in #9, depth maps for KITTI are generated with a monocular depth estimation model, which leads to the unknown scale.

  1. How should the depth_unit_scale_factor be set? Should it be set to 1.0?
  2. If a monocular depth model is used for depth estimation, should the depth loss be set as sensor depth loss or mono depth loss?
  3. During the experiment, when I switched to a different monocular depth model for estimating depth on the KITTI dataset, I observed a significant decrease in the PSNR metric. What should I pay attention to in this situation?

Cannot Run on vKitti during argpraser

When I download the latest code and run on the vkitti dataset as following code:
ns-train nsg-vkitti-car-depth-recon --data /path/to/vkitti_2.0.3/Scene01/sunset/

I follow the guided in ReadMe, while The error occurs:
train.py nsg-vkitti-car-depth-recon: error: argument [{pipeline.model.sky-model:sky-model-config,pipeline.model.sky-model:None}]: invalid choice: '0' (choose from 'pipeline.model.sky-model:sky-model-config', 'pipeline.model.sky-model:None')

Can you show me some tips, what should I do?
Any answer is helpful to me.

A huge tensor appeared while train on the kitti-MOT dataset.

I met the problem when I running ns-train using kitti dataset. It showed that a memory-boom tensor appears in /mars/nsg/data/nsg_dataparser.py. Actually, this single tensor takes about 40GB memory. I found that this tensor was built in this code blocks:

       input_size = 0

        obj_nodes_tensor = torch.from_numpy(obj_nodes)
        # if self.config.fast_loading:
        #     obj_nodes_tensor = obj_nodes_tensor.cuda()
        print(obj_nodes_tensor.shape)
        obj_nodes_tensor = obj_nodes_tensor[:, :, None, ...].repeat_interleave(image_width, dim=2)
        print(obj_nodes_tensor.shape)
        print(image_width,'*',image_height)
        obj_nodes_tensor = obj_nodes_tensor[:, :, None, ...].repeat_interleave(image_height, dim=2)

        obj_size = self.max_input_objects * add_input_rows
        input_size += obj_size

The console log(I make some outputs about the obj_nodes_tensor):
image

I caculated it as follow( I'm not actually sure):
512 * 14 * 1242 * 3 * 375 * 4(dtype)=40061952000Bytes(40g)
So, were there any error operations I have made? If not, how can I deal with the highly-cost tensor since that I haven't enough cpu or gpu resource to put it in? I look forward to hearing from you soon.

Source of depth maps

Hi,thank you for sharing your great work!
Can you sharing the depth completion script?

    dataparser_outputs = self._generate_dataparser_outputs(split)
  File "/rockywin.wang/NeRF/mars/nsg/data/nsg_dataparser.py", line 1040, in _generate_dataparser_outputs
    image_filenames, depth_name, semantic_name = get_scene_images_tracking(
  File "/rockywin.wang/NeRF/mars/nsg/data/nsg_dataparser.py", line 622, in get_scene_images_tracking
    for frame_no in range(len(os.listdir(left_depth_path))):
FileNotFoundError: [Errno 2] No such file or directory: '/rockywin.wang/NeRF/mars/data/kitti_tracking/training/completion_02/0006'

How can we get the car-object-latents and car-nerf-state-dict at the first place?

Hi MARS team, the project is a great work in the NERF simulator area!

I noticed in issue #5 that you provided the car-object-latents and car-nerf-state-dict for some scenes in KITTI-MOT and vKITTI.

My question is that how we can get the car-object-latents and car-nerf-state-dict from scratch, like if we want to use your default car-nerf foreground model on our own dataset?

Thanks!

How to edit a car of the scene

For example, you can edit objects in a scene by changing the pose and rotation of a car. Concretely, the last dimension of rotation represents object_id, the last 2 dimensions of pose represent object_id and axis. image

We now don't support directly editing the car type, but it can be implemented via manually modifying the car latents. But you can change the appearance of a car to an existing one in the scene. To edit car trajectory, you can generate your own car pose and assign it to pose and rotation across the scene.

@xBeho1der
Thanks for your detailed reply, Your algorithm models the scene and dynamic objects separately, how do I change the style of a moving car or remove the car when doing a rendering,etc.
May I ask how to change the appearance of the car to an existing one in the scene.
Does object_id represent the currently rendered dynamic object, and if I change the id can I change the type of the dynamic object?
How to change the following code to achieve scene editing, can you give some comments or example codes

pose = batch_obj_dyn[..., :3]
rotation = batch_obj_dyn[..., 3]
pose[:, :, 0, 2] = pose[:, :, 0, 2]
rotation[:, :, 0] = rotation[:, :, 0]
batch_obj_dyn[..., :3] = pose
batch_obj_dyn[..., 3] = rotation

The render process is killed.

I use the following command to render,while the process is killed.
CUDA_VISIBLE_DEVICES=0 python scripts/cicai_render.py --load-config outputs/0006/ablation-no-depth-kitti/2023-08-07_142610/config.yml --output-path renders/kitti/06/output.mp4
The error is

(mars) liaoxin@amax:~/mars$ CUDA_VISIBLE_DEVICES=0 python scripts/cicai_render.py --load-config outputs/0006/ablation-no-depth-kitti/2023-08-07_142610/config.yml --output-path renders/kitti/06/output.mp4
[18:56:42] [array([0.]), array([1.]), array([2.]), array([7.]), array([8.]), array([3.]), nsg_dataparser.py:843
array([4.]), array([9.]), array([5.]), array([12.]), array([6.]), array([10.]),
array([11.]), array([13.]), array([14.])] in this scene.
finished data parsing
[18:56:49] [array([0.]), array([1.]), array([2.]), array([7.]), array([8.]), array([3.]), nsg_dataparser.py:843
array([4.]), array([9.]), array([5.]), array([12.]), array([6.]), array([10.]),
array([11.]), array([13.]), array([14.])] in this scene.
Killed

I think it may be out of memory.I wander how much memory it need to render a kitti model.
The memory situation is as follows.

liaoxin@amax:~$ free
total used free shared buff/cache available
Mem: 263727240 127613456 132761136 59212 3352648 134206488
Swap: 8000508 8000508 0

RuntimeError: Error writing 'renders': [NULL @ 0x5632d3584140] Unable to find a suitable output format for 'renders'

I meet some wrong in the rendering process.
🎥 Rendering 🎥 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ? --:--
Traceback (most recent call last):
File "scripts/cicai_render.py", line 210, in _render_trajectory_video
writer.add_image(render_image)
File "/home/liaoxin/.conda/envs/mars/lib/python3.8/site-packages/mediapy/init.py", line 1645, in add_image
if self._proc.stdin.write(data) != len(data):
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "scripts/cicai_render.py", line 392, in
entrypoint()
File "scripts/cicai_render.py", line 388, in entrypoint
tyro.cli(RenderTrajectory).main()
File "scripts/cicai_render.py", line 370, in main
_render_trajectory_video(
File "scripts/cicai_render.py", line 210, in _render_trajectory_video
writer.add_image(render_image)
File "/home/liaoxin/.conda/envs/mars/lib/python3.8/contextlib.py", line 525, in exit
raise exc_details[1]
File "/home/liaoxin/.conda/envs/mars/lib/python3.8/contextlib.py", line 510, in exit
if cb(*exc_details):
File "/home/liaoxin/.conda/envs/mars/lib/python3.8/site-packages/mediapy/init.py", line 1607, in exit
self.close()
File "/home/liaoxin/.conda/envs/mars/lib/python3.8/site-packages/mediapy/init.py", line 1658, in close
raise RuntimeError(f"Error writing '{self.path}': {s}")
RuntimeError: Error writing 'renders': [NULL @ 0x5632d3584140] Unable to find a suitable output format for 'renders'
renders: Invalid argument

Someone said the ffmpeg versionis too old ,so I updated the ffmpeg to the 6.0.0.But it still can't solve the problem.
My command is as follows.
CUDA_VISIBLE_DEVICES=0 python scripts/cicai_render.py --load-config outputs/clone/nsg-vkitti-car-depth-recon/2023-08-04_153117/config.yml --output-path renders/

Poor edit performance

I try to edit the location of the car , and find the result is not very good. I make a little change of the x of the first car. I wonder if there are some tricks to edit.
00002

if camera_idx%237 in range(0, 8):
pose[:, :, 0, 0] *= 1.5

rgb_00002

Question about calculating depth loss.

Hello, your work is great.
but it seems that you mutiply depth_gt with directions_norm twice. Does it make sense?
the default value of self.config.is_euclidean_depth is False at Line 112, nsg/models/scene_graph.py

    is_euclidean_depth: bool = False
    """Whether input depth maps are Euclidean distances (or z-distances)."""

at Line 695-696, nsg/models/scene_graph.py, you mutiply depth_gt with directions_norm the first time.

            if not self.config.is_euclidean_depth:
                depth_gt = depth_gt * outputs["directions_norm"]

at Line 715-720, nsg/models/scene_graph.py, in fuction monosdf_depth_loss, you mutiply depth_gt with directions_norm the second time.

            mono_depth_loss = monosdf_depth_loss(
                termination_depth=depth_gt,
                predicted_depth=predicted_depth,
                is_euclidean=self.config.is_euclidean_depth,
                directions_norm=outputs["directions_norm"],
            )

the fuction in nsg/model_components/loss.py

def monosdf_depth_loss(
    termination_depth: Float[Tensor, "*batch 1"],
    predicted_depth: Float[Tensor, "*batch 1"],
    directions_norm: Float[Tensor, "*batch 1"],
    is_euclidean: bool,
):
    """MonoSDF depth loss"""
    if not is_euclidean:
        termination_depth = termination_depth * directions_norm
    sift_depth_loss = ScaleAndShiftInvariantLoss(alpha=0.5, scales=1)
    mask = torch.ones_like(termination_depth).reshape(1, 32, -1).bool()
    return sift_depth_loss(predicted_depth.reshape(1, 32, -1), (termination_depth * 50 + 0.5).reshape(1, 32, -1), mask)

Is it propal?

Why does rendered video show a black screen when playing?

With the command

python /root/autodl-fs/tiny-cuda-nn/mars/scripts/cicai_render.py --load-config /root/autodl-fs/outputs/clone/nsg-vkitti-car-depth-recon/2023-08-19_174218/config.yml --output-path /root/autodl-fs/renders/output.mp4

I get the rendered video:
https://github.com/OPEN-AIR-SUN/mars/assets/140368447/bcc18bd8-79a3-4865-8c07-ec784371eb97

Why does it only play properly on iOS devices but show a black screen when playing on other devices?

some doubts about kitti data in training step

thank you for your work,

I am learning to reproduce kitti results, and I notice that you use car_object_latents named latent_codes_car_van_truck.pt, could you share this car_latents? I try to use car-object-latents/latent_codes06.pt to train, but effect is not so good. the car is not clear and all the vehicles are same.

Besides,another question is how to use the panoptic_maps you provided in training process, could you share your cacai_config about training with panoptic maps?

much thanks

How to get the result of depth completion?

Hi, thanks for sharing your great work.
May I ask how you get the results of depth completion.
Can you provide a link to download the panoptic maps from the kitti dataset? The official website is a bit difficult to access.
Thanks

Reproduce the results on KITTI datasets

Hi, thanks for sharing your great work.
I would like to reproduce the results on the KITTI dataset, can you provide the data of latents and colors files for the following scenarios 0001, 0002, 0006?
Can you tell me how to generate the data under this file: mars/data/kitti/panoptic_maps/colors

How to obtain intuitive 3D results.

Dear author,
Hello. I encountered an issue when using the command ns-render to render my model results. Even though I changed the file name of the output file to .mp4 extension, the result is still a bunch of rendered images in my folder. Then, when I opened the Nerf Studio webpage using the command ns-viewer, I got an unordered green screen and a stack of overlapping photos instead of a 3D scene generation. This prevents me from confirming the 3D scene generated by my model from any perspective.

I used approximately 30 photos (lastframe-firstframe=30), trained for about 60000 iterations, and the PSNR convergence value is 24.25( The rendered pictures performance very well ). I would like to know if this issue is related to the small number of my train-set or if there may be some other problems.

Furthermore, I would like to ask if there are any other good methods to obtain visualized results of 3D scenes apart from ns-viewer. I look forward to your reply.

Questions about the results of the experiments?

Hi, thanks for sharing your great work.
I have a few questions for you.

  1. Experimental results using your proposed monocular depth estimation algorithm on the KITTI dataset are lower than the performance without depth maps. What combination of modules were used to obtain the experimental results you provided on the KITTI dataset?
  2. When I run the render script I get the following error:
    python scripts/cicai_render.py --load-config outputs/0006/ablation-no-depth-kitti/2023-08-02_111759/config.yml --output-path renders/
    Creating trajectory video
    🎥 Rendering 🎥 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ? --:--
    Traceback (most recent call last):
    File "/home//mars/scripts/cicai_render.py", line 161, in _render_trajectory_video
    outputs = pipeline.model.get_outputs_for_camera_ray_bundle_render(camera_ray_bundle)
    File "/home/
    /anaconda3/envs/SUDS/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(args, **kwargs)
    File "/home/
    /mars/nsg/models/scene_graph.py", line 844, in get_outputs_for_camera_ray_bundle_render
    outputs = self.forward(ray_bundle=ray_bundle)
    File "/home//mars/nerfstudio/nerfstudio/models/base_model.py", line 140, in forward
    return self.get_outputs(ray_bundle)
    File "/home/
    /mars/nsg/models/scene_graph.py", line 330, in get_outputs
    obj_pose = self.batchify_object_pose(ray_bundle).to(self.device)
    File "/home/*/mars/nsg/models/scene_graph.py", line 643, in batchify_object_pose
    batch_obj_metadata = torch.index_select(obj_meta_tensor, 0, obj_idx.reshape(-1)).reshape(
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

Could you give some suggestions about how to solve the problems?
Many thanks!

Why are there no vehicles in the rendering results?

Hi, @xBeho1der

Great work.

Following your rendering script,

python scripts/cicai_render.py --load-config outputs/nvs75fullseq/nsg-vkitti-car-depth-nvs/2023-06-21_135412/config.yml --output-path renders/

I get the result:

output.mp4

Why are there no vehicles in the rendering results?

How to adapt to DAIR V2X dataset?

Hi, I tried to adapt MARS to DAIR V2X roadside dataset , but I got some problems. In the training phase, I set use_depth=False and use_car_latents=False , to prevent overwrite the model weight of the CarNeRF model. In the rendering phase, I set use car_latents=True, to render the cars. I save the output, "objects_rgb", as below, is my process wrong?
截屏2023-08-28 10 46 57

Multi-GPU training question

I wander to know if mars support muti-GPU to train or render.
I notice that there is a TODO
# TODO: To run cicai_render.py, add to(self.device) in the following line
Dose it mean the render don't suppport muti-GPU to render.

How to infer the shadow of cars?

Hi, thanks for sharing your great work.

I noticed that the shadow looks realistic even when you move objects. I'm quite curious about it. Did you train the shadow with cars, or use an independent shadow model?

pretrain model shape is different with config shape

input cmd

python scripts/cicai_render.py --load-config outputs/clone/nsg-vkitti-car-depth-recon/2023-08-02_085533/config.yml --output_path renders/output-pretrain.mp4 --seconds 13
I place the pretrained vkitti2 model below in 2023-08-02_085533
image

output

Loading poses from: vkitti2/Scene06/clone/pose.txt
Loading bbox from: vkitti2/Scene06/clone/bbox.txt
Loading info from: vkitti2/Scene06/clone/info.txt
[array([0.]), array([1.]), array([2.]), array([7.]), array([8.]), array([3.]),
array([9.]), array([4.]), array([5.]), array([12.]), array([6.]), array([10.]),
array([11.]), array([13.])] in this scene
finished data parsing
Loading poses from: vkitti2/Scene06/clone/pose.txt
Loading bbox from: vkitti2/Scene06/clone/bbox.txt
Loading info from: vkitti2/Scene06/clone/info.txt
[array([0.]), array([1.]), array([2.]), array([7.]), array([8.]), array([3.]),
array([9.]), array([4.]), array([5.]), array([12.]), array([6.]), array([10.]),
array([11.]), array([13.])] in this scene
finished data parsing
/home/rongbo.ma/anaconda3/envs/mars/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:61: FutureWarning: Importing PeakSignalNoiseRatio from torchmetrics was deprecated and will be removed in 2.0. Import PeakSignalNoiseRatio from torchmetrics.image instead.
_future_warning(
Loading latest checkpoint from load_dir
Traceback (most recent call last):
File "scripts/cicai_render.py", line 391, in
entrypoint()
File "scripts/cicai_render.py", line 387, in entrypoint
tyro.cli(RenderTrajectory).main()
File "scripts/cicai_render.py", line 311, in main
_, pipeline, _, _ = eval_setup(
File "/home/rongbo.ma/nerf/mars/nerfstudio/nerfstudio/utils/eval_utils.py", line 104, in eval_setup
checkpoint_path, step = eval_load_checkpoint(config, pipeline)
File "/home/rongbo.ma/nerf/mars/nerfstudio/nerfstudio/utils/eval_utils.py", line 61, in eval_load_checkpoint
pipeline.load_pipeline(loaded_state["pipeline"], loaded_state["step"])
File "/home/rongbo.ma/nerf/mars/nsg/nsg_pipeline.py", line 252, in load_pipeline
self.load_state_dict(state, strict=True)
File "/home/rongbo.ma/nerf/mars/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 125, in load_state_dict
self.model.load_state_dict(model_state, strict=False)
File "/home/rongbo.ma/anaconda3/envs/mars/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SceneGraphModel:
size mismatch for background_model.field.embedding_appearance.embedding.weight: copying a param with shape torch.Size([358, 32]) from checkpoint, the shape in current model is torch.Size([476, 32]).

How to generate the panoptic_maps?

Hi, thank you for sharing the great work!
I download the kitti-step data, but I don't know how to generate the panoptic_maps?
This is the kitti step data.
image
This is the panoptic maps.
image

About Edit the Scene

Excellent work!!!
Can you specifically describe how to

  1. perform vehicle addition and removal
  2. change the camera Angle
  3. know the vehicle id in the scene.
    in the cicai_render.py?
    Thank you very much

Problem using custom dataset: camera poses and visible objects poses cannot be aligned properly

Hi all!

MARS is a great project which works well on kitti and vkitti dataset. But I would like to implement it on other datasets, like nuScenes, so multi cam images can be used in the model.

However, after I tried to line up the nuScenes data with the "poses" and "visible_objects_" variables L1044-1046, the cam poses and the object bounding boxes seemed not to match.

I concluded the mismatch from the following observation:

During training, several eval images are saved, so we can see the rough bouding box positions in the "objects_depth_xxxxx.png" image ( at least in the early iterations, or maybe I was wrong ... ). Here are some samples of the eval images I got when training with nuScenes dataset (scene 0106).

iter 500:

img_500_647ef572ef50e7677631
objects_depth_500_bd587bbad1524bd8e658

iter 2000:

img_2000_26137c500c4b22c6b896
objects_depth_2000_7293bd957da773b57787

iter 3000:

img_3000_8e64a025b822d2ecb28a
objects_depth_3000_1588276b10c99438f7b2

It seems to me that there is a certain translation or rotation "gap" between my object bbox poses and the correct ones.

Is there any way that we can easily debug the camera poses and the visible objects poses? Or is there any special treatment on the kitti dataset we must pay extra attention on?

Thanks a lot!

depth map of completion_02&completion_03

Hello, I have read the issue and notice the used depth is acquired by a monocular depth estimation method. Could you just give us the generated depth map of the 0006 sequence so that we can directly reproduce your result? Besides, how about the performance of using the raw lidar depth provided by KITTI?

How to edit the scene

Hi! Thank you for the great work! I had a few questions about edit the scene on kitti-0006.
I read the code as you suggested(mars/scripts/cicai_render.py), but did not find any code about the editing function. Could you give me same advice about of how to edit scene , e.g., change the type of the scene's car, edit the motion trajectory, etc.

Thanks a lot in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.