pjlab-adg / 3dtrans Goto Github PK
View Code? Open in Web Editor NEWAn open-source codebase for exploring autonomous driving pre-training
Home Page: https://bobrown.github.io/boZhang.github.io
License: Apache License 2.0
An open-source codebase for exploring autonomous driving pre-training
Home Page: https://bobrown.github.io/boZhang.github.io
License: Apache License 2.0
Hello!
I am trying to run the pre-training script listed in the codebase documentation.
I am getting the following error message when trying to run the script:
Script:
sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 \ --cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml \ --batch_size 2 \ --epochs 30
Error:
´ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 506209) of binary: /cluster/home/martiiv/deeplearningproject/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group´
I am using:
Has anyone else encountered this error?
Full message:
`+ NGPUS=2
os.environ('LOCAL_RANK')
instead.INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/cluster/home/martiiv/deeplearningproject/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=38966
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_0/1/error.json
program started
program started
2023-11-13 12:58:15,729 train_pointcontrast.py main 91 INFO Start logging
2023-11-13 12:58:15,730 train_pointcontrast.py main 93 INFO CUDA_VISIBLE_DEVICES=0,1
2023-11-13 12:58:15,730 train_pointcontrast.py main 96 INFO total_batch_size: 2
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO batch_size 1
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO epochs 15
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO workers 8
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO extra_tag default
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO ckpt None
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO pretrained_model None
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO launcher pytorch
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO tcp_port 18888
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO sync_bn False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO fix_random_seed False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO ckpt_save_interval 1
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO local_rank 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO max_ckpt_save_num 30
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO merge_all_iters_to_one_epoch False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO set_cfgs None
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO max_waiting_mins 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO start_epoch 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO num_epochs_to_eval 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO save_to_file False
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.ROOT_DIR: /cluster/home/martiiv/DeepLearningProject/3DTrans
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.LOCAL_RANK: 0
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.CLASS_NAMES: ['Vehicle', 'Pedestrian', 'Cyclist']
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.USE_PRETRAIN_MODEL: False
2023-11-13 12:58:15,731 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG = edict()
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATASET: ONCEDataset
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_PATH: ../data/once
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.LABELED_RATIO: 0
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_CLOUD_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0]
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.VOXEL_SIZE: [0.1, 0.1, 0.2]
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.UNLABELED_DATA_FOR: ['teacher', 'student']
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.INFO_PATH = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.train: ['once_infos_train_vehicle.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.val: ['once_infos_val_vehicle.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.test: ['once_infos_test.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_small: ['once_infos_raw_small.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_medium: ['once_infos_raw_medium.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_large: ['once_infos_raw_large.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.DATA_SPLIT = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.train: train
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.test: val
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.raw: raw_small
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.POINT_FEATURE_ENCODING = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.encoding_type: absolute_coordinates_encoding
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.used_feature_list: ['x', 'y', 'z', 'intensity']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.src_feature_list: ['x', 'y', 'z', 'intensity']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_PROCESSOR: [{'NAME': 'mask_points_and_boxes_outside_range', 'REMOVE_OUTSIDE_BOXES': True}, {'NAME': 'shuffle_points', 'SHUFFLE_ENABLED': {'train': True, 'test': False}}, {'NAME': 'transform_points_to_voxels', 'VOXEL_SIZE': [0.1, 0.1, 0.2], 'MAX_POINTS_PER_VOXEL': 5, 'MAX_NUMBER_OF_VOXELS': {'train': 60000, 'test': 60000}}]
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.DATA_AUGMENTOR = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'gt_sampling', 'USE_ROAD_PLANE': False, 'DB_INFO_PATH': ['once_dbinfos_train_vehicle.pkl'], 'PREPARE': {'filter_by_min_points': ['Car:5', 'Bus:5', 'Truck:5', 'Pedestrian:5', 'Cyclist:5']}, 'SAMPLE_GROUPS': ['Car:1', 'Bus:4', 'Truck:3', 'Pedestrian:2', 'Cyclist:2'], 'NUM_POINT_FEATURES': 4, 'REMOVE_EXTRA_WIDTH': [0.0, 0.0, 0.0], 'LIMIT_WHOLE_SCENE': True}, {'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.TEACHER_AUGMENTOR = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.TEACHER_AUGMENTOR.DISABLE_AUG_LIST: ['random_world_scaling']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.TEACHER_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.STUDENT_AUGMENTOR = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.STUDENT_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.STUDENT_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.BASE_CONFIG: cfgs/dataset_configs/once/PRETRAIN/unsupervised_once_dataset.yaml
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.USE_PAIR_PROCESSOR: True
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.NUM_EPOCHS: 15
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.OPTIMIZER: adam_onecycle
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR: 0.001
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.WEIGHT_DECAY: 0.01
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.MOMENTUM: 0.9
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.MOMS: [0.95, 0.85]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.PCT_START: 0.4
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.DIV_FACTOR: 10
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.DECAY_STEP_LIST: [35, 45]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_DECAY: 0.1
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_CLIP: 1e-07
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_WARMUP: False
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.WARMUP_EPOCH: -1
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.GRAD_NORM_CLIP: 10
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.POS_THRESH: 0.1
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NEG_THRESH: 1.4
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3 = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.DOWNSAMPLE_FACTOR: 4
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.POOL_RADIUS: [1.2]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.NSAMPLE: [16]
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4 = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.DOWNSAMPLE_FACTOR: 8
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.POOL_RADIUS: [2.4]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.NSAMPLE: [16]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.FEATURES_SOURCE: ['bev']
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.POINT_SOURCE: raw_points
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NUM_KEYPOINTS: 2048
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NUM_NEGATIVE_KEYPOINTS: 1024
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.TEST = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.TEST.BATCH_SIZE_PER_GPU: 4
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.MODEL = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.MODEL.NAME: PVRCNN_PLUS_BACKBONE
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.MODEL.VFE = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.VFE.NAME: MeanVFE
2023-11-13 12:58:15,735 config.py log_config_to_file 10 INFO
cfg.MODEL.BACKBONE_3D = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.BACKBONE_3D.NAME: VoxelResBackBone8x
2023-11-13 12:58:15,735 config.py log_config_to_file 10 INFO
cfg.MODEL.MAP_TO_BEV = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.MAP_TO_BEV.NAME: HeightCompression
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.MAP_TO_BEV.NUM_BEV_FEATURES: 256
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.TAG: pointcontrast_pvrcnn_res_plus_backbone
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.EXP_GROUP_PATH: cfgs/once_models/unsupervised_model
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 506209) of binary: /cluster/home/martiiv/deeplearningproject/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=38966
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_1/1/error.json
program started
program started
`
Hello, thanks for great works!
During Waymo dataset evaluation, I encountered 0 mAP and GT objects.
I searched in PCDet issue, the author says that Waymo 1.0.0 is incompatible with evaluation, and they use Waymo 1.2.0 dataset as default, referring to this: open-mmlab/OpenPCDet#1102
However, in your AD-PT paper, you mentioned that you used Waymo 1.0.0 as your experiment setting. As your codebase is based on PCDet, may I ask if you also encountered this 0 AP error during waymo evaluation? Is Waymo 1.0.0 okay for evaluation?
Hello, thanks for you ADA codebase.
I try to train pvrcnn with Bi3D, I use kitti as source domain and a custom dataset in kitti format (smaller than kitti) as target domain.
A CUDA out of memory problem occured during Stage 2. I use 6 RTX 2080ti (each has 10 GB memory) and set BATCH_SIZE_PER_GPU to 1.
The Discriminator training and active evaluating were both done successfully but the CUDA out of memory problem occured after these.
Are there any bug in memory management in this code? Or do I need more memory to train?
Looking forward to your response!
Hello!
I am currently working on reproducing the results from the AD-PT paper.
To save some time I am trying to include a ckpt file when running the pretraining with Pointcontrast file.
sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 \
--cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml \
--batch_size 4 \
--epochs 4 \
--ckpt once_1M_ckpt.pth
However the model fails when trying to load the model state giving me the following error:
Traceback (most recent call last):
File "train_pointcontrast.py", line 212, in <module>
main()
File "train_pointcontrast.py", line 140, in main
it, start_epoch = model.load_params_with_optimizer(args.ckpt, to_cpu=dist_train, optimizer=optimizer, logger=logger)
File "../pcdet/models/detectors/detector3d_template.py", line 403, in load_params_with_optimizer
self._load_state_dict(checkpoint['model_state'], strict=True)
KeyError: 'model_state'
Do I need to save my model state before applying the checkpoint?
Do you have a solution for this problem @BOBrown
First of all, thank you for your outstanding contribution to 3D object detection.
As can see from the paper, you don't directly perform dataset-level merging. I'm currently working on cross-domain multi-dataset training in 2D and would like to learn from your ideas from Uni3D. I would like to know how I can run uni3D to train multiple datasets (what specific code to call on the terminal, or what specific .py file to run.), and I'd like to know how you fed multiple datasets separately into the network training.
Hi!
First of all, thank you for applying transfer learning and active learning to the detection task of point cloud data. This will be a very good approach and strategy. However, the projects you have showcased are only tested on a few publicly available benchmark datasets, yielding test results. Can you tell me how to use 3DTrans to train and test my own dataset? How can I utilize this project to further improve the detection capabilities of existing models such as CenterPoint and PV-RCNN++? Can you provide me with some practical methods, detailed steps, and suggestions that can be implemented?
Thank you!
I tried to train Voxel-RCNN using Uni-3D as instructed in readme files, but encountered the following error:
Exception|implicit_gemm]feat=torch.Size([531550, 32]),w=torch.Size([32, 3, 3, 3, 32]),pair=torch.Size([27, 784744]),act=784744,issubm=True,istrain=True
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
epochs: 0%| | 0/30 [00:08<?, ?it/s]
Traceback (most recent call last):
File "/home/lipw/3DTrans-master/tools/train_multi_db.py", line 261, in <module>
main()
File "/home/lipw/3DTrans-master/tools/train_multi_db.py", line 210, in main
train_func(
File "/home/lipw/3DTrans-master/tools/train_utils/train_multi_db_utils.py", line 174, in train_model
accumulated_iter = train_one_epoch(
File "/home/lipw/3DTrans-master/tools/train_utils/train_multi_db_utils.py", line 59, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/lipw/3DTrans-master/tools/../pcdet/models/__init__.py", line 63, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict, **forward_args)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lipw/3DTrans-master/tools/../pcdet/models/detectors/voxel_rcnn.py", line 61, in forward
batch_dict = cur_module(batch_dict)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lipw/3DTrans-master/tools/../pcdet/models/backbones_3d/spconv_backbone_unibn.py", line 211, in forward
t_conv2_2 = self.conv2_2(t_conv2_1)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
input = module(input)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
input = module(input)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 755, in forward
return self._conv_forward(self.training,
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 456, in _conv_forward
out_features = Fsp.implicit_gemm(
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/functional.py", line 224, in forward
raise e
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/functional.py", line 210, in forward
out, mask_out, mask_width = ops.implicit_gemm(
File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/ops.py", line 1513, in implicit_gemm
mask_width, tune_res_cpp = ConvGemmOps.implicit_gemm(
RuntimeError: /io/build/temp.linux-x86_64-cpython-310/spconv/build/core_cc/src/cumm/conv/main/ConvMainUnitTest/ConvMainUnitTest_matmul_split_Simt_f32f32f32_0.cu:1047
cuda execution failed with error 700 an illegal memory access was encountered
Simt_f32f32f32f32f32tnt_m32n128k16m32n32k8A1_200_C301LLL_SK error with params [531550, 32] [32, 27, 32] [784744, 32] [27, 784744] [784744, 1] [784744] [] -1
Training PV-RCNN+ using Uni3D or training vanilla Voxel R-CNN works fine though.
In Table 8 of Uni3D, generalization study is conducted by evaluating zero-shot detection accuracy on KITTI. Since two separate detection heads are trained and a dual-BN is leveraged for nuscenes and waymo during pre-training, what are the implementation details for conducting zero-shot detection on KITTI and which detection head is used?
Hi, Thank you for this repository. I really like the Quick Sequence Demo. I would like to use the same for KITTI /Waymo. Is it to be used the same way?
If not, could you give me some pointers on going about it? I would like to contribute by creating it then.
Thanks!
I would like to ask a question about the AD-PT paper. In the AD-PT paper, when using only 20% of the kitti dataset, it showed a significant performance improvement in the SECOND model. However, when I used the SECOND-IOU model using the ONCE 1M provided as a backbone pretrain model, it showed -1.7mAP performance when using AD-PT. I would like to ask if you can release the imageset of the data when using 20% of the kitti.
Nice job!
How can I use weakly supervised pre-training on the Waymo dataset for fine-tuning to Kitti?
When I run the code using the following script, I met a error of ModuleNotFoundError: No module named ‘torch_scatter’
'''
sh ./scripts/UDA/dist_train_uda.sh 4 --cfg_file ./cfgs/DA/waymo_kitti/voxel_rcnn_pre_SN_feat_3.yaml
'''
I have installed the code using the INSTALL.md, and please tell me how to fix this error?
When I run the following command,
python /tools/train_multi_db_merge_loss.py --cfg_file ./cfgs/MDF/nusc_kitti/nusc_kitti_voxel_rcnn_feat_3_uni3d.yaml
I encounter the following problem:
Can you help me to solve it ? Thank you very much
Traceback (most recent call last):
File "/home/hyh/westdigital_dataset/3DTrans/tools/train_multi_db_merge_loss.py", line 268, in
main()
File "/home/hyh/westdigital_dataset/3DTrans/tools/train_multi_db_merge_loss.py", line 135, in main
model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES), dataset=source_set)
File "/home/hyh/westdigital_dataset/3DTrans/tools/../pcdet/models/init.py", line 16, in build_network
model = build_detector(
File "/home/hyh/westdigital_dataset/3DTrans/tools/../pcdet/models/detectors/init.py", line 73, in build_detector
model = all[model_cfg.NAME](
TypeError: init() missing 3 required positional arguments: 'num_class_s2', 'dataset_s2', and 'source_one_name'
python /tools/train_multi_db_merge_loss.py --cfg_file ./cfgs/MDF/nusc_kitti/nusc_kitti_voxel_rcnn_feat_3_uni3d.yaml
Hi, thanks for your work with this repo.
For PV-RCNN when we only use xyz features, I'm aware that the xyz_features become None. Some solutions I've seen remove the 'raw_points' from FEATURES_SOURCE below so that it can work with xyz data only.
You wrote that cover feat uses the z points as the 4th feature, essentially making the point cloud [x,y,z,z]. What's the idea behind this, and do you know if it works better than excluding 'raw_points'?
Q1 How to perform the model training and inference using our own Dataset (for DA Waymo->our Dataset)?
Q2 How to write a new dataloader to load our private Dataset?
I'm currently working on pretraining the AD-PT model using the NuScenes dataset, but I've hit a few roadblocks and could really use some help. Here's where I'm at:
Following the guide i have aquired the:
(However, due to an error these files had to be moved from 3DTrans/data/nuscenes/v1.0-trainval/, to 3DTrans /data/nuscenes/)
When running the script:
sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 --cfg_file cfgs/nuscenes_models/cbgs_dyn_pp_centerpoint.yaml --batch_size 4 --epochs 30
I received this error:
File "../pcdet/datasets/nuscenes/nuscenes_semi_dataset.py", line 114, in split_nuscenes_semi_data
raw_split = data_splits['raw']
KeyError: 'raw'
As I see the data_splits are : {'train': 'train', 'test': 'test'}
After figuring this out i modified the code (File: nuscenes_semi_dataset.py) to only run if data_splits contians 'raw':
raw_split = data_splits.get('raw')
if raw_split:
for info_path in info_paths[raw_split]:
if oss_path is None:
info_path = root_path / info_path
with open(info_path, 'rb') as f:
infos = pickle.load(f)
nuscenes_unlabeled_infos.extend(copy.deepcopy(infos))
else:
info_path = os.path.join(oss_path, info_path)
pkl_bytes = client.get(info_path, update_cache=True)
infos = pickle.load(io.BytesIO(pkl_bytes))
nuscenes_unlabeled_infos.extend(copy.deepcopy(infos))
Doing this removed the error. However, then i received this error:
Traceback (most recent call last):
File "train_pointcontrast.py", line 206, in <module>
main()
File "train_pointcontrast.py", line 112, in main
datasets, dataloaders, samplers = build_unsupervised_dataloader(
File "../pcdet/datasets/__init__.py", line 301, in build_unsupervised_dataloader
unlabeled_dataset = _semi_dataset_dict[dataset_cfg.DATASET]['UNLABELED_PAIR'](
KeyError: 'UNLABELED_PAIR'
Looking into this error I saw that this key is not in the NuScenes key, as_semi_dataset_dict looked like this:
_semi_dataset_dict = {
'ONCEDataset': {
'PARTITION_FUNC': split_once_semi_data,
'PRETRAIN': ONCEPretrainDataset,
'LABELED': ONCELabeledDataset,
'UNLABELED': ONCEUnlabeledDataset,
'UNLABELED_PAIR': ONCEUnlabeledPairDataset,
'TEST': ONCETestDataset
},
'NuScenesDataset': {
'PARTITION_FUNC': split_nuscenes_semi_data,
'PRETRAIN': NuScenesPretrainDataset,
'LABELED': NuScenesLabeledDataset,
'UNLABELED': NuScenesUnlabeledDataset,
'TEST': NuScenesTestDataset
},
'KittiDataset': {
'PARTITION_FUNC': split_kitti_semi_data,
'PRETRAIN': KittiPretrainDataset,
'LABELED': KittiLabeledDataset,
'UNLABELED': KittiUnlabeledDataset,
'TEST': KittiTestDataset
}
}
I then added a condition where the code in init.py only ran if 'UNLABELED_PAIR' was in the dataset(file: pcdet/datasets/init.py):
if 'UNLABELED_PAIR' in _semi_dataset_dict[dataset_cfg.DATASET]:
unlabeled_dataset = _semi_dataset_dict[dataset_cfg.DATASET]['UNLABELED_PAIR'](
dataset_cfg=dataset_cfg,
class_names=class_names,
infos = unlabeled_infos,
root_path=root_path,
logger=logger,
)
Then this happened:
Traceback (most recent call last):
File "train_pointcontrast.py", line 206, in <module>
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 130 INFO Total samples for nuscenes testing dataset: 0
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 131 INFO Total samples for nuscenes labeled dataset: 0
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 132 INFO Total samples for nuscenes unlabeled dataset: 0
Traceback (most recent call last):
File "train_pointcontrast.py", line 206, in <module>
main()
File "train_pointcontrast.py", line 112, in main
main()
File "train_pointcontrast.py", line 112, in main
datasets, dataloaders, samplers = build_unsupervised_dataloader(
File "../pcdet/datasets/__init__.py", line 312, in build_unsupervised_dataloader
datasets, dataloaders, samplers = build_unsupervised_dataloader(
File "../pcdet/datasets/__init__.py", line 312, in build_unsupervised_dataloader
unlabeled_sampler = torch.utils.data.distributed.DistributedSampler(unlabeled_dataset)
UnboundLocalError: local variable 'unlabeled_dataset' referenced before assignment
unlabeled_sampler = torch.utils.data.distributed.DistributedSampler(unlabeled_dataset)
UnboundLocalError: local variable 'unlabeled_dataset' referenced before assignment
Any ideas on how to tackle these errors?
Excuse me, How should I install gcc-5.4.0? I didn't find an installation command line.
Is it installed locally or in conda environment.
Thank you!
We are very interested in your work on ReSimAD, but we found some questions that we would like to consult with you. We downloaded the KITTI-like dataset, but found through KITTI's projection method that according to the label file you gave, the 3D box cannot circle the target object very well. It looks like this picture:
We could not know what the problem is in this process? Hope to get your reply.
Excuse me, when i run the command line in [Bi3D Adaptation stage 1: active source domain data ,something occured .
the command line is as below
bash scripts/ADA/dist_train_active_source.sh 2 --cfg_file ./cfgs/ADA/nuscenes-kitti/voxelrcnn/active_source.yaml --pretrained_model ***3DTrans/tools/cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi/default/ckpt/checkpoint_epoch_30.pth
Whem i do the training from beginning ,everything goes well. Howerer when i resume the training , some error occured:
Traceback (most recent call last):
File "train_active_source.py", line 272, in
main()
File "train_active_source.py", line 193, in main
lr_scheduler_discriminator, lr_warmup_scheduler_discriminator = build_scheduler(
File "/home/hyh/Projects/3DTrans/tools/train_utils/optimization/init.py", line 55, in build_scheduler
lr_scheduler = lr_sched.LambdaLR(optimizer, lr_lbmd, last_epoch=last_epoch)
File "/home/hyh/anaconda3/envs/3Dtrans/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 203, in init
super(LambdaLR, self).init(optimizer, last_epoch, verbose)
File "/home/hyh/anaconda3/envs/3Dtrans/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 39, in init
raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
Anyone else knows how to solve it?
Thank you!
Meaning work focusing on multi-datasets training and pre-training!
I am quite interested in statistical.alignment module mentioned in your paper Uni3D and this module can make a big improvement of model performance. In which step can I find the code snippet of it?
Hello, thanks for your great work!
Have you considered training only one detection head for different datasets, since the detection task for different datasets is basically based on three classes: car, pedestrian, cyclist. Although there are some differences between objects of the same category in different datasets, I am curious if sharing the detection between different datasets will make the performance about the same or even better?
Hello, I met a problem when I ran your program, I am using kitti and nuscenes-mini datasets, but when I run the source-only commandsh scripts/dist_train.sh 8 --cfg_file ./cfgs/DA/nusc_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml
my 8 NVIDIA RTX A5000 GPUs have reached 100% utilization, but the memory is used less than half, no error messages and nothing happens,I adjusted BATCH_SIZE_PER_GPU=1, but still does not work.
Hello @BOBrown !
I am trying to create the required Waymo PKL files for fine tuning point contrast on the Waymo dataset.
I have followed the instructions to download the Waymo-open-dataset and I am trying to run the following command:
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo/OD/waymo_dataset.yaml
However when running this the program loads
---------------The waymo sample interval is 1, total sequecnes is 14-----------------
And then tensorflow runs but it runs out of ram.
I have tried running the script with the following configuration:
Two Nvidia A100 gpus with 80gbs of ram each
Tensorflow 2.4.0
CUDA 11.1
cuDNN/8.0.4.3
Torch 1.8.1 cu111
Waymo-open-dataset-tf-2-4-0
The error message occurs in the Waymo_utils file on line
---------------Start to generate data infos---------------
0%| | 0/14 [00:00<?, ?it/s]
100%|██████████| 14/14 [00:00<00:00, 1197.61it/s]
2023-11-29 13:24:38,993 waymo_dataset.py include_waymo_data 105 INFO Total skipped info 14
2023-11-29 13:24:38,993 waymo_dataset.py include_waymo_data 106 INFO Total samples for Waymo dataset: 0
2023-11-29 13:24:39.125843: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
---------------The waymo sample interval is 1, total sequecnes is 14-----------------
After this it starts the GPUs and I run out of ram
What is the reason for this?
Hello!
I wonder if I want to reproduce the experimental results in Table 7 in Uni3D, how can I adjust the number of samples in nuScenes dataset, and where is the script for training on one dataset only?
Hi,
GETTING_STARTED_ADA says "Active Domain Adaptation (ADA) task is to pick up a subset..":
1 To perform the manual annotation process, how to export the subset ?
2 After manual annotation process, how to put the labeled data back to the pipeline ?
谢谢
Hello, I want to debug the codes to Better understand the paper Uni3D. However, I can not find the key codes about Coordinate-origin Alignment mentioned in Uni3D. I found that the batch_info of two datasets are directly concatenated together as batch1 and batch2, without Coordinate-origin Alignment(Maybe I miss the code). Could you please tell me where are the key codes about Coordinate-origin Alignment? Thanks a lot!
Hello!
I am trying to pretrain using AD-PT and I have encountered a problem.
I am aware that after you configure the ONCE dataset using this command:
python -m pcdet.datasets.once.once_dataset --func create_once_infos --cfg_file tools/cfgs/dataset_configs/once/OD/once_dataset.yaml
I realized later that you can't pre train using the produced files so I needed to merge the labels of the produced files using merge_labels.py located in the tools_utils folder.
Converting the files I get a Keyworderror 'Vehicle'
error when trying to run the pre training script
You need to merge the labels in the once_infos_train.pkl, once_dbinfos_train.pkl and once_dbinfos_val.pkl files to produce the once_infos_train_vehicle.pkl files. However, when trying to merge the labels I end up with an identical file.
I am using the following command
python -m tools.tools_utils.merge_labels --raw_data_pkl once_dbinfos_train.pkl --save_path once_dbinfos_vehicle.pkl
I think I need to provide something with the --vehicle_pkl command as this argument reads from a vehicle.pkl file but I don't know what type of file I need to provide and/or where I get it from!
Has anyone encountered this problem before?
Dear author,
Thank you for your great work. I am try to reproduce your result (MDF) with waymo and nuscenes dataset, but when merge two batches from waymo and nuscenes, error occurred:
batch = common_utils.merge_two_batch_dict(batch_1, batch_2)
File "../pcdet/utils/common_utils.py", line 670, in merge_two_batch_dict
batch_merge_dict[key] = np.concatenate(tar_list_merge, axis=0)
File "<array_function internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 10 and the array at index 1 has size 8
It seems the gt from nuscenes is mismatched with gt from waymo
Could you tell me how you avoid such issue?
B.R
Thank you
Thanks for your great job. I found some issues with the evaluation of source-pretrained model in GETTING_STARTED_ADA.md. I noticed that some of the yaml files don't include DATA_CONFIG_TAR. For example, I was testing with the following command
bash scripts/dist_test.sh 4 --cfg_file cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml --ckpt ../output/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi/default/ckpt/checkpoint_epoch_30.pth
However, as there is no DATA_CONFIG_TAR in cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml, this command will test the performance on the source dataset, instead of the target dataset, which is not my intension.
Why is SHIFT_COOR set to [0.0, 0.0, 1.6] in KITTI ( [0.0, 0.0, 1.8] in nuscenes), and what is the rationale behind it? How should I modify it if I'm using my own dataset?
Our team aims to broaden the boundaries of Autonomous Driving (AD) perception model, trying to find unified representations that can be generalized across different AD domains and scenarios. If you are interested in unified representation learning of AD perception model, please do not hesitate to contact us
Hello. I'm running the Bi3D Adaptation stage 1. After training the first epoch, the algorithm begins to Active Evaluate. However, after the evaluation is finished, I noticed the memory usage of GPU is abnormal. My hardware is RTX 4090 24GB, however when the evaluation is done, the program reports CUDA error: out of memory. I wonder have you ever met this problem before and how did you solve it?
What's more,
Thanks for sharing your work! I'm curious about the effect of lidar intensity input.
In most of your settings, only [x,y,z] is taken as input to the model. But for AD-PT, the intensity is also considered. I wonder if there is any reason behind that. And since intensity value depends also on the sensor type, have you tried some experiment on it and investigate the effect? Looking forward to your reply and thanks in advance.
Can you provide the code of Point-to-Beam Playback Re-sampling used in AD-PT?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.