yifanlu0227 / coalign Goto Github PK
View Code? Open in Web Editor NEW[ICRA2023] CoAlign: Robust Collaborative 3D Object Detection in Presence of Pose Errors
License: Other
[ICRA2023] CoAlign: Robust Collaborative 3D Object Detection in Presence of Pose Errors
License: Other
(cobevflow) aitest7@833f376856e4:~/wynne/CoBEVFlow$ python opencood/tools/train.py --hypes_yaml opencood/hypes_yaml/dair-v2x/npj/dair_disconet.yaml
ASync dataset with 5 time delay initialized! 4650 samples totally!
ASync dataset with 5 time delay initialized! 1717 samples totally!
/public/home/aitest7/anaconda3/envs/cobevflow/lib/python3.7/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
=== Time consumed: 0.0 minutes. ===
device: cuda
full path is: /public/home/aitest7/wynne/CoBEVFlow/logs/dair_npj_disconet_w_2023_11_07_17_20_37
=== Time consumed: 0.1 minutes. ===
Traceback (most recent call last):
File "opencood/tools/train.py", line 327, in
main()
File "opencood/tools/train.py", line 185, in main
teacher_model.load_state_dict(torch.load(teacher_checkpoint_path), strict=False)
File "/public/home/aitest7/anaconda3/envs/cobevflow/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PointPillarDiscoNetTeacher:
size mismatch for cls_head.weight: copying a param with shape torch.Size([2, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 384, 1, 1]).
size mismatch for reg_head.weight: copying a param with shape torch.Size([14, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([14, 384, 1, 1]).
Hi, thank you for your work. I am trying to evaluate CoAlign on DAIR-V2X with PointPillars as backbone. But there is only example of SECOND on openv2v. So I try to change the configs in *.yaml but it seems that something went wrong about the data_loader. Could you provide an example of Evaluation of PointPillars on DAIR-V2X or do you know how to fix this? Thanks in advance.
python opencood/tools/pose_graph_pre_calc.py -y opencood/hypes_yaml/dairv2x/lidar_only_with_noise/coalign/precalc.yaml
Noise Added: 0/0/0/0.
Dataset Building
Traceback (most recent call last):
File "opencood/tools/pose_graph_pre_calc.py", line 187, in <module>
main()
File "opencood/tools/pose_graph_pre_calc.py", line 145, in main
for i, batch_data in enumerate(eval(f"{split}_loader")):
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 53, in fetch
return self.collate_fn(data)
File "/data/mengh3/CoAlign/opencood/data_utils/datasets/late_fusion_dataset.py", line 415, in collate_batch_test
transformation_matrix = cav_content['transformation_matrix']
KeyError: 'transformation_matrix'
I try the command python setup.py install
and there is a CMake Error.
CMake Error at /data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:751 (message):
Compiling the CUDA compiler identification source file
"CMakeCUDACompilerId.cu" failed.
Compiler: /usr/bin/nvcc
Build flags:
Id flags: --keep;--keep-dir;tmp -v
The output was:
255
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin
#$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice
#$
PATH=/usr/lib/nvidia-cuda-toolkit/bin:/data/public/CUDA11/cuda-11.2/bin:/data/mengh3/myconda/envs/coalign/bin:/data/mengh3/myconda/envs/coalign/bin:/data/mengh3/myconda/envs/coalign/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/data/public/CUDA11/cuda-11.0/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/home/mengh3/bin:/home/mengh3/.local/bin:/usr/local/cuda/bin:/data/mengh3/myconda/envs/coalign/bin:/opt/anaconda3/condabin:/opt/anaconda3/bin:/usr/local/cuda-9.0/bin:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
#$ LIBRARIES= -L/usr/lib/x86_64-linux-gnu/stubs
#$ rm tmp/a_dlink.reg.c
#$ gcc -std=c++14 -D__CUDA_ARCH__=300 -E -x c++
-DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__
-D"__CUDACC_VER_BUILD__=85" -D"__CUDACC_VER_MINOR__=1"
-D"__CUDACC_VER_MAJOR__=9" -include "cuda_runtime.h" -m64
"CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"
#$ cicc --c++14 --gnu_version=70500 --allow_managed -arch compute_30 -m64
-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name
"CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library
"/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc"
--gen_module_id_file --module_id_file_name
"tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name
"CMakeCUDACompilerId.cu" --gen_c_file_name
"tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name
"tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name
"tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o
"tmp/CMakeCUDACompilerId.ptx"
#$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o
"tmp/CMakeCUDACompilerId.sm_30.cubin"
ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'
# --error 0xff --
Call Stack (most recent call first):
/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:309 (CMAKE_DETERMINE_COMPILER_ID)
CMakeLists.txt:14 (enable_language)
Hi~
I've tested the intermediate fusion method(v2x-vit) and late fusion method based on lift-splat-shoot recently.However, v2x-vit got poor results 0.47/0.37/0.22 at ap30/50/70 and the late fusion baseline got 0.83/0.71/0.43 at ap30/50/70. Besides, no fusion baseline got 0.48/0.37/0.18 at ap30/50/70, which is similar to v2x-vit.
name: opv2v_lss_single_efficientnet
root_dir: "/data2/wsh/data/OPV2V/train"
validate_dir: "/data2/wsh/data/OPV2V/validate"
test_dir: "/data2/wsh/data/OPV2V/test"
yaml_parser: "load_lift_splat_shoot_params"
train_params:
batch_size: &batch_size 2
epoches: 50
eval_freq: 2
save_freq: 2
max_cav: 7
input_source: ['camera']
label_type: 'camera'
comm_range: 70
only_vis_ego: true
add_data_extension: ['bev_visibility.png']
fusion:
core_method: 'late'
dataset: 'opv2v'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
ybound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
zbound: [-10, 10, 20.0] # 不需要和preprocess一致. NO Need to be consistent with preprocess.
ddiscr: [2, 50, 48]
mode: 'LID' # or 'UD'
data_aug_conf: &data_aug_conf
resize_lim: [0.8, 0.85]
final_dim: [480, 640]
rot_lim: [-3.6, 3.6]
H: 600
W: 800
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4
preprocess:
core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 4]
max_points_per_voxel: 32
max_voxel_train: 32000
max_voxel_test: 70000
cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]
data_augment: # use less
NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]
postprocess:
core_method: 'VoxelPostprocessor' # That's ok
gt_range: *cav_lidar
anchor_args:
cav_lidar_range: *cav_lidar
l: 3.9
w: 1.6
h: 1.56
feature_stride: 2
r: &anchor_yaw [0, 90]
num: &achor_num 2
target_args:
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.25
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853
num_bins: 2
anchor_yaw: *anchor_yaw
model:
core_method: lift_splat_shoot
args:
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: 128
use_depth_gt: false
depth_supervision: false
bevout_feature: 128
shrink_header:
kernal_size: [ 3 ]
stride: [ 2 ]
padding: [ 1 ]
dim: [ 128 ]
input_dim: 128
camera_encoder: EfficientNet
loss:
core_method: point_pillar_loss
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0
dir:
type: 'WeightedSoftmaxClassificationLoss'
weight: 0.2
args: *dir_args
optimizer:
core_method: Adam
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4
lr_scheduler:
core_method: multistep #step, multistep and Exponential support
gamma: 0.1
step_size: [25, 40]
name: opv2v_lss_v2xvit
root_dir: "/data1/wsh/data/OPV2V/train"
validate_dir: "/data1/wsh/data/OPV2V/validate"
test_dir: "/data1/wsh/data/OPV2V/test"
yaml_parser: "load_lift_splat_shoot_params" # we need specific loading functions for different backbones.
train_params: # the common training parameters
batch_size: &batch_size 2
epoches: 50
eval_freq: 2
save_freq: 2
max_cav: 7
input_source: ['camera'] # 'lidar', 'camera', 'depth'
label_type: 'camera' # 'lidar' or 'camera'
comm_range: 70
only_vis_ego: true
add_data_extension: ['bev_visibility.png']
fusion:
core_method: 'intermediate'
dataset: 'opv2v'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
ybound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
zbound: [-10, 10, 20.0] # 不需要和preprocess一致. No Need to be consistent with preprocess.
ddiscr: [2, 50, 48] # depth_min, depth_max, num_bins, make grid in image plane
mode: 'LID'
data_aug_conf: &data_aug_conf
resize_lim: [0.8, 0.85]
final_dim: [480, 640]
rot_lim: [-3.6, 3.6]
H: 600
W: 800
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4
preprocess:
core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 4] # the voxel resolution
max_points_per_voxel: 32 # maximum points allowed in each voxel
max_voxel_train: 32000 # the maximum voxel number during training
max_voxel_test: 70000 # the maximum voxel number during testing
cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]
data_augment: # useless
NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]
postprocess:
core_method: 'VoxelPostprocessor' # VoxelPostprocessor, BevPostprocessor supported
gt_range: *cav_lidar
anchor_args: # anchor generator parameters
cav_lidar_range: *cav_lidar # the range is consistent with the lidar cropping range to generate the correct anchors
l: 3.9 # the default length of the anchor
w: 1.6 # the default width
h: 1.56 # the default height
feature_stride: 2 # the feature map is shrank twice compared the input voxel tensor
r: &anchor_yaw [0, 90] # the yaw angles. 0, 90 meaning for each voxel, two anchors will be generated with 0 and 90 degree yaw angle
num: &achor_num 2 # for each location in the feature map, 2 anchors will be generated
target_args: # used to generate positive and negative samples for object detection
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.25
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853 # pi / 4
num_bins: 2
anchor_yaw: *anchor_yaw
model:
core_method: lift_splat_shoot_intermediate # trainer will load the corresponding model python file with the same name
args: # detailed parameters of this model
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: &img_feature 128
use_depth_gt: false
depth_supervision: false
supervise_single: true
bevout_feature: 128
camera_encoder: EfficientNet
fusion_args:
core_method: v2xvit
args:
voxel_size: *voxel_size
in_channels: *img_feature
v2xvit:
transformer:
encoder: &encoder
# number of fusion blocks per encoder layer, V2XFusionBlock
num_blocks: 1
# number of encoder layers, BaseWindowAttention
depth: 3
use_roi_mask: true
use_RTE: &use_RTE false
RTE_ratio: &RTE_ratio 0 # 2 means the dt has 100ms interval while 1 means 50 ms interval
# agent-wise attention, HMSA(Heterogeneous multi-agent self-attention)
cav_att_config: &cav_att_config
dim: 256
use_hetero: true
use_RTE: *use_RTE
RTE_ratio: *RTE_ratio
heads: 8
dim_head: 32
dropout: 0.3
# spatial-wise attention, BaseWindowAttention
pwindow_att_config: &pwindow_att_config
dim: 256
heads: [16, 8, 4]
dim_head: [16, 32, 64]
dropout: 0.3
window_size: [4, 8, 16]
relative_pos_embedding: true
fusion_method: 'split_attn'
# feedforward condition
feed_forward: &feed_forward
mlp_dim: 256
dropout: 0.3
sttf: &sttf
voxel_size: *voxel_size
downsample_rate: 2
loss: # loss function
core_method: point_pillar_loss # trainer will load the loss function with the same name
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0 # classification weights
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0 # regression weights
optimizer: # optimzer setup
core_method: Adam # the name has to exist in Pytorch optimizer library
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4
lr_scheduler: # learning rate schedular
core_method: multistep # step, multistep and Exponential support
gamma: 0.1
step_size: [25, 40]
Again, tried CUDA_VISIBLE_DEVICES=1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=5 --use_env opencood/tools/train_ddp.py -y opencood/hypes_yaml/opv2v/lidar_only_with_noise/coalign/pointpillar_coalign_woba.yaml
And the following error appeared:
Traceback (most recent call last):
File "opencood/tools/train_ddp.py", line 250, in <module>
main()
File "opencood/tools/train_ddp.py", line 148, in main
ouput_dict = model(batch_data['ego'])
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home//Projects/CoAlign/opencood/models/point_pillar_baseline_multiscale.py", line 103, in forward
batch_dict = self.pillar_vfe(batch_dict)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 151, in forward
features = pfn(features)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 37, in forward
for num_part in range(num_parts + 1)]
File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 37, in <listcomp>
for num_part in range(num_parts + 1)]
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
What could be the issue here? Any solutions recommended? This seems to be bugs with train_ddp.py
.
Hi, thanks for your valuable work.
In my understanding, the premise for using the agent-object pose graph is that the same object can always be seen between two perspectives, but this premise may not always be satisfied, especially in DAIR-V2X datasets. I'm curious about how this problem was solved.
Thank you for any response.
Hi, I've tested the intermediate fusion method based on lift-splat-shoot recently.
The following is my training config:
`
name: dairv2x_lss_intermediate_e30
data_dir: "dataset/cooperative-vehicle-infrastructure"
root_dir: "dataset/cooperative-vehicle-infrastructure/train.json"
validate_dir: "dataset/cooperative-vehicle-infrastructure/val.json"
test_dir: "dataset/cooperative-vehicle-infrastructure/val.json"
class_names: ['Car']
yaml_parser: "load_lift_splat_shoot_params"
train_params:
batch_size: &batch_size 4 #4
epoches: 30 #50
eval_freq: 15 #2
save_freq: 5 #2
max_cav: 5
input_source: ['camera']
label_type: 'camera'
comm_range: 100
only_vis_ego: false
fusion:
core_method: 'intermediate'
dataset: 'dairv2x'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-102.4, 102.4, 0.4] # Limit the range of the x direction and divide the grids
ybound: [-51.2, 51.2, 0.4] # Limit the range of the y direction and divide the grids
zbound: [-10, 10, 20.0] # Limit the range of the z direction and divide the grids
ddiscr: [2, 100, 98]
mode: 'LID' # or 'UD'
data_aug_conf: &data_aug_conf
resize_lim: [0.27, 0.28]
final_dim: [288, 512]
rot_lim: [0, 0]
H: 1080
W: 1920
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4 # placeholder. no use
preprocess:
core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 5] # useful
max_points_per_voxel: 32 # useless
max_voxel_train: 32000 # useless
max_voxel_test: 70000 # useless
cav_lidar_range: &cav_lidar [-102.4, -51.2, -3.5, 102.4, 51.2, 1.5]
data_augment:
NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]
postprocess:
core_method: 'VoxelPostprocessor'
gt_range: *cav_lidar
anchor_args:
cav_lidar_range: *cav_lidar
l: 4.5
w: 2
h: 1.56
feature_stride: 2
r: &anchor_yaw [0, 90]
num: &achor_num 2
target_args:
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.20
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853
num_bins: 2
anchor_yaw: *anchor_yaw
model:
core_method: lift_splat_shoot_intermediate
args:
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: &img_feature 128
use_depth_gt: false
depth_supervision: false
supervise_single: true
bevout_feature: 128
camera_encoder: EfficientNet
fusion_args:
core_method: att #att #max #att # support v2vnet, fcooper, v2xvit, self-att. Referring to LiDAR yamls to changethe args.
args:
voxel_size: *voxel_size
in_channels: *img_feature
loss:
core_method: point_pillar_loss
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0
dir:
type: 'WeightedSoftmaxClassificationLoss'
weight: 0.2
args: *dir_args
optimizer:
core_method: Adam
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4
lr_scheduler:
core_method: multistep #step, multistep and Exponential support
gamma: 0.1
step_size: [15, 25] #[25, 40]
`
However, I've got poor results 6.62/2.01/0.295 at ap30/50/70 and the late fusion baseline got 40.45/18.39/4.19 at ap30/50/70. The intermediate fusion method should outperform the late fusion but the results show the opposite.
Hope for your reply, thanks.
I also encountered a similar problem, and the error was
So I deleted the ’_backup‘ in ’label_world_backup‘ and it ran successfully, but I don’t know if this is correct.
data[0]['params']['vehicles_front'] = read_json(os.path.join(self.root_dir,frame_info['cooperative_label_path'].replace("label_world", "label_world_backup")))
Originally posted by @HuangZhe885 in #11 (comment)
Hi Yifan,
Congrats on your paper being accepted by ICRA2023! Besides opening source in this repo, I also invite you to make a PR in OpenCOOD official repo to integrate your codes. In this way, I can put your results on the opencood homepage table with pre-trained model, which can help you gain more attention and perhaps citation.
Have you tested camera collaboration on the V2XSet dataset? It seems that this dataset has no additional data, such as bev_visibility.png
Hi, I have run the code of FPVRCNN and V2VNet and V2VNet_robust on DAIR V2X, OPV2V respectively according to the provided code, but there is always a situation that the result cannot be obtained during inference.
First of all thank you for contributing such valuable work.
Regarding CoAlign's [email protected] on DAIR-V2X, we found that v3 on arXiv has a significant increase compared to v2. The Ap rose from 0.598 (v2) to 0.746 (v3) under the level of no noise, and even other methods such as V2X-ViT also increased from 0.571 (v2) to 0.704 (v3). It's not clear which version is reasonable and correct. In addition, we refer to your previous article Where2Comm, the V2X-ViT in it is 0.543, which is similar to the v2 version.
Hi, thank you for your work. I tried to use SECOND_uncertainty.yaml to train the single detector. The bounding box I generated was not very accurate. Could you please share the best checkpoints under these three datasets? I want to check where exactly I went wrong.
frame_info['cooperative_label_path']
KeyError: 'cooperative_label_path'
Hello,
I've been going through your project and am looking for the camera yaml file in hypes_yaml/dairv2x. Would you be able to provide this file? It would be greatly helpful for my research. Thank you very much!
您好,打扰了。请问能够提供hypes_yaml/dairv2x中的camera yaml吗?非常感谢!
If not, https://github.com/MediaBrain-SJTU/CoCa3D is another solution, is that right?
Hi, I tried using the train_ddp.py to leverage multiple GPUs for faster training. Everything started fine, but when I was finishing the last epoch of the training process, an error showed and the training process was terminated.
This is the error message:
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1800936 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1801044 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1801058 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2003339 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 2003341) of binary: /home//.conda/envs/coalign/bin/python
Traceback (most recent call last):
File "/home//.conda/envs/coalign/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home//.conda/envs/coalign/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========================================================
opencood/tools/train_ddp.py FAILED
--------------------------------------------------------
Failures:
[1]:
time : 2023-10-25_20:08:24
host : lthpc-AS-4124GS-TNR
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 2003342)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 2003342
[2]:
time : 2023-10-25_20:08:24
host : lthpc-AS-4124GS-TNR
rank : 4 (local_rank: 4)
exitcode : -6 (pid: 2003343)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 2003343
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-10-25_20:08:24
host : lthpc-AS-4124GS-TNR
rank : 2 (local_rank: 2)
exitcode : -6 (pid: 2003341)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 2003341
========================================================
What could possibily be the issue here? Thanks!
CoAlign/opencood/utils/common_utils.py", line 217, in <listcomp>
iou = [box.intersection(b).area / (box.union(b).area) for b in boxes]
ZeroDivisionError: float division by zero
Hi~When using visual intermediate fusion methods,do voxel_size_x and voxel_size_y need to be the same?
First of all thank you for contributing such valuable work.
May I ask whether the detection accuracy AP mentioned in your paper is from a 3D perspective or a BEV perspective?
thank you again.
Thank you for your excellent work. I'm particularly interested in the derivation of the loss function using KL divergence in your research. I attempted to derive it myself, but due to my limited expertise, I wasn't able to reproduce the formulas presented in your paper. I would greatly appreciate it if you could kindly explain the derivation process.
Thank you again!
Hi, thanks for your project.
I notice CoAlign supports the V2XSim2.0, but I don't have enough space to store V2X-Sim 2.0.
Could you please provide the train/test pickle files in V2XSim1.0?
Thank you!
May I ask if you have parameter files for some models that have already been trained on this dataset, as I cannot train these models well on this dataset, and the metrics are a bit strange
Hi~
I I noticed that in CoAlign/opencood/models/point_pillar_deform_transformer.py line 11:
from opencood.models.sub_modules.deformable_transformer_backbone import DeformableTransformerBackbone
But there is no deformable_transformer_backbone.py in CoAlign/opencood/models/sub_modules.
May i ask where can i find deformable_transformer_backbone.py.
Thank you, looking forward to your reply!谢谢!
The CoPercption UAVS dataset is a little hard to deal with.
Hi, after spending an afternoon installing unfriendly dependencies from the source code, the following problem occurred. Can you help me see what the reason is? Thanks.
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/xxx/data/opencood/CoAlign-main/opencood/data_utils/datasets/intermediate_fusion_dataset.py", line 312, in getitem
cur_agent_in_all_agent = [all_agent_id_list.index(cur_agent) for cur_agent in cur_agent_id_list] # indexing current agent in all_agent_id_list
File "/data/xxx/data/opencood/CoAlign-main/opencood/data_utils/datasets/intermediate_fusion_dataset.py", line 312, in
cur_agent_in_all_agent = [all_agent_id_list.index(cur_agent) for cur_agent in cur_agent_id_list] # indexing current agent in all_agent_id_list
ValueError: '6853' is not in list
Thanks for your project.
I notice that the code generating pkl file for V2X-Sim 2.0 is missing.
I use the file from #2 (comment) without success:
AssertionError: Error: there are 47638 label files but 47300 lidarseg records.
The split file I used is associated with the coperception project. I was able to successfully build training and test sets for this project.
The Structure of V2XSim2.0 is :
|-- v2.0
|-- sweeps
|-- maps
|-- lidarseg
|-- imu
|-- gnss
Could you please provide the train/test pickle files in V2XSim2.0?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.