yifanlu0227 / coalign Goto Github PK

View Code? Open in Web Editor NEW

119.0 6.0 7.0 29.36 MB

[ICRA2023] CoAlign: Robust Collaborative 3D Object Detection in Presence of Pose Errors

License: Other

Python 88.42% C++ 4.21% Cuda 6.77% C 0.27% Cython 0.33%

collaboration collaborative-perception multiagent object-detection perception robust

coalign's People

Contributors

Stargazers

Watchers

Forkers

arslan-z nican2018 griffinclark10 jinlong17 x1a-jk guspan-tanadi mrbryant23

coalign's Issues

disconet don't match early_fusion

(cobevflow) aitest7@833f376856e4:~/wynne/CoBEVFlow$ python opencood/tools/train.py --hypes_yaml opencood/hypes_yaml/dair-v2x/npj/dair_disconet.yaml

Dataset Building ...

ASync dataset with 5 time delay initialized! 4650 samples totally!
ASync dataset with 5 time delay initialized! 1717 samples totally!
/public/home/aitest7/anaconda3/envs/cobevflow/lib/python3.7/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
=== Time consumed: 0.0 minutes. ===

Creating Model ...

device: cuda
full path is: /public/home/aitest7/wynne/CoBEVFlow/logs/dair_npj_disconet_w_2023_11_07_17_20_37
=== Time consumed: 0.1 minutes. ===

Training start!

Traceback (most recent call last):
File "opencood/tools/train.py", line 327, in
main()
File "opencood/tools/train.py", line 185, in main
teacher_model.load_state_dict(torch.load(teacher_checkpoint_path), strict=False)
File "/public/home/aitest7/anaconda3/envs/cobevflow/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PointPillarDiscoNetTeacher:
size mismatch for cls_head.weight: copying a param with shape torch.Size([2, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 384, 1, 1]).
size mismatch for reg_head.weight: copying a param with shape torch.Size([14, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([14, 384, 1, 1]).

problems about running PointPillars with DAIR-V2X

Hi, thank you for your work. I am trying to evaluate CoAlign on DAIR-V2X with PointPillars as backbone. But there is only example of SECOND on openv2v. So I try to change the configs in *.yaml but it seems that something went wrong about the data_loader. Could you provide an example of Evaluation of PointPillars on DAIR-V2X or do you know how to fix this? Thanks in advance.

python opencood/tools/pose_graph_pre_calc.py -y opencood/hypes_yaml/dairv2x/lidar_only_with_noise/coalign/precalc.yaml
Noise Added: 0/0/0/0.
Dataset Building
Traceback (most recent call last):
  File "opencood/tools/pose_graph_pre_calc.py", line 187, in <module>
    main()
  File "opencood/tools/pose_graph_pre_calc.py", line 145, in main
    for i, batch_data in enumerate(eval(f"{split}_loader")):
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 53, in fetch
    return self.collate_fn(data)
  File "/data/mengh3/CoAlign/opencood/data_utils/datasets/late_fusion_dataset.py", line 415, in collate_batch_test
    transformation_matrix = cav_content['transformation_matrix']
KeyError: 'transformation_matrix'

Question about the yamls of where2comm and when2com?

Dear Lu! I'm sorry to bother you again! Excuse me, is there any available yaml about where2comm and when2com for opv2v?

CMake Error when try to install d3d

I try the command python setup.py install and there is a CMake Error.

CMake Error at /data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:751 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /usr/bin/nvcc

  Build flags:

  Id flags: --keep;--keep-dir;tmp -v



  The output was:

  255

  #$ _SPACE_=

  #$ _CUDART_=cudart

  #$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin

  #$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin

  #$ _TARGET_SIZE_=

  #$ _TARGET_DIR_=

  #$ _TARGET_SIZE_=64

  #$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice

  #$
  PATH=/usr/lib/nvidia-cuda-toolkit/bin:/data/public/CUDA11/cuda-11.2/bin:/data/mengh3/myconda/envs/coalign/bin:/data/mengh3/myconda/envs/coalign/bin:/data/mengh3/myconda/envs/coalign/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/data/public/CUDA11/cuda-11.0/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/data/public/CUDA11/cuda-11.0/bin:/usr/local/cuda/bin:/home/mengh3/bin:/home/mengh3/.local/bin:/usr/local/cuda/bin:/data/mengh3/myconda/envs/coalign/bin:/opt/anaconda3/condabin:/opt/anaconda3/bin:/usr/local/cuda-9.0/bin:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin


  #$ LIBRARIES= -L/usr/lib/x86_64-linux-gnu/stubs

  #$ rm tmp/a_dlink.reg.c

  #$ gcc -std=c++14 -D__CUDA_ARCH__=300 -E -x c++
  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__
  -D"__CUDACC_VER_BUILD__=85" -D"__CUDACC_VER_MINOR__=1"
  -D"__CUDACC_VER_MAJOR__=9" -include "cuda_runtime.h" -m64
  "CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"

  #$ cicc --c++14 --gnu_version=70500 --allow_managed -arch compute_30 -m64
  -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name
  "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library
  "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc"
  --gen_module_id_file --module_id_file_name
  "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name
  "CMakeCUDACompilerId.cu" --gen_c_file_name
  "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name
  "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name
  "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o
  "tmp/CMakeCUDACompilerId.ptx"

  #$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o
  "tmp/CMakeCUDACompilerId.sm_30.cubin"

  ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'

  # --error 0xff --





Call Stack (most recent call first):
  /data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  /data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
  /data/mengh3/myconda/envs/coalign/lib/python3.7/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:309 (CMAKE_DETERMINE_COMPILER_ID)
  CMakeLists.txt:14 (enable_language)

The intermediate fusion method based on lift-splat-shoot got poor results on opv2v dataset

Hi~
I've tested the intermediate fusion method(v2x-vit) and late fusion method based on lift-splat-shoot recently.However, v2x-vit got poor results 0.47/0.37/0.22 at ap30/50/70 and the late fusion baseline got 0.83/0.71/0.43 at ap30/50/70. Besides, no fusion baseline got 0.48/0.37/0.18 at ap30/50/70, which is similar to v2x-vit.

name: opv2v_lss_single_efficientnet
root_dir: "/data2/wsh/data/OPV2V/train"
validate_dir: "/data2/wsh/data/OPV2V/validate"
test_dir: "/data2/wsh/data/OPV2V/test"

yaml_parser: "load_lift_splat_shoot_params"
train_params:
batch_size: &batch_size 2
epoches: 50
eval_freq: 2
save_freq: 2
max_cav: 7

input_source: ['camera']
label_type: 'camera'

comm_range: 70
only_vis_ego: true

add_data_extension: ['bev_visibility.png']

fusion:
core_method: 'late'
dataset: 'opv2v'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
ybound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
zbound: [-10, 10, 20.0] # 不需要和preprocess一致. NO Need to be consistent with preprocess.
ddiscr: [2, 50, 48]
mode: 'LID' # or 'UD'
data_aug_conf: &data_aug_conf
resize_lim: [0.8, 0.85]
final_dim: [480, 640]
rot_lim: [-3.6, 3.6]
H: 600
W: 800
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4

preprocess-related

preprocess:

options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor

core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 4]
max_points_per_voxel: 32
max_voxel_train: 32000
max_voxel_test: 70000

detection range for each individual cav.

cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]

data_augment: # use less

NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]

anchor box related

postprocess:
core_method: 'VoxelPostprocessor' # That's ok
gt_range: *cav_lidar
anchor_args:
cav_lidar_range: *cav_lidar
l: 3.9
w: 1.6
h: 1.56
feature_stride: 2
r: &anchor_yaw [0, 90]
num: &achor_num 2
target_args:
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.25
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853
num_bins: 2
anchor_yaw: *anchor_yaw

model related

model:
core_method: lift_splat_shoot
args:
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: 128
use_depth_gt: false
depth_supervision: false
bevout_feature: 128

shrink_header:
  kernal_size: [ 3 ]
  stride: [ 2 ]
  padding: [ 1 ]
  dim: [ 128 ]
  input_dim: 128
camera_encoder: EfficientNet

loss:
core_method: point_pillar_loss
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0
dir:
type: 'WeightedSoftmaxClassificationLoss'
weight: 0.2
args: *dir_args

optimizer:
core_method: Adam
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4

lr_scheduler:
core_method: multistep #step, multistep and Exponential support
gamma: 0.1
step_size: [25, 40]

name: opv2v_lss_v2xvit
root_dir: "/data1/wsh/data/OPV2V/train"
validate_dir: "/data1/wsh/data/OPV2V/validate"
test_dir: "/data1/wsh/data/OPV2V/test"

yaml_parser: "load_lift_splat_shoot_params" # we need specific loading functions for different backbones.
train_params: # the common training parameters
batch_size: &batch_size 2
epoches: 50
eval_freq: 2
save_freq: 2
max_cav: 7

input_source: ['camera'] # 'lidar', 'camera', 'depth'
label_type: 'camera' # 'lidar' or 'camera'

comm_range: 70
only_vis_ego: true

add_data_extension: ['bev_visibility.png']

fusion:
core_method: 'intermediate'
dataset: 'opv2v'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
ybound: [-51.2, 51.2, 0.4] # 需要和preprocess一致. Need to be consistent with preprocess.
zbound: [-10, 10, 20.0] # 不需要和preprocess一致. No Need to be consistent with preprocess.
ddiscr: [2, 50, 48] # depth_min, depth_max, num_bins, make grid in image plane
mode: 'LID'
data_aug_conf: &data_aug_conf
resize_lim: [0.8, 0.85]
final_dim: [480, 640]
rot_lim: [-3.6, 3.6]
H: 600
W: 800
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4

preprocess-related

preprocess:

options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor

core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 4] # the voxel resolution
max_points_per_voxel: 32 # maximum points allowed in each voxel
max_voxel_train: 32000 # the maximum voxel number during training
max_voxel_test: 70000 # the maximum voxel number during testing

detection range for each individual cav.

cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]

data_augment: # useless

NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]

anchor box related

postprocess:
core_method: 'VoxelPostprocessor' # VoxelPostprocessor, BevPostprocessor supported
gt_range: *cav_lidar
anchor_args: # anchor generator parameters
cav_lidar_range: *cav_lidar # the range is consistent with the lidar cropping range to generate the correct anchors
l: 3.9 # the default length of the anchor
w: 1.6 # the default width
h: 1.56 # the default height
feature_stride: 2 # the feature map is shrank twice compared the input voxel tensor
r: &anchor_yaw [0, 90] # the yaw angles. 0, 90 meaning for each voxel, two anchors will be generated with 0 and 90 degree yaw angle
num: &achor_num 2 # for each location in the feature map, 2 anchors will be generated
target_args: # used to generate positive and negative samples for object detection
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.25
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853 # pi / 4
num_bins: 2
anchor_yaw: *anchor_yaw

model related

model:
core_method: lift_splat_shoot_intermediate # trainer will load the corresponding model python file with the same name
args: # detailed parameters of this model
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: &img_feature 128
use_depth_gt: false
depth_supervision: false
supervise_single: true
bevout_feature: 128
camera_encoder: EfficientNet

fusion_args:
  core_method: v2xvit
  args:
    voxel_size: *voxel_size
    in_channels: *img_feature
    v2xvit:
      transformer:
        encoder: &encoder
          # number of fusion blocks per encoder layer, V2XFusionBlock
          num_blocks: 1
          # number of encoder layers, BaseWindowAttention
          depth: 3
          use_roi_mask: true
          use_RTE: &use_RTE false
          RTE_ratio: &RTE_ratio 0 # 2 means the dt has 100ms interval while 1 means 50 ms interval
          # agent-wise attention, HMSA(Heterogeneous multi-agent self-attention)
          cav_att_config: &cav_att_config
            dim: 256
            use_hetero: true
            use_RTE: *use_RTE
            RTE_ratio: *RTE_ratio
            heads: 8
            dim_head: 32
            dropout: 0.3
          # spatial-wise attention, BaseWindowAttention
          pwindow_att_config: &pwindow_att_config
            dim: 256
            heads: [16, 8, 4]
            dim_head: [16, 32, 64]
            dropout: 0.3
            window_size: [4, 8, 16]
            relative_pos_embedding: true
            fusion_method: 'split_attn'
          # feedforward condition
          feed_forward: &feed_forward
            mlp_dim: 256
            dropout: 0.3
          sttf: &sttf
            voxel_size: *voxel_size
            downsample_rate: 2

loss: # loss function
core_method: point_pillar_loss # trainer will load the loss function with the same name
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0 # classification weights
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0 # regression weights

optimizer: # optimzer setup
core_method: Adam # the name has to exist in Pytorch optimizer library
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4

lr_scheduler: # learning rate schedular
core_method: multistep # step, multistep and Exponential support
gamma: 0.1
step_size: [25, 40]

Bugs with multiple gpu training on lidar pointpillar with train_ddp.py

Again, tried CUDA_VISIBLE_DEVICES=1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=5 --use_env opencood/tools/train_ddp.py -y opencood/hypes_yaml/opv2v/lidar_only_with_noise/coalign/pointpillar_coalign_woba.yaml
And the following error appeared:

Traceback (most recent call last):
  File "opencood/tools/train_ddp.py", line 250, in <module>
    main()
  File "opencood/tools/train_ddp.py", line 148, in main
    ouput_dict = model(batch_data['ego'])
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home//Projects/CoAlign/opencood/models/point_pillar_baseline_multiscale.py", line 103, in forward
    batch_dict = self.pillar_vfe(batch_dict)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 151, in forward
    features = pfn(features)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 37, in forward
    for num_part in range(num_parts + 1)]
  File "/home//Projects/CoAlign/opencood/models/sub_modules/pillar_vfe.py", line 37, in <listcomp>
    for num_part in range(num_parts + 1)]
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

What could be the issue here? Any solutions recommended? This seems to be bugs with train_ddp.py.

About the Agent-Object Pose Graph

Hi, thanks for your valuable work.

In my understanding, the premise for using the agent-object pose graph is that the same object can always be seen between two perspectives, but this premise may not always be satisfied, especially in DAIR-V2X datasets. I'm curious about how this problem was solved.

Thank you for any response.

The intermediate fusion method based on lift-splat-shoot got poor results on DAIR-V2X-C dataset.

Hi, I've tested the intermediate fusion method based on lift-splat-shoot recently.
The following is my training config:

`
name: dairv2x_lss_intermediate_e30
data_dir: "dataset/cooperative-vehicle-infrastructure"
root_dir: "dataset/cooperative-vehicle-infrastructure/train.json"
validate_dir: "dataset/cooperative-vehicle-infrastructure/val.json"
test_dir: "dataset/cooperative-vehicle-infrastructure/val.json"

class_names: ['Car']

yaml_parser: "load_lift_splat_shoot_params"
train_params:
batch_size: &batch_size 4 #4
epoches: 30 #50
eval_freq: 15 #2
save_freq: 5 #2
max_cav: 5

input_source: ['camera']
label_type: 'camera'

comm_range: 100
only_vis_ego: false

fusion:
core_method: 'intermediate'
dataset: 'dairv2x'
args:
proj_first: false # useless
grid_conf: &grid_conf
xbound: [-102.4, 102.4, 0.4] # Limit the range of the x direction and divide the grids
ybound: [-51.2, 51.2, 0.4] # Limit the range of the y direction and divide the grids
zbound: [-10, 10, 20.0] # Limit the range of the z direction and divide the grids
ddiscr: [2, 100, 98]
mode: 'LID' # or 'UD'
data_aug_conf: &data_aug_conf
resize_lim: [0.27, 0.28]
final_dim: [288, 512]
rot_lim: [0, 0]
H: 1080
W: 1920
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4 # placeholder. no use

preprocess:
core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 5] # useful
max_points_per_voxel: 32 # useless
max_voxel_train: 32000 # useless
max_voxel_test: 70000 # useless
cav_lidar_range: &cav_lidar [-102.4, -51.2, -3.5, 102.4, 51.2, 1.5]

data_augment:

NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]

postprocess:
core_method: 'VoxelPostprocessor'
gt_range: *cav_lidar
anchor_args:
cav_lidar_range: *cav_lidar
l: 4.5
w: 2
h: 1.56
feature_stride: 2
r: &anchor_yaw [0, 90]
num: &achor_num 2
target_args:
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.20
order: 'hwl' # hwl or lwh
max_num: 100 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853
num_bins: 2
anchor_yaw: *anchor_yaw

model:
core_method: lift_splat_shoot_intermediate
args:
anchor_number: *achor_num
grid_conf: *grid_conf
data_aug_conf: *data_aug_conf
dir_args: *dir_args
img_downsample: 8
img_features: &img_feature 128
use_depth_gt: false
depth_supervision: false
supervise_single: true
bevout_feature: 128
camera_encoder: EfficientNet
fusion_args:
core_method: att #att #max #att # support v2vnet, fcooper, v2xvit, self-att. Referring to LiDAR yamls to changethe args.
args:
voxel_size: *voxel_size
in_channels: *img_feature

optimizer:
core_method: Adam
lr: 0.0015
args:
eps: 1e-10
weight_decay: 1e-4

lr_scheduler:
core_method: multistep #step, multistep and Exponential support
gamma: 0.1
step_size: [15, 25] #[25, 40]
`

However, I've got poor results 6.62/2.01/0.295 at ap30/50/70 and the late fusion baseline got 40.45/18.39/4.19 at ap30/50/70. The intermediate fusion method should outperform the late fusion but the results show the opposite.
Hope for your reply, thanks.

why AP values the same when noise levels are different?

I also encountered a similar problem, and the error was

          I also encountered a similar problem, and the error was

So I deleted the ’_backup‘ in ’label_world_backup‘ and it ran successfully, but I don’t know if this is correct.

    data[0]['params']['vehicles_front'] = read_json(os.path.join(self.root_dir,frame_info['cooperative_label_path'].replace("label_world", "label_world_backup")))

Originally posted by @HuangZhe885 in #11 (comment)

Welcome provide PR to OpenCOOD

Hi Yifan,

Congrats on your paper being accepted by ICRA2023! Besides opening source in this repo, I also invite you to make a PR in OpenCOOD official repo to integrate your codes. In this way, I can put your results on the opencood homepage table with pre-trained model, which can help you gain more attention and perhaps citation.

camera collaboration on the V2XSet dataset

Have you tested camera collaboration on the V2XSet dataset? It seems that this dataset has no additional data, such as bev_visibility.png

Experimental results on V2VNet and FPVRCNN

Hi, I have run the code of FPVRCNN and V2VNet and V2VNet_robust on DAIR V2X, OPV2V respectively according to the provided code, but there is always a situation that the result cannot be obtained during inference.

Confusion about DAIR-V2X metirc

First of all thank you for contributing such valuable work.
Regarding CoAlign's [email protected] on DAIR-V2X, we found that v3 on arXiv has a significant increase compared to v2. The Ap rose from 0.598 (v2) to 0.746 (v3) under the level of no noise, and even other methods such as V2X-ViT also increased from 0.571 (v2) to 0.704 (v3). It's not clear which version is reasonable and correct. In addition, we refer to your previous article Where2Comm, the V2X-ViT in it is 0.543, which is similar to the v2 version.

V2 of CoAlign:

V3 of CoAlign:

The Question regarding the results of a single detector

Hi, thank you for your work. I tried to use SECOND_uncertainty.yaml to train the single detector. The bounding box I generated was not very accurate. Could you please share the best checkpoints under these three datasets? I want to check where exactly I went wrong.

bugs when training DAIR-V2X

frame_info['cooperative_label_path']
KeyError: 'cooperative_label_path'

Request for hypes_yaml/dairv2x Camera YAML

Hello,
I've been going through your project and am looking for the camera yaml file in hypes_yaml/dairv2x. Would you be able to provide this file? It would be greatly helpful for my research. Thank you very much!
您好，打扰了。请问能够提供hypes_yaml/dairv2x中的camera yaml吗？非常感谢！

Excuse me, I wonder whether this coalign code can support the rgb v2x-sim?

If not, https://github.com/MediaBrain-SJTU/CoCa3D is another solution, is that right?

Bugs with multiple GPUs training with DDP

Hi, I tried using the train_ddp.py to leverage multiple GPUs for faster training. Everything started fine, but when I was finishing the last epoch of the training process, an error showed and the training process was terminated.
This is the error message:

[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1800936 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1801044 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1801058 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2003339 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 2003341) of binary: /home//.conda/envs/coalign/bin/python
Traceback (most recent call last):
  File "/home//.conda/envs/coalign/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home//.conda/envs/coalign/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main    launch(args)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home//.conda/envs/coalign/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========================================================
opencood/tools/train_ddp.py FAILED
--------------------------------------------------------
Failures:
[1]:
  time      : 2023-10-25_20:08:24
  host      : lthpc-AS-4124GS-TNR
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 2003342)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2003342
[2]:
  time      : 2023-10-25_20:08:24
  host      : lthpc-AS-4124GS-TNR
  rank      : 4 (local_rank: 4)
  exitcode  : -6 (pid: 2003343)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2003343
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-25_20:08:24
  host      : lthpc-AS-4124GS-TNR
  rank      : 2 (local_rank: 2)
  exitcode  : -6 (pid: 2003341)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 2003341
========================================================

What could possibily be the issue here? Thanks!

bugs when infer

CoAlign/opencood/utils/common_utils.py", line 217, in <listcomp>
    iou = [box.intersection(b).area / (box.union(b).area) for b in boxes]
ZeroDivisionError: float division by zero

Do voxel_size_x and voxel_size_y need to be the same？

Hi~When using visual intermediate fusion methods，do voxel_size_x and voxel_size_y need to be the same？

question about AP

First of all thank you for contributing such valuable work.
May I ask whether the detection accuracy AP mentioned in your paper is from a 3D perspective or a BEV perspective?
thank you again.

Questions about loss function

Thank you for your excellent work. I'm particularly interested in the derivation of the loss function using KL divergence in your research. I attempted to derive it myself, but due to my limited expertise, I wasn't able to reproduce the formulas presented in your paper. I would greatly appreciate it if you could kindly explain the derivation process.

Thank you again!

The train/test pickle files in V2XSim 1.0

Hi, thanks for your project.

I notice CoAlign supports the V2XSim2.0, but I don't have enough space to store V2X-Sim 2.0.

Could you please provide the train/test pickle files in V2XSim1.0?

Thank you!

May I ask if you have parameter files for some models that have already been trained on v2xsim2.0

May I ask if you have parameter files for some models that have already been trained on this dataset, as I cannot train these models well on this dataset, and the metrics are a bit strange

about absence of DeformableTransformer_backbone.py

Hi~
I I noticed that in CoAlign/opencood/models/point_pillar_deform_transformer.py line 11：
from opencood.models.sub_modules.deformable_transformer_backbone import DeformableTransformerBackbone
But there is no deformable_transformer_backbone.py in CoAlign/opencood/models/sub_modules.
May i ask where can i find deformable_transformer_backbone.py.
Thank you, looking forward to your reply！谢谢！

Question about t_matrix

Is the purpose of t_matrix to rotate the CAVs feature to the ego's space?
Thank you!

Can the source code support the CoPercption UAVS dataset?

The CoPercption UAVS dataset is a little hard to deal with.

ValueError: '6853' is not in list

Hi, after spending an afternoon installing unfriendly dependencies from the source code, the following problem occurred. Can you help me see what the reason is? Thanks.

ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/xxx/miniconda3/envs/coalign/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/xxx/data/opencood/CoAlign-main/opencood/data_utils/datasets/intermediate_fusion_dataset.py", line 312, in getitem
cur_agent_in_all_agent = [all_agent_id_list.index(cur_agent) for cur_agent in cur_agent_id_list] # indexing current agent in all_agent_id_list
File "/data/xxx/data/opencood/CoAlign-main/opencood/data_utils/datasets/intermediate_fusion_dataset.py", line 312, in
cur_agent_in_all_agent = [all_agent_id_list.index(cur_agent) for cur_agent in cur_agent_id_list] # indexing current agent in all_agent_id_list
ValueError: '6853' is not in list

The train/test pickle files in V2XSim 2.0

Thanks for your project.

I notice that the code generating pkl file for V2X-Sim 2.0 is missing.

I use the file from #2 (comment) without success:
AssertionError: Error: there are 47638 label files but 47300 lidarseg records.

The split file I used is associated with the coperception project. I was able to successfully build training and test sets for this project.

The Structure of V2XSim2.0 is :

            |-- v2.0
            |-- sweeps
            |-- maps
            |-- lidarseg
            |-- imu
            |-- gnss

Could you please provide the train/test pickle files in V2XSim2.0?

Thank you!

yifanlu0227 / coalign Goto Github PK

coalign's People

Contributors

Stargazers

Watchers

Forkers

coalign's Issues

Dataset Building ...

Creating Model ...

Training start!

preprocess-related

options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor

detection range for each individual cav.

anchor box related

model related

preprocess-related

options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor

detection range for each individual cav.

anchor box related

model related

Recommend Projects

Recommend Topics

Recommend Org