Giter Club home page Giter Club logo

upt's Introduction

Hi, It's Fred(张真) here 👋

I'm currently a postdoc at the Australian Institute for Machine Learning. Refer to my homepage for more details.

Fred's GitHub Stats

Connect with me:

FredericZhang | Google Scholar FredericZhang | YouTube FredericZhang | Twitter FredericZhang | LinkedIn FredericZhang | Instagram

upt's People

Contributors

fredzzhang avatar nikanor97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

upt's Issues

HICO-DET training accuracy problem

The training settings batch=4, world_size=4; I get The mAP is 0.3098, rare: 0.2528, none-rare: 0.3268 on the hico-det dataset.
When the batch is slightly greater than 16, The mAP is 0.3119, rare: 0.2539, none-rare: 0.3293.
I tested the upt(resnet50) training weights you posted, The mAP is 0.3156, rare: 0.2560, none-rare: 0.3334.
I want to fully achieve your training performance, please give me some advice, thank you!
Did I ignore some settings?
nohup: ignoring input Namespace(alpha=0.5, aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=False, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints/upt-r50-hicodet', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='checkpoints/detr-r50-hicodet.pth', print_interval=500, repr_dim=512, resume='', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=4) Load weights for the object detector from checkpoints/detr-r50-hicodet.pth => Rank 0: start from a randomly initialised model => Rank 1: start from a randomly initialised model => Rank 3: start from a randomly initialised model => Rank 2: start from a randomly initialised model

suppress overconfident objects

Hi,

We found that "suppress overconfident objects" works on HICO-DET but hardly works on V-COCO.
Our work is in HICO-DET:

resnet50: λ=1.0 32.10mAP λ=1.9 33.44mAP λ=2.8 33.63mAP
resnet101: λ=1.0 32.48mAP λ=1.9 33.63mAP λ=2.8 33.79mAP

UPT in V-COCO:

resnet50: λ=1.0 58.9mAP λ=2.8 59.0mAP
resnet101: λ=1.0 60.7mAP λ=2.8 60.7mAP

Our existing view, this strategy works in HICO-DET and closes the gap between resnet50 and resnet101.
Is it possible to find a unified view to explain the phenomena on HICO-DET and V-COCO?

UPT predicts scenario 1 of V-COCO

Hi,

Based on your tips, I solved the above problem.
I would like to know how the UPT predicts scenario 1 and scenario 2, which needs to be predicted as [0,0,0,0] for occlusion objects. Scenario 2 needs to ignore object prediction. Does UPT have the function to predict [0,0,0,0] of the occluded object?

Thank you so much!
yaoyaosanqi.

environment

ModuleNotFoundError: No module named 'pocket.data'; 'pocket' is not a package
I have installed pocket using pip3 install pocket

Some questions about the convergence

Hi ,

Thanks for your great work,it inspired me a lot. I noticed that in your paper, the model can converge to a good result within 20 epochs. I wonder if you have tried to train the models for more epochs(e.g. 100 epochs or more?) to get better results?To be honest,I really want to know the boundary(or the best results)of the model. For example,by training more epochs or by designing more reasonable training strategies,the model maybe can reach a better mAP?

A question about v-coco dataset

Hi, thank you for your code; But i have a question about the V-COCO dataset you implemented.

V-COCO is a subset of COCO dataset and has 10, 396 images (5,400 for training and 4,964 for testing) as stated in paper.
However, when executing your code for v-coco dataset, I foundlen(trainset)==4969and len(testset)==4532. The reported number of training images does not match the actual one.

single video inference code?

Dear author:
Thanks for sharing the insightful work. it looks great. Before diving deeply into your work, a lot of researcher like me, would like to firstly play with your models. I think a easy to use inference video py scripts would make it much efficient to know your work. Thank you.

Validation set division on HICO-DET

Hi@fredzzhang,

We recently wanted to carve out validation sets in HICO-DET. We use the following code:
image

But we run into a lot of obstacles:
(1)Run directly

Namespace(alpha=0.5, aux_loss=True, backbone='resnet50', batch_size=16, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=True, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='', print_interval=500, repr_dim=512, resume='/home/quan107552101247/upt/checkpoints/jokex2/ckpt_47040_20.pt', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=1)
=> Rank 0: continue from saved checkpoint /home/quan107552101247/upt/checkpoints/jokex2/ckpt_47040_20.pt
  0%|                                                  | 0/7527 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/quan107552101247/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  File "/home/quan107552101247/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  File "/home/quan107552101247/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  [Previous line repeated 993 more times]
RecursionError: maximum recursion depth exceeded
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  File "/home/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  File "/home/upt/pocket/data/base.py", line 167, in __getattr__
    if hasattr(self.dataset, key):
  [Previous line repeated 993 more times]
RecursionError: maximum recursion depth exceeded
  0%|                                                  | 0/7527 [00:05<?, ?it/s]
Traceback (most recent call last):
 ......
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/queue.py", line 179, in get
    self.not_empty.wait(remaining)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 3874908) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
......
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data
    success, data = self._try_get_data()
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 3874908, 3875008) exited unexpectedly

(2)num-workers==0

Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/upt/main.py", line 107, in main
    ap = engine.test_hico(test_loader)
  File "/usr/local/anaconda3/envs/pocket/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/upt/utils.py", line 170, in test_hico
    inputs = pocket.ops.relocate_to_cuda(batch[0])
  File "/home/upt/pocket/ops/relocate.py", line 63, in relocate_to_cuda
    return [relocate_to_cuda(item, ignore, device, **kwargs) for item in x]
  File "/home/upt/pocket/ops/relocate.py", line 63, in <listcomp>
    return [relocate_to_cuda(item, ignore, device, **kwargs) for item in x]
  File "/home/upt/pocket/ops/relocate.py", line 71, in relocate_to_cuda
    raise TypeError('Unsupported type of data {}'.format(type(x)))
TypeError: Unsupported type of data <class 'PIL.Image.Image'>
···

How do we solve it?

Thank you so much!
yaoyaosanqi.

KO mode?

Excuse me, I found that the KO mode result for HICO-DET is provided in your paper, but it isn't provided in the code, could you provide any idea, Thank you.

list index out of range

Hi,

I am getting an error on executing inference.py -> the command given on the github site.

image

torch.multiprocessing.spawn.ProcessRaisedException

Hi, @fredzzhang

I had a sudden situation, before I kept the same settings, it ran without problems. Do you know how?

nohup: ignoring input Namespace(alpha=0.5, aux_loss=True, backbone='resnet50', batch_size=8, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=30, eval=False, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints/upt-r50-hicodet2244', partitions=['train2015', 'test2015'], port='3714', position_embedding='sine', pre_norm=False, pretrained='checkpoints/detr-r50-hicodet.pth', print_interval=500, repr_dim=512, resume='', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=2) Load weights for the object detector from checkpoints/detr-r50-hicodet.pth => Rank 0: start from a randomly initialised model => Rank 1: start from a randomly initialised model Traceback (most recent call last): File "main.py", line 220, in <module> mp.spawn(main, nprocs=args.world_size, args=(args,)) File "/home1/quan107552101247/.conda/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home1/quan107552101247/.conda/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home1/quan107552101247/.conda/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home1/quan107552101247/.conda/envs/pocket/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home1/quan107552101247/upt/main.py", line 137, in main engine(args.epochs) File "/home1/quan107552101247/spatially-conditioned-graphs/pocket/pocket/core/distributed.py", line 139, in __call__ self._on_each_iteration() File "/home1/quan107552101247/upt/utils.py", line 139, in _on_each_iteration if loss_dict['interaction_loss'].isnan(): TypeError: list indices must be integers or slices, not str

yaoyaosanqi.

Reported results for SCG

Hi, Thank you for the great work!

In this paper, the reported results for SCG on HICO-DET is 29.26 mAP (full), while the reported result on SCG paper was 31.33 mAP (full). Is the difference caused by different backbones used (i.e. ResNet50-FPN v.s. ResNet101)? Thank you!

Here is problem

Traceback (most recent call last):
File "inference.py", line 225, in
main(args)
File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "inference.py", line 150, in main
upt = build_detector(args, conversion)
File "C:\Users\User\PycharmProjects\hoi\UPT\upt.py", line 268, in build_detector
detr, , postprocessors = build_model(args)
File "C:\Users\User\PycharmProjects\hoi\UPT\detr\models_init
.py", line 6, in build_model
return build(args)
File "C:\Users\User\PycharmProjects\hoi\UPT\detr\models\detr.py", line 313, in build
num_classes = 20 if args.dataset_file != 'coco' else 91
AttributeError: 'Namespace' object has no attribute 'dataset_file'

The HOI loss is NaN for rank 0

Dir sir,
I followed with readme to build this UPT network,but when i use the instruction
python main.py --world-size 1 --dataset vcoco --data-root ./v-coco --partitions trainval test --pretrained ../detr-r50-vcoco.pth --output-dir ./upt-r50-vcoco.pt

i got an error

`Traceback (most recent call last):
File "main.py", line 208, in
mp.spawn(main, nprocs=args.world_size, args=(args,))
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/autodl-tmp/upload/main.py", line 125, in main
engine(args.epochs)
File "/root/pocket/pocket/pocket/core/distributed.py", line 139, in call
self._on_each_iteration()
File "/root/autodl-tmp/upload/utils.py", line 138, in _on_each_iteration
raise ValueError(f"The HOI loss is NaN for rank {self._rank}")
ValueError: The HOI loss is NaN for rank 0`

I tried to train without pretrain model it works the same error.I tried to print the loss but it shown an empty tensor.As a beginner , i have no idea what it happened.If you could give me any help,i would be appreciated.
I look forward to receiving your reply.Thank you for a lot.

confused about the vcoco dataset

There're some cool properties of VCOCO dataset you implemented:
"object_to_action" gives me the list of actions for each object, i.e. {1: [0, 3, 11, 15], 2: [0, 1, 2, 3, 11], ......}
"objects" return the list of objects, i.e. ['background', 'person', 'bicycle', .......]
"actions" return the list of actions, i.e. ['hold obj', 'sit instr', 'ride instr', .......]

However, I'm confused about the relationships among them:

  1. Which object does the key 1 of "1: [0, 3, 11, 15]", which is the first item of object_to_action, represent?
  2. Which action does the values [0, 3, 11, 15] of "1: [0, 3, 11, 15]" represent?

According to the List of actions and objects, Actions 0, 3, 11, 15 represent hold obj, look obj, carry obj, cut obj respectively while Object 1 represent person, which appears to be weird.

About the evaluation code for HICO-DET

Thanks for your work.

Does the current evaluation code only support default settings on HICO-DET?

It seems that there are no hyperparameters about Default Setting and Known Objects Setting in main.py.

Thanks!

Real-time measurement

HI,Thanks for your amazing work. I noticed that UPT is near real-time performance on a single GPU and reported 24FPS results in the article, I would like to know how this 24FPS is measured, it would be very helpful for me, thank you.

list out of range and checkpoint's state_dict mismatch

hello,
1). when I try the random init model, it runs into list index out of range for this line target_cls_idx = [self.object_class_to_target_class[obj.item()]

2). Then I tried to used the pre-trained UPT's checkpoint, but the state_dict mis-match.

My args' configure is correct. the default value of args.dataset is hicodet.

So I wander whether it is a bug or my problem ?

For 1)

python main.py --eval --backbone resnet101 --dilation --resume /path/to/model --data-root /storage/gaokaifeng

Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=2, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='/storage/gaokaifeng', dataset='hicodet', dec_layers=6, device='cuda', dilation=True, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=True, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_backbone=1e-05, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='', print_interval=500, repr_dim=512, resume='/path/to/model', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=1)
=> Rank 0: start from a randomly initialised model
  8%|████████████                                                                                                                                           | 766/9546 [01:29<17:09,  8.53it/s]
Traceback (most recent call last):
  File "main.py", line 210, in <module>
    mp.spawn(main, nprocs=args.world_size, args=(args,))
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/gaokaifeng/project/upt/main.py", line 99, in main
    ap = engine.test_hico(test_loader)
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/gaokaifeng/project/upt/utils.py", line 169, in test_hico
    output = net(inputs)
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gaokaifeng/project/upt/upt.py", line 252, in forward
    logits, prior, bh, bo, objects, attn_maps = self.interaction_head(
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gaokaifeng/project/upt/interaction_head.py", line 366, in forward
    prior_collated.append(self.compute_prior_scores(
  File "/home/gaokaifeng/project/upt/interaction_head.py", line 260, in compute_prior_scores
    target_cls_idx = [self.object_class_to_target_class[obj.item()]
  File "/home/gaokaifeng/project/upt/interaction_head.py", line 260, in <listcomp>
    target_cls_idx = [self.object_class_to_target_class[obj.item()]
IndexError: list index out of range

For 2):

(torch111) gaokaifeng@server1:~/project/upt$ python main.py \
>         --data-root /storage/gaokaifeng \
>         --eval \
>         --backbone resnet101 \
>         --dilation \
>         --resume checkpoints/upt-r101-dc5-hicodet.pt
Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=2, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='/storage/gaokaifeng', dataset='hicodet', dec_layers=6, device='cuda', dilation=True, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=True, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_backbone=1e-05, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='', print_interval=500, repr_dim=512, resume='checkpoints/upt-r101-dc5-hicodet.pt', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=1)
=> Rank 0: continue from saved checkpoint checkpoints/upt-r101-dc5-hicodet.pt
Traceback (most recent call last):
  File "main.py", line 210, in <module>
    mp.spawn(main, nprocs=args.world_size, args=(args,))
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/gaokaifeng/project/upt/main.py", line 76, in main
    upt.load_state_dict(checkpoint['model_state_dict'])
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UPT:
        size mismatch for detector.class_embed.weight: copying a param with shape torch.Size([81, 256]) from checkpoint, the shape in current model is torch.Size([92, 256]).
        size mismatch for detector.class_embed.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([92]).

AttributeError: 'Namespace' object has no attribute 'lr_backbone'

hello,
when I run python main.py --eval --backbone resnet101 --dilation --resume /path/to/model

It raise AttributeError: 'Namespace' object has no attribute 'lr_backbone'

It seems that the args used to build_dert is not covered by the args in main.py

Can you provide some instructions about how to combine these two (build detr and run upt) ?

details:

(torch111) gaokaifeng@server1:~/project/upt$ python main.py --eval --backbone resnet101 --dilation --resume /path/to/model
Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=2, bbox_loss_coef=5, box_score_thresh=0.2, cache=False, clip_max_norm=0.1, data_root='./hicodet', dataset='hicodet', dec_layers=6, device='cuda', dilation=True, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=True, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='checkpoints', partitions=['train2015', 'test2015'], port='1234', position_embedding='sine', pre_norm=False, pretrained='', print_interval=500, repr_dim=512, resume='/path/to/model', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=1)
Traceback (most recent call last):
  File "main.py", line 208, in <module>
    mp.spawn(main, nprocs=args.world_size, args=(args,))
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/gaokaifeng/anaconda3/envs/torch111/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/gaokaifeng/project/upt/main.py", line 71, in main
    upt = build_detector(args, object_to_target)
  File "/home/gaokaifeng/project/upt/upt.py", line 268, in build_detector
    detr, _, postprocessors = build_model(args)
  File "/home/gaokaifeng/project/upt/detr/models/__init__.py", line 6, in build_model
    return build(args)
  File "/home/gaokaifeng/project/upt/detr/models/detr.py", line 320, in build
    backbone = build_backbone(args)
  File "/home/gaokaifeng/project/upt/detr/models/backbone.py", line 114, in build_backbone
    train_backbone = args.lr_backbone > 0
AttributeError: 'Namespace' object has no attribute 'lr_backbone'

GFLOPs and params

Hi, Dr. Zhang,
We found <Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection> reports the following:
image
We are puzzled by the scoring of MACs (G): UPT contains DETR and interaction head, but MACs (G) are very close. What is causing this?

We compared the number of parameters and GFLOPs for QPIC and UPT. The image size is [3,887,1055] and the test results are as follows:

QPIC: Parameters: 41.462M GFLOPs: 91.82
UPT: Parameters: 54.763M (interaction head 13.241M) GFLOPs: 91.91
(test with GeForce GTX 1080)

This seems similar to the table, can you tell me what is causing this?

Multiple loss training code

Hi, @fredzzhang :

I want to try training with multiple losses. I found the relevant code. I added a loss, which is running and no error is reported.

but I want to successfully train multiple loss and set the hyperparameters of loss, how do I do it?

if self.training:

        interaction_loss = self.compute_interaction_loss(boxes, bh, bo, logits, prior, targets, pairwise_tokens_x_collated)
        interaction_x_loss = self.compute_interaction_x_loss(boxes, bh, bo, logits, prior, targets, pairwise_tokens_x_collated)
        loss_dict = dict(
            interaction_loss=interaction_loss,
            interaction_x_loss = interaction_x_loss
        )
        return loss_dict

def _on_each_iteration(self):

    loss_dict = self._state.net(
        *self._state.inputs, targets=self._state.targets)
    if loss_dict['interaction_loss'].isnan():
        raise ValueError(f"The HOI loss is NaN for rank {self._rank}")

    self._state.loss = sum(loss for loss in loss_dict.values())
    self._state.optimizer.zero_grad(set_to_none=True)
    self._state.loss.backward()
    if self.max_norm > 0:
        torch.nn.utils.clip_grad_norm_(self._state.net.parameters(), self.max_norm)
    self._state.optimizer.step()

yaoyaosanqi.

error when test vcoco

I use python main.py --cache --dataset vcoco --data-root vcoco/ --partitions trainval test --output-dir vcoco-r50 --resume checkpoints/upt-r50-vcoco.pt to generate cache.pkl.
But report a error when eval it.

The eval code is:

from vsrl_eval import VCOCOeval

vsrl_annot_file = 'data/vcoco/vcoco_val.json'
coco_file = 'data/instances_vcoco_all_2014.json'
split_file = 'data/splits/vcoco_val.ids'

vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

det_file = '/media/ming-t/Deng/relation_mppe/HOI-UPT/vcoco-r50/cache.pkl'
vcocoeval._do_eval(det_file, ovr_thresh=0.5)

The error is:

loading annotations into memory...
Done (t=0.74s)
creating index...
index created!
loading vcoco annotations...
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 194, in _do_eval
    self._do_agent_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh)
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 417, in _do_agent_eval
    assert(np.amax(rec) <= 1)
  File "<__array_function__ internals>", line 180, in amax
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

How to solve it?

Detection results, mAP calculation

Hi @fredzzhang ,

Suppose you detect an image with a total of 15 pairs of Human-Object interacting with a total of 114 valid verbs. Even if all predictions are correct, a large number of predicted triples will be treated as negative samples, does this affect mAP? (QPIC's prediction is only 100)

If 114 pieces of information are considered the final detection result, then there is no way to know exactly what kind of interaction it is, and it seems that you need to specify all actions, like the visualization of actions in your reasoning.

The above is the confusion that arises when I check the code, I hope you can give me some guidance.

yaoyaosanqi.

VCOCO Scenario 1 and Scenario 2

Hi, @fredzzhang

I found that there are evaluation strategies for scenario 1 and scenario 2 in VCOCO. I checked the vcoco's annotations file:
#{"boxes_h": [], "boxes_o": [], "actions": [], "objects": [], "file_name": "COCO_train2014_000000565694.jpg"}

Because UPT must detect human-object pairs. I would like to know how vcoco generates scenario 1 and scenario 2 results under supervised training. Can you indicate where the relevant code is?

Thank you so much!
yaoyaosanqi.

Generate the results on the friends.gif

Hello! Thank you for this amazing work! I am curious to know how you got the inference results showing the names of the objects and the activities on the demo_friends.gif. Can you please tell how you achieved that? Thanks in advance.

train

python main.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/upt-r50-hicodet

raise ValueError(f"The HOI loss is NaN for rank {self._rank}")
ValueError: The HOI loss is NaN for rank 0

VCOCO evaluation

For V-COCO, you use the utilities provided by Gupta et al, but the length of the test dataset is 4532 in your code,this means that the resulting cache.pkl contains only 4532 outputs. However, in the utilities provided by Gupta et al., the images in the v-coco/data/splits/vcoco_test.ids is 4946, which is not equal to the 4532. Will this affect the final test results? Thanks.

There is still a problem.

Traceback (most recent call last):
File "inference.py", line 225, in
main(args)
File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "inference.py", line 150, in main
upt = build_detector(args, conversion)
File "C:\Users\User\PycharmProjects\hoi\UPT\upt.py", line 276, in build_detector
detr.backbone[0].num_channels,
File "C:\Users\User\anaconda3\envs\colab\lib\site-packages\torch\nn\modules\module.py", line 1207, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DETRsegm' object has no attribute 'backbone'


I changed the torch version and tried it in the collab environment, but the problem still occurs in the same place.

If possible, can you tell me all libraries using "pip freeze > requirements.txt"?

If it is not possible to disclose it externally, I would appreciate it if you could send it to [email protected].

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.