Giter Club home page Giter Club logo

sandipan211 / zsd-sc-resolver Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 4.0 78.82 MB

Resolving semantic confusions for improved zero-shot detection (BMVC 2022)

License: MIT License

Python 59.87% Shell 0.18% Cython 5.44% C 7.54% C++ 0.61% Jupyter Notebook 25.26% Dockerfile 0.01% Makefile 0.01% Batchfile 0.01% Cuda 1.08%
computer-vision conditional-gan deep-learning faster-rcnn multi-modal-learning object-detection pytorch-implementation triplet-loss zero-shot-learning zero-shot-object-detection

zsd-sc-resolver's People

Contributors

sandipan211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

zsd-sc-resolver's Issues

low accuracy on unseen classes

Dear author,

I trained and got good results for Coco. Setting split = unseen, test accuracy for both seen and unseen classes can be obtained with gzsd. However, in Pascal VOC config, for the test set, should we train separately to get test accuracy on seen and unseen classes? Because, while I trained making split = seen, it only shows seen class accuracy during testing and also only for unseen class (in both zsd and gzsd), while trying for split = unseen.
Moreover, while trying on custom data, mAP for seen class is very hi in the range of 60-70. During training GAN(step 4), validation accuracy is higher (around 25.00), but test accuracy is very low (around 5.00).
Can you please suggest a solution for higher accuracy on seen class and lower accuracy on the unseen class for the test (although validation accuracy is a little higher for the unseen class)?

Thank you for your time and consideration.

class embedding vector for custom data

Dear @sandipan211 ,

I have one query regarding class embedding. In the current repository, for MSCOCO, MSCOCO/fasttext.npy, it uses 81300 dimensional embedding , and Pascal VOC, VOC/fasttext_synonym.npy, it uses 21300 dimensional embedding. It may be because 81th in coco and 21st in voc may represent background class.
I want to create similar class embedding for custom data having 50 different number of classes. In that case, is the class embedding (for instance, in VOC/fasttext_synonym.npy) that can represent different categorical name of classes into numerical representation? Is using the python embedding function , such as word2vec , is only to represent the different classes with names in string to numerical value with 300 dimensions for each class ?

The class embedding weight (such as fasttext.npy) is required to train the regressor, specially in step 3.

Thank you for your time and consideration.

custom dataset tune

Hi @sandipan211 thanks for sharing the work and having timely reply for issues! we are recenly using the code for training a custom dataset, with about 10 seen classes and 5 unseen classes, could you please give some suggestions on tuning the hyper-pararmeters in this code, as we observe there are a lot of hyper-parameters and we don't how to tune them best for a new dataset.

Thanks!

Pascal VOC data split

Hi @sandipan211, thanks for sharing this work! I was recently trying to reproduce your results on PASCAL VOC but couldn't achieve such good experimental results. I would like to know how you divided the PASCAL 2007 and 2012 datasets to make sure I didn't make a mistake in this step. Thanks for your time and consideration.

num_classes for first step

What should num_classes be for the first step when training on seen data?

num_classes=81,

or,

num_classes=66,

Do you have a minimal environment YAML file?

I am unable to recreate your environment using your zsd_environment.yml file, there are various dependency errors I'm getting.

The environment file looks like an export of your current environment, and so there are probably lots of packages, e.g. Anaconda default installs, that aren't needed, and so cleaning up the environment file might help resolve the issue.

In other words, do you have an environment file that lists just the absolute minimum set of packages that are needed?

Custom Data

What steps should I take if I want to train with custom data? Thanks in advance.

inference to single image

Hi, Thank you for your work.

I would like to inquire about the possibility of adding inference code for a custom single image.
I am interested in conducting a qualitative evaluation of your work.

Some hesitations related to background class label index

Hi, Thanks for your work. I met similar problem about environment nbkn865 said as #7, but I haven't try your further response. Now I'm trying to transfer this work to new mmdet2.x. However, mmdet2.x have some different characteristics, one is that the background label is not 0 but the num_classes.

I wonder if your work still efficient after the background label changed.?

Also, I wonder if these semantic vectors start from the "background", that is to say the index of "background" vector(attribute) is 0?

And in mmdetection/tools/test.py, I found code as

 model.bbox_head.seen_bg_weight = torch.from_numpy(seen_bg_weight).cuda()
 model.bbox_head.seen_bg_bias = torch.from_numpy(seen_bg_bias).cuda()

but I think the model has no attr as seen_bg_weight and see_bg_bias. Do you mean it refers to 0 index (last index in new mmdet) of fc layer weights or do you modify something?

Looking forward to your answer, thanks!

Setting batch size

Is there an option to set batch size? I'm attempting to run an example on a single GPU and encountering memory issues.

Do you still have the weight file obtained by executing the fifth step

Hello sir, do you still have the weight file obtained by executing the fifth step ./script/train_coco_generator_65_15.sh? I have been unable to reproduce your results when all settings and parameters have not been changed, and the mAP difference is %14 from yours in the GZSD settings, and other parameters are also quite different from yours. I hope you can tell more details or you can share the result file.
recall mAP
ZSD 64.3 (65.1) 18.5 (20.1)
GZSD seen 63.1 (58.6) 24.1 (37.4)
unseen 30.2 (64.0) 15.6 (20.1)
The results provided in your paper are in brackets.

How to generate class embedding files?

Hi, thanks for you great job.
I just get confused how do you generate these classes embedding files(fastext, glove).
How does the index in classes embedding files match to the class id?
Could provide a little more details about generating class embedding files?
Thanks!

The test result of detector trained only on seen classes

Hi, I ran a test of the epoch_12 you provided in the readme.
The result came out as follow:
num_classes ----------- 80
+----------------+-------+--------+--------+-----------+----------+
| class | gts | dets | recall | precision | ap |
+----------------+-------+--------+--------+-----------+----------+
| person | 15697 | 198482 | 0.895 | 0.071 | 0.660139 |
| bicycle | 290 | 4338 | 0.628 | 0.042 | 0.294971 |
| car | 2392 | 47918 | 0.788 | 0.039 | 0.192860 |
| motorcycle | 118 | 2004 | 0.720 | 0.042 | 0.510870 |
| bus | 208 | 7563 | 0.707 | 0.019 | 0.196079 |
| truck | 779 | 11144 | 0.416 | 0.029 | 0.039854 |
| boat | 211 | 17642 | 0.616 | 0.007 | 0.174057 |
| traffic_light | 563 | 18209 | 0.650 | 0.020 | 0.173152 |
| fire_hydrant | 52 | 936 | 0.827 | 0.046 | 0.546072 |
| stop_sign | 47 | 1365 | 0.745 | 0.026 | 0.521156 |
| bench | 569 | 37449 | 0.489 | 0.007 | 0.049016 |
| bird | 235 | 31663 | 0.523 | 0.004 | 0.124819 |
| dog | 437 | 8283 | 0.787 | 0.042 | 0.178770 |
| horse | 35 | 621 | 0.629 | 0.035 | 0.453032 |
| sheep | 29 | 8115 | 0.310 | 0.001 | 0.022340 |
| cow | 37 | 1310 | 0.595 | 0.017 | 0.140441 |
| elephant | 6 | 288 | 0.500 | 0.010 | 0.500000 |
| giraffe | 7 | 377 | 0.857 | 0.016 | 0.563532 |
| backpack | 955 | 8777 | 0.466 | 0.051 | 0.054695 |
| umbrella | 320 | 6690 | 0.672 | 0.032 | 0.225762 |
| handbag | 1112 | 13397 | 0.424 | 0.035 | 0.082742 |
| tie | 160 | 7823 | 0.613 | 0.013 | 0.102578 |
| skis | 468 | 20539 | 0.560 | 0.013 | 0.044583 |
| sports_ball | 77 | 12894 | 0.545 | 0.003 | 0.035616 |
| kite | 31 | 11299 | 0.742 | 0.002 | 0.248010 |
| baseball_bat | 7 | 1684 | 0.857 | 0.004 | 0.139171 |
| baseball_glove | 3 | 1530 | 0.000 | 0.000 | 0.000000 |
| skateboard | 24 | 3004 | 0.833 | 0.007 | 0.605755 |
| surfboard | 14 | 8140 | 0.357 | 0.001 | 0.011808 |
| tennis_racket | 11 | 1705 | 0.455 | 0.003 | 0.200000 |
| bottle | 2375 | 21796 | 0.742 | 0.081 | 0.392130 |
| wine_glass | 820 | 3996 | 0.687 | 0.141 | 0.467318 |
| cup | 2730 | 14335 | 0.690 | 0.131 | 0.401259 |
| knife | 1180 | 11138 | 0.402 | 0.043 | 0.052995 |
| spoon | 999 | 16973 | 0.416 | 0.025 | 0.049059 |
| bowl | 1643 | 10953 | 0.733 | 0.110 | 0.374713 |
| banana | 183 | 7767 | 0.634 | 0.015 | 0.101670 |
| apple | 168 | 2932 | 0.429 | 0.025 | 0.085470 |
| orange | 171 | 2765 | 0.702 | 0.043 | 0.123477 |
| broccoli | 446 | 18614 | 0.749 | 0.018 | 0.117123 |
| carrot | 727 | 20541 | 0.707 | 0.025 | 0.060945 |
| pizza | 599 | 5304 | 0.755 | 0.085 | 0.421441 |
| donut | 197 | 7644 | 0.594 | 0.015 | 0.143520 |
| cake | 590 | 5982 | 0.629 | 0.062 | 0.243563 |
| chair | 2788 | 70764 | 0.655 | 0.026 | 0.197088 |
| couch | 340 | 2819 | 0.494 | 0.060 | 0.211933 |
| potted_plant | 609 | 25395 | 0.800 | 0.019 | 0.261423 |
| bed | 311 | 5294 | 0.707 | 0.042 | 0.216452 |
| dining_table | 1849 | 99306 | 0.787 | 0.015 | 0.192627 |
| tv | 934 | 10099 | 0.760 | 0.070 | 0.426263 |
| laptop | 775 | 3485 | 0.796 | 0.177 | 0.631668 |
| remote | 237 | 8653 | 0.608 | 0.017 | 0.129868 |
| keyboard | 784 | 6538 | 0.778 | 0.093 | 0.297669 |
| cell_phone | 448 | 7005 | 0.592 | 0.038 | 0.163251 |
| microwave | 83 | 624 | 0.687 | 0.091 | 0.593158 |
| oven | 144 | 6232 | 0.688 | 0.016 | 0.286438 |
| sink | 856 | 33101 | 0.822 | 0.021 | 0.255440 |
| refrigerator | 97 | 15373 | 0.732 | 0.005 | 0.300810 |
| book | 2632 | 51639 | 0.697 | 0.036 | 0.083469 |
| clock | 172 | 8461 | 0.680 | 0.014 | 0.196000 |
| vase | 249 | 3609 | 0.546 | 0.038 | 0.185588 |
| scissors | 52 | 2071 | 0.404 | 0.010 | 0.115084 |
| teddy_bear | 130 | 2989 | 0.838 | 0.036 | 0.415706 |
| toothbrush | 137 | 0 | 0.000 | 0.000 | 0.000000 |
+----------------+-------+--------+--------+-----------+----------+
| mean | | | 0.627 | | 0.238852 |
+----------------+-------+--------+--------+-----------+----------+
+---------------+------+------+--------+-----------+----------+
| class | gts | dets | recall | precision | ap |
+---------------+------+------+--------+-----------+----------+
| airplane | 1444 | 0 | 0.000 | 0.000 | 0.000000 |
| train | 1602 | 0 | 0.000 | 0.000 | 0.000000 |
| parking_meter | 510 | 0 | 0.000 | 0.000 | 0.000000 |
| cat | 1669 | 0 | 0.000 | 0.000 | 0.000000 |
| bear | 462 | 0 | 0.000 | 0.000 | 0.000000 |
| suitcase | 2219 | 0 | 0.000 | 0.000 | 0.000000 |
| frisbee | 935 | 0 | 0.000 | 0.000 | 0.000000 |
| snowboard | 793 | 0 | 0.000 | 0.000 | 0.000000 |
| fork | 1775 | 0 | 0.000 | 0.000 | 0.000000 |
| sandwich | 1457 | 0 | 0.000 | 0.000 | 0.000000 |
| hot_dog | 1009 | 0 | 0.000 | 0.000 | 0.000000 |
| toilet | 1462 | 0 | 0.000 | 0.000 | 0.000000 |
| mouse | 850 | 0 | 0.000 | 0.000 | 0.000000 |
| toaster | 78 | 0 | 0.000 | 0.000 | 0.000000 |
| hair_drier | 74 | 0 | 0.000 | 0.000 | 0.000000 |
+---------------+------+------+--------+-----------+----------+
| mean | | | 0.000 | | 0.000000 |
+---------------+------+------+--------+-----------+----------+
+------+--+--+-------+--+----------+
| mean | | | 0.627 | | 0.238852 |
+------+--+--+-------+--+----------+
| mean | | | 0.000 | | 0.000000 |
+------+--+--+-------+--+----------+
mAP is : 0.19349995255470276

The detected bounding boxes of "toothbrush" is zero, which looks not right. I think I might do something wrong.
Is the result the same as yours?

Training on custom data

Dear @sandipan211 ,

I have one more query. I got good results for MSCOCO data. Now, I want to train in custom data. In the training steps, for MSCOCO, MSCOCO/fasttext.npy is required, and Pascal VOC also requires /workspace/arijit_ug/sushil/zsd/VOC/fasttext_synonym.npy is required in step 4 and step 5.

Is this file created during training from steps 1-3? If so, similar .npy files for class embeddings may be created for custom data. I may then be able to train steps 4 and step 5 for complete training in custom data.

Thank you for your time and consideration.

NaN metrics during Epoch 1 on coco2014

Is this normal behavior during during Epoch 1 when training the backbone on coco2014 dataset?

2022-12-20 14:32:29,144 - INFO - Epoch [1][25500/61598] lr: 0.02000, eta: 5 days, 12:43:05, time: 0.658, data_time: 0.009, memory: 3162, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 0.0000, loss_bbox: nan, loss: nan

Confusion about validation metrics for zero-shot detection.

Hi, I am now a bit confused about validation metrics for zero-shot detection.

I knew zsd is that just load unseen classes whlie test, and we just setting classes_to_load as unseen is ok. And I repeat the result as you reported on coco and voc dataset.

How do I test gzsd setting? Does gzsd means load all classes while test(classes_to_load = all)? In mmdetection/tools/test.py, this code may set for gzsd test on coco.

 if cfg.test_cfg.rcnn.gzsd and hasattr(dataset,'cat_ids'):
        dataset.cat_to_load = dataset.cat_ids

When I test gzsd (use --gzsd) on voc, I can't get gzsd result. When I test gzsd on coco dataset, I get a worse result, which is different to your report.
So, for gzsd, how to load the correct test data on both voc and coco dataset?

Hope you can answer my confusion, thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.