kdexd / virtex Goto Github PK

View Code? Open in Web Editor NEW

556.0 556.0 62.0 3.73 MB

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

Home Page: http://kdexd.xyz/virtex

License: MIT License

Python 99.69% Shell 0.31%

coco-dataset cvpr2021 image-captioning model-zoo pretrained-models

virtex's People

Contributors

Stargazers

Watchers

virtex's Issues

this model can be used in Multimodal retrieval problem

eval_captioning.py script has incorrect attributes

Hi, I'v got following error
File "scripts/eval_captioning.py", line 114, in
main(_A)
File "scripts/eval_captioning.py", line 76, in main
val_batch[key] = val_batch[key].to(device)
AttributeError: 'list' object has no attribute 'to'

i'v made following changes in code to run script without error:
for val_iteration, val_batch in enumerate(val_dataloader, start=1):
#for key in val_batch:
#print(val_batch[key])
val_batch["image"] = val_batch["image"].to(device)

Possible inconsistency in data preprocessing

Hi, thank you so much for sharing this code. It is very helpful.

However, I am confused about the data preprocessing configuration. In the config files, Caffe-style image mean and std is used, but it seams they are not used in the code. Instead, the code seems to hard-code torchvision-style mean and std (here). Can you confirm that both pretraining and fine-tuning use the latter?

Furthermore, I am not sure whether the images are in 0-255 range or 0-1. For Caffe-style mean and std, it should be 0-255, but it seems with your hard-coded mean and std, it should be 0-1. However, I noticed you are using opencv to load images, which loads in 0-255, and I did not find anywhere in the code that they are transformed into 0-1, except in supervised pretraining (here).

Could you please comment on the aforementioned issues? Especially it is important to make sure the config is identical for all pretraining and downstream settings. Since you fine-tune all layers and don't freeze the stem, it is hard to notice if such inconsistencies exist, because the fine-tuning process would fix them to some extent.

Thank you so much.

VirTex for CC-like classification problems

Great project in the right direction, i.e. getting about the same results with less compute.

In your paper, you mention that you are discarding the text-head and only using the visual backbone and future research could leverage the text-head – do you think it could develop comparable performance to BERT-like models trained on text corpuses?

Also, I’m very interested in using VirTex for classification problems such as Conceptual Captions – Did you try / Would you estimate performance improvements using VirTex’s Visual Backbone + BERT or VirTex Visual & Textual (not sure if the latter would work) over VilBERT or VisualBERT?

run on single input image

Hi,

I would like to evaluate your work on a single image for image captioning. Can you tell me the steps I should follow for a single input? For instance, given a folder of images, how would I use your model for inference only on the folder of images?

Looking at captioning-task from your description, I am not sure how to go about using my own dataset for evaluation of the model.

Thanks

Cog version

"😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog that is no longer supported."
https://replicate.com/kdexd/virtex-image-captioning

Cannot find this file: serialized_train.lmdb

When I was doing the pretraining, this errors showed up:

lmdb.Error: datasets/coco/serialized_train.lmdb: No such file or directory

I couldn't any instruction that generates this file. Anyone has this problem?

Wrong URL for downloading weights

import virtex.model_zoo as mz model = mz.get("width_ablations/bicaptioning_R_50_L1_H2048.yaml", pretrained=True)

When I run this script I get the following HTTP Error:
`HTTPError Traceback (most recent call last)
in ()
1 import virtex
2 import virtex.model_zoo as mz
----> 3 model = mz.get("width_ablations/bicaptioning_R_50_L1_H2048.yaml", pretrained=True)

18 frames
/content/virtex/virtex/model_zoo/model_zoo.py in get(config_path, pretrained)
98 checkpoint_url,
99 dir=os.path.expanduser("~/.torch/virtex_cache"),
--> 100 filename=os.path.basename(config_path).replace(".yaml", ".pth")
101 )
102 CheckpointManager(model=model).load(checkpoint_path)

/usr/local/lib/python3.6/dist-packages/fvcore/common/download.py in download(url, dir, filename, progress)
56 unit="B", unit_scale=True, miniters=1, desc=filename, leave=True
57 ) as t:
---> 58 tmp, _ = request.urlretrieve(url, filename=tmp, reporthook=hook(t))
59
60 else:

/usr/lib/python3.6/urllib/request.py in urlretrieve(url, filename, reporthook, data)
246 url_type, path = splittype(url)
247
--> 248 with contextlib.closing(urlopen(url, data)) as fp:
249 headers = fp.info()
250

/usr/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
568 if http_err:
569 args = (dict, 'default', 'http_error_default') + orig_args
--> 570 return self._call_chain(*args)
571
572 # XXX probably also want an abstract factory that knows when it makes

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 class HTTPDefaultErrorHandler(BaseHandler):
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651
652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found`

I think its because the URL is not being generated correctly there is an extra '/'
Failed to download https://umich.box.com/shared/static//fm1nq819q74vr0kqcd3gkivlzf06xvko.pth

Question about SentencePiece [SOS] and [EOS] ID.

Hi,
I saw that in SentencePieceTrainer, as below you made EOS and BOS and MASK and PAS tokens equal to Zero
" --bos_id=-1 --eos_id=-1" " --control_symbols=[SOS],[EOS],[MASK]"
However, during the captioning, you define
sos_index: int = 1, eos_index: int = 2,
I am wondering if these setups , have any effects?

unable to find a valid cuDNN algorithm to run convolution

sorry to bother you, but I run into this problem and can not to find a way to fix it.
it happens when I train the base virtex model.
I have update the cuDNN version into 8.0.3, the former version is 7.6.5. both version have this error.

Update Requirements: `numpy` missing

Dear authors,

Thank you very much for providing the code and instructions for using this library! I tried to use the instructions from http://kdexd.xyz/virtex/virtex/usage/setup_dependencies.html:

cd virtex
conda create -n virtex python=3.6
conda activate virtex
pip install -r requirements.txt

I run into a problem that numpy was not found when I ran pip install -r requirements.txt.
I fixed it by running conda install numpy before that.

Maybe you could add it to the requirements.txt so others can use the same version you used.

Many thanks,
George Batchkala

Using model without running script

Is it possible to use this model to caption one image in code, for example, in google colab, without running script?

Training loss acts strangely after resuming

Hi,

I want to reproduce your pre-training result. There was a accident that caused the interruption of my training. I restored it by the flag "--resume-from" and it acts weirdly. The training and validation loss jumped dramatically at the beginning and then decreased, which seems there is a problem about the restoring. Could you help me about this?

Fine tuning Virtex for image captioning

Hi there,
I am aware that Virtex used image captioning as a pretraining task and not as the "final goal", but I was wondering whether one could go on fine-tuning the pretrained model (e.g. bicaptioning_R_50_L1_H2048) with additional COCOcaptions-like data in order to get an improved captioning model.
Has anyone tried that or does anyone have any suggestion how to do it? Can any of the scripts in the repository be used/adapted for fine-tuning existing models?
Thanks a lot! :)

error in evaluating image caption, not find or load main class edu.stanford.nlp.process.PTBTokenizer

Hi, this is really a great job and clear codebase.

However, I face a problem when I want to evaluate the Image Captioning on COCO Captions val2017. It said "Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer".

I have downloaded CoreNLP and Spice with your config 'download_spice.sh'. Besides, I also set the path in my bashrc file that is:
export CLASSPATH="$CLASSPATH:/path/to/virtex/virtex/utils/assets/SPICE-1.0/lib/stanford-corenlp-3.6.0.jar:/path/to/virtex/virtex/utils/assets/SPICE-1.0/lib/stanford-corenlp-3.6.0-models.jar for file in 'find /path/to/virtex/virtex/utils/assets/SPICE-1.0/lib -name "*.jar"'; do export CLASSPATH="$CLASSPATH:'realpath $file'"; done

I could run
java -mx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -outputFormat json -file input.txt
successfully with sentences in input.txt.

In addition, If I don't calculate metrics, i.e. without --calc-metrics, it works fine.

Do you have any idea about this problem?

Decoder Attention Weight Visualization

Hi, thanks for the awesome code base!

I'm looking to produce visualizations of decoder attention weights similar to those shown in the paper, but I don't think that you have implemented this feature in the published code (although I may have overlooked it!)

As best I can tell, the way this would be done is by using a new TransformerDecoderLayer which returns the multihead attention's attn_output_weights in its forward method. The visualized attention weights when predicting a single token would then be the average of these weights across all heads. The problem that I am finding is that the visualized weights seem to mostly appear in the center of the image during captioning on the coco dataset, but the results in the paper show reasonable variation in these weights as tokens are predicted.

Is this the method that you used to create the visualization? Any insight into how this was previously done would be appreciated!

torch.hub.load("kdexd/virtex", "resnet50", pretrained=True) not working

I tried running this in Colab environment.

Got the below error:

KeyError                                  Traceback (most recent call last)

<ipython-input-5-e8ec27705300> in <module>()
      1 import torch
      2 # model = torch.hub.load('pytorch/vision:v0.9.0', 'alexnet', pretrained=True)
----> 3 model = torch.hub.load("kdexd/virtex", "resnet50", pretrained=True)
      4 model.eval()

2 frames

/root/.cache/torch/hub/kdexd_virtex_master/hubconf.py in resnet50(pretrained, **kwargs)
     31                 "https://umich.box.com/shared/static/gsjqm4i4fm1wpzi947h27wweljd8gcpy.pth",
     32                 progress=False,
---> 33             )["model"]
     34         )
     35     return model

KeyError: 'model'

Can you let me know the fix ?

BatchNormalization's Running Stats are Accumulated in ImageNet Linear Evaluation

Hi,

Thanks for the nice paper and clear code!

I found that the models are set with .train() in clf_linear.py. Thus the running averages (i.e., the states) of BatchNormalization layers will be accumulated when training the ImageNet datasets (via calling the forward function), and the backbone model seems not to be fully frozen. Is it a special design for this fine-tuning task?

Best,
Hao

eval_captioning.py - RuntimeError: gather_out_cuda(): Expected dtype int64 for index

Hello

I'm trying to perform inference on new images

I've followed the setup instructions here

This is the CMD i'm running:

python scripts/eval_captioning.py \
    --config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
    --checkpoint-path /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth \
    --data-root /path/to/images_dir \
    --output /path/to/save/predictions.json \
    --num-gpus-per-machine 1 \
    --cpu-workers 4

This is my stacktrace:

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 

2020-12-16 16:34:21.685 | INFO     | virtex.utils.checkpointing:load:156 - Rank 0: Loading checkpoint from /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth
2020-12-16 16:34:22.134 | INFO     | virtex.utils.checkpointing:load:166 - Rank 0: Loading model from /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth
Traceback (most recent call last):
  File "scripts/eval_captioning.py", line 114, in <module>
    main(_A)
  File "scripts/eval_captioning.py", line 78, in main
    output_dict = model(val_batch)
  File "/yonatan/virtex/virtex_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/yonatan/virtex/virtex/models/captioning.py", line 176, in forward
    start_predictions, beam_search_step
  File "/yonatan/virtex/virtex/utils/beam_search.py", line 250, in search
    predictions[timestep].gather(1, cur_backpointers).unsqueeze(2)
RuntimeError: gather_out_cuda(): Expected dtype int64 for index

This is my system (Linux):

(virtex_env) (base) [p virtex]$ python
Python 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0'

This is my datasets structure:

(virtex_env) (base) [p virtex]$ ls datasets/
coco  vocab
(virtex_env) (base) [p virtex]$ ls datasets/vocab/
coco_10k.model  coco_10k.vocab
(virtex_env) (base) [p virtex]$ ls datasets/coco/
annotations  serialized_train.lmdb  serialized_train.lmdb-lock  serialized_val.lmdb  serialized_val.lmdb-lock  train2017  val2017

What am I missing?

Thanks

Removed link for pretrained model

Hi,

I am trying to download the pertained model for image captioning. But the download link has been removed. Could you please update the download link?

Running Image Captioning Inference on Arbitrary Images: FileNotFound

python scripts/eval_captioning.py \ --config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \ --checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \ --data-root /path/to/images_dir \ --output /path/to/save/predictions.json \ --num-gpus-per-machine 1 \ --cpu-workers 4

Hi, I am wondering where can I download the pretrain_config.yaml and the checkpoint file. I cannot find the link on Github README. Is the only way to get the pre-trained model retraining the whole model?

Cheers,

Reproduction of SPICE results (eval_captioning.py)

Hi, nice work - thanks for sharing the code.

I'm trying to reproduce CIDEr & SPICE results, as appear in figure 4 in your paper.
I simply load the pre-trained models (specifically, those that correspond to H=1024 & H=2048 in width-ablation) and run eval_captioning.py, after building the vocabulary.
The values I get are much lower than those in fig. 4, which seems like some inconsistency in the pre/post-processing.
Should I expect the same values in this experiment? If so, is there any change I should perform?

The weight file on http://kdexd.xyz/virtex/virtex/usage/model_zoo.html was canceled

Hello, your work is very attractive to me, but when I reproduced your excellent research results, I found that the weight file on http://kdexd.xyz/virtex/virtex/usage/model_zoo.html was canceled. I hope you can provide effective links to the weighted documents and reproduce your excellent work.

scripts/eval_captioning.py error

Pre-training on another dataset

Hi,

Thank you for making this code public!

I want to pre-train a captioning model on another dataset (ARCH dataset). I went through your codebase and realized that first I need to create a Dataset class for my dataset similar to your Dataset class in virtex/data/datasets/coco_captions.py. Next, I will need to make a modified version of virtex/data/datasets/captioning.py.

Somehow the files in virtex/data/datasets/ are all ignored by git and I can't make any of them become visible. Can you please help me with it? I would also appreciate any suggestions on how to modify the code at this stage in order to cause the least amount of disruption to the functions and classes which rely on the Dataset classes.

Many thanks,
George Batchkala

No loss when pretraining on token classification

I am trying to pretrain using the token classification method. I copied this repo and was just trying to reproduce the results from the study. I am experiencing problems when pretraining using token classification. It seems as though the loss values are not in the output_dict variable.

When I use pretrain_virtex.py and log every 20 iterations, I get the following output.
2021-11-16T12:20:04.960052+0000: Iter 20 | Time: 0.764 sec | ETA: 54h 39m [Loss nan] [GPU 8774 MB]

Do you have any idea what could be wrong in the code?

Difficulty with evaluation on downstream task of Object detection

Hi,

I used the setup tutorial to setup Virtex on my local Macbook Pro machine.

I tried to run Object detection on Pascal VOC and I got this error

Do you have any advice? Did I mess something up during setup?

Training with new Random Seed does not shuffle data

I've been adapting the example scripts to my own training task, and I've noticed that the scripts do not handle different random seeds as expected. I've found this problem in two places, but there might be more:

virtex/scripts/clf_linear.py

Lines 104 to 109 in 2baba8a

 sampler=DistributedSampler( 

 train_dataset, 

 num_replicas=dist.get_world_size(), 

 rank=dist.get_rank(), 

 shuffle=True, 

 ),

virtex/scripts/pretrain_virtex.py

Line 68 in 2baba8a

DistributedSampler(train_dataset, shuffle=True) # type: ignore

The problem is that the DistributedSampler (from PyTorch 1.9.0) requires kwarg "seed" to shuffle differently, when shuffle=True. I believe that the correct use of DistributedSampler for training with different random seeds would be to add the kwarg seed=_DOWNC.RANDOM_SEED when DistributedSampler is initialized in these two places. As for reshuffling on additional epochs, DistributedSampler will add the seed to the epoch number, so nothing needs to be changed during epoch-setting for the sampler.

https://github.com/pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/torch/utils/data/distributed.py#L100

Please let me know your thoughts, or if I may have missed something.

Approximate Pretraining time?

Can you share the approximate time for pretraining?

	sampler=DistributedSampler(
	train_dataset,
	num_replicas=dist.get_world_size(),
	rank=dist.get_rank(),
	shuffle=True,
	),

kdexd / virtex Goto Github PK

virtex's People

Contributors

Stargazers

Watchers

Forkers

virtex's Issues

Recommend Projects

Recommend Topics

Recommend Org