Giter Club home page Giter Club logo

virtex's People

Contributors

arjunmajum avatar dependabot[bot] avatar kdexd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

virtex's Issues

eval_captioning.py script has incorrect attributes

Hi, I'v got following error
File "scripts/eval_captioning.py", line 114, in
main(_A)
File "scripts/eval_captioning.py", line 76, in main
val_batch[key] = val_batch[key].to(device)
AttributeError: 'list' object has no attribute 'to'

i'v made following changes in code to run script without error:
for val_iteration, val_batch in enumerate(val_dataloader, start=1):
#for key in val_batch:
#print(val_batch[key])
val_batch["image"] = val_batch["image"].to(device)

Possible inconsistency in data preprocessing

Hi, thank you so much for sharing this code. It is very helpful.

However, I am confused about the data preprocessing configuration. In the config files, Caffe-style image mean and std is used, but it seams they are not used in the code. Instead, the code seems to hard-code torchvision-style mean and std (here). Can you confirm that both pretraining and fine-tuning use the latter?

Furthermore, I am not sure whether the images are in 0-255 range or 0-1. For Caffe-style mean and std, it should be 0-255, but it seems with your hard-coded mean and std, it should be 0-1. However, I noticed you are using opencv to load images, which loads in 0-255, and I did not find anywhere in the code that they are transformed into 0-1, except in supervised pretraining (here).

Could you please comment on the aforementioned issues? Especially it is important to make sure the config is identical for all pretraining and downstream settings. Since you fine-tune all layers and don't freeze the stem, it is hard to notice if such inconsistencies exist, because the fine-tuning process would fix them to some extent.

Thank you so much.

VirTex for CC-like classification problems

Great project in the right direction, i.e. getting about the same results with less compute.

In your paper, you mention that you are discarding the text-head and only using the visual backbone and future research could leverage the text-head – do you think it could develop comparable performance to BERT-like models trained on text corpuses?

Also, I’m very interested in using VirTex for classification problems such as Conceptual Captions – Did you try / Would you estimate performance improvements using VirTex’s Visual Backbone + BERT or VirTex Visual & Textual (not sure if the latter would work) over VilBERT or VisualBERT?

run on single input image

Hi,

I would like to evaluate your work on a single image for image captioning. Can you tell me the steps I should follow for a single input? For instance, given a folder of images, how would I use your model for inference only on the folder of images?

Looking at captioning-task from your description, I am not sure how to go about using my own dataset for evaluation of the model.

Thanks

Cannot find this file: serialized_train.lmdb

When I was doing the pretraining, this errors showed up:

lmdb.Error: datasets/coco/serialized_train.lmdb: No such file or directory

I couldn't any instruction that generates this file. Anyone has this problem?

Wrong URL for downloading weights

import virtex.model_zoo as mz model = mz.get("width_ablations/bicaptioning_R_50_L1_H2048.yaml", pretrained=True)

When I run this script I get the following HTTP Error:
`HTTPError Traceback (most recent call last)
in ()
1 import virtex
2 import virtex.model_zoo as mz
----> 3 model = mz.get("width_ablations/bicaptioning_R_50_L1_H2048.yaml", pretrained=True)

18 frames
/content/virtex/virtex/model_zoo/model_zoo.py in get(config_path, pretrained)
98 checkpoint_url,
99 dir=os.path.expanduser("~/.torch/virtex_cache"),
--> 100 filename=os.path.basename(config_path).replace(".yaml", ".pth")
101 )
102 CheckpointManager(model=model).load(checkpoint_path)

/usr/local/lib/python3.6/dist-packages/fvcore/common/download.py in download(url, dir, filename, progress)
56 unit="B", unit_scale=True, miniters=1, desc=filename, leave=True
57 ) as t:
---> 58 tmp, _ = request.urlretrieve(url, filename=tmp, reporthook=hook(t))
59
60 else:

/usr/lib/python3.6/urllib/request.py in urlretrieve(url, filename, reporthook, data)
246 url_type, path = splittype(url)
247
--> 248 with contextlib.closing(urlopen(url, data)) as fp:
249 headers = fp.info()
250

/usr/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

/usr/lib/python3.6/urllib/request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

/usr/lib/python3.6/urllib/request.py in error(self, proto, *args)
568 if http_err:
569 args = (dict, 'default', 'http_error_default') + orig_args
--> 570 return self._call_chain(*args)
571
572 # XXX probably also want an abstract factory that knows when it makes

/usr/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 class HTTPDefaultErrorHandler(BaseHandler):
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651
652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found`

I think its because the URL is not being generated correctly there is an extra '/'
Failed to download https://umich.box.com/shared/static//fm1nq819q74vr0kqcd3gkivlzf06xvko.pth

Question about SentencePiece [SOS] and [EOS] ID.

Hi,
I saw that in SentencePieceTrainer, as below you made EOS and BOS and MASK and PAS tokens equal to Zero
" --bos_id=-1 --eos_id=-1" " --control_symbols=[SOS],[EOS],[MASK]"
However, during the captioning, you define
sos_index: int = 1, eos_index: int = 2,
I am wondering if these setups , have any effects?

unable to find a valid cuDNN algorithm to run convolution

sorry to bother you, but I run into this problem and can not to find a way to fix it.
it happens when I train the base virtex model.
I have update the cuDNN version into 8.0.3, the former version is 7.6.5. both version have this error.

Update Requirements: `numpy` missing

Dear authors,

Thank you very much for providing the code and instructions for using this library! I tried to use the instructions from http://kdexd.xyz/virtex/virtex/usage/setup_dependencies.html:

cd virtex
conda create -n virtex python=3.6
conda activate virtex
pip install -r requirements.txt

I run into a problem that numpy was not found when I ran pip install -r requirements.txt.
I fixed it by running conda install numpy before that.

Maybe you could add it to the requirements.txt so others can use the same version you used.

Many thanks,
George Batchkala

Training loss acts strangely after resuming

Hi,

I want to reproduce your pre-training result. There was a accident that caused the interruption of my training. I restored it by the flag "--resume-from" and it acts weirdly. The training and validation loss jumped dramatically at the beginning and then decreased, which seems there is a problem about the restoring. Could you help me about this?

Fine tuning Virtex for image captioning

Hi there,
I am aware that Virtex used image captioning as a pretraining task and not as the "final goal", but I was wondering whether one could go on fine-tuning the pretrained model (e.g. bicaptioning_R_50_L1_H2048) with additional COCOcaptions-like data in order to get an improved captioning model.
Has anyone tried that or does anyone have any suggestion how to do it? Can any of the scripts in the repository be used/adapted for fine-tuning existing models?
Thanks a lot! :)

error in evaluating image caption, not find or load main class edu.stanford.nlp.process.PTBTokenizer

Hi, this is really a great job and clear codebase.

However, I face a problem when I want to evaluate the Image Captioning on COCO Captions val2017. It said "Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer".

I have downloaded CoreNLP and Spice with your config 'download_spice.sh'. Besides, I also set the path in my bashrc file that is:
export CLASSPATH="$CLASSPATH:/path/to/virtex/virtex/utils/assets/SPICE-1.0/lib/stanford-corenlp-3.6.0.jar:/path/to/virtex/virtex/utils/assets/SPICE-1.0/lib/stanford-corenlp-3.6.0-models.jar for file in 'find /path/to/virtex/virtex/utils/assets/SPICE-1.0/lib -name "*.jar"'; do export CLASSPATH="$CLASSPATH:'realpath $file'"; done

I could run
java -mx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -outputFormat json -file input.txt
successfully with sentences in input.txt.

In addition, If I don't calculate metrics, i.e. without --calc-metrics, it works fine.

Do you have any idea about this problem?

Decoder Attention Weight Visualization

Hi, thanks for the awesome code base!

I'm looking to produce visualizations of decoder attention weights similar to those shown in the paper, but I don't think that you have implemented this feature in the published code (although I may have overlooked it!)

As best I can tell, the way this would be done is by using a new TransformerDecoderLayer which returns the multihead attention's attn_output_weights in its forward method. The visualized attention weights when predicting a single token would then be the average of these weights across all heads. The problem that I am finding is that the visualized weights seem to mostly appear in the center of the image during captioning on the coco dataset, but the results in the paper show reasonable variation in these weights as tokens are predicted.

Is this the method that you used to create the visualization? Any insight into how this was previously done would be appreciated!

torch.hub.load("kdexd/virtex", "resnet50", pretrained=True) not working

I tried running this in Colab environment.

Got the below error:

KeyError                                  Traceback (most recent call last)

<ipython-input-5-e8ec27705300> in <module>()
      1 import torch
      2 # model = torch.hub.load('pytorch/vision:v0.9.0', 'alexnet', pretrained=True)
----> 3 model = torch.hub.load("kdexd/virtex", "resnet50", pretrained=True)
      4 model.eval()

2 frames

/root/.cache/torch/hub/kdexd_virtex_master/hubconf.py in resnet50(pretrained, **kwargs)
     31                 "https://umich.box.com/shared/static/gsjqm4i4fm1wpzi947h27wweljd8gcpy.pth",
     32                 progress=False,
---> 33             )["model"]
     34         )
     35     return model

KeyError: 'model'

Can you let me know the fix ?

BatchNormalization's Running Stats are Accumulated in ImageNet Linear Evaluation

Hi,

Thanks for the nice paper and clear code!

I found that the models are set with .train() in clf_linear.py. Thus the running averages (i.e., the states) of BatchNormalization layers will be accumulated when training the ImageNet datasets (via calling the forward function), and the backbone model seems not to be fully frozen. Is it a special design for this fine-tuning task?

Best,
Hao

eval_captioning.py - RuntimeError: gather_out_cuda(): Expected dtype int64 for index

Hello

I'm trying to perform inference on new images

I've followed the setup instructions here

This is the CMD i'm running:

python scripts/eval_captioning.py \
    --config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \
    --checkpoint-path /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth \
    --data-root /path/to/images_dir \
    --output /path/to/save/predictions.json \
    --num-gpus-per-machine 1 \
    --cpu-workers 4

This is my stacktrace:

** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 

2020-12-16 16:34:21.685 | INFO     | virtex.utils.checkpointing:load:156 - Rank 0: Loading checkpoint from /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth
2020-12-16 16:34:22.134 | INFO     | virtex.utils.checkpointing:load:166 - Rank 0: Loading model from /tmp/.torch/virtex_cache/bicaptioning_R_50_L1_H2048.pth
Traceback (most recent call last):
  File "scripts/eval_captioning.py", line 114, in <module>
    main(_A)
  File "scripts/eval_captioning.py", line 78, in main
    output_dict = model(val_batch)
  File "/yonatan/virtex/virtex_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/yonatan/virtex/virtex/models/captioning.py", line 176, in forward
    start_predictions, beam_search_step
  File "/yonatan/virtex/virtex/utils/beam_search.py", line 250, in search
    predictions[timestep].gather(1, cur_backpointers).unsqueeze(2)
RuntimeError: gather_out_cuda(): Expected dtype int64 for index

This is my system (Linux):

(virtex_env) (base) [p virtex]$ python
Python 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.0'

This is my datasets structure:

(virtex_env) (base) [p virtex]$ ls datasets/
coco  vocab
(virtex_env) (base) [p virtex]$ ls datasets/vocab/
coco_10k.model  coco_10k.vocab
(virtex_env) (base) [p virtex]$ ls datasets/coco/
annotations  serialized_train.lmdb  serialized_train.lmdb-lock  serialized_val.lmdb  serialized_val.lmdb-lock  train2017  val2017

What am I missing?

Thanks

Removed link for pretrained model

Hi,

I am trying to download the pertained model for image captioning. But the download link has been removed. Could you please update the download link?

Running Image Captioning Inference on Arbitrary Images: FileNotFound

python scripts/eval_captioning.py \ --config /tmp/bicaptioning_R_50_L1_H2048/pretrain_config.yaml \ --checkpoint-path /tmp/bicaptioning_R_50_L1_H2048/checkpoint_500000.pth \ --data-root /path/to/images_dir \ --output /path/to/save/predictions.json \ --num-gpus-per-machine 1 \ --cpu-workers 4

Hi, I am wondering where can I download the pretrain_config.yaml and the checkpoint file. I cannot find the link on Github README. Is the only way to get the pre-trained model retraining the whole model?

Cheers,

Reproduction of SPICE results (eval_captioning.py)

Hi, nice work - thanks for sharing the code.

I'm trying to reproduce CIDEr & SPICE results, as appear in figure 4 in your paper.
I simply load the pre-trained models (specifically, those that correspond to H=1024 & H=2048 in width-ablation) and run eval_captioning.py, after building the vocabulary.
The values I get are much lower than those in fig. 4, which seems like some inconsistency in the pre/post-processing.
Should I expect the same values in this experiment? If so, is there any change I should perform?

scripts/eval_captioning.py error

Hi, I'v got following error
File "scripts/eval_captioning.py", line 114, in
main(_A)
File "scripts/eval_captioning.py", line 76, in main
val_batch[key] = val_batch[key].to(device)
AttributeError: 'list' object has no attribute 'to'

i'v made following changes in code to run script without error:
for val_iteration, val_batch in enumerate(val_dataloader, start=1):
#for key in val_batch:
#print(val_batch[key])
val_batch["image"] = val_batch["image"].to(device)

Pre-training on another dataset

Hi,

Thank you for making this code public!

I want to pre-train a captioning model on another dataset (ARCH dataset). I went through your codebase and realized that first I need to create a Dataset class for my dataset similar to your Dataset class in virtex/data/datasets/coco_captions.py. Next, I will need to make a modified version of virtex/data/datasets/captioning.py.

Somehow the files in virtex/data/datasets/ are all ignored by git and I can't make any of them become visible. Can you please help me with it? I would also appreciate any suggestions on how to modify the code at this stage in order to cause the least amount of disruption to the functions and classes which rely on the Dataset classes.

Many thanks,
George Batchkala

No loss when pretraining on token classification

I am trying to pretrain using the token classification method. I copied this repo and was just trying to reproduce the results from the study. I am experiencing problems when pretraining using token classification. It seems as though the loss values are not in the output_dict variable.

When I use pretrain_virtex.py and log every 20 iterations, I get the following output.
2021-11-16T12:20:04.960052+0000: Iter 20 | Time: 0.764 sec | ETA: 54h 39m [Loss nan] [GPU 8774 MB]

Do you have any idea what could be wrong in the code?

Training with new Random Seed does not shuffle data

I've been adapting the example scripts to my own training task, and I've noticed that the scripts do not handle different random seeds as expected. I've found this problem in two places, but there might be more:

sampler=DistributedSampler(
train_dataset,
num_replicas=dist.get_world_size(),
rank=dist.get_rank(),
shuffle=True,
),

DistributedSampler(train_dataset, shuffle=True) # type: ignore

The problem is that the DistributedSampler (from PyTorch 1.9.0) requires kwarg "seed" to shuffle differently, when shuffle=True. I believe that the correct use of DistributedSampler for training with different random seeds would be to add the kwarg seed=_DOWNC.RANDOM_SEED when DistributedSampler is initialized in these two places. As for reshuffling on additional epochs, DistributedSampler will add the seed to the epoch number, so nothing needs to be changed during epoch-setting for the sampler.

https://github.com/pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/torch/utils/data/distributed.py#L100

Please let me know your thoughts, or if I may have missed something.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.