Starter code in PyTorch for the Visual Dialog challenge

Home Page: https://visualdialog.org/challenge/2019

License: BSD 3-Clause "New" or "Revised" License

Python 99.22% Dockerfile 0.78%

visual-dialog cvpr-2019

visdial-challenge-starter-pytorch's People

Contributors

Stargazers

Watchers

visdial-challenge-starter-pytorch's Issues

Late Fusion demo not working

Hi @abhshkdz @kdexd,

Could you please check the url for the "Late Fusion (2019)" demo on "Visual Chatbot".

It currently redirects to http://demo-visualdialog.cloudcv.org/

extract image features

Hi, thanks for sharing the visual dialog challenge code. If i extract image features by mtself, where can i get "config_faster_rcnn_x101.yaml" and "model_faster_rcnn_x101.pkl"?

missing file 'data/visdial_1.0_train.json' when running train.py

Thanks for posting the visual dialog challenge code. When going through the readme file, I could follow it up to the step where wee invoke training. When running
python train.py --config-yml configs/lf_disc_faster_rcnn_x101.yml --gpu-ids 4 5 6 7
I get the following error. I cannot seem to find 'data/visdial_1.0_train.json'

(visdialch) beymer@alm00:~/VisualDialog/visdial-challenge-starter-pytorch$ python train.py --config-yml configs/lf_disc_faster_rcnn_x101.yml --gpu-ids 4 5 6 7
dataset:
concat_history: true
image_features_test_h5: data/features_faster_rcnn_x101_test.h5
image_features_train_h5: data/features_faster_rcnn_x101_train.h5
image_features_val_h5: data/features_faster_rcnn_x101_val.h5
img_norm: 1
max_sequence_length: 20
vocab_min_count: 5
word_counts_json: data/visdial_1.0_word_counts_train.json
model:
decoder: disc
dropout: 0.5
encoder: lf
img_feature_size: 2048
lstm_hidden_size: 512
lstm_num_layers: 2
word_embedding_size: 300
solver:
batch_size: 128
initial_lr: 0.01
lr_gamma: 0.1
lr_milestones:

4
7
10
num_epochs: 20
training_splits: train
warmup_epochs: 1
warmup_factor: 0.2

config_yml : configs/lf_disc_faster_rcnn_x101.yml
train_json : data/visdial_1.0_train.json
val_json : data/visdial_1.0_val.json
val_dense_json : data/visdial_1.0_val_dense_annotations.json
gpu_ids : [4, 5, 6, 7]
cpu_workers : 4
overfit : False
validate : False
in_memory : False
save_dirpath : checkpoints/
load_pthpath :
Traceback (most recent call last):
File "train.py", line 104, in
config["dataset"], args.train_json, overfit=args.overfit, in_memory=args.in_memory
File "/home/beymer/VisualDialog/visdial-challenge-starter-pytorch/visdialch/data/dataset.py", line 26, in init
self.dialogs_reader = DialogsReader(dialogs_jsonpath)
File "/home/beymer/VisualDialog/visdial-challenge-starter-pytorch/visdialch/data/readers.py", line 35, in init
with open(dialogs_jsonpath, "r") as visdial_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data/visdial_1.0_train.json'

resource of the Faster-RCNN pre-trained on Visual Genome

Hi, thanks for your code.
May I ask which faseter-rcnn model you use to extract the image feature? I would like to use this model to extract the image feature by myself. Thanks so much.

generative decoder

Thank you for your code. Has the author tried to use generative decoder?

Softmax dimension is wrong.

I think there is critical mistake in decoder code.

Dimension for implementing Softmax is wrong.

Since the score tensor's size is [batch_size x answer_options], dimension should be changed
(dim 0 -> dim 1)

Bounding box coordinates

Hi @kdexd ,

Is it possible to release the bounding box information (co-ordinates/labels) of the detectron features to actually map these features to the original images.

Thanks.

How to create features_faster_rcnn_x101_train.h5?

Thank you!

Extracting Actual Images

Could you elaborate on the relationship between the image_ids in the new dataset with respect to the COCO image_ids. We're trying to visualize some of the images using a script hooked into the Coco api, but there seems to be no correlation between the image_ids used here and the ones in Coco.
Is there something we're missing?

About image features

Hello! Thank you for providing the features of the image. However I didn't find the information of boxes of these features. Can you provide the image features with the information of boxes? Thanks a lot.

VisDial v0.9

Hello.
I want to use VisDial v0.9.
So, I run the prepro.py -version 0.9 and I got the visdial_datta.h5 and visdial_params.json.
But, when I run the train.py, I got this error.
What can I dot to solve this problem?

Traceback (most recent call last):
File "train.py", line 146, in
for i, batch in enumerate(dataloader):
File "/home/ailab/anaconda2/envs/visdial-chal/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/ailab/anaconda2/envs/visdial-chal/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/ailab/visdial-challenge-starter-pytorch/dataloader.py", line 164, in getitem
item['num_rounds'] = self.data[dtype + '_num_rounds'][idx]
IndexError: index 87666 is out of range for dimension 0 (of size 82783)

Tokenizing is slow

The tokenization process is too slow, specifically for debug needs. A debug option, or option to load pre-processed file will be appreciated.

Closed

Training step is too slow

Hi,
Thank you for your code.
As I go deeply into this code, I found the training step is particular slow. The problem here (I guess) is the dataset construction processing, where too much functions (e.g., padding sequences, getting history) are implemented in the __get_item__.
I wonder, have you tried to wrap these functions in the __init__ function? This might lead to more memory consuming but will absolutely accelerate the training process.
Thanks.

How to execute visual dialog demo through colab

I'm final year student. I got interested in this visual chatbot while I searching for my final year project. So you can explain me in detail.

RuntimeError: DataLoader worker (pid 22114) is killed by signal: Killed.

If I set the cpu-workers to be 4 , then after hundreds of iterations, I got error “RuntimeError: DataLoader worker (pid 22114) is killed by signal: Killed.”

I searched related topics, some suggested “cpu-workers=0”. So I set it to be 0 but after hundreds of iterations, I still got killed. This time, only “Killed” is given. No other hints.

In the meantime, when I set ''cpu-workers=0'', training is too slow ，about 1.1～2 s/it.

In the end, I want to know how long it took you to train this model.

why do max_sequence_length - 1 in dataset.py

`def _pad_sequences(self, sequences: List[List[int]]):
"""Given tokenized sequences (either questions, answers or answer
options, tokenized in __getitem__), padding them to maximum
specified sequence length. Return as a tensor of size
``(*, max_sequence_length)``.

    This method is only called in ``__getitem__``, chunked out separately
    for readability.

    Parameters
    ----------
    sequences : List[List[int]]
        List of tokenized sequences, each sequence is typically a
        List[int].

    Returns
    -------
    torch.Tensor, torch.Tensor
        Tensor of sequences padded to max length, and length of sequences
        before padding.
    """

    for i in range(len(sequences)):
        sequences[i] = sequences[i][
            : self.config["max_sequence_length"] - 1
        ]
    sequence_lengths = [len(sequence) for sequence in sequences]

    # Pad all sequences to max_sequence_length.
    maxpadded_sequences = torch.full(
        (len(sequences), self.config["max_sequence_length"]),
        fill_value=self.vocabulary.PAD_INDEX,
    )
    padded_sequences = pad_sequence(
        [torch.tensor(sequence) for sequence in sequences],
        batch_first=True,
        padding_value=self.vocabulary.PAD_INDEX,
    )
    maxpadded_sequences[:, : padded_sequences.size(1)] = padded_sequences
    return maxpadded_sequences, sequence_lengths`

How to finetune the model (generated by train.py) with dense annotations (train + val)?

many thanks!

How to get multi-gpu to work?

I created an Compute Engine instance on Google Cloud Compute with 4 K80 gpus, followed the instructions on the repo to setup the Anaconda environment and download the data. I ran the training with:

python train.py --gpu-ids 0 1 2 3

The batch_size is 128 and cpu_workers is 4.

During training, I use nvidia-smi and can see that all 4 gpus are utilized (but rarely at 100%). Furthermore, the the seconds per iteration is a lot worse compared to a single GPU (8 vs 2).

What other configs should I adjust to get a speedup from using mulitple gpus?

About concat_history in dataset.py

Hi, concat_history flag in dataset.py is quite confusing.

I think if self.config.get("concat_history", True): would be correct
not if self.config.get("concat_history", False):.

Due to the code above, the dataloader returns the concatenated history if concat_history==False.

Torch=1.0.0 is not found

ERROR: Could not find a version that satisfies the requirement torch==1.0.0 (from -r requirements.txt (line 10)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.0.0 (from -r requirements.txt (line 10))?

The 'answer' would be 0 if the answer is one word

visdial-challenge-starter-pytorch/visdialch/data/dataset.py

Line 139 in 556def3

[dialog_round["answer"][:-1] for dialog_round in dialog]

Hi, the code here confuses me. Since 'dialog_round["answer"][:-1]' and 'dialog_round["answer"][1:]' ignore the last and the first word respectively, if the answer is one word, the 'answers_in' and 'answers_out' would be '0'. In this situation, the model would not learn anything from this sample.
Not sure if I am understanding this right, looking forward to your reply.
Thank you.

visdial_1.0_val_dense_annotations.json

Hi, where is the file of “visdial_1.0_val_dense_annotations.json”？

ffi.lua:56 expected align(#) on line 579

When run command
th prepro_img_vgg16.lua -imageRoot ../image_root -gpuid 0

there are errors:

/home/denniswu/torch/install/bin/lua: .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/denniswu/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579 stack traceback: [C]: in function 'error' .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require' prepro_img_vgg16.lua:3: in main chunk [C]: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

has anyone met this problem?

thanks in advance.

Shared memory issues with parallelization

Hi @kdexd

I am running into all kinds of shared memory errors after this commit 9c1ee36

pytorch/pytorch#8976
pytorch/pytorch#973

I guess this parallelization is not stable; sometimes it run while sometimes it breaks (even though after trying possible solutions) such as:

torch.multiprocessing.set_sharing_strategy('file_system')

# https://github.com/pytorch/pytorch/issues/973
import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048*4, rlimit[1]))

Is there a leak somewhere? Might be best to have a look.

Dense annotations of part of validation and training splits

Could you kindly update where we can download this file?
Thanks a lot.

Need suggestion about embeddings

I am trying to use elmo embeddings from allennlp and need some suggestion.

In the for loop of __getitem__ before you convert it to indices, I also save the raw_question

dialog[i]["raw_question"] = dialog[i]["question"] # Tokenized

which could then be converted to char_ids and elmo_emb

        
        
        ques_char_ids = batch_to_ids([dialog_round["raw_question"] for dialog_round in dialog])
        ques_elmo_emb = self._elmo_wrapper(ques_char_ids)


    def _elmo_wrapper(self, char_ids, max_sequence_length = None):
        # Refer: https://github.com/allenai/allennlp/issues/2659
        """
        Parameters
        ----------
        char_ids : torch.Tensor
            char ids of the raw sequences

        Returns
        -------
        torch.Tensor
            Tensor of sequences padded to max length

        """
        if not max_sequence_length:
            max_sequence_length = self.config["max_sequence_length"]
        # with torch.no_grad():
        #     elmo_seq = self.elmo(char_ids)['elmo_representations'][0]
        # elmo_seq = self.elmo(char_ids)['elmo_representations'][0].requires_grad_(False)
        elmo_seq = self.elmo(char_ids)['elmo_representations'][0].detach()
        batch_size, timesteps, emb_dim  = elmo_seq.size()
        if timesteps > max_sequence_length:
            elmo_emb = elmo_seq[:, :max_sequence_length, :]
        else:
            # Pad zeros
            zeroes_size = max_sequence_length - elmo_seq.size(1)
            zeros = torch.zeros(batch_size, zeroes_size, emb_dim).type_as(elmo_seq)
            elmo_emb = torch.cat([elmo_seq, zeros], 1)

        return elmo_emb

However the training gets too slow. Do you have any experience with elmo and suggest why it is happening?

I think one of the possible workaround is to extract and save the embeddings as a pre-processing step. Could you share your data generation scripts please.

generative decoder

Hi, Can you share the generative decoder model in baseline?

batra-mlp-lab / visdial-challenge-starter-pytorch Goto Github PK

visdial-challenge-starter-pytorch's People

Contributors

Stargazers

Watchers

Forkers

visdial-challenge-starter-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org