Giter Club home page Giter Club logo

visdial-challenge-starter-pytorch's People

Contributors

abhshkdz avatar dependabot[bot] avatar hsm207 avatar shubhamagarwal92 avatar yashkant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visdial-challenge-starter-pytorch's Issues

extract image features

Hi, thanks for sharing the visual dialog challenge code. If i extract image features by mtself, where can i get "config_faster_rcnn_x101.yaml" and "model_faster_rcnn_x101.pkl"?

missing file 'data/visdial_1.0_train.json' when running train.py

Thanks for posting the visual dialog challenge code. When going through the readme file, I could follow it up to the step where wee invoke training. When running
python train.py --config-yml configs/lf_disc_faster_rcnn_x101.yml --gpu-ids 4 5 6 7
I get the following error. I cannot seem to find 'data/visdial_1.0_train.json'

(visdialch) beymer@alm00:~/VisualDialog/visdial-challenge-starter-pytorch$ python train.py --config-yml configs/lf_disc_faster_rcnn_x101.yml --gpu-ids 4 5 6 7
dataset:
concat_history: true
image_features_test_h5: data/features_faster_rcnn_x101_test.h5
image_features_train_h5: data/features_faster_rcnn_x101_train.h5
image_features_val_h5: data/features_faster_rcnn_x101_val.h5
img_norm: 1
max_sequence_length: 20
vocab_min_count: 5
word_counts_json: data/visdial_1.0_word_counts_train.json
model:
decoder: disc
dropout: 0.5
encoder: lf
img_feature_size: 2048
lstm_hidden_size: 512
lstm_num_layers: 2
word_embedding_size: 300
solver:
batch_size: 128
initial_lr: 0.01
lr_gamma: 0.1
lr_milestones:

  • 4
  • 7
  • 10
    num_epochs: 20
    training_splits: train
    warmup_epochs: 1
    warmup_factor: 0.2

config_yml : configs/lf_disc_faster_rcnn_x101.yml
train_json : data/visdial_1.0_train.json
val_json : data/visdial_1.0_val.json
val_dense_json : data/visdial_1.0_val_dense_annotations.json
gpu_ids : [4, 5, 6, 7]
cpu_workers : 4
overfit : False
validate : False
in_memory : False
save_dirpath : checkpoints/
load_pthpath :
Traceback (most recent call last):
File "train.py", line 104, in
config["dataset"], args.train_json, overfit=args.overfit, in_memory=args.in_memory
File "/home/beymer/VisualDialog/visdial-challenge-starter-pytorch/visdialch/data/dataset.py", line 26, in init
self.dialogs_reader = DialogsReader(dialogs_jsonpath)
File "/home/beymer/VisualDialog/visdial-challenge-starter-pytorch/visdialch/data/readers.py", line 35, in init
with open(dialogs_jsonpath, "r") as visdial_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data/visdial_1.0_train.json'

generative decoder

Thank you for your code. Has the author tried to use generative decoder?

Softmax dimension is wrong.

I think there is critical mistake in decoder code.

Dimension for implementing Softmax is wrong.

Since the score tensor's size is [batch_size x answer_options], dimension should be changed
(dim 0 -> dim 1)

Bounding box coordinates

Hi @kdexd ,

Is it possible to release the bounding box information (co-ordinates/labels) of the detectron features to actually map these features to the original images.

Thanks.

Extracting Actual Images

Could you elaborate on the relationship between the image_ids in the new dataset with respect to the COCO image_ids. We're trying to visualize some of the images using a script hooked into the Coco api, but there seems to be no correlation between the image_ids used here and the ones in Coco.
Is there something we're missing?

About image features

Hello! Thank you for providing the features of the image. However I didn't find the information of boxes of these features. Can you provide the image features with the information of boxes? Thanks a lot.

VisDial v0.9

Hello.
I want to use VisDial v0.9.
So, I run the prepro.py -version 0.9 and I got the visdial_datta.h5 and visdial_params.json.
But, when I run the train.py, I got this error.
What can I dot to solve this problem?

Traceback (most recent call last):
File "train.py", line 146, in
for i, batch in enumerate(dataloader):
File "/home/ailab/anaconda2/envs/visdial-chal/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/ailab/anaconda2/envs/visdial-chal/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/ailab/visdial-challenge-starter-pytorch/dataloader.py", line 164, in getitem
item['num_rounds'] = self.data[dtype + '_num_rounds'][idx]
IndexError: index 87666 is out of range for dimension 0 (of size 82783)

Tokenizing is slow

The tokenization process is too slow, specifically for debug needs. A debug option, or option to load pre-processed file will be appreciated.

Training step is too slow

Hi,
Thank you for your code.
As I go deeply into this code, I found the training step is particular slow. The problem here (I guess) is the dataset construction processing, where too much functions (e.g., padding sequences, getting history) are implemented in the __get_item__.
I wonder, have you tried to wrap these functions in the __init__ function? This might lead to more memory consuming but will absolutely accelerate the training process.
Thanks.

RuntimeError: DataLoader worker (pid 22114) is killed by signal: Killed.

If I set the cpu-workers to be 4 , then after hundreds of iterations, I got error “RuntimeError: DataLoader worker (pid 22114) is killed by signal: Killed.”

I searched related topics, some suggested “cpu-workers=0”. So I set it to be 0 but after hundreds of iterations, I still got killed. This time, only “Killed” is given. No other hints.

In the meantime, when I set ''cpu-workers=0'', training is too slow ,about 1.1~2 s/it.

In the end, I want to know how long it took you to train this model.

why do max_sequence_length - 1 in dataset.py

`def _pad_sequences(self, sequences: List[List[int]]):
"""Given tokenized sequences (either questions, answers or answer
options, tokenized in __getitem__), padding them to maximum
specified sequence length. Return as a tensor of size
``(*, max_sequence_length)``.

    This method is only called in ``__getitem__``, chunked out separately
    for readability.

    Parameters
    ----------
    sequences : List[List[int]]
        List of tokenized sequences, each sequence is typically a
        List[int].

    Returns
    -------
    torch.Tensor, torch.Tensor
        Tensor of sequences padded to max length, and length of sequences
        before padding.
    """

    for i in range(len(sequences)):
        sequences[i] = sequences[i][
            : self.config["max_sequence_length"] - 1
        ]
    sequence_lengths = [len(sequence) for sequence in sequences]

    # Pad all sequences to max_sequence_length.
    maxpadded_sequences = torch.full(
        (len(sequences), self.config["max_sequence_length"]),
        fill_value=self.vocabulary.PAD_INDEX,
    )
    padded_sequences = pad_sequence(
        [torch.tensor(sequence) for sequence in sequences],
        batch_first=True,
        padding_value=self.vocabulary.PAD_INDEX,
    )
    maxpadded_sequences[:, : padded_sequences.size(1)] = padded_sequences
    return maxpadded_sequences, sequence_lengths`

How to get multi-gpu to work?

I created an Compute Engine instance on Google Cloud Compute with 4 K80 gpus, followed the instructions on the repo to setup the Anaconda environment and download the data. I ran the training with:

python train.py --gpu-ids 0 1 2 3

The batch_size is 128 and cpu_workers is 4.

During training, I use nvidia-smi and can see that all 4 gpus are utilized (but rarely at 100%). Furthermore, the the seconds per iteration is a lot worse compared to a single GPU (8 vs 2).

What other configs should I adjust to get a speedup from using mulitple gpus?

About concat_history in dataset.py

Hi, concat_history flag in dataset.py is quite confusing.

I think if self.config.get("concat_history", True): would be correct
not if self.config.get("concat_history", False):.

Due to the code above, the dataloader returns the concatenated history if concat_history==False.

Torch=1.0.0 is not found

ERROR: Could not find a version that satisfies the requirement torch==1.0.0 (from -r requirements.txt (line 10)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.0.0 (from -r requirements.txt (line 10))?

The 'answer' would be 0 if the answer is one word

[dialog_round["answer"][:-1] for dialog_round in dialog]

Hi, the code here confuses me. Since 'dialog_round["answer"][:-1]' and 'dialog_round["answer"][1:]' ignore the last and the first word respectively, if the answer is one word, the 'answers_in' and 'answers_out' would be '0'. In this situation, the model would not learn anything from this sample.
Not sure if I am understanding this right, looking forward to your reply.
Thank you.

ffi.lua:56 expected align(#) on line 579

When run command
th prepro_img_vgg16.lua -imageRoot ../image_root -gpuid 0

there are errors:

/home/denniswu/torch/install/bin/lua: .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/denniswu/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579 stack traceback: [C]: in function 'error' .../denniswu/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require' prepro_img_vgg16.lua:3: in main chunk [C]: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

has anyone met this problem?

thanks in advance.

Shared memory issues with parallelization

Hi @kdexd

I am running into all kinds of shared memory errors after this commit 9c1ee36

pytorch/pytorch#8976
pytorch/pytorch#973

I guess this parallelization is not stable; sometimes it run while sometimes it breaks (even though after trying possible solutions) such as:

torch.multiprocessing.set_sharing_strategy('file_system')

# https://github.com/pytorch/pytorch/issues/973
import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048*4, rlimit[1]))

Is there a leak somewhere? Might be best to have a look.

Need suggestion about embeddings

I am trying to use elmo embeddings from allennlp and need some suggestion.

In the for loop of __getitem__ before you convert it to indices, I also save the raw_question

dialog[i]["raw_question"] = dialog[i]["question"] # Tokenized

which could then be converted to char_ids and elmo_emb

        
        
        ques_char_ids = batch_to_ids([dialog_round["raw_question"] for dialog_round in dialog])
        ques_elmo_emb = self._elmo_wrapper(ques_char_ids)


    def _elmo_wrapper(self, char_ids, max_sequence_length = None):
        # Refer: https://github.com/allenai/allennlp/issues/2659
        """
        Parameters
        ----------
        char_ids : torch.Tensor
            char ids of the raw sequences

        Returns
        -------
        torch.Tensor
            Tensor of sequences padded to max length

        """
        if not max_sequence_length:
            max_sequence_length = self.config["max_sequence_length"]
        # with torch.no_grad():
        #     elmo_seq = self.elmo(char_ids)['elmo_representations'][0]
        # elmo_seq = self.elmo(char_ids)['elmo_representations'][0].requires_grad_(False)
        elmo_seq = self.elmo(char_ids)['elmo_representations'][0].detach()
        batch_size, timesteps, emb_dim  = elmo_seq.size()
        if timesteps > max_sequence_length:
            elmo_emb = elmo_seq[:, :max_sequence_length, :]
        else:
            # Pad zeros
            zeroes_size = max_sequence_length - elmo_seq.size(1)
            zeros = torch.zeros(batch_size, zeroes_size, emb_dim).type_as(elmo_seq)
            elmo_emb = torch.cat([elmo_seq, zeros], 1)

        return elmo_emb

However the training gets too slow. Do you have any experience with elmo and suggest why it is happening?

I think one of the possible workaround is to extract and save the embeddings as a pre-processing step. Could you share your data generation scripts please.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.