Hi, First of all great work with codebase and paper. Model architect

Here is my file after 16 epochs, <a href="https://github.com/v-iashin/MDVC/files/4

Shouldn't the end_token (3) be the last element in tensor? </blockquo

Requesting tensorboard log file for best model about mdvc HOT 7 CLOSED

v-iashin commented on July 17, 2024

Requesting tensorboard log file for best model

from mdvc.

Comments (7)

VP-0822 commented on July 17, 2024 1

Here is my file after 16 epochs,
events.out.tfevents.1592425758.04a1b10b6b08.125.zip

from mdvc.

v-iashin commented on July 17, 2024

Hi. I am glad you liked the paper and the source code 🙂.

I inspected your tb (thanks for it btw) and found it to be quite different from mine. I am afraid something went wrong on the way. In my case, it starts by repeating itself (sometimes like yours), but after a couple of epochs, it makes the captions more appealing. Your curves look promising, though!

I would try to inspect what your decoder (and generator) are doing. Also, check out your loss design as your model seems to receive a weird response for its predictions. Or maybe even attention spans (if you are using it, ofc). Another shot in the blue would be to check if your special tokens are encoded into the same integers (pad-1, start-2, end-3) as you may mask out different things instead of the padding--you mentioned the unk token.

Here is the tb generated during training the best model:
events.out.tfevents.1573036460.3x2080-12432.38798.0.zip

Tiny hint: in case you are wondering how to display text summary for each epoch try using --samples_per_plugin=text=200 when starting tensorboard

from mdvc.

VP-0822 commented on July 17, 2024

Thanks for the inspection. After seeing the file you provided, I believe I have messed up something, I need to debug it. If I understood correctly the statement 'check out your loss design as your model seems to receive a weird response for its predictions' Is it how the graphs look and how loss is behaving weirdly for certain steps?

from mdvc.

v-iashin commented on July 17, 2024

My pleasure!

The problem is likely is that you are writing several tb summaries into one file. Hence, you see your lines break and start at 0th epoch while being connected to curves from the previous experiment.

What I meant by that statement is the fact that your model converges to the state when predicting nothing is better than to predict anything at all, while your loss is still decreasing. It seems the loss might receive different ground truth than it is expected (always end-token-index for example).

Also, check the variables in next_word = preds[:, -1].max(dim=-1)[1].unsqueeze(1) in greedy_decoder to see what the softmax (log_softmax) returns.

from mdvc.

VP-0822 commented on July 17, 2024

This is how ReversibleField for Caption returns data for input to decoder,

Shouldn't the end_token (3) be the last element in tensor?
Because in your code when you do,
caption_idx, caption_idx_y = caption_idx[:, :-1], caption_idx[:, 1:]
You are trying to remove end_token from caption to create input token for decoder and removing start_token to prepare caption for calculating loss.

Note: my caption field
self.CAPTION_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, is_target=True, unk_token=UNKNOWN_TOKEN)

from mdvc.

v-iashin commented on July 17, 2024

Shouldn't the end_token (3) be the last element in tensor?

Well, it should. However, it is common to use padding (1) up to the largest length in a batch (7th row in your case if you are printing caption_idx[:, :-1]--please verify) and masking it out in attention and loss.

You are trying to remove end_token from caption to create input token for the decoder and removing start_token to prepare caption for calculating loss.

Ok, let me clarify this point a bit. Let's consider a sequence of tokens in the batch:

Ground Truth Sequence:   2   4  19 559  12   4 131   3   1   1   1

Then we need to construct the input sequence of previous caption words (caption_idx) and ground truth sequence, which the decoder will try to predict the next word (caption_idx_y)

Ground Truth Sequence:   2   4  19 559  12   4 131   3   1   1   1
caption_idx:             2   4  19 559  12   4 131   3   1   1
caption_idx_y:           4  19 559  12   4 131   3   1   1   1 (caption_idx shifted left)

Then, given caption_idx, the decoder will generate a distribution for the next word (p*)

pred:                   p1  p2  p3  p4  p5  p6 ...

Therefore, cross-entropy will compare the predicted distributions (p*) with the one-hot encoding (OHE) of each token from caption_idx_y.

# j-th captions
loss_1,j(p1, OHE(4)); loss_2,j(p2, OHE(19)); loss_3,j(p3, OHE(559)); ...

my caption field

Yep, looks the same except for the UNK argument. Hopefully, not much has been changed between torchtext versions.

If you are ok with the provided tensorboard log, please close the issue and create a separate issue if you have other questions.

from mdvc.

VP-0822 commented on July 17, 2024

Thank you so much for the detailed clarification.

from mdvc.

Requesting tensorboard log file for best model about mdvc HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent