Giter Club home page Giter Club logo

Comments (7)

VP-0822 avatar VP-0822 commented on July 17, 2024 1

Here is my file after 16 epochs,
events.out.tfevents.1592425758.04a1b10b6b08.125.zip

from mdvc.

v-iashin avatar v-iashin commented on July 17, 2024

Hi. I am glad you liked the paper and the source code ๐Ÿ™‚.

I inspected your tb (thanks for it btw) and found it to be quite different from mine. I am afraid something went wrong on the way. In my case, it starts by repeating itself (sometimes like yours), but after a couple of epochs, it makes the captions more appealing. Your curves look promising, though!

I would try to inspect what your decoder (and generator) are doing. Also, check out your loss design as your model seems to receive a weird response for its predictions. Or maybe even attention spans (if you are using it, ofc). Another shot in the blue would be to check if your special tokens are encoded into the same integers (pad-1, start-2, end-3) as you may mask out different things instead of the padding--you mentioned the unk token.

Here is the tb generated during training the best model:
events.out.tfevents.1573036460.3x2080-12432.38798.0.zip

Tiny hint: in case you are wondering how to display text summary for each epoch try using --samples_per_plugin=text=200 when starting tensorboard

from mdvc.

VP-0822 avatar VP-0822 commented on July 17, 2024

Thanks for the inspection. After seeing the file you provided, I believe I have messed up something, I need to debug it. If I understood correctly the statement 'check out your loss design as your model seems to receive a weird response for its predictions' Is it how the graphs look and how loss is behaving weirdly for certain steps?
image

from mdvc.

v-iashin avatar v-iashin commented on July 17, 2024

My pleasure!

The problem is likely is that you are writing several tb summaries into one file. Hence, you see your lines break and start at 0th epoch while being connected to curves from the previous experiment.

What I meant by that statement is the fact that your model converges to the state when predicting nothing is better than to predict anything at all, while your loss is still decreasing. It seems the loss might receive different ground truth than it is expected (always end-token-index for example).

Also, check the variables in next_word = preds[:, -1].max(dim=-1)[1].unsqueeze(1) in greedy_decoder to see what the softmax (log_softmax) returns.

from mdvc.

VP-0822 avatar VP-0822 commented on July 17, 2024

This is how ReversibleField for Caption returns data for input to decoder,
image

Shouldn't the end_token (3) be the last element in tensor?
Because in your code when you do,
caption_idx, caption_idx_y = caption_idx[:, :-1], caption_idx[:, 1:]
You are trying to remove end_token from caption to create input token for decoder and removing start_token to prepare caption for calculating loss.

Note: my caption field
self.CAPTION_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, is_target=True, unk_token=UNKNOWN_TOKEN)

from mdvc.

v-iashin avatar v-iashin commented on July 17, 2024

Shouldn't the end_token (3) be the last element in tensor?

Well, it should. However, it is common to use padding (1) up to the largest length in a batch (7th row in your case if you are printing caption_idx[:, :-1]--please verify) and masking it out in attention and loss.

You are trying to remove end_token from caption to create input token for the decoder and removing start_token to prepare caption for calculating loss.

Ok, let me clarify this point a bit. Let's consider a sequence of tokens in the batch:

Ground Truth Sequence:   2   4  19 559  12   4 131   3   1   1   1

Then we need to construct the input sequence of previous caption words (caption_idx) and ground truth sequence, which the decoder will try to predict the next word (caption_idx_y)

Ground Truth Sequence:   2   4  19 559  12   4 131   3   1   1   1
caption_idx:             2   4  19 559  12   4 131   3   1   1
caption_idx_y:           4  19 559  12   4 131   3   1   1   1 (caption_idx shifted left)

Then, given caption_idx, the decoder will generate a distribution for the next word (p*)

pred:                   p1  p2  p3  p4  p5  p6 ...

Therefore, cross-entropy will compare the predicted distributions (p*) with the one-hot encoding (OHE) of each token from caption_idx_y.

# j-th captions
loss_1,j(p1, OHE(4)); loss_2,j(p2, OHE(19)); loss_3,j(p3, OHE(559)); ...

my caption field

Yep, looks the same except for the UNK argument. Hopefully, not much has been changed between torchtext versions.


If you are ok with the provided tensorboard log, please close the issue and create a separate issue if you have other questions.

from mdvc.

VP-0822 avatar VP-0822 commented on July 17, 2024

Thank you so much for the detailed clarification.

from mdvc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.