Comments (7)
Here is my file after 16 epochs,
events.out.tfevents.1592425758.04a1b10b6b08.125.zip
from mdvc.
Hi. I am glad you liked the paper and the source code ๐.
I inspected your tb (thanks for it btw) and found it to be quite different from mine. I am afraid something went wrong on the way. In my case, it starts by repeating itself (sometimes like yours), but after a couple of epochs, it makes the captions more appealing. Your curves look promising, though!
I would try to inspect what your decoder (and generator) are doing. Also, check out your loss design as your model seems to receive a weird response for its predictions. Or maybe even attention spans (if you are using it, ofc). Another shot in the blue would be to check if your special tokens are encoded into the same integers (pad-1, start-2, end-3) as you may mask out different things instead of the padding--you mentioned the unk
token.
Here is the tb generated during training the best model:
events.out.tfevents.1573036460.3x2080-12432.38798.0.zip
Tiny hint: in case you are wondering how to display text summary for each epoch try using --samples_per_plugin=text=200
when starting tensorboard
from mdvc.
Thanks for the inspection. After seeing the file you provided, I believe I have messed up something, I need to debug it. If I understood correctly the statement 'check out your loss design as your model seems to receive a weird response for its predictions' Is it how the graphs look and how loss is behaving weirdly for certain steps?
from mdvc.
My pleasure!
The problem is likely is that you are writing several tb summaries into one file. Hence, you see your lines break and start at 0th epoch while being connected to curves from the previous experiment.
What I meant by that statement is the fact that your model converges to the state when predicting nothing is better than to predict anything at all, while your loss is still decreasing. It seems the loss might receive different ground truth than it is expected (always end-token-index for example).
Also, check the variables in next_word = preds[:, -1].max(dim=-1)[1].unsqueeze(1)
in greedy_decoder
to see what the softmax (log_softmax) returns.
from mdvc.
This is how ReversibleField for Caption returns data for input to decoder,
Shouldn't the end_token (3) be the last element in tensor?
Because in your code when you do,
caption_idx, caption_idx_y = caption_idx[:, :-1], caption_idx[:, 1:]
You are trying to remove end_token from caption to create input token for decoder and removing start_token to prepare caption for calculating loss.
Note: my caption field
self.CAPTION_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, is_target=True, unk_token=UNKNOWN_TOKEN)
from mdvc.
Shouldn't the end_token (3) be the last element in tensor?
Well, it should. However, it is common to use padding (1
) up to the largest length in a batch (7th row in your case if you are printing caption_idx[:, :-1]
--please verify) and masking it out in attention and loss.
You are trying to remove end_token from caption to create input token for the decoder and removing start_token to prepare caption for calculating loss.
Ok, let me clarify this point a bit. Let's consider a sequence of tokens in the batch:
Ground Truth Sequence: 2 4 19 559 12 4 131 3 1 1 1
Then we need to construct the input sequence of previous caption words (caption_idx
) and ground truth sequence, which the decoder will try to predict the next word (caption_idx_y
)
Ground Truth Sequence: 2 4 19 559 12 4 131 3 1 1 1
caption_idx: 2 4 19 559 12 4 131 3 1 1
caption_idx_y: 4 19 559 12 4 131 3 1 1 1 (caption_idx shifted left)
Then, given caption_idx
, the decoder will generate a distribution for the next word (p*
)
pred: p1 p2 p3 p4 p5 p6 ...
Therefore, cross-entropy will compare the predicted distributions (p*
) with the one-hot encoding (OHE) of each token from caption_idx_y
.
# j-th captions
loss_1,j(p1, OHE(4)); loss_2,j(p2, OHE(19)); loss_3,j(p3, OHE(559)); ...
my caption field
Yep, looks the same except for the UNK
argument. Hopefully, not much has been changed between torchtext
versions.
If you are ok with the provided tensorboard log, please close the issue and create a separate issue if you have other questions.
from mdvc.
Thank you so much for the detailed clarification.
from mdvc.
Related Issues (20)
- All the best for your CVPR 2020 workshop, Vladimir! :) HOT 1
- Dense Video Captioning on raw input videos HOT 23
- videoCategoriesMetaUS.json HOT 4
- ASR HOT 1
- ASR HOT 3
- 503 service unavailable: cannot open a3s.fi links HOT 5
- ResolvePackageNotFound Issue HOT 9
- difference between val_1 and val_2 HOT 1
- The utilization rate of GPU is low HOT 4
- Alignment key for the A/V features in the .npy/.hdf5 files HOT 3
- ResolvePackageNotFound: HOT 4
- I have two Issues - 1. I want to run this without GPU (i don't have GPU in Ubuntu) HOT 3
- RuntimeError: CUDA out of memory. HOT 2
- ไฝ่ ๆจๅฅฝ๏ผ่ฏท้ฎๆๅ็ๅคๆจกๆ็นๅพ็ๆถๅ้ไธ้่ฆๅบๅๆฏ่ฎญ็ป้็็นๅพ่ฟๆฏๆต่ฏ้็็นๅพใ
- Hello, author. May I ask whether it is necessary to distinguish the features of training set or test set when extracting multi-modal features? HOT 1
- About text HOT 1
- How to extract vggish features having overlapping? HOT 2
- Code to run model on own videos HOT 1
- Sharing I3D and VGGish features HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mdvc.