Comments (40)
@andfoy I have replaced backbone with Resnet50. As you mentioned, I used the pretained weights of resnet50 but change the num-class to 1000 to match with the parameters. I did make some changes for the resnet.py and extract 5 layers from resnet, and update vis_size to 2048. It works well and much faster. Thanks
from dms.
Hi, thanks for your question. Actually, you could omit the --optim-snapshot
argument and the training script should start fine-tuning.
from dms.
Thanks for your timely reply and I will try.
from dms.
@13331112522 Any followup on this one?
from dms.
We are working on processing our new dataset and I am going to report the result on this issue. Thanks for your consideration.
from dms.
Hi, I have actually completed my model training as your instruction with my own tracking dataset. It has good performance on my dataset but poor performance on referit and UNC. I think it adjusted its weights to adapt the new task and hard to generalize multiple tasks or datasets. I had used 2 epoch on low resolution and 10 epoch on high resolution during my training, so I also suspected it might get overfitting. By the way, is there any updating on DMN structure, as the latest network like MattNet for object referring had much better performance?
from dms.
By the way, is there any updating on DMN structure, as the latest network like MattNet for object referring had much better performance?
Mattnet and DMN have different natures, while ours is global and agnostic in the sense that we give the model an image and a referral expression and produce a probability map over all the image, Mattnet relies on MRCNN region features, here the objetive is to classify the regions rather than to produce a segmentation.
from dms.
I had used 2 epoch on low resolution and 10 epoch on high resolution during my training, so I also suspected it might get overfitting
I agree with you on this affirmation. From our experience, too much training time on high resolution induces overfitting
from dms.
Do you mean that if I training more on low resolution and then less on high one, I could get the model with more powerful generalization ability?
from dms.
In addition, the latest model like Bert or GPT seems to have powerful feature representation on NLP, I was wondering whether DMN could take advantage of some of them.
from dms.
Do you mean that if I training more on low resolution and then less on high one, I could get the model with more powerful generalization ability?
It is possible that this may happen
from dms.
In addition, the latest model like Bert or GPT seems to have powerful feature representation on NLP, I was wondering whether DMN could take advantage of some of them.
From my experience using Transformers and BERT, in general, they are not able to surpass classical RNNs on this problem. i.e., They provide almost the same performance
from dms.
I have tried different ways to train the model with my tracking datasets these days, I found the model has good learning and fitting performance, converged quickly and achieved very high score (over 0.96) on the training sets while very poor performance on other datasets, drop to 0.36. The best weights was from the mixed datasets training with high resolution. 10 low res epochs and 5 high epochs seems not to perform better than more high res training.
from dms.
@13331112522 Have you tried reducing the total number of parameters by modifying the hidden state, embedding size and the number of filters?
from dms.
Not yet so far.
from dms.
@13331112522 I am also trying to train on my custom dataset, can you tell me how were you able to train on high resolution as due to model non-parallelizibility and large gpu requirement, I couldn't train over 128x128 resolution?
from dms.
@Shivanshmundra Try to tune down the para of --workers and --num-workers to 1. Notice training on high resolution need to be based on weights from low resolution as the author mentioned.
from dms.
@andfoy Is it possible to speed up the process of DMN by replacing the LSTM by CNN?
from dms.
@13331112522, We didn't tried the inclusion of language-level CNNs, as we used recurrent modules for both language and multimodality. Feel free to try them, however always taking into account that there are two RNNs that would need to be modified
from dms.
@andfoy I was not able to reproduce results from referit dataset as mentioned in the paper. Maximum IoU observed was around 32% from pre-trained referit weights. Although, optimizer snapshot wasn't available so we just skipped that part and saw initial results. Can you suggest some pointers where we might bee going wrong. We already using SRU mentioned in README only.
Thanks
Edit - On Training Images, results are pretty nice. Close to perfect in some cases.
from dms.
Maximum IoU observed was around 32% from pre-trained referit weights
Hi @Shivanshmundra, on which dataset are you trying to reproduce the results? Also, which resolution are you using?
from dms.
@andfoy Sorry for the late reply. I was trying on Refer it dataset. The resolution was 256x256.
from dms.
Also. @andfoy is there anything I can do to make this code parallelizable? Like some changes in architecture or the pipeline in general which won't harm results much?
from dms.
@andfoy Sorry for the late reply. I was trying on Refer it dataset. The resolution was 256x256.
To replicate the ReferIt results, you should first train on UNC, and then fine tune the weights on the former one.
Note: Resolution is important here, so, when the model is trained in a lower resolution than 512, one would expect a decrease on the performance of the model
Also. @andfoy is there anything I can do to make this code parallelizable? Like some changes in architecture or the pipeline in general which won't harm results much?
One of the main issues is related to the Batch Norm on the feature extractor (DPN-92), the dynamic filter computation, which would require a batched multifilter convolution, which it is not available on PyTorch. Also, the sentence length variability is a major factor that prevents DMN from being parallelized, if you could found a way to overcome the aforementioned issues, feel free to share them here
from dms.
@andfoy I found inference process has been slow, especially for the higher resolution image. I am wondering whether there's solution to speed up the process?
from dms.
@andfoy I found inference process has been slow, especially for the higher resolution image. I am wondering whether there's solution to speed up the process?
Maybe replacing DPN-92 by a newer and more efficient feature extractor?
from dms.
@andfoy I have two questions. 1. what is the point of output of low-resolution, I have visualized the low resolution output with low-res training weights, it seems to have nothing to do with the ground-truth mask. 2. I tried to replace the DPN92 with Resnet, but got this when loading the dict:
size mismatch for fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1, 2048]).
size mismatch for fc.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([1]).
Thanks!
from dms.
Try to replace DPN with Resnet50, just find so many points (dimensions) to adjust, any instructions for that.
from dms.
@13331112522 The drawback of using other feature extractor is that pretrained weights are not compatible anymore. To enable its operation, you should modify ResNet to return the full pyramid feature representations and also, update the vis_size
from 2688 to 256 channels.
Also, remember to remove the classification layer from ResNet, as the model does not uses it at all.
from dms.
what is the point of output of low-resolution, I have visualized the low resolution output with low-res training weights, it seems to have nothing to do with the ground-truth mask.
The low-resolution training phase is done in order to accelerate computation and also to constrain the representation space, which should be more easy to upsample during the high-res phase
from dms.
I was thinking whether we could joint train both low resolution and high resolution simultaneously. The loss could be combined from both losses. Tried but seems it does not work well, easy to get out of memory.
from dms.
I was thinking whether we could joint train both low resolution and high resolution simultaneously. The loss could be combined from both losses. Tried but seems it does not work well, easy to get out of memory.
Maybe you could reduce any of the num_filters or joint_size at the expense of non-comparability
from dms.
- Is the num_filters equal to the max lens of language query? if I reduce it, does it mean the lens of query have to be reduced?
2.Any ideas or instructions for adapting another kind of language instead of English, I wanna try change it to Chinese.
from dms.
Is the num_filters equal to the max lens of language query? if I reduce it, does it mean the lens of query have to be reduced?
No, num_filters is an arbitrary parameter to the model, which states the number of filters created from language, so you can change it without affecting the length of the input sentences.
Any ideas or instructions for adapting another kind of language instead of English, I wanna try change it to Chinese.
If you are able to map words to Chinese embeddings, then those can be given as input to the model.
from dms.
Thanks a lot.@andfoy.
from dms.
What's the purpose to set the batch_size to 1? When I replace the backbone, some need to do BatchNorm, which needs more than 1 batch to calculate the mean value. In the end, I have to remove the BN to pass the issue.
from dms.
What's the purpose to set the batch_size to 1? When I replace the backbone, some need to do BatchNorm, which needs more than 1 batch to calculate the mean value. In the end, I have to remove the BN to pass the issue.
Batch size was originally set to 1 in order to train the model, so we could fit it into memory
from dms.
I am curious about the metrics of mIoU in the paper, is it max IoU or mean IoU? According to the source code, the output of the evaluation is the maximum IoU but popular method is to calculate the overall IoU, or mean IoU.
from dms.
In addition, would you plz provide the results with resnet-50 as backbone? I wanna do some comparison study. Very appreciated for your help. @andfoy
from dms.
I am curious about the metrics of mIoU in the paper, is it max IoU or mean IoU? According to the source code, the output of the evaluation is the maximum IoU but popular method is to calculate the overall IoU, or mean IoU.
@13331112522, sorry for the late reply! By mIoU we refer to the sum of the unions over the sum of the intersections, which is different from the mean IoU, which corresponds to the mean of the unions over the intersections. You should find that while mean IoU penalizes each object equally, mIoU is biased towards large objects.
In addition, would you plz provide the results with resnet-50 as backbone? I wanna do some comparison study. Very appreciated for your help. @andfoy
Sadly, we don't have Resnet-50 weights available, the only way of obtaining them is to retrain the model from scratch
from dms.
Related Issues (19)
- [Errno 2] No such file or directory: 'data/unc/corpus.pth' HOT 4
- ECCV objectives
- Getting very low mIoU using pre-trained weights for DMN HOT 6
- DataPrepare HOT 9
- I cannot not obtain your data. HOT 1
- Question of REFER HOT 6
- AttributeError: 'Program' object has no attribute '_program' HOT 2
- How much GPU does this work usually require? HOT 17
- The result problem HOT 4
- What impact it will have to adjust the size HOT 2
- Dissect ViLSTM
- Why net.train() in evaluate function? HOT 2
- How to visualize the output? HOT 2
- Testing on Custom Dataset HOT 24
- No such file or directory: 'data\\unc\\corpus.pth'? what is this corpus.pth? Need download? HOT 2
- Cannot produce results using the pretrained model HOT 3
- help me, can't download the referit_splits.tar.bz2 HOT 2
- Pretrained weights link seems to be down HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dms.