loewex / greedy_infomax Goto Github PK
View Code? Open in Web Editor NEWCode for the paper: Putting An End to End-to-End: Gradient-Isolated Learning of Representations
Home Page: https://arxiv.org/abs/1905.11786
License: MIT License
Code for the paper: Putting An End to End-to-End: Gradient-Isolated Learning of Representations
Home Page: https://arxiv.org/abs/1905.11786
License: MIT License
With the Pre-activation ResNet Encoders that are used, my understanding of the layer numbers doesn't align with how you've labelled them.
Greedy_InfoMax/GreedyInfoMax/vision/models/FullModel.py
Lines 27 to 36 in 8f91dc2
The total number of blocks in block_dims is 37 and there is also the initial conv1. When using PreActBlockNoBN, 2 layers per block, does this not result in a ResNet75? When using PreActBottleneckNoBN, 3 layers per block, does this not result in a ResNet112?
Please let me know if I've misunderstood something.
The problem : RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 6144]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
update :
i success mod this loss, but when train the model , when i get the loss is 1.8xxx , the acc only 27% ~ 30% , is it normal ?
bash download_audio_data.sh can not run, error is "You don't have permission to access this resource."
i want to konw why ,thanks
What should be the training time (as in time per step, per epoch, or time to convergence) for the audio experiment? I'm running it on a Tesla K80 and it seems to be taking ~5.7 seconds per step, which I'm assuming is much slower than expected.
In your paper the following is stated:
For each patch xi,j in row i and column j of this grid, we predict up to K patches xi+K,j in the rows underneath, skipping the first overlapping patch xi+1,j
However, in the implementation of InfoNCE_Loss, skip_step is applied for all k predictions. This has meant that xi+k,j is being skipped for all k instead of only when k is 1. So nearby non-overlapping patches are also being skipped when k > 1.
Greedy_InfoMax/GreedyInfoMax/vision/models/InfoNCE_Loss.py
Lines 52 to 57 in 8f91dc2
In simply changing skip_step to 0 after the first iteration I have seen an improvement. I haven't run it for long enough to compare this improvement to the results stated in your paper.
Hi, GIM looks super cool and thanks for the code!
For vision and audio task, I would like to ask how does it take for the network to converge
and how much big memory should i need to train! (I only have one RTX 3080)
It would be appreciate to respond to this! :)
Hello,
In the 3rd sampling strategy (sampling from same sequence) for audio subtask, I noticed that the permutation of negative samples is same for all audio sequences in the batch. This is not necessarily incorrect, but it can introduce some sort a bias based on locations of negative samples.
I think it would be better to have random permutations for all audio samples and the fix is easy :)
Reference :
Hello there. I'd like to ask if there's any plan on the release of the code for vision experiment mentioned in the paper? Thanks!
Hi,
I have found your paper and code extremely interesting!
I am trying to run the vision training, but am coming across an error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256, 1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
When I do torch.autograd.set_detect_anomaly(True) the issue is traced to line 57 in InfoNCE_Loss:
Greedy_InfoMax/GreedyInfoMax/vision/models/InfoNCE_Loss.py
Lines 55 to 57 in 8f91dc2
Any idea why this is hapenning?
Hi! Is there a pre-trained vision model available somewhere? It would be really helpful instead of having to re-train the model from scratch.
Thanks!
Nikhil
Thanks for making the code available.
When running the command python -m GreedyInfoMax.vision.main_vision --download_dataset --save_dir vision_experiment
I had problems with the following error:
CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
I traced the problem to the forward pass in Resnet_Encoder:
out = F.adaptive_avg_pool2d(z, 1)
out = out.reshape(-1, n_patches_x, n_patches_y, out.shape[1])
out = out.permute(0, 3, 1, 2)
The permute()
operation seems to make the variable out
non-contiguous (see https://stackoverflow.com/questions/48915810/pytorch-contiguous for discussion).
The issue is solved by making it contiguous again:
out = F.adaptive_avg_pool2d(z, 1)
out = out.reshape(-1, n_patches_x, n_patches_y, out.shape[1])
out = out.permute(0, 3, 1, 2).contiguous()
I hope this helps if someone help gets a similar problem.
System specifics:
Ubuntu 16.04.6 LTS
Python 3.6
CUDA 10.0
cudnn 7.6.4
Pytorch 1.4.0
Nice work! I wonder how to speed up training and reduce memory usage. As I can see, the released code uses .detach()
to prevant backpropagation, but I didn't find it can speed up training or reduce memory usage. Are there any other operations? Looking forward for your reply.
Hi, your work is very interesting, thanks for sharing the code!
As I understand, the losses for different sub-networks are calculated in a for loop in https://github.com/loeweX/Greedy_InfoMax/blob/master/GreedyInfoMax/vision/models/FullModel.py#L102
therefore, it is "asynchronous"
However, I have one question on parallel training:
This great blog says This reduces the amount of communication needed between modules tremendously and allows us to train modules on separate devices.
Does that mean that you put the three submodules on three GPUs and doing the gradient updates simultaneously?
Besides, is it possible to train different modules on one GPU in parallel?
Thanks very much for your time!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.