Giter Club home page Giter Club logo

greedy_infomax's People

Contributors

bairesearch avatar basveeling avatar loewex avatar luageeko avatar rschwarz15 avatar spijkervet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

greedy_infomax's Issues

Resnet Encoder Layer Numbers

With the Pre-activation ResNet Encoders that are used, my understanding of the layer numbers doesn't align with how you've labelled them.

block_dims = [3, 4, 6, 6, 6, 6, 6]
num_channels = [64, 128, 256, 256, 256, 256, 256]
full_model = nn.ModuleList([])
encoder = nn.ModuleList([])
if opt.resnet == 34:
self.block = Resnet_Encoder.PreActBlockNoBN
elif opt.resnet == 50:
self.block = Resnet_Encoder.PreActBottleneckNoBN

The total number of blocks in block_dims is 37 and there is also the initial conv1. When using PreActBlockNoBN, 2 layers per block, does this not result in a ResNet75? When using PreActBottleneckNoBN, 3 layers per block, does this not result in a ResNet112?

Please let me know if I've misunderstood something.

The following problem occurred when I was building the model following the READme.md file. Is there any problem with the function InfoNCE_Loss?

The problem : RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 6144]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

i have a problem

bash download_audio_data.sh can not run, error is "You don't have permission to access this resource."

i want to konw why ,thanks

Audio training time

What should be the training time (as in time per step, per epoch, or time to convergence) for the audio experiment? I'm running it on a Tesla K80 and it seems to be taking ~5.7 seconds per step, which I'm assuming is much slower than expected.

InfoNCE_Loss skipping

In your paper the following is stated:

For each patch xi,j in row i and column j of this grid, we predict up to K patches xi+K,j in the rows underneath, skipping the first overlapping patch xi+1,j

However, in the implementation of InfoNCE_Loss, skip_step is applied for all k predictions. This has meant that xi+k,j is being skipped for all k instead of only when k is 1. So nearby non-overlapping patches are also being skipped when k > 1.

for k in range(1, self.k_predictions + 1):
### compute log f(c_t, x_{t+k}) = z^T_{t+k} W_k c_t
# compute z^T_{t+k} W_k:
ztwk = (
self.W_k[k - 1]
.forward(z[:, :, (k + skip_step) :, :]) # Bx, C , H , W

In simply changing skip_step to 0 after the first iteration I have seen an improvement. I haven't run it for long enough to compare this improvement to the results stated in your paper.

Training time and memory usage

Hi, GIM looks super cool and thanks for the code!
For vision and audio task, I would like to ask how does it take for the network to converge
and how much big memory should i need to train! (I only have one RTX 3080)
It would be appreciate to respond to this! :)

Same permutation for all audio samples?

Hello,

In the 3rd sampling strategy (sampling from same sequence) for audio subtask, I noticed that the permutation of negative samples is same for all audio sequences in the batch. This is not necessarily incorrect, but it can introduce some sort a bias based on locations of negative samples.

I think it would be better to have random permutations for all audio samples and the fix is easy :)

Reference :

elif self.opt.sampling_method == 2:

Code for Vision Experiment

Hello there. I'd like to ask if there's any plan on the release of the code for vision experiment mentioned in the paper? Thanks!

Failure to compute gradient

Hi,

I have found your paper and code extremely interesting!

I am trying to run the vision training, but am coming across an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 256, 1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

When I do torch.autograd.set_detect_anomaly(True) the issue is traced to line 57 in InfoNCE_Loss:

ztwk = (
self.W_k[k - 1]
.forward(z[:, :, (k + skip_step) :, :]) # Bx, C , H , W

Any idea why this is hapenning?

Pre-trained vision model?

Hi! Is there a pre-trained vision model available somewhere? It would be really helpful instead of having to re-train the model from scratch.

Thanks!
Nikhil

CUDNN_STATUS_NOT_SUPPORTED error caused by non-contiguous variable

Thanks for making the code available.

When running the command python -m GreedyInfoMax.vision.main_vision --download_dataset --save_dir vision_experiment I had problems with the following error:

CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

I traced the problem to the forward pass in Resnet_Encoder:

out = F.adaptive_avg_pool2d(z, 1)
out = out.reshape(-1, n_patches_x, n_patches_y, out.shape[1])
out = out.permute(0, 3, 1, 2)

The permute() operation seems to make the variable out non-contiguous (see https://stackoverflow.com/questions/48915810/pytorch-contiguous for discussion).

The issue is solved by making it contiguous again:

out = F.adaptive_avg_pool2d(z, 1)
out = out.reshape(-1, n_patches_x, n_patches_y, out.shape[1])
out = out.permute(0, 3, 1, 2).contiguous()

I hope this helps if someone help gets a similar problem.

System specifics:
Ubuntu 16.04.6 LTS
Python 3.6
CUDA 10.0
cudnn 7.6.4
Pytorch 1.4.0

How to speed up training

Nice work! I wonder how to speed up training and reduce memory usage. As I can see, the released code uses .detach() to prevant backpropagation, but I didn't find it can speed up training or reduce memory usage. Are there any other operations? Looking forward for your reply.

Question on parallel training

Hi, your work is very interesting, thanks for sharing the code!

As I understand, the losses for different sub-networks are calculated in a for loop in https://github.com/loeweX/Greedy_InfoMax/blob/master/GreedyInfoMax/vision/models/FullModel.py#L102
therefore, it is "asynchronous"

However, I have one question on parallel training:
This great blog says This reduces the amount of communication needed between modules tremendously and allows us to train modules on separate devices.
Does that mean that you put the three submodules on three GPUs and doing the gradient updates simultaneously?

Besides, is it possible to train different modules on one GPU in parallel?

Thanks very much for your time!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.