rdevon / dim Goto Github PK
View Code? Open in Web Editor NEWDeep InfoMax (DIM), or "Learning Deep Representations by Mutual Information Estimation and Maximization"
License: BSD 3-Clause "New" or "Revised" License
Deep InfoMax (DIM), or "Learning Deep Representations by Mutual Information Estimation and Maximization"
License: BSD 3-Clause "New" or "Revised" License
Hi,
I got a TypeError when I run the main.py, the message is this, and my workdir is E:\pythonProjectTorch/DIM,
Traceback (most recent call last):
File "E:/pythonProjectTorch/DIM/scripts/main.py", line 70, in
controller = Controller(inputs=dict(inputs='data.images'), **models)
File "E:\pythonProjectTorch\DIM\cortex_DIM\models\controller.py", line 126, in init
super().init(inputs=inputs)
TypeError: init() got an unexpected keyword argument 'inputs'
Hi, thank you for your interesting work and releasing the code! Would you kindly release the code bout
training a generator by matching to a prior implicitly
?
Thanks a lot!
Hi and thanks for your paper,
I see this error when i run
pip install .
Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)
What's wrong?
Thanks for your code about DIM,but i feel puzzled for your code, can you give me a code with pytorch? thanks.
@rdevon Can you add a small tutorial how to use a custom dataset. Thank you in advance!
Is there any way to use your codes for finding mutual information between two images?
Hi.
As can be seen in paper, the prior matching is a bi-level optimization problem. For params of encoder, we should maximize the objective, while we should minimize it for params of discriminator (in prior matching). However, the prior loss is just added as part of total loss, which means the objectives of two optimization problems are the same (i.e. maximization). Does this contradict to Eq. 7 in paper?
Hi and thanks for your paper, I see this error when i run pip install .
Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11) No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)
What's wrong?
Dear Professor R Devon Hjelm,
Is the mutual information between inputs and global representations constant, since the network is deterministic? If so, how to understand the mutual information used in the loss function?
Thanks
Traceback (most recent call last):
File "scripts/main.py", line 66, in
run(controller)
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/main.py", line 38, in run
data.setup(**exp.ARGS['data'])
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/lib/data/init.py", line 69, in setup
plugin.handle(source, **data_args)
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/built_ins/datasets/torchvision_datasets.py", line 85, in handle
torchvision_path = self.get_path('torchvision')
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/plugins.py", line 104, in get_path
'{}
not found in {} data_paths'.format(source, _config_name))
KeyError: 'torchvision
not found in .cortex.yml data_paths'
can you tell me how to do? thanks
For Resnet Do You think it works on pytorch version 1.2.0?
Could you please test it?
python scripts/main.py local classifier --d.source ImageFolder --encoder_config resnet19_32x32 --local.mode nce --local.mi_units 1024 -n DIM_Resnet --t.epochs 1000
which version of pytorch do you recommend or tested?
While training is going on, I can check the training curve via visdom.
However, I accidentally closed visdom server, and all training logs are gone.
Some models are still on training, so newly opened visdom server shows their training curve.
However, already-finished-model's training curve is lost.
Is there a way to get displayed train-finished-model's training curve on my visdom server?
First of all, thank you very much for providing the source code for your work,
I was going though the code to get a better understanding of the paper, and I have some questions,
So for CIFAR 10 in case of Local DIM, the networks are as follows:
(encoder): Convnet(
(layers): Sequential(
(layer0): Sequential(
(conv): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(ReLU): ReLU(inplace=True)
[.........]
(layer5): Linear(in_features=1024, out_features=64, bias=True))
(local_global_net): MI1x1ConvNet(
(block_nonlinear): Sequential(
(0): Conv2d(64, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv2d(2048, 2048, kernel_size=(1, 1), stride=(1, 1))
[.........]
(linear_shortcut): Conv2d(64, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False))
With local and global features, the next step of to compute the loss by computing the product of the two features in the form of [64, 64, 64, 1] and taking the positives from the diagonal and negatives as the rest.
My question, is what part of the network constitues the discriminator, given that the fake and real samples are computed as follows:
I can't see where does the Discriminator Dw
appears in the implementation.
Thank you.
Hi Dr,
I've cloned your repo and although i've cloned cortex-dev branch I get
Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)
again.what is your idea?
Originally posted by @masoudpz in #14 (comment)
Hello, thanks for your great paper and codes!
How can I select the "Concat-and-convolve architecture" for maximizing local MI?
Hi,
I am getting this error when trying to run the code.
File "scripts/main.py", line 65, in
controller = Controller(inputs=dict(inputs='data.images'), **models)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\cortex_DIM\models\controller.py", line 126, in init
super().init(inputs=inputs)
TypeError: init() got an unexpected keyword argument 'inputs'
Best
Hello,
What is the difference in performance between the Concat-Convolve and Encoder-Dot architectures ?
Thanks!
hello!
thanks for your great repo.
I followed all instructions and made a new conda environment specific to this code.
I'm not sure why but it seems to fail on loading modules
I'm working on ./DIM directory and wrote command python scripts/main.py --help
thanks for your support in advance
Hi, thank you for your interesting DIM model and the open-source code.
However, I am confused about the realization of prior matching part
DIM/cortex_DIM/models/discriminator.py
Line 62 in bac4765
Hi again.
How can I send an Image into network and get out-put vector (Mutual Information) and loss value for my Input?
I want to use my trained network, Is it possible?
Hi Dr,
I've cloned your repo and although i've cloned cortex-dev branch I get
Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)
again.what is your idea?
Hi,
I was looking through your code, namely here: https://github.com/rdevon/DIM/blob/bac4765a8126746675f517c7bfa1b04b88044d51/cortex_DIM/functions/dim_losses.py#L109 and noticed that in the docstring it states that the losses take the feature maps l and m directly. However, in the paper, it is stated that the losses take the score that the discriminator gives to the feature maps. In the implementation where does the forward pass to the discriminator networks occur?
I tried below command, but it run training loop with cpu. Can you provide some examples?
CUDA_VISIBLE_DEVICES=3,4,5,6 python scripts/main.py local classifier --d.source CIFAR10 --encoder_config resnet19_32x32 --local.mode nce --t.epochs 1000 -d 3 -n gpu_test
Hi,
when I load .t7 file, I can see 4 nets as dict in variable.
'Controller.encoder'
'conv2.classifier'
'fc4.classifier'
'glob-1.classifier'
what is the role of each net in code? which one i can use to calculate Mutual Information between 2 images?
best regards
then output of Controller.encoder is a vector in size (64), is this vector the encoded with max mutual information of input?
in paper page 12, table 6.
you show for Pearson chi2, f(u)=(u-1)^2. the domain of f* is R.
But actually domain of f* is (2, +infty) because we have constraints u > 0 in your paper.
Hi, thank you very much for your paper and code.
I now have a small number of grayscale images (about 800) and I want to know if I can apply the DIM model to encode these images and then apply them to classification tasks. In the demo, I found that the CIFAR-10 dataset contained 60,000 color images, so I was worried that my dataset was too small. In addition, how should I use your code, is it enough to just change the dataset (our image is single-channel)?
Since I am a novice at deep learning, any suggestions will be very helpful. Thank you again
Hi Dr.
Thanks for the paper and code!
I have two questions about postive and negative samples for the JSD estimator mentioned in dim_losses.py
# Compute the positive and negative score. Average the spatial locations. E_pos = get_positive_expectation(u, measure, average=False).mean(2).mean(2) E_neg = get_negative_expectation(u, measure, average=False).mean(2).mean(2)
E_pos = (E_pos * mask).sum() / mask.sum() E_neg = (E_neg * n_mask).sum() / n_mask.sum()
Since we have a big tensor with both positive and negative samples, we need to mask.
, I'm still confused about the mask, and the intention to operate E_neg * n_mask
. How to generate neg samples?Hi,
I've read your paper, which is quite inspiring!
But here I have some questions regarding the eq.4, which states "x' is an input sampled from \tilde{P} x P". Does it mean that we randomly pick (x,y) independently from the X and Y(Y is the generated representation distribution of X)? If so, which part of the code implements this sampling method?
Thanks a lot!
Thank you for sharing your code. I'm happy to try this seemingly attractive representation learning idea on a different backbone. e.g. ResNet50.
Gladly, you mentioned ResNet50 on Table3 in your paper, so I'm looking into it. Then I got several questions about it.
Table3. CIFAR10 (no data augmentation) DIM(L) single global
From the FoldedResNet code https://github.com/rdevon/DIM/blob/master/cortex_DIM/configs/resnets.py#L129
I was able to reproduce similar result (conv: 80, Y(64): 71) as it is on the paper(80.95) with following commmand
CUDA_VISIBLE_DEVICES=0 python scripts/main.py local classifier --d.source CIFAR10 --d.batch_size 128 --o.learning_rate 0.0002 --encoder_config foldresnet34_32x32 --local.mode nce --local.mi_units 1024 -n CIFAR10_DIM_L_NCE_foldresnet34 --t.epochs 500
CIFAR10_DIM_L_NCE_foldresnet34_final.t7_global_test_acc
I know you mentioned setting mi_units to be 2048 in Appendix 2, but it seems this change does not explain 9% performance difference. So I looked into the backbone.
While I was trying to fill the performance gap, I found that the ResNet used in this code is somewhat different from the one I usually use.
From the above image (from original ResNet Paper), 34-layer and 50-layer structure are only differ in use of bottleneck block instead of basic block.
In your code, even though the name shows resnet37, using bottleneck block make it resnet50 like encoder.
And I also considered using ResNet for CIFAR10 requires small modification from ImageNet 224x224 input version of ResNet (e.g. First conv kernel size become 3x3 from 7x7, no maxpooling after that)
However, there are a few more changes that have no explanation I can think of. Could you clarify what was the behind thought on these small changes?
I was tried various modified ResNet structure based on your code, and most of the time classification accuracy rise and down again, but at the same time loss still decreases. (maybe goes to a degenerative solution or trivial solution) Have you experienced this phenomenon during your work? If the above changes don't matter much to the final result I wouldn't ask this question. It seems DIM training works on specific design choice, and I'm not sure what it is.
thanks a lot for your amazing work! However, as I tried to run the code, it raised the error:
File "deep_infomax.py", line 107, in build
self.classifier_c.build(dim_in=conv_units * conv_x * conv_y)
File "XXX/.local/lib/python3.6/site-packages/cortex/_lib/models.py", line 186, in wrapped
return fn(*args, **kwargs)
File "XXX/.local/lib/python3.6/site-packages/cortex/built_ins/models/classifier.py", line 35, in build
classifier = FullyConnectedNet(dim_in, dim_out=dim_l, **classifier_args)
TypeError: type object argument after ** must be a mapping, not NoneType
I didn't change anything neither in cortex nor in DIM; since I couldn't find the __init__ in class SimpleClassifier, I have no idea about how to fix the bug. Would you mind checking that?
I want to see some random samples from your new generative models, especially which are trained on CelebaA or LSUN datasets. The random samples trained on Tiny-ImageNet seen not good actually~
Hello,
thank you for publishing your code - outstanding work :)
However I have a question regarding JSD/GAN based estimators and differences between implementation and formulation in your paper:
Eq. 4:
Eq. 7:
At the same time, in your code:
(for the JSD estimator)
if mode == 'fd':
loss = fenchel_dual_loss(l_enc, m_enc, measure=measure)
[...]
E_pos = get_positive_expectation(u, measure, average=False).mean(2).mean(2)
E_neg = get_negative_expectation(u, measure, average=False).mean(2).mean(2)
[...]
Ep = log_2 - F.softplus(-p_samples) # Note JSD will be shifted
Eq = F.softplus(-q_samples) + q_samples - log_2 # Note JSD will be shifted
While I do know, where does thie log_2 come from [Nowozin et al., 2016], the addition of q_samples in Eq
is a bit more mysterious :D
And then for the prior matching:
if not loss_type or loss_type == 'minimax':
return get_negative_expectation(q_samples, measure)
elif loss_type == 'non-saturating':
return -get_positive_expectation(q_samples, measure)
Seems like you are using just half of the equation 7 to obtain loss value.
Could you clarify those differences (maybe I am missing something in the code)? I have been trying to merge DIM with my existing code (a bit different setup, yet should work together properly) and cannot get it to work well.
Thanks in advance!
hi! When I try to retrain the DIM on CIFAR10, there is an issue in the project:
Traceback (most recent call last):
File "scripts/deep_infomax.py", line 227, in <module>
run(DIM())
File "/usr/local/lib/python3.5/dist-packages/cortex/main.py", line 38, in run
data.setup(**exp.ARGS['data'])
File "/usr/local/lib/python3.5/dist-packages/cortex/_lib/data/__init__.py", line 69, in setup
plugin.handle(source, **data_args_)
File "/usr/local/lib/python3.5/dist-packages/cortex/built_ins/datasets/torchvision_datasets.py",
line 142, in handle
uniques = sorted(np.unique(labels).tolist())
UnboundLocalError: local variable 'labels' referenced before assignment
I install the cortex as your readme.txt and the version is cortex-0.13, when I check the source code of the specific line in torchvision_datasets.py, there seems the labels is not contained in the CIFAR10 dataset.
train_set, test_set = handler(Dataset, data_path, transform=train_transform, test_transform=test_transform,
labeled_only=labeled_only)
if train_samples is not None:
train_set.train_data = train_set.train_data[:train_samples]
train_set.train_labels = train_set.train_labels[:train_samples]
if test_samples is not None:
test_set.test_data = test_set.test_data[:test_samples]
test_set.test_labels = test_set.test_labels[:test_samples]
dim_images = train_set[0][0].size()
if hasattr(train_set, 'labels'):
labels = train_set.labels
elif hasattr(train_set, 'targets'):
labels = train_set.targets
uniques = sorted(np.unique(labels).tolist())
if -1 in uniques:
uniques = uniques[1:]
dim_l = len(uniques)
dims = dict(images=dim_images, targets=dim_l)
input_names = ['images', 'targets', 'index']
self.add_dataset(
source,
data=dict(train=train_set, test=test_set),
input_names=input_names,
dims=dims,
scale=scale
Hi @rdevon, I have a question about the updating of the weights of the discriminator in prior matching. Updating weights of the discriminator to maximize the prior loss from the paper is done here, and Z_Q is detached so the weights of the encoder will not be updated.
But when you want to update the weights of the encoder to minimize the loss from the paper you that here and Q_samples are calculated using the discriminator in self.score function. So, I can't see how the weights of the discriminator do not minimize the loss from the paper in this case (which would be wrong, since the discriminator wants to maximize the loss)?
when i load your .t7 file in pytorch I get this error
"unknown object type / typeidx: {}".format(typeidx))
torchfile.T7ReaderException: unknown object type / typeidx: 176816768
I installed cortex from the branch of dev,but it does not has main.py in the script file.Could you give me a hlep?Thanks!
Hello,
Thanks for a great paper. I am not sure, but I think the Eq. 8 in the paper doesn't contain correct term for local DIM.
Also, in the MINE paper MI is calculated by shuffling z
along the batch dimension (this mentioned in a footnote) for product of the marginals term. Is there any reason for not doing this?
Thank you very much for your interesting work and releasing the code, much appreciated !!
I am implementing several manifold learning methods (from 64x64 images to 3D) that includes a jointly optimization of mutual information (MI) with MINE tricks (DIM, InfoMax VAE, ...)
As in those methods we are interested in maximizing MI (and not get its precise value), I understand the use of a more stable (but less tight) lower bound of MI, as Jensen-Shannon Divergence or InfoNCE.
However, as you did also in you DIM research paper (?), I want now to use Mutual information between input space and latent representation as a quantitative metrics to evaluate the latent code, and be able to compare for instance with state of the art technics (UMAP, t-SNE).
As we want a precise estimation of MI, do you agree that :
I tried several implementation of this, have an overall coherent MI behavior, but very unstable (no clear asymptote at all), it would be difficult to extract a single MI estimation from the output. Therefore I failed to use MINE as a metrics to compare different dimensionality reduction technics.
It would be so helpful if you could share your implementation of MINE for that purpose, or just some insight on the architecture you used, the optimizer, the lower bound on MI you used.
Any advice is welcome. Thank you so much in advance !
First of I really appreciate you for your git page
Second, how can I use resent arch but not it's wight?
you know in contrast of other works accuracy (like below) if feel reset loaded with pertained resent weights (because in first epoch I get 70% accuracy) how can I use resent arch but not trained resent's weight?
https://github.com/xu-ji/IIC
Please describe how to load and use trained ".t7" models.
Thanks for your codes, Now ,I had reproduced the code with tf, but i feel confused in the "Training a generator by matching to a prior implicitly" section, so, Can your give some example code or reference?
Thanks for your again.
Traceback (most recent call last):
File "scripts/main.py", line 66, in
run(controller)
File "/root/anaconda3/lib/python3.6/site-packages/cortex/main.py", line 38, in run
data.setup(**exp.ARGS['data'])
when i run python scripts/main.py local classifier --d.source CIFAR10 -n DIM_CIFAR10 --t.epochs 1000 can you help me? File "/root/anaconda3/lib/python3.6/site-packages/cortex/lib/data/init.py", line 69, in setup
plugin.handle(source, **data_args)
File "/root/anaconda3/lib/python3.6/site-packages/cortex/built_ins/datasets/torchvision_datasets.py", line 85, in handle
torchvision_path = self.get_path('torchvision')
File "/root/anaconda3/lib/python3.6/site-packages/cortex/plugins.py", line 104, in get_path
'{}
not found in {} data_paths'.format(source, _config_name))
KeyError: 'torchvision
not found in .cortex.yml data_paths'
Hi,
It seems that a file is missing.
hi! would you mind providing the network details(arch, feature map layer, etc) of DIM on tiny-imagenet? the paper only mentioned AlexNet arch, but we don't know which feature map to adopt and we are having trouble reproducing its performance. Sincerely thanks!
Thank you for your good research :)
Hi, Rdevon, I read your paper DIM, a good job. However, I have a question about this model. In your paper, you use the JS divergence to replace the KL divergence, but this maybe get a bad result: the marginal probability product p(z)p(x) will be greater than the p(z|x)p(x), this will result in a poor result, the probability that a specific representation given a specific sample will be reduced. This is my question and what do you think it.
Hello, I don't have a pure math background and I've tried to make my own naive implemenation of DIM where the "discriminator" is just a standard binary crossentropy sigmoid classifier where there is a 50% chance the input is global and and + example and a 50% chance the input is a global and - sample. Is this vanilla classification scheme a correct implementation of the project?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.