rdevon / dim Goto Github PK

View Code? Open in Web Editor NEW

799.0 799.0 102.0 60 KB

Deep InfoMax (DIM), or "Learning Deep Representations by Mutual Information Estimation and Maximization"

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

dim's People

Contributors

Stargazers

Watchers

Forkers

duguyue100 createamind adnan1306 uxvii shabanian yflyzhang dlwbm123 tisu32 wangxiao5791509 knhuq wl-zhao kfzyqin farbodtm weikuoguo napkin-dl spacelearner eycab mtlong jefflai108 sdnafio shaoguangcheng fuxianh bruinxiong lslrh mannykayy kwotsin normonisping zizai xuanheiiis wangyongguang jingcx tanimutomo siyayao jseam2 yyht wangxu-scu skyisnotwarm ankitshah009 gchb2012 longjohncoder yuejin-ee suyanzhou626 rhoposit qi-max mjpyeon holmes-alan isr-wang fintrek wjth07 skhong0831 tong8080 brechetp comzzw codeaudit sbhakat bangyou01 yelobean newcodevelop dorisxindu chrisbyd decoder996 mrleizy sakastlord hongbo-sun wzpy lidongjieing soapy-salted-fish-king rayruizhiliao kronos10021042 realcatking lzw27 1170300814 publicshawn marcelomata ubastic yiweilu3 louisa-luyi swanknightzjp arthas1121 mrleaper liangzhendong123 self-supervised-contrastive-learning chenhaocs xdusponge greatwizard9519 wr-1999 schwabdr raina-akshay xtlyu zhangshuaizxc mms1410 whuhxb gaopeigi3 zhongpingsir ml-edu greatlying 742785334 jianrui1995 burjune yzhang50927

dim's Issues

A TypeError in running

Hi,
I got a TypeError when I run the main.py, the message is this, and my workdir is E:\pythonProjectTorch/DIM,
Traceback (most recent call last):
File "E:/pythonProjectTorch/DIM/scripts/main.py", line 70, in
controller = Controller(inputs=dict(inputs='data.images'), **models)
File "E:\pythonProjectTorch\DIM\cortex_DIM\models\controller.py", line 126, in init
super().init(inputs=inputs)
TypeError: init() got an unexpected keyword argument 'inputs'

code for training a generator

Hi, thank you for your interesting work and releasing the code! Would you kindly release the code bout

training a generator by matching to a prior implicitly

?
Thanks a lot!

error on installing cortex-DIM

Hi and thanks for your paper,
I see this error when i run
pip install .

Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)

What's wrong?

codes

Thanks for your code about DIM,but i feel puzzled for your code, can you give me a code with pytorch? thanks.

Using new dataset

@rdevon Can you add a small tutorial how to use a custom dataset. Thank you in advance!

how to compute mutual information between two images?

Is there any way to use your codes for finding mutual information between two images?

questions about bi-level optimization in prior matching loss

Hi.

As can be seen in paper, the prior matching is a bi-level optimization problem. For params of encoder, we should maximize the objective, while we should minimize it for params of discriminator (in prior matching). However, the prior loss is just added as part of total loss, which means the objectives of two optimization problems are the same (i.e. maximization). Does this contradict to Eq. 7 in paper?

Hi and thanks for your paper, I see this error when i run pip install . Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11) No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13) What's wrong?

Hi and thanks for your paper, I see this error when i run pip install .

Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11) No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)

What's wrong?

How to understand mutual information in deterministic deep networks?

Dear Professor R Devon Hjelm,

Is the mutual information between inputs and global representations constant, since the network is deterministic? If so, how to understand the mutual information used in the loss function?

Thanks

KeyError: '`torchvision` not found in .cortex.yml data_paths'

Traceback (most recent call last):
File "scripts/main.py", line 66, in
run(controller)
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/main.py", line 38, in run
data.setup(**exp.ARGS['data'])
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/lib/data/init.py", line 69, in setup
plugin.handle(source, **data_args)
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/built_ins/datasets/torchvision_datasets.py", line 85, in handle
torchvision_path = self.get_path('torchvision')
File "/home/py/miniconda3/envs/DIM/lib/python3.6/site-packages/cortex/plugins.py", line 104, in get_path
'{} not found in {} data_paths'.format(source, _config_name))
KeyError: 'torchvision not found in .cortex.yml data_paths'

can you tell me how to do? thanks

torch version

For Resnet Do You think it works on pytorch version 1.2.0?
Could you please test it?
python scripts/main.py local classifier --d.source ImageFolder --encoder_config resnet19_32x32 --local.mode nce --local.mi_units 1024 -n DIM_Resnet --t.epochs 1000

which version of pytorch do you recommend or tested?

Load training curve after training finished

While training is going on, I can check the training curve via visdom.
However, I accidentally closed visdom server, and all training logs are gone.
Some models are still on training, so newly opened visdom server shows their training curve.
However, already-finished-model's training curve is lost.

Is there a way to get displayed train-finished-model's training curve on my visdom server?

A question about the discriminator in CIFAR10 on a DCGAN architecture - local.

First of all, thank you very much for providing the source code for your work,

I was going though the code to get a better understanding of the paper, and I have some questions,
So for CIFAR 10 in case of Local DIM, the networks are as follows:

The encoder

(encoder): Convnet(
      (layers): Sequential(
        (layer0): Sequential(
          (conv): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (ReLU): ReLU(inplace=True)
         [.........]
        (layer5): Linear(in_features=1024, out_features=64, bias=True))

Global network, taking as input the output of the encoder and outputing the global feature vector of shape [64, 2048, 1, 1]

    (local_global_net): MI1x1ConvNet(
      (block_nonlinear): Sequential(
        (0): Conv2d(64, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
        (3): Conv2d(2048, 2048, kernel_size=(1, 1), stride=(1, 1))
         [.........]
      (linear_shortcut): Conv2d(64, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False))

And the local network, with the same architecture as Global net, taking as input the output of layer 1 of the encoder (shape of [64, 128, 8, 8]) and output the local feature of shape [64, 2048, 8, 8] reshaped to [64, 2048, 64].

With local and global features, the next step of to compute the loss by computing the product of the two features in the form of [64, 64, 64, 1] and taking the positives from the diagonal and negatives as the rest.

My question, is what part of the network constitues the discriminator, given that the fake and real samples are computed as follows:

I can't see where does the Discriminator Dw appears in the implementation.
Thank you.

Hi Masoudpz,

Hi Dr,
I've cloned your repo and although i've cloned cortex-dev branch I get

Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)

again.what is your idea?

Originally posted by @masoudpz in #14 (comment)

Concat-and-convolve architecture

Hello, thanks for your great paper and codes!
How can I select the "Concat-and-convolve architecture" for maximizing local MI?

TypeError: init() got an unexpected keyword argument 'inputs'

Hi,

I am getting this error when trying to run the code.

File "scripts/main.py", line 65, in
controller = Controller(inputs=dict(inputs='data.images'), **models)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\cortex_DIM\models\controller.py", line 126, in init
super().init(inputs=inputs)
TypeError: init() got an unexpected keyword argument 'inputs'

Best

Concat and convolve / Encoder and dot

Hello,
What is the difference in performance between the Concat-Convolve and Encoder-Dot architectures ?

Thanks!

I'm having No module named 'cortex_DIM' error

hello!
thanks for your great repo.
I followed all instructions and made a new conda environment specific to this code.
I'm not sure why but it seems to fail on loading modules

I'm working on ./DIM directory and wrote command python scripts/main.py --help

thanks for your support in advance

Questions about prior matching part

Hi, thank you for your interesting DIM model and the open-source code.

However, I am confused about the realization of prior matching part

DIM/cortex_DIM/models/discriminator.py

Line 62 in bac4765

self.add_losses(discriminator=-difference + gp_loss)

It seems that GAN is used to force the global encoding matching the prior distribution. The discriminator loss consists of a difference between E_pos and E_neg (the realization equals to the original gan's formulation, i.e., with log term) and also one gradient penalty term which is introduced in WGAN to satisfy the Lipschitz constraint. So I wonder if the combination of original gan with a gradient penalty is reasonable?

work with network after training

Hi again.
How can I send an Image into network and get out-put vector (Mutual Information) and loss value for my Input?
I want to use my trained network, Is it possible?

problem in installing DIM

Hi Dr,
I've cloned your repo and although i've cloned cortex-dev branch I get

Could not find a version that satisfies the requirement cortex==0.13a0 (from cortex-DIM==0.13) (from versions: 0.1a0, 0.11)
No matching distribution found for cortex==0.13a0 (from cortex-DIM==0.13)

again.what is your idea?

Where is the discriminator network parametrized by omega?

Hi,

I was looking through your code, namely here: https://github.com/rdevon/DIM/blob/bac4765a8126746675f517c7bfa1b04b88044d51/cortex_DIM/functions/dim_losses.py#L109 and noticed that in the docstring it states that the losses take the feature maps l and m directly. However, in the paper, it is stated that the losses take the score that the discriminator gives to the feature maps. In the implementation where does the forward pass to the discriminator networks occur?

how can I train on GPU?

I tried below command, but it run training loop with cpu. Can you provide some examples?

CUDA_VISIBLE_DEVICES=3,4,5,6 python scripts/main.py local classifier --d.source CIFAR10 --encoder_config resnet19_32x32 --local.mode nce --t.epochs 1000 -d 3 -n gpu_test

4 saved nets

Hi,
when I load .t7 file, I can see 4 nets as dict in variable.
'Controller.encoder'
'conv2.classifier'
'fc4.classifier'
'glob-1.classifier'
what is the role of each net in code? which one i can use to calculate Mutual Information between 2 images?
best regards

then output of Controller.encoder is a vector in size (64), is this vector the encoded with max mutual information of input?

a bug in paper?

in paper page 12, table 6.

you show for Pearson chi2, f(u)=(u-1)^2. the domain of f* is R.
But actually domain of f* is (2, +infty) because we have constraints u > 0 in your paper.

Can I use your code on a small sample dataset?

Hi, thank you very much for your paper and code.
I now have a small number of grayscale images (about 800) and I want to know if I can apply the DIM model to encode these images and then apply them to classification tasks. In the demo, I found that the CIFAR-10 dataset contained 60,000 color images, so I was worried that my dataset was too small. In addition, how should I use your code, is it enough to just change the dataset (our image is single-channel)?
Since I am a novice at deep learning, any suggestions will be very helpful. Thank you again

How to generate 'fake' samples?

Hi Dr.
Thanks for the paper and code!
I have two questions about postive and negative samples for the JSD estimator mentioned in dim_losses.py

# Compute the positive and negative score. Average the spatial locations. E_pos = get_positive_expectation(u, measure, average=False).mean(2).mean(2) E_neg = get_negative_expectation(u, measure, average=False).mean(2).mean(2)
While calculating the E_pos and E_neg, all the inputs are the same --- u. I understand that the pos and neg samples should be distinguished before that, while why using the same u as the input here?
E_pos = (E_pos * mask).sum() / mask.sum() E_neg = (E_neg * n_mask).sum() / n_mask.sum()
Why do this? In code comments Since we have a big tensor with both positive and negative samples, we need to mask., I'm still confused about the mask, and the intention to operate E_neg * n_mask. How to generate neg samples?
Thanks!

Question about sampling the x' (negative samples) & code.

Hi,

I've read your paper, which is quite inspiring!
But here I have some questions regarding the eq.4, which states "x' is an input sampled from \tilde{P} x P". Does it mean that we randomly pick (x,y) independently from the X and Y(Y is the generated representation distribution of X)? If so, which part of the code implements this sampling method?

Thanks a lot!

Table 3. CPC comparision, ResNet 50 design choice

Thank you for sharing your code. I'm happy to try this seemingly attractive representation learning idea on a different backbone. e.g. ResNet50.

Gladly, you mentioned ResNet50 on Table3 in your paper, so I'm looking into it. Then I got several questions about it.

Table3. CIFAR10 (no data augmentation) DIM(L) single global

From the FoldedResNet code https://github.com/rdevon/DIM/blob/master/cortex_DIM/configs/resnets.py#L129
I was able to reproduce similar result (conv: 80, Y(64): 71) as it is on the paper(80.95) with following commmand

CUDA_VISIBLE_DEVICES=0 python scripts/main.py local classifier --d.source CIFAR10 --d.batch_size 128 --o.learning_rate 0.0002 --encoder_config foldresnet34_32x32 --local.mode nce --local.mi_units 1024 -n CIFAR10_DIM_L_NCE_foldresnet34 --t.epochs 500

CIFAR10_DIM_L_NCE_foldresnet34_final.t7_global_test_acc

I know you mentioned setting mi_units to be 2048 in Appendix 2, but it seems this change does not explain 9% performance difference. So I looked into the backbone.

While I was trying to fill the performance gap, I found that the ResNet used in this code is somewhat different from the one I usually use.

From the above image (from original ResNet Paper), 34-layer and 50-layer structure are only differ in use of bottleneck block instead of basic block.

In your code, even though the name shows resnet37, using bottleneck block make it resnet50 like encoder.

And I also considered using ResNet for CIFAR10 requires small modification from ImageNet 224x224 input version of ResNet (e.g. First conv kernel size become 3x3 from 7x7, no maxpooling after that)

However, there are a few more changes that have no explanation I can think of. Could you clarify what was the behind thought on these small changes?

Why did you change all conv_2,3,4x layers to have stride=1?
The only downsampling occurs at the first conv. But you could do the same thing by setting stride at following conv layers. Is there any advantage downsampling on the lowest level feature?
Why did you remove conv_5x layer completely from your _foldresnet37_32x32? I understand keeping all conv_xx layers make feature map to small (say, 1x1) but you can just set stride to be 1 as you did in previous layers. Does the insertion of conv_5x affect final results?
repeating blocks follow [3, 4, 6, 3] pattern. Considering you removed the final block. Then following [3, 4, 6] seems reasonable. But your code has a pattern of [2, 5, 2]. Is there any specific reason?

I was tried various modified ResNet structure based on your code, and most of the time classification accuracy rise and down again, but at the same time loss still decreases. (maybe goes to a degenerative solution or trivial solution) Have you experienced this phenomenon during your work? If the above changes don't matter much to the final result I wouldn't ask this question. It seems DIM training works on specific design choice, and I'm not sure what it is.

a bug in the code?

thanks a lot for your amazing work! However, as I tried to run the code, it raised the error:

 File "deep_infomax.py", line 107, in build
    self.classifier_c.build(dim_in=conv_units * conv_x * conv_y)
  File "XXX/.local/lib/python3.6/site-packages/cortex/_lib/models.py", line 186, in wrapped
    return fn(*args, **kwargs)
File "XXX/.local/lib/python3.6/site-packages/cortex/built_ins/models/classifier.py", line 35, in build
    classifier = FullyConnectedNet(dim_in, dim_out=dim_l, **classifier_args)
TypeError: type object argument after ** must be a mapping, not NoneType

I didn't change anything neither in cortex nor in DIM; since I couldn't find the __init__ in class SimpleClassifier, I have no idea about how to fix the bug. Would you mind checking that?

Samples from "Training a generator by matching to a prior implicitly" ?

I want to see some random samples from your new generative models, especially which are trained on CelebaA or LSUN datasets. The random samples trained on Tiny-ImageNet seen not good actually~

Code vs paper

Hello,

thank you for publishing your code - outstanding work :)
However I have a question regarding JSD/GAN based estimators and differences between implementation and formulation in your paper:
Eq. 4:

Eq. 7:

At the same time, in your code:
(for the JSD estimator)

if mode == 'fd':
    loss = fenchel_dual_loss(l_enc, m_enc, measure=measure)
[...]
E_pos = get_positive_expectation(u, measure, average=False).mean(2).mean(2)
E_neg = get_negative_expectation(u, measure, average=False).mean(2).mean(2)
[...]
Ep = log_2 - F.softplus(-p_samples)  # Note JSD will be shifted
Eq = F.softplus(-q_samples) + q_samples - log_2  # Note JSD will be shifted

While I do know, where does thie log_2 come from [Nowozin et al., 2016], the addition of q_samples in Eq is a bit more mysterious :D
And then for the prior matching:

    if not loss_type or loss_type == 'minimax':
        return get_negative_expectation(q_samples, measure)
    elif loss_type == 'non-saturating':
        return -get_positive_expectation(q_samples, measure)

Seems like you are using just half of the equation 7 to obtain loss value.

Could you clarify those differences (maybe I am missing something in the code)? I have been trying to merge DIM with my existing code (a bit different setup, yet should work together properly) and cannot get it to work well.

Thanks in advance!

Error in debugging in pycharm

A training error on CIFAR10?

hi! When I try to retrain the DIM on CIFAR10, there is an issue in the project:

 Traceback (most recent call last):
 File "scripts/deep_infomax.py", line 227, in <module>
   run(DIM())
  File "/usr/local/lib/python3.5/dist-packages/cortex/main.py", line 38, in run
    data.setup(**exp.ARGS['data'])
  File "/usr/local/lib/python3.5/dist-packages/cortex/_lib/data/__init__.py", line 69, in setup
plugin.handle(source, **data_args_)
  File "/usr/local/lib/python3.5/dist-packages/cortex/built_ins/datasets/torchvision_datasets.py", 
line 142, in handle
uniques = sorted(np.unique(labels).tolist())
UnboundLocalError: local variable 'labels' referenced before assignment

I install the cortex as your readme.txt and the version is cortex-0.13, when I check the source code of the specific line in torchvision_datasets.py, there seems the labels is not contained in the CIFAR10 dataset.

train_set, test_set = handler(Dataset, data_path, transform=train_transform, test_transform=test_transform,
                                  labeled_only=labeled_only)
    if train_samples is not None:
        train_set.train_data = train_set.train_data[:train_samples]
        train_set.train_labels = train_set.train_labels[:train_samples]
    if test_samples is not None:
        test_set.test_data = test_set.test_data[:test_samples]
        test_set.test_labels = test_set.test_labels[:test_samples]

    dim_images = train_set[0][0].size()

    if hasattr(train_set, 'labels'):
        labels = train_set.labels
    elif hasattr(train_set, 'targets'):
        labels = train_set.targets

    uniques = sorted(np.unique(labels).tolist())

    if -1 in uniques:
        uniques = uniques[1:]

    dim_l = len(uniques)
    dims = dict(images=dim_images, targets=dim_l)
    input_names = ['images', 'targets', 'index']

    self.add_dataset(
        source,
        data=dict(train=train_set, test=test_set),
        input_names=input_names,
        dims=dims,
        scale=scale

Question about prior matching

Hi @rdevon, I have a question about the updating of the weights of the discriminator in prior matching. Updating weights of the discriminator to maximize the prior loss from the paper is done here, and Z_Q is detached so the weights of the encoder will not be updated.
But when you want to update the weights of the encoder to minimize the loss from the paper you that here and Q_samples are calculated using the discriminator in self.score function. So, I can't see how the weights of the discriminator do not minimize the loss from the paper in this case (which would be wrong, since the discriminator wants to maximize the loss)?

Load .t7 data file error

when i load your .t7 file in pytorch I get this error

"unknown object type / typeidx: {}".format(typeidx))
torchfile.T7ReaderException: unknown object type / typeidx: 176816768

The version of cortex

Does not have scripts/main.py

I installed cortex from the branch of dev,but it does not has main.py in the script file.Could you give me a hlep?Thanks!

Is Eq. 8 in the paper correct?

Hello,

Thanks for a great paper. I am not sure, but I think the Eq. 8 in the paper doesn't contain correct term for local DIM.

Also, in the MINE paper MI is calculated by shuffling z along the batch dimension (this mentioned in a footnote) for product of the marginals term. Is there any reason for not doing this?

MINE as an estimation of Mutual Information between input space and latent representation

Thank you very much for your interesting work and releasing the code, much appreciated !!

I am implementing several manifold learning methods (from 64x64 images to 3D) that includes a jointly optimization of mutual information (MI) with MINE tricks (DIM, InfoMax VAE, ...)
As in those methods we are interested in maximizing MI (and not get its precise value), I understand the use of a more stable (but less tight) lower bound of MI, as Jensen-Shannon Divergence or InfoNCE.

However, as you did also in you DIM research paper (?), I want now to use Mutual information between input space and latent representation as a quantitative metrics to evaluate the latent code, and be able to compare for instance with state of the art technics (UMAP, t-SNE).

As we want a precise estimation of MI, do you agree that :

Using the Donsker-Varadhan representation(DV) of the KL divergence is needed (as it is the tightest bound on MI)
As this bound lead to a biased gradient, there is a need of a correction with moving average (as suggested in the MINE paper)

I tried several implementation of this, have an overall coherent MI behavior, but very unstable (no clear asymptote at all), it would be difficult to extract a single MI estimation from the output. Therefore I failed to use MINE as a metrics to compare different dimensionality reduction technics.
It would be so helpful if you could share your implementation of MINE for that purpose, or just some insight on the architecture you used, the optimizer, the lower bound on MI you used.

Any advice is welcome. Thank you so much in advance !

Random Initialization of Resnet arch

First of I really appreciate you for your git page

Second, how can I use resent arch but not it's wight?

you know in contrast of other works accuracy (like below) if feel reset loaded with pertained resent weights (because in first epoch I get 70% accuracy) how can I use resent arch but not trained resent's weight?
https://github.com/xu-ji/IIC

Load Trained model

Please describe how to load and use trained ".t7" models.

Training a generator by matching to a prior implicitly

Thanks for your codes, Now ,I had reproduced the code with tf, but i feel confused in the "Training a generator by matching to a prior implicitly" section, so, Can your give some example code or reference?
Thanks for your again.

KeyError: '`torchvision` not found in .cortex.yml data_paths'

Traceback (most recent call last):
File "scripts/main.py", line 66, in
run(controller)
File "/root/anaconda3/lib/python3.6/site-packages/cortex/main.py", line 38, in run
data.setup(**exp.ARGS['data'])

when i run python scripts/main.py local classifier --d.source CIFAR10 -n DIM_CIFAR10 --t.epochs 1000 can you help me? File "/root/anaconda3/lib/python3.6/site-packages/cortex/lib/data/init.py", line 69, in setup
plugin.handle(source, **data_args)
File "/root/anaconda3/lib/python3.6/site-packages/cortex/built_ins/datasets/torchvision_datasets.py", line 85, in handle
torchvision_path = self.get_path('torchvision')
File "/root/anaconda3/lib/python3.6/site-packages/cortex/plugins.py", line 104, in get_path
'{} not found in {} data_paths'.format(source, _config_name))
KeyError: 'torchvision not found in .cortex.yml data_paths'

missing file

Hi,

It seems that a file is missing.

=============================
from mlp import configs as mlp_configs
ModuleNotFoundError: No module named 'mlp'

code for tiny imagenet

hi! would you mind providing the network details(arch, feature map layer, etc) of DIM on tiny-imagenet? the paper only mentioned AlexNet arch, but we don't know which feature map to adopt and we are having trouble reproducing its performance. Sincerely thanks!

Something about Eq.4 in your paper that I can't understand

I don not understand the reason that you utilize the [softplus function] for mutual information calculation.
Can you give me a clear explanation?
Thanks.

when will you update your code?

Thank you for your good research :)

A problem in your paper

Hi, Rdevon, I read your paper DIM, a good job. However, I have a question about this model. In your paper, you use the JS divergence to replace the KL divergence, but this maybe get a bad result: the marginal probability product p(z)p(x) will be greater than the p(z|x)p(x), this will result in a poor result, the probability that a specific representation given a specific sample will be reduced. This is my question and what do you think it.

High level question about discriminator

Hello, I don't have a pure math background and I've tried to make my own naive implemenation of DIM where the "discriminator" is just a standard binary crossentropy sigmoid classifier where there is a 50% chance the input is global and and + example and a 50% chance the input is a global and - sample. Is this vanilla classification scheme a correct implementation of the project?

rdevon / dim Goto Github PK

dim's People

Contributors

Stargazers

Watchers

Forkers

dim's Issues

============================= from mlp import configs as mlp_configs ModuleNotFoundError: No module named 'mlp'

Recommend Projects

Recommend Topics

Recommend Org

=============================
from mlp import configs as mlp_configs
ModuleNotFoundError: No module named 'mlp'