icoz69 / deepemd Goto Github PK

Code for paper "DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers", CVPR2020

License: MIT License

Python 95.99% Shell 4.01%

deepemd's Introduction

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

PyTorch implementation of

DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers" (CVPR 2020 oral (oral video) )

and

"DeepEMD v2: Differentiable Earth Mover's Distance for Few-Shot Learning" (TPAMI Extension).

DeepEMD achieves new state-of-the-art performance on five few-shot learning benchmarks with significant advantages (up to 7%). The result is obtained without using any extra data for training or testing (tranductive setting).

Check few-shot classification leaderboard.

If you use the code in this repo for your work, please cite the following bib entries:

@InProceedings{Zhang_2020_CVPR,
author = {Zhang, Chi and Cai, Yujun and Lin, Guosheng and Shen, Chunhua},
title = {DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and

@misc{zhang2020deepemdv2,
    title={DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning},
    author={Chi Zhang and Yujun Cai and Guosheng Lin and Chunhua Shen},
    year={2020},
    eprint={2003.06777v3},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

If you have any question regarding the paper, please send a email to chi007[at]e[dot]ntu[dot]edu[dot]sg.

Abstract

Deep learning has proved to be very effective in learning with a large amount of labelled data. Few-shot learning in contrast attempts to learn with only a few labelled data. In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To handle k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the proposed EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on four widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100) and Caltech-UCSD Birds-200-2011 (CUB).

Few-shot classification Results

Experimental results on few-shot learning datasets with ResNet-12 backbone. We report average results with 5,000 randomly sampled episodes for 1-shot evaluation and 600 episodes for k-shot evaluation

MiniImageNet Dataset

Setups	1-Shot 5-Way	5-Shot 5-Way
Previous SOTA	64.12	80.51
DeepEMD-FCN	66.50	82.41
DeepEMD-Grid	67.83	83.14
DeepEMD-Sampling	68.77	84.13

TieredImageNet Dataset

Setups	1-Shot 5-Way	5-Shot 5-Way
Previous SOTA	68.50	84.28
DeepEMD-FCN	72.65	86.03
DeepEMD-Grid	73.13	87.08
DeepEMD-Sampling	74.29	86.98

Prerequisites

The following packages are required to run the scripts:

Dataset

Please click the Google Drive link or Baidu Drive (uk3o) for downloading the following datasets, or running the downloading bash scripts in folder datasets/ to download.

MiniImageNet Dataset

It contains 100 classes with 600 images in each class, which are built upon the ImageNet dataset. The 100 classes are divided into 64, 16, 20 for meta-training, meta-validation and meta-testing, respectively.

TieredImageNet Dataset

TieredImageNet is also a subset of ImageNet, which includes 608 classes from 34 super-classes. Compared with miniImageNet, the splits of meta-training(20), meta-validation(6) and meta-testing(8) are set according to the super-classes to enlarge the domain difference between training and testing phase. The dataset also include more images for training and evaluation (779,165 images in total).

CUB Dataset

CUB was originally proposed for fine-grained bird classification, which contains 11,788 images from 200 classes. We follow the splits in FEAT that 200 classes are divided into 100, 50 and 50 for meta-training, meta-validation and meta-testing, respectively.

FC100 Dataset

FC100 is a few-shot classification dataset built on CIFAR100. We follow the split division proposed in TADAM, where 36 super-classes were divided into 12 (including 60 classes), 4 (including 20 classes), 4 (including 20 classes), for meta-training, meta-validation and meta-testing, respectively, and each class contains 600 images.

CIFAR-FS dataset (not in paper)

CIFAR-FS was also built upon CIFAR100,proposed in here. It contains 64, 16, 20 classes for training, validation and testing.

Important Arguments

We list some important arguments of our networks.

Model Selection Arguments

deepemd: choices=['fcn', 'grid', 'sampling']
pretrain_dir: dir of the pre-trained model
model_dir: dir of the testing model in eval.py

Solver Selection Arguments

solver: choices=['opencv', 'qpth']
form: two ways to use the QPTH for solving EMD, choices=['QP', 'L2']. see codes for detail
l2_strength: the weight for omitting the quadratic term in the form 'L2'

DeepEMD-FCN Related Arguments

feature_pyramid: feature pyramid applied to FCN, e.g. '2,3', default: None

DeepEMD-Grid Related Arguments

patch_list: the size of grids at every image-pyramid level, e.g. '2,3' means a pyramid with 2×2 and 3×3 structure.
patch_ratio: scale the patch region to incorporate context around the patch

DeepEMD-Sampling Related Arguments

num_patch: the number of sampled patches

SFC Related Arguments

sfc_lr: learning rate for finetuning SFC
sfc_wd: weight decay strength for finetuning SFC
sfc_update_step: number of finetuning step for SFC
sfc_bs: batch size for finetuning SFC

Some general tips

Get started. The training of our model has two stages, the model pre-training stage (train_pretrain.py) and the episodic meta-training stage (train_meta.py). You may also directly test the trained models by running eval.py. Before running these scripts, please set the dataset directory (-data_dir) and pre-trained model directory (-pretrain_dir) in the arguments or directly change the default directories in the code.

Solver selection. We provide two solvers to solve the LP problem in the EMD layer, OpenCV and QPTH. OpenCV is much faster than QPTH, therefore you can use OpenCV for validation and QPTH for training. You may also use OpenCV for training, which is much faster and saves memory, but this omits the graidents through the constraints and compromises performance.

About GPU memory. There are many arguments that influence the GPU momory. You may choose to adjust these arguments to make a blance between GPU memory and performance.

solver: OpenCV requires much less GPU memories than QPTH.
query: The number of query images in each class. Play the role of batch size in a task.
num_patch: The number of sampling patches in DeepEMD-Sampling.
patch_list: A list that indicates the grid size at different levels of image-pyramid in DeepEMD-Grid.

For DeepEMD-Sampling and DeepEMD-Grid, you may choose to train with a small number of patches but test with a big number of patches to achieve improved performance.

Testing scripts for DeepEMD

Test DeepEMD-FCN with a trained model for 1-shot 5-way task on the miniImageNet dataset:

$ python eval.py  -deepemd fcn -gpu 0,1,2,3

Test DeepEMD-FCN with a trained model for 5-shot 5-way task on the miniImageNet dataset:

$ python eval.py  -deepemd fcn -shot 5 -test_episode 600 -gpu 0,1,2,3

Test DeepEMD-Gird-Pyramid (2,3) with a trained model for 1-shot 5-way task on the miniImageNet dataset:

$ python eval.py  -deepemd grid -patch_list 2,3  -gpu 0,1,2,3

Test DeepEMD-Sampling (9-patch) with a trained model for 1-shot 5-way task on the miniImageNet dataset:

$ python eval.py  -deepemd sampling -num_patch 9   -gpu 0,1,2,3

Training scripts for DeepEMD

Pre-train the models on the miniImagenet

$ python train_pretrain.py -dataset miniimagenet -gpu 0,1,2,3

Train DeepEMD-FCN with a pre-trained model for 1-shot 5-way task on the miniImageNet dataset:

#use opencv solver (about 8GB memory)
$ python train_meta.py -deepemd fcn -shot 1 -way 5 -solver opencv -gpu 0,1,2,3

#use QPTH solver (about 32GB memory)
$ python train_meta.py -deepemd fcn -shot 1 -way 5 -solver qpth -gpu 0,1,2,3

Train DeepEMD-Gird-Pyramid (2,3) with a pre-trained model for 1-shot 5-way task on the miniImageNet dataset:

#use opencv solver (about 45GB memory)
$ python train_meta.py  -deepemd grid -patch_list 2,3 -shot 1 -way 5 -solver opencv -gpu 0,1,2,3

Train DeepEMD-Sampling (9 patchs) with a pre-trained model for 1-shot 5-way task on the miniImageNet dataset:

#use opencv solver (about 32GB memory)
$ python train_meta.py  -deepemd sampling -patch_list 9 -shot 1 -way 5 -solver opencv -gpu 0,1,2,3

Download Models

Pre-trained Models (or run bash download_pretrain_model.sh)

Meta-trained Models (or run bash download_trained_model.sh)

Acknowledgment

Our project references the codes in the following repos.

FEAT
MTL

deepemd's People

Contributors

Stargazers

Watchers

deepemd's Issues

How to perform model inference and achieve classification?

Hello, thank you very much for sharing the source code.
I have a question and don't know how to solve it.
How does the trained model perform inference to achieve classification?

Could you please reveal more details on the training procedure?

今天我刚刚读过您（们）的文章，确实是很有启发性的思路（尤其是structured FC相关的讨论）！然而，文章并没有详细解释训练的细节，同时我注意到文章中使用的多个对比方法的结果都相当高（这些应该是您用ResNet-12作backbone自行复现的结果），故而我对您训练/推理时的具体流程产生了兴趣，能请您分享一些相关细节吗？谢谢！
关于“结果相当高的对比方法”：举例来说，表4a中使用ResNet-12作backbone的ProtoNe在 miniImagenet 5way 5shot上有78.01%，而根据我之前的实验结果，在不使用数据增强、并使用了预训练backbone的情况下训练20万个episode后，这个数值在71%左右；CTM中也提及了，用ResNet-18作backbone的ProtoNet在5way 5shot上的准确率在74%。

I've just read your paper and I have to say that it's really thoughtful and inspiring ( especially the discussions on the structured fully connected layer). However, the details on the training procedure aren't discussed in the paper; and I notice that the results of many previous state-of-the-arts (which are obtained by your reproduction, I suppose) are relatively high in the paper than what I used to know, which makes me wonder how you managed to get them. Could you please reveal more details on your training/inference procedure? Thanks a lot.

About "relatively high in the paper": For example, in Table 4(a) ProtoNet(with ResNet-12 as its backbone) gets an accuracy of 78.01% on miniImagenet 5-way 5-shot tasks. However, according to my previous experiments, the performance of ProtoNet+ResNet12 is around 71% after 200,000 episodes of training, when a pretrained backbone is adopted and data augmentation is not used. It's also mentioned in CTM that ProtoNet+ResNet18 shows a performance of ~74% on miniImagenet 5-way 5-shot tasks.

Why can't I change the head layer to a protonet?

I use the script and the pre-trained model provided by the auther (resnet12), only get 70.5% on 1-shot tiered-imagenet testset, which is at least 4% behind what's reported in the paper. Can other people reproduce the results on tiered?

About the minImageNet dataset

Hello!
In the evaluation stage of minIageNet datasets, there are 20 types of samples, each of which has 600 samples, which means 12,000 samples. If you use the 5-way 1-shot experimental setting, it means that one task requires 80 images and your epoch has 5000 tasks, which means that 400,000 pictures are used, and you used data amplification? Or is the sample reused in one epoch? thank you for your patience!

Hello, do you know how to generate the pickle file for miniImagenet?

opencv and

the resolution of input image

Hi, thanks a lot for the great work!

But I am a little bit confused about the resolution of the input images and I don't see you explained it explicitly in the original paper. I took a look at your dataloader code and seems like they are taking the original-size image (as large as 300*300) and do randomresizecrop (into 84) during training? please correct me if my understanding is wrong.

If yes, I am quite concerned that the comparison is quite unfair then? Cause according to my knowledge, most numbers on mini-imagenet are using the pre-resized 84 as input (except closer look at few-shot and simpleshot)?

Thanks in advance and looking forward to ur reply!

FC100 misdescription

hello, I found there was some misdescription. There are 600 samples per class on FC100, not 100 samples.

Some questions about the ResNet-12 and the optimizer setting

Hi, thanks for your great work! I have some questions about the structure of ResNet-12 and the optimizer setting.

Firstly, why are the channels of ResNet-12 set as [64, 160, 320, 640]? Such a setting is common in WideResNet-28*10 rather than ResNet-12, which should be [64, 128, 256, 512]. Especially, in the paper, the Implementation Details section describes the feature map size which is 5*5*512. It seems a change after finishing the paper. So, I wonder how much this change has affected the performance?

Meanwhile, the optimizer adopts second-order optimization. It is also not common in the previous works. Although the training process really depends on each work, I wanna try a training from scratch without it.

Experimental results on miniimagenet dataset

Hello, thanks for your great work first.
I downloaded the Meta-trained models and tested DeepEMD-FCN with it for 1-shot 5-way task on the miniImageNet dataset, but the result is different from that in the paper. The results of DeepEMD-FCN for 1-shot tasks on miniimagenet is 57.46, pretty lower than your experimental result. I want to know what caused it.
I am really desperate for the answer.
Thanks in advance for your help.

Why repeat data_shot

Hi, I'm a beginner. I want to know why to repeat data_shot during meta-training.
train_meta.py, line.158
logits = model((data_shot.unsqueeze(0).repeat(num_gpu, 1, 1, 1, 1), data_query))

pre_train with grid/sampling mode error

Hi, I'm trying to pre train with grid or sampling mode but I have an error.

This is due to encode function when using patches: the outputs features are size (batch_size, feature_size=640, number_patchs) which can not be fed into Linear layer. How do you resize the outputs features of patches to feed properly into the linear layer ?

The complete error:

Traceback (most recent call last):
  File "train_pretrain.py", line 119, in <module>
    logits = model(data)
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/yohann/DeepEMD/src/models/deep_emd.py", line 29, in forward
    return self.pre_train_forward(input)
  File "/home/ubuntu/yohann/DeepEMD/src/models/deep_emd.py", line 41, in pre_train_forward
    return self.fc(self.encode(input, dense=False).squeeze(-1).squeeze(-1))
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ubuntu/yohann/DeepEMD/deepemdenv/lib/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

hyperparameters for pre-training on TieredImageNet?

What are the best hyperparameters for pre-training on TieredImageNet?

Should I keep the hyperparameters in the train_pretrain.py unchanged？

How to train tieredimagenet？

@icoz69 Thanks for your excellent work! When I train tieredimagenet, should I change the learning rate, max_epoch, step_size and other parameters? Or should I keep the pre-defined default parameters unchanged?

questions about supporting sets and query sets

Thanks for your nice work. I am new to FSL, so I am a little confused about supporting and query sets. In my understanding, we can only fine-tune the model on supporting sets, but I find you also use query set(each class 15), then what’s the difference between the case and 5way 20shot? I am really desperate for the answer. Thanks.

Does the QP module back-propagate gradient?

In file Network.py line 110:

       for i in range(num_query):
                 for j in range(num_proto):



                _, flow = emd_inference_opencv(1 - similarity_map[i, j, :, :], weight_1[i, j, :], weight_2[j, i, :])

                similarity_map[i, j, :, :] =(similarity_map[i, j, :, :])*torch.from_numpy(flow).cuda()

        temperature=(self.args.temperature/num_node)

        logitis = similarity_map.sum(-1).sum(-1) *  temperature

The calculation of the flow, i.e., the best match between features, is conducted when forward logits and losses, but does the back-propagation go through the 'emd_inference_opencv' module? If I understand correctly, this module is run on cpu() and numpy, therefore no gradient is tracked.

How to train 5-shot model？

Hi, thanks for your great work! How should I train the 5-shot model? Direct training from scratch or use 1-shot model as initialization? Do relevant parameters such as learning rate need to be adjusted accordingly?

dataset structure？

-0:
1
2.....
-1:
1
2....

About the acc on miniImageNet

Hi，I use the parameters you gave to train on miniimagenet，the model is DeepEMD-FCN. But the acc_test=60.5072. Did you use another training strategy, or did you pre-train on ImageNet？Looking forward to your reply

retun statement in the encode function of class DeepEMD

@icoz69 , Does the return statement 'return x' in the encode function of class DeepEMD is unnecessary in the first two places, I think only the last one needed to be retained, what do you think?

opencv solver for 5-shot evaluation

Hi,

I saw the default setting (both the eval.py and the evaluation command-line script in the ReadMe file) is using opencv solver. However, for 5-shot evaluation, the structured FC weights need gradients to update. But looks like the opencv solver layer is not differentiable? So how could the SFC weights be updated given this situation?

Thanks!

about CUB results in table 3c

Hi thanks a lot for your previous detailed response and it's really helpful. However, I am still quite confused about the details in table 3c.

In the last post, u mentioned: "For the results in Table3C, part of some is based on our implementation, while for the others , we added the citation behind the method name, which you can refer to."

so in this table above, could u tell us in details:

which number is implemented by yourselves and which one is citing others' number?
for the numbers implemented by yourselves, they are using the same data input setting as FEAT and deepEMD, right? I mean the input image is pre-cropped by the given bounding box.

Thanks a lot!

OpenCV used to meta-train?

Hi,

Thanks for your inspring work! However, I have a question.

I notice that in your readme file of the project, a command is provided to meta-train the model, after pre-training is done, where opencv is specified as the solver:
-----#use opencv solver (about 8GB memory)
-----$ python train_meta.py -deepemd fcn -shot 1 -way 5 -solver opencv -gpu 0,1,2,3
Nevertheless, in your paper, 5.1 Implementation Details, Training paragraph, it's claimed that QPTH is used during meta-training: “At training time, we use the GPU accelerated convex optimization solver QPTH [76] to solve the Linear Programming problem in our network and compute gradients for back-propagation.”

So, my question is:

Is opencv available during meta-training?
If both opencv and QPTH are available during meta-training? Does that mean for opencv, additional coding work has been done in the project to calculate gradients for back-propagation, like what has been done in QPTH? I fail to find related code, if there is, could you help highlight them to me?

Best Regards,
Neil

Can't achieve the accuracy in the papar

I pretrained the model and then used deep emd with the default setting, but the 5-way-1-shot accuracy was just 65.5%, which was 1% lower than the paper.

About sfc_lr in eval.py

I find that sfc_lr default value is 0.1 in train_meta.py. However, the default value in eval.py is 100. Is it based on some considerations or just a bug?

training hyperparameters for CUB and CIFAR

Dear author,

thanks for opening source your project. I am just wondering if you can publish the training script for CUB and CIFAR datasets that can potentially achieve the results reported in your paper?

Thanks

About the effect of meta-train

Hi, this is very good work.
After running the code, I find a problem of the meat-train.
I use the CUB dataset. With the number of epochs increasing during pre-train, the validation accuracy on 5-way 5-shot task also increases. This is normal.
However, during the meta-train, where the feature extractor also gets updated, I can't find accuracy improvement or meta-train loss decreasing. The test accuracy after meta-train is similar to the validation accuracy after pre-train. Is that normal?
So, my question is, does the meta-train take effect?
Thanks.

Pretrain model overfitting

Thanks for your exciting work! When I pretrain miniimagenet , I notice that the train acc up to 92% , while the val acc only 62% , whether it has being overrfitting? or should i check my code?

why resize to 92 first?

when setname == 'val' or setname == 'test'
why?resize to 92 first?
transforms.Resize([92, 92]),
transforms.CenterCrop(84),

RuntimeError: lu: For batch 0: U(30,30) is zero, singular U.

Have you meet some problem like this?

RuntimeError: lu: For batch 0: U(30,30) is zero, singular U.

CUB dataset is not original one

I notice that the CUB dataset you use is the accurately cropped version, not the same as in "Take a closer look at FSL". Is this somehow unfair?

Code for Visualization of Fig 6 & 7 in the paper

Hi @icoz69 ,

Thanks for your great work!! I am impressed by your visualization Fig 6 & 7 in the paper to explain matching patches of images. Do you have the code or reference to reproduce this? It would be helpful if you can help. Thanks!!

ModuleNotFoundError: No module named 'qpth'

can you please tell me how should i install qpth?

gradient of the LP problem

Hi! Thank you for the nice work. I have a question about the gradient when using the QPTH solver to solve the LP problem. In my test, the gradients are almost always zeros. I attached the following testing code in the models/emd_utils.py file.

if __name__ == '__main__':

    batch_size = 50
    num_node = 25

    cosine_distance_matrix = torch.rand(batch_size, num_node, num_node, device='cuda', requires_grad=True)
    weight1 = torch.rand(batch_size, num_node, device='cuda', requires_grad=True)
    weight2 = torch.rand(batch_size, num_node, device='cuda', requires_grad=True)

    _, qpth_flow = emd_inference_qpth(cosine_distance_matrix, weight1, weight2)
    qpth_flow.sum().backward()
    print(weight1.grad)
    print(weight2.grad)
    print(cosine_distance_matrix.grad)

The output gradient w.r.t all three terms are very small regardless of whether L2 or QP form is used. So I am wondering how is the back-propagation done in such a condition. Would you like to tell me if there is a misunderstanding of me?

Configs for training 5-shot models

Hello, thanks for your great work first.

I want to reproduce some results on 5-shot tasks but found the training process quite slow compared to 1-shot tasks (e.g. 10-20 seconds/iterations), is it normal? or could you share the training configs for 5-shot tasks.

btw, I also have questions about how you improve your numbers on miniimagenet 5-way 1-shot task from CVPR version (65.91) to arxiv version? any suggestion is appreciated.

Thanks
Yang

Reproduction of the results of tiered-imagenet

Hi, thanks for the contribution to few-shot learning community. It is indeed interesting work!

I am just wondering how to get the exact results you report in the paper for tiered-imagenet. Maybe not the exact results but close results. I can get 73% accuracy on 1 shot using the hyperparameters given in the readme file with 600 randomly sampled episodes, however, only 70% with 5,000 episodes. Thanks!

About application areas

Does it possible to apply DeepEMD to general classification task?

higher results with respect to the matching networks

hi, @icoz69 , thanks for releasing the code! I am quite surprising that the results of some of the baselines are much more higher than that reported in the original paper, such as matching network, i.e., 63.08% in deepemd compared to 43% in nips paper, is it because you change the backbone to resnet12?

Looking forward to your reply.

All the results are lower than paper with the default settings and feature pyramid 5,2,1

I have tried FCN pyramid structure 5,2,1 in the paper, but the accuracy on miniImagenenet is still 65.5%.
Besides, acc on tiered imagenet is 70.6%(2% lower), 72.8%(2% lower) on CIFAR-FS, 45.7% on FC100(1% lower) and 76.6%(0.5% lower).
I used raw code and keep the pyramid FCN as 5,2,1, all the results are lower than paper.

understanding the dataloader procedure

Thanks so much for providing the code of your paper.
I am new to FSL and trying to understand the training and data loader process. I understand the concept of support and query as well as shot and way. But I'm still confused about how you make the data loader and appreciate if you can explain it. I also read the paper several times but it didn't work to understand these details. I really appreciate your help in advance.

In the case of 1-shot 5-way, what does query=15 mean? how should we choose the number for query? does it need to be some specific number in agreement with previous works or it is arbitrary and kinda like hyperparameter?

in the cub dataset, we have 5892 images. and I see the data loader generate batch of 80x3x84x84 for input and the output is in forms of 80x640x5x5:

I don't understand where the 80 comes from at the first point, and then in the output where the 640 comes from?
later I see we have data_shot and logits, the size of logit is 75x5, can you please explain what 75 represent? I understand that it is 15*5=75 but I could not understand what it represents.

Thanks in advance for your help.

Question on how to use this for inference

So i'm able to train my own model on a custom dataset, but had a hard time figuring own how to do inference on this. For example given a single image, how do I make it so that the result is the prediction of what class the image belong to and the confidence scores of the prediction .

The memory of QPTH-solver(5way-1shot) is about 22GB

Unable to REPEAT previous results.

Although setting the same random seed, the accuracy results of two experiments with the same setting are different.

Code:
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

Looking forward to your reply!

about tiered imagenet dataset

I downloaded the tiered imagenet dataset using the google drive link you provided but found that those images are not actually the same as the original dataset, which is usually the one people are using for fair comparison i think.

The original dataset contains images that have been pre-processed into 84*84, but in your dataset, seems like the images are all in 224*224. Could you please:

release the details about how to do the data pre-processing?
give some reference that any baseline methods are using the same dataset format?

Thanks!

Fail to reproduce accuracy for miniimagenet

Hi. I tried to reproduce results for MiniImageNet in DeepEMD FCN following your instruction, but failed to get the accuracy listed in your paper. The evaluation accuracy is only 55%.

Here is my training scripts:

$ python train_pretrain.py -shot 1 -way 5 -dataset miniimagenet -gpu 
$ python train_meta.py -deepemd fcn -shot 1 -way 5 -solver opencv -dataset miniimagenet -gpu 0
$ python eval.py -shot 1 -way 5 -dataset miniimagenet -gpu

Could you give me some tips?
Looking forward to your reply.

Use opencv as a solver in meta-training?

Hi,

Thanks for your inspring work! However, I have a question.

I notice that in your readme file of the project, a command is provided to meta-train the model, after pre-training is done, where opencv is specified as the solver:

#use opencv solver (about 8GB memory)
$ python train_meta.py -deepemd fcn -shot 1 -way 5 -solver opencv -gpu 0,1,2,3

Nevertheless, in your paper, 5.1 Implementation Details, Training paragraph, it's claimed that QPTH is used during meta-training: “At training time, we use the GPU accelerated convex optimization solver QPTH [76] to solve the Linear Programming problem in our network and compute gradients for back-propagation.”

So, my question is:

Is opencv available during meta-training?
If both opencv and QPTH are available during meta-training? Does that mean for opencv, additional coding work has been done in the project to calculate gradients for back-propagation, like what has been done in QPTH?

Best Regards,
Neil

do you have pretrained models available?

About GPU utilization during meta-training(meta-val)

Hello, first of all, thank you very much for your excellent work！！ When I run meta-training (meta-val), I found a problem, I would like to ask you.
I ran the experiment on 12 GB NVIDIA Titan V GPUs. I found that during meta-training ( meta-val ), the GPU utilization has been fluctuating and at a relatively low level. I don't know if you have found the same problem, or if it was my fault. If you have found this phenomenon as well, is it a bug, or is it inevitable?
This situation does not exist in the pre-training phase and only exists when meta-training or meta-validation, meta-testing is involved.

Looking forward to your reply. Thank you！！

RuntimeError and Code details

Thanks for your exciting work. May I ask you some questions:
Q1. When I train (meta-train) the model with 4 gpus with qpth solver, I get the RuntimeError(at the bottom). But when meta-training the model with 3 gpus with qpth solver, it is available. And it is also ok when pretraining or meta-training with opcv solver with any numbers of gpus.

Q2. What are they trying to do in the opencv solver? specifically,
1) line 5 and line 8 calculate the dot product between flow and similarity, but in transportation problem, we aim to calculate the dot product between flow and cost, so is it miss the (1-similarity_map[i, j, :, :]) in line 5? and why not directly use the cost computed by opencv?
2) what is the meaning of the temperature in line 7, and it is like that temperature is a scale factor for logitis?

  line0:     for i in range(num_query):
  line1:          for j in range(num_proto):
  line2:              # similarity, weight_1, weight_2 index [i, j]; cost = 1 - similarity
  line3:              _, flow = emd_inference_opencv(1 - similarity_map[i, j, :, :], weight_1[i, j, :], weight_2[j, i, :])
  line4:              # _: cost; flow
  line5:              similarity_map[i, j, :, :] = (similarity_map[i, j, :, :]) * torch.from_numpy(flow).cuda()
  line6:
  line7:      temperature = (self.args.temperature/num_node)
  line8:      logitis = similarity_map.sum(-1).sum(-1) * temperature      # (tensor([75,5])
  line9:      return logitis

why v2.EMD is much faster than qpth? do they use the different algorithms? is it necessary to use relu and normalize before EMD?

It is obvious that gradient doesn't bp, so do you only update structed_fc when training with opcv and the encoder is fixed?

    def emd_inference_opencv(cost_matrix, weight1, weight2):
        # (tensor([25,25]), tensor([25]), tensor([25]))
        # cost matrix is a tensor of shape [N,N]
        cost_matrix = cost_matrix.detach().cpu().numpy()
        # ensure > 0
        weight1 = F.relu(weight1) + 1e-5
        weight2 = F.relu(weight2) + 1e-5
        # normalize to 1
        weight1 = (weight1 * (weight1.shape[0] / weight1.sum().item())).view(-1, 1).detach().cpu().numpy()
        weight2 = (weight2 * (weight2.shape[0] / weight2.sum().item())).view(-1, 1).detach().cpu().numpy()
        cost, _, flow = cv2.EMD(weight1, weight2, cv2.DIST_USER, cost_matrix)
        # a transportation problem, solved using a simplex algorithm
        # the complexity is exponential in the worst case
        # multi-dimensional histogram comparison for image retrieval
        return cost, flow

Looking forward to your reply! Many thanks for your help!

loading model from : checkpoint/pre_train/miniimagenet/128-0.1000-30-0.20/max_acc.pth
checkpoint/miniimagenet/fcn/1shot-5way/
epo 1, total loss=0.7245 acc=0.8000: 2%|██▎ | 1/50 [00:48<39:16, 48.09s/it]
Traceback (most recent call last):
File "train_meta.py", line 157, in
logits = model((data_shot.unsqueeze(0).repeat(num_gpu, 1, 1, 1, 1), data_query))
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in replica 3 on device 3.
Original Traceback (most recent call last):
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home1/lichangzhen/experiment/DeepEMD/Models/models/Network.py", line 26, in forward
return self.emd_forward_1shot(support, query)
File "/home1/lichangzhen/experiment/DeepEMD/Models/models/Network.py", line 75, in emd_forward_1shot
logits = self.get_emd_distance(similarity_map, weight_1, weight_2, solver='qpth')
File "/home1/lichangzhen/experiment/DeepEMD/Models/models/Network.py", line 130, in get_emd_distance
_, flows = emd_inference_qpth(1 - similarity_map, weight_1, weight_2,form=self.args.form, l2_strength=self.args.l2_strength)
File "/home1/lichangzhen/experiment/DeepEMD/Models/models/emd_utils.py", line 59, in emd_inference_qpth
flow = QPFunction(verbose=-1)(Q, p, G, h, A, b)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/qpth/qp.py", line 96, in forward
eps, verbose, notImprovedLim, maxIter)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/qpth/solvers/pdipm/batch.py", line 62, in forward
factor_kkt(S_LU, R, d)
File "/home/lichangzhen/anaconda3/envs/pytorch1.7/lib/python3.6/site-packages/qpth/solvers/pdipm/batch.py", line 442, in factor_kkt
T[factor_kkt_eye] += (1. / d).squeeze().view(-1)
IndexError: The shape of the mask [95, 675, 675] at index 0 does not match the shape of the indexed tensor [90, 675, 675] at index 0

Request for the complete arguments for the eval part.

Hello, first of all thanks for your excellent work!
But when I try to reproduce the results that you provided, what I got is pretty low.

Here is the complete log for MiniImageNet DeepEMD-FCN 1-Shot 5-Way:

C:\Users\admin\Anaconda3\envs\tensorflow\python.exe D:/Projects/DeepEMD/eval.py
{'data_dir': 'your/default/dataset/dir',  # it's fine because I make some changes to the fcn\mini_imagenet.py and already hardcode the real path.
 'dataset': 'miniimagenet',
 'deepemd': 'fcn',
 'feature_pyramid': None,
 'gpu': '0', 
 'metric': 'cosine',
 'model_dir': 'D:\\Projects\\DeepEMD\\trained\\deepemd_trained_model\\miniimagenet\\fcn\\max_acc.pth',
 'norm': 'center',
 'num_patch': 9,
 'patch_list': [2, 3],
 'patch_ratio': 2,
 'query': 15,
 'seed': 1,
 'set': 'test',
 'sfc_bs': 4,
 'sfc_lr': 100,
 'sfc_update_step': 100,
 'sfc_wd': 0,
 'shot': 1,
 'solver': 'opencv',
 'temperature': 12.5,
 'test_episode': 5000,
 'way': 5}
manual seed: 1
use gpu: [0]
loading model from : D:\Projects\DeepEMD\trained\deepemd_trained_model\miniimagenet\fcn\max_acc.pth
batch 5000: This episode:49.33  average: 57.1491+0.2795: 100%|██████████| 5000/5000 [38:01<00:00,  2.19it/s]
test Acc 57.1491
Test Acc 57.1491 + 0.2795

Process finished with exit code 0

I think if you could publish the complete arguments it will help to resolve this problem.

Thx in advance!