vico-uoe / url Goto Github PK

Universal Representation Learning from Multiple Domains for Few-shot Classification - ICCV 2021, Cross-domain Few-shot Learning with Task-specific Adapters - CVPR 2022

License: MIT License

Python 97.43% Shell 2.57%

few-shot-learning multi-domain-learning multi-domain-meta-training knowledge-distillation

url's People

Contributors

Stargazers

Watchers

Forkers

licj1 indussky8 mysqlsc weihonglee cv-ip yoongsoon vieozhu kouyk xxchenxx zivzone monalissaa damien911224 liaoweiduo hummarow yunkai696 ljm198134 aylameansme

url's Issues

training time of URL

Hi,

I was wondering how long did it take for you to train URL? I'm running the script train_resnet18_url.sh and it takes a really long time, would be good to know the training time so I could see if I did something wrong.

Thanks,
Rui

关于您发布的URL模型的测试结果

尊敬的作者，您好！感谢您开源您杰出工作的代码！我在用您的代码(下载了最新版)复现[URL]论文的Table 4时，发现Table 4的第一列(NCC)与最后一列（Ours）的结果复现有一定的差距，我用您的模型(URL-pretrained)复现NCC的结果为(所有数据集测试平均Acc)：73.1(2次复现),，而论文结果为：73.7，最后一列(Ours)的结果复现为(所有数据集测试平均Acc)：75.04，而您论文的结果为：76.6。在复现NCC时，我选用的测度是余弦相似度，使用了test_extractor.py进行测试；在复现(Ours)时，我没有做任何改动，直接使用了您发布的测试脚本test_resnet18_pa.sh。
请问您能帮我分析以下可能出现的问题在哪里呢？麻烦您了！

Error in testing phase

Hi, thanks for your code! When I try to run test_extractor_pa.py after installing meta-dataset, I get the following error in L61.
*** RuntimeError: The Session graph is empty. Add operations to the graph before calling run().

I'm using tensorflow 2.8.0, pytorch 1.9.0 and cuda 11.1. I wonder if you can help me on this problem. Thank you!

unable to train url

Hi,

When I was trying to train the Universal Feature Extractor I had this error:

Traceback (most recent call last):
  File "/scratch/work/lir3/FSC/URL/train_net_url.py", line 109, in <module>
    sample = train_loader.get_train_batch(session)
  File "/scratch/work/lir3/FSC/URL/data/meta_dataset_reader.py", line 313, in get_train_batch
    return self._get_batch(self.train_dataset_next_task, session)
  File "/scratch/work/lir3/FSC/URL/data/meta_dataset_reader.py", line 296, in _get_batch
    episode = session.run(next_task)[0]
  File "/scratch/work/lir3/.conda_envs/ml/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/scratch/work/lir3/.conda_envs/ml/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1117, in _run
    raise RuntimeError('The Session graph is empty. Add operations to the '
RuntimeError: The Session graph is empty. Add operations to the graph before calling run().

Do you know how to fix it? Thank you

About the data augment

Hi, thank you for your great work on TSA.
When I trying to modify the SDL part, I would like to fetch non-augment data. So where is the augment code of train_loader? Some configurations are located in meta_dataset_config.gin , but when i set the value of augment configurations to zero, there is nothing happened. Do you have any suggestion?
looking forward to your answer.

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Hello,

I've trained the sdl networks from scratch and I was trying to train the URL model from scratch then. The program starts fine, but suddenly it breaks with this error:

2022-06-07 11:38:17.964494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29879 MB memory: -> device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:00:05.0, compute capability: 7.0
0%| | 0/240000 [00:00<?, ?it/s]2022-06-07 11:38:19.286408: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-06-07 11:38:47.998087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:390] Filling up shuffle buffer (this may take a while): 169 of 1000
2022-06-07 11:38:52.983647: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:415] Shuffle buffer filled.
2022-06-07 11:39:26.450546: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:390] Filling up shuffle buffer (this may take a while): 116 of 1000
2022-06-07 11:39:30.279009: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:415] Shuffle buffer filled.
0%|▍ | 1007/240000 [17:43<70:06:39, 1.06s/it]
Traceback (most recent call last):
File "/home/guests/lbt/URL/train_net_url.py", line 224, in
train()
File "/home/guests/lbt/URL/train_net_url.py", line 129, in train
ft, fs = torch.nn.functional.normalize(stl_features[t_indx], p=2, dim=1, eps=1e-12), torch.nn.functional.normalize(mtl_features[t_indx], p=2, dim=1, eps=1e-12)
File "/home/guests/lbt/.local/bin/.virtualenvs/few-shot/lib/python3.9/site-packages/torch/nn/functional.py", line 4637, in normalize
denom = input.norm(p, dim, keepdim=True).clamp_min(eps).expand_as(input)
File "/home/guests/lbt/.local/bin/.virtualenvs/few-shot/lib/python3.9/site-packages/torch/_tensor.py", line 498, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/guests/lbt/.local/bin/.virtualenvs/few-shot/lib/python3.9/site-packages/torch/functional.py", line 1590, in norm
return _VF.norm(input, p, _dim, keepdim=keepdim) # type: ignore[attr-defined]
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I wasn't able to find the problem. Can you help me?

Also, the program execution time is estimated in between 150 to 200 hours. Is it normal? Or there is something happening. I am training with a single GPU Tesla V100S-PCIE-32GB.

About the pre-trained single domain networks

Hi,

Thank you for sharing the code!

Are the pre-trained single domain networks the same as those provided by SUR?

Code for TSA adapter designs

Thank you for your great work on TSA.

I note that in the TSA paper, different adapter topologies and parameterisation were studied. I'm interested in replicating those additional studies but it seems like the released code defaults to the best configurations and left out the implementation for alternative designs. May I check if my understanding is correct or did I miss something?

how to compute "Best SDL"

Hi,

I was wondering do you still have the code for computing the "Best SDL" in Table 1 in the URL paper? Or alternatively, could you help with the following problem:

When I was trying to assess the performance of single domain pretrained NN the labels returned by test loader seems strange. If my understanding is correct, then by

test_loader = MetaDatasetEpisodeReader('test', test_set = ['vgg_flower'], test_type=args['test.type'])
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = False

with tf.compat.v1.Session(config=config) as session:
    
    test_sample = test_loader.get_test_task(session)

test_sample['target_gt'] should return the original label for the target set. To assess pretrained SDL NN, I just need to use it to make the prediction and calculate the accuracies. But test_sample['target_gt'] doesn't seem to be the original label for the target set (in vgg flower there're only 71 classes but when I print out test_sample['target_gt'] the smallest number is larger than 80)? I was wondering if I am i missing something here?

Many thanks in advance.

Code Release

Hi there~

Thanks for your excellent works! When will the code for them be published?

关于使用episodic training

您好！很抱歉再次来打扰您，希望能得到您的一些帮助或者建议。这几天我使用您以及您上次提到的Cnaps作者发布的代码进行episodic training，采取meta-learner是protonet。但是我在使用时都发现一个问题，就是似乎训练的时间代价很大。当我使用一张RTX3090进行训练时，仅在ImageNet上进行episodic training，每进行500次迭代的时间大约是30分钟。我使用的tensorflow版本是1.15，训练时有一些警告，我列举了一部分，如下所示：
OMP: Info #171: KMP_AFFINITY: 05 proc 77 maps to package 1 core 5 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 30 maps to package 1 core 8 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 78 maps to package 1 core 8 thread 1
OMP: Info #171: KMP_AFFINITY: oS proc 31 maps to package 1 core 9 thread 0
OMP: Info #171: KMP_AFFINITY: 0s proc 79 maps to package 1 core 9 thread 1
OMP: Info #171: KMP_AFFINITY: os proc 32 naps to package 1 core 10 thread 0
OMP: Info #171: KMP_AFFINITY: os proc 80 naps to package 1 core 10 thread 1
OMP: Info #171: KNP_AFFINITY: os proc 33 naps to package 1 core 11 thread 0
OMP: Info #171: KMP_AFFINITY: 0S proc 81 maps to package 1 core 11 thread 1
OMP: Info #250: KNP_AFFINITY: pid 137352 tid 137558 thread 1 bound to oS proc set 1
OMP: Info #250:KNP_AFFINITY: pid 137352 tid 137561 thread 2 bound to oS proc set 2
ONP: Info #250:KNP_AFFINITY: pid 137352 tid 137565 thread 3 bound to os proc set 3
如果您也遇到过类似的情况，是否能给我一些建议呢？感谢您！

No model update on 5-way 1-shot

Hi,
Wei-Hong, thanks for your nice work, it brings me a lot of inspiration.

I wonder is there no model (adapter) update in the setting of 5-way 1-shot ? (FROM test_extractor.py).
Or, how to get the results reported in Table 4?

Thanks!
Jim

Error in testing phase

Respected researchers
I am having a lot of problems running test_extractor_pa.py, can you help me out?
I'm running python3.9,torch11.8,tf2.13.
It looks like the model can't be imported or there is some problem with the dataset.

(fsl) YYM@ubuntu:~/NewResearchDirection/URL$ bash ./scripts/test_resnet18_pa.sh
2024-04-27 22:07:47.187378: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-04-27 22:07:47.225910: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-27 22:07:47.784984: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

['ilsvrc_2012', 'omniglot', 'aircraft', 'cu_birds', 'dtd', 'quickdraw', 'fungi', 'vgg_flower'] ['ilsvrc_2012', 'omniglot', 'aircraft', 'cu_birds', 'dtd', 'quickdraw', 'fungi', 'vgg_flower', 'traffic_sign', 'mscoco', 'mnist', 'cifar10', 'cifar100']
2024-04-27 22:07:49.159644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19140 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:18:00.0, compute capability: 8.9
=> loading best checkpoint './saved_results/url/weights/url/model_best.pth.tar'
2024-04-27 22:08:09.164747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19140 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:18:00.0, compute capability: 8.9
ilsvrc_2012
0%| | 0/600 [00:00<?, ?it/s]2024-04-27 22:08:09.599796: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-04-27 22:08:12.346139: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
2024-04-27 22:08:12.922567: W tensorflow/core/framework/op_kernel.cc:1816] INVALID_ARGUMENT: ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15,) + inhomogeneous part.
Traceback (most recent call last):

File "/home/YYM/anaconda3/envs/fsl/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 268, in call
ret = func(*args)

File "/home/YYM/anaconda3/envs/fsl/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)

File "/home/YYM/anaconda3/envs/fsl/lib/python3.9/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 198, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))

File "/home/YYM/NewResearchDirection/meta-dataset/meta_dataset/data/reader.py", line 132, in episode_representation_generator
episode_description = sampler.sample_episode_description()

File "/home/YYM/NewResearchDirection/meta-dataset/meta_dataset/data/sampling.py", line 487, in sample_episode_description
class_ids = self.sample_class_ids()

File "/home/YYM/NewResearchDirection/meta-dataset/meta_dataset/data/sampling.py", line 424, in sample_class_ids
episode_classes_rel = self._rng.choice(self.span_leaves_rel)

File "mtrand.pyx", line 920, in numpy.random.mtrand.RandomState.choice

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15,) + inhomogeneous part.

 [[{{node PyFunc}}]]
 [[IteratorGetNext]]

Rasha

https://coavira.com/2022/%D8%B5%D8%A7%D9%84%D9%88%D9%86-%D8%B1%D8%B4%D8%A7-%D8%AD%D9%86%D8%A7%D9%88%D9%8A-%D8%AC%D8%AF%D8%A9/

An small problem, really need your help!

Hi Prof. Lee! @WeiHongLee

I really appreciate the wonderful work and really want to have a try. When I train each single model(for example, 'dtd'), there is error in meta_dataset.data.reader.py. There are errors in Line 245 and 246: self.base_path = self.dataset_spec.path self.class_set = self.dataset_spec.get_classes(self.split). Since the 'dtd' is str, I wonder the method for solving this problem. Thanks a lot!

acurracy in MSCOCO and CIFAR-10

Hi, thank you for your great work on TSA.
I have a question I would like to ask, when I run the Varying-Way Five-Shot and Five-Way One-Shot scenarios with the URL parameter file you provided, the accuracy of the two data sets MSCOCO and CIFAR-10 is lower than your provided more than 5%, and the accuracy rate of other datasets is %1-%2 different from what you provided. Have you performed any other operations on the MSCOCO and CIFAR-10 datasets?
looking forward to your answer.

Parameter size of the loaded URL classifier doesn't match with that of model

Hi, thanks for open-source the code!

I am trying to understand the implementation of the work. However, when I load the parameters of f_{\phi} from the checkpoint, the classifier size doesn't match with the size of the classifier of the model.

eg, the size of the loaded classifier is (num_c1+num_c2+...+num_c8----->512) but the model has 8 classifier, eg, (num_c1----->512) (num_c2----->512) ....(num_c8----->512)

Inquiry about running on multiple GPUs

Hello,

I am currently using your project and am wondering if it is possible to run it on multiple GPUs. Specifically, I am interested in training the model on two GPUs to accelerate the training process.

I have tried to modify the code to support multiple GPUs, but I encountered some errors. Could you please let me know if your project supports multi-GPU training? If so, could you provide some guidance on how to implement it correctly?

Thank you for your help in advance!

Best regards！

Results of TSA on Aircraft and QuickDraw using the single ImageNet-trained Network

Hi, thanks for the valuable work!

I'm trying to reproduce the results in Table 1 (using a multi domain-trained network) and Table 2 (using a single ImageNet-trained network). It is easy to perfectly reproduce the results in Table 1, according to the README.md. It is praiseworthy!

However, when i try to obtain the results in Table 2, there are large margins between the achieved results and the reported ones on the two datasets Aircraft, QuickDraw. To be more specific, the reproduced ACCs for TSA on Aircraft and QuickDraw are about 65.08 and 63.06, which are far away from the reported 72.2 and 67.6, respectively.

When i re-run the code, the hyper-parameters for TSA are unchanged, and the pretrained model (ImageNet-net) was downloaded from: https://drive.google.com/file/d/1MvUcvQ8OQtoOk1MIiJmK6_G8p4h8cbY9/view.

Is there something wrong? Please correct me！
Many thanks.

Question about SDL model

Hi,

Thanks for your excellent work!

I was wondering which model you use in the Table 2 in the paper Cross-domain Few-shot Learning with Task-specific Adapters(the experiment using a single-domain feature extractor which is trained only on ImageNet). Is it the model from Single-domain networks (one for each dataset)?

We have tried to replicate your experiments. The results of MDL are the same as what you have provided(Table 1&Table 9 in TSA). Howerver, the results of SDL is so different from what you have provided.

We believe there may be two reasons for this.

Firstly it's the model. We use the model from Single-domain networks (one for each dataset), under the folder imagenet net. Is it the correct model?

Secondly, we notice that shuffle_buffer_size had an impact on the results (issue). You have already correct the results in Table 1. I was wondering whether the setting of param. shuffle_buffer_size in Table 2(the experiment using a single-domain feature extractor which is trained only on ImageNet) is 0 or 1000?

Thanks again!

请教两个问题

您好! 感谢您开源您杰出的工作！我最近也想在meta-dataset上进行一些实验，在使用您的代码过程中，我有两个小问题，想请教一下。问题一：在您的代码中，train_net.py好像是在训练一个普通的带全连接层的softmax分类器，在脚本文件中的train_resnet18_sdl.sh中执行训练时，训练过程中似乎没有epoch的概念，例如在ImageNet上一共迭代480000次，请问训练时，这480000次迭代的batch是如何确定的呢？是否是对训练集完整采样的产生的batch，每隔多少次迭代就把训练集遍历了一遍呢？还是说每次参与迭代的样本都是在训练集中随机采样产生呢？那么这样的话，如何避免可能存在的相邻几次迭代的重复采样的问题呢？问题二：在您发布的代码中似乎没有包含meta-train的过程，假如我想使用您的代码进行meta-train，该如何进行呢？
期待您的回复！