mmasana / facil Goto Github PK
View Code? Open in Web Editor NEWFramework for Analysis of Class-Incremental Learning with 12 state-of-the-art methods and 3 baselines.
Home Page: https://arxiv.org/pdf/2010.15277.pdf
License: MIT License
Framework for Analysis of Class-Incremental Learning with 12 state-of-the-art methods and 3 baselines.
Home Page: https://arxiv.org/pdf/2010.15277.pdf
License: MIT License
Dear author,
Thanks for your work.
Question 1:
I am quite confused about your definition of the model_old. I thought model_old should be the model from the last tasks, therefore, the number of heads of the model_old in tasks 1 should be always 1. However, "the number of heads of the model_old in tasks 1" is sometimes 1 (expected value) and sometimes 2 (unexpected value) during def eval.
I think it's because of your arrangement of the def train which includes train_loop and post_train_process,
When you call .train and .eval sequentially, the post_train_process will make the number of heads of the model_old in tasks 1 be 2 in the .eval function.
For example. In def search_tradeoff (gridsearch.py), you call .train and .eval.
and in incremental_learning.py, you call .train and .eval.
By setting num_epochs=1, the following results can make my confusion clear.
Can I confirm my understanding from you:
Can you help me to understand this point? What's the specific explanation of your model_old in different cases?
Question 2:
The following values are from Task-Aware incremental performance. Surprisingly, as the number of tasks increases, the performance on the first task increases instead (as shown by the bolded numbers), can you kindly explain the reason?
Task Incremental Acc
81.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 81.5%
81.3% 48.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 64.9%
81.6% 52.9% 75.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 69.9%
83.0% 54.6% 76.5% 68.4% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 70.6%
83.5% 55.9% 77.9% 68.3% 72.4% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 71.6%
84.2% 58.3% 78.4% 70.2% 73.7% 66.2% 0.0% 0.0% 0.0% 0.0% Avg.: 71.8%
84.0% 59.9% 79.3% 70.4% 77.1% 66.4% 71.5% 0.0% 0.0% 0.0% Avg.: 72.7%
83.8% 60.1% 79.8% 70.6% 77.1% 66.3% 69.5% 69.7% 0.0% 0.0% Avg.: 72.1%
83.9% 60.0% 80.1% 70.8% 78.1% 67.1% 70.0% 69.0% 66.0% 0.0% Avg.: 71.7%
83.9% 59.6% 80.3% 71.2% 78.1% 68.3% 71.1% 70.5% 64.6% 64.2% Avg.: 71.2%
Thanks.
Mengya Xu
Task 1
LR GridSearch
| Epoch 1, time= 1.8s | Train: skip eval | Valid: time= 0.4s loss=1.969, TAw acc= 29.6% | *
Current best LR: 0.05
| Epoch 1, time= 1.9s | Train: skip eval | Valid: time= 0.4s loss=2.072, TAw acc= 25.6% | *
Current best LR: 0.05
| Epoch 1, time= 1.8s | Train: skip eval | Valid: time= 0.4s loss=2.115, TAw acc= 23.4% | *
Current best LR: 0.05
Current best acc: 29.6
Trade-off GridSearch
inside def train_epoch
num_model_old_heads 1
num_model_heads 2
| Epoch 1, time= 2.8s | Train: skip eval |
inside def eval
num_model_old_heads 1
num_model_heads 2
(this def eval is called after def train_epoch, specifically, this def eval is called before post_train_process.)
Valid: time= 0.4s loss=22.879, TAw acc= 19.2% | *
| Selected 2000 train exemplars, time= 11.8s
inside def eval
num_model_old_heads 2
num_model_heads 2
(this def eval is called after def train, specifically, this def eval is called after post_train_process.)
Current acc: 0.214 for lamb=4
inside def train_epoch
num_model_old_heads 1
num_model_heads 2
| Epoch 1, time= 2.9s | Train: skip eval |
inside def eval
num_model_old_heads 1
num_model_heads 2
Valid: time= 0.4s loss=12.227, TAw acc= 22.4% | *
| Selected 2000 train exemplars, time= 11.7s
inside def eval
num_model_old_heads 2
num_model_heads 2
Current acc: 0.236 for lamb=2.0
inside def train_epoch
num_model_old_heads 1
num_model_heads 2
| Epoch 1, time= 2.8s | Train: skip eval |
inside def eval
num_model_old_heads 1
num_model_heads 2
Valid: time= 0.4s loss=7.281, TAw acc= 23.8% | *
| Selected 2000 train exemplars, time= 11.7s
Train
inside def train_epoch
num_model_old_heads 1
num_model_heads 2
| Epoch 1, time= 2.9s | Train: skip eval |
inside def eval
num_model_old_heads 1
num_model_heads 2
Test on task 0 : loss=3.747 | TAw acc= 31.0%, forg= -5.6%| TAg acc= 22.5%, forg= 2.9% <<<
inside def eval
num_model_old_heads 2
num_model_heads 2
Test on task 1 : loss=7.030 | TAw acc= 25.8%, forg= 0.0%| TAg acc= 14.1%, forg= 0.0% <<<
Save at results_test/cifar100_icarl_icarl_fixd_5
Task 2
Hello,
As described in the documentation, --save-models
allows to save the trained models after each incremental step (task). Models are saved in a model.ckpt
format without any other additional file (.meta file, .index file).
How can I convert a saved model directly from .ckpt to .pb ?
Appreciate your help! Thank you in advance!
Hello !
In Class-incremental learning: survey and performance evaluation on image classification, large domain shift is studied. In this section, the number of classes varies among the tasks (i.e., starting with 102 classes then adding 67 classes then 200, etc.).
Is it possible to specify the number of classes for each task? Because in the code, I only found the possibility of specifying the --nc-first-task (number of classes of the first task) and --num-tasks (total number of tasks).
Thank you in advance !!
I have added 2 classes at each task for total of 5 tasks i.e. 5*2=10 total classes. Below are the results.
I have some confusion in understanding the results:
Dear author,
Thanks for your great work. It really helps me a lot!
I wonder if you tried to use imagenet_100 (imagenet_subset) as the dataset? I tried to use the imagenet_subset dataset but failed. It really helps me if you can provide the code about using the imagenet_subset dataset!
Hello!
I have a confusion about how to view Upperbound(Joint) results. First of all, I am running the code by setting the approach to "joint" to see the Upperbound(Joint) result in CIFAR-100 in the script.
There are four types of results for each seed: avg_accs_tag, avg_accs_taw, acc_tag, and acc_taw. In your paper results (Fig 8), it is confusing how to view the Upperbound(Joint) result using the above four results.
Thank you.
Many thanks for your helpful project.
I want to know if there is possible to omit the validation dataset since it will reduce the number of training datasets as shown below,
FACIL/src/datasets/memory_dataset.py
Lines 100 to 110 in f653d6c
I have tried to change the default value of the controlled parameter detailed below,
FACIL/src/datasets/data_loader.py
Line 14 in f653d6c
ZeroDivisionError
.Hello !
In the paper Class-incremental learning: survey and performance evaluation on image classification, there are several extensive experiments shown in the figures.
While I was trying to use the code and run the experiments, the results are a bit different from the paper's results.
Is it possible to specify the config setting to run these experiments? such as random seeds, learning rates (starting, min and decreasing factors) or weight decays.
For the paper results, I'm mostly regarding to Fig. 7 and Fig. 8 in the paper. The experiments I ran using the code couldn't reach ~80% accuracy on cifar-100 after the first task (10 class), so I'm wondering if I'm missing something on this.
Thank you in advance !!
Hello, I noticed that the accuracy of the LwF method in CIFAR-100 (10/10) drops to 16% after 10 tasks when the No exemplars approach is used, which is different fromthe 30.2% accuracy reported in your paper. I waswondering if you could help me understand this discrepancy? Thank you.
Hi, this is not really an issue.
I'm currently developing a classification model which is trained by event-driven samples. The number of labels/classes grows as new events are produced and consumed by the model. I partially read :) your article but couldn't not find a reference to what is a task ID. (Is this a batch number for the training set?) Overall I would like to congratulate you for this work since it seems to extensively deal with a real issue.
It's not an issue regarding your codes,
but I just wanted to let you know that your link for more information about --approach
seems to be something wrong. (in src/README.md)
(approaches -> approach)
By the way, thank you very much for your work!! :))
I want to use Huggingface's ViTForImageClassification. How do I integrate it in FACIL? I want to load the pretrained model
'google/vit-base-patch16-224'. I have read the instructions to add networks in readme of networks. However I am still not sure how to implement it. How do I set "self.head_var = 'fc'" when head is changed by "model.classifier = nn.Linear(768, num_classes)"? How exactly will a class even be created in this case?
Hello!
Thank you for your nice work.
I have a question:
LwM (learning without Memorizing) paper uses attention distillation loss. In your code (lwm.py):
# in class GradCAM
def __call__(self, input, class_indices=None, return_outputs=False):
# pass input & backpropagate for selected class
if input.dim() == 3:
input = input.view([1] + list(input.size()))
self.model.eval()
model_output = self.model(input)
logits = torch.cat(model_output, dim=1)
if class_indices is None:
class_indices = logits.argmax(dim=1)
score = logits[:, class_indices].squeeze()
self.model.zero_grad()
score.mean().backward(retain_graph=self.retain_graph)
model_output = [o.detach() for o in model_output]
# create map based on gradients and activations
with torch.no_grad():
weights = F.adaptive_avg_pool2d(self.gradients, 1)
att_map = (weights * self.activations).sum(dim=1, keepdim=True)
att_map = F.relu(att_map)
del self.activations
del self.gradients
return (att_map, model_output) if return_outputs else att_map
I feel that using such a code does not seem to produce gradients when backpropagating.
Looking forward to your reply.
Thank you.
Hello! Thank you for your amazing work! But it seems that in BiC, in line 102, when creating the dataset val_old, the code doesn't consider the situation that "self.exemplars_dataset.max_num_exemplars_per_class != 0".
In the EEIL paper, distillation loss is applied to classification layers corresponding to previous classes, and the balanced finetuning stage adds temporary distillation loss for classification layers for new classes.
But my question is, in the balanced finetuning stage, I can't see what the temporary distillation loss for classification layer for the new class is. Also, looking at the loss function part of the EEIL code, it appears that during the balanced fine-tuning stage, the fc layer is computed to correspond to task t-1 for the distillation. On the other hand, the distillation is applied to the fc layer up to task-2 before entering the balanced fine-tuning stage. If we calculate like this, doesn't distillation in the unbalanced training stage apply to the classifier corresponding to task t-1?
I'm asking because I get confused even if I look at EEIL paper and code several times.
Thank you.
Is it suitable for the more strict tasks, like the Few-shot Class Incremental Learning (FSCIL) scenario?
Hello!
Thank you very much for this library - it's great!
I have a question:
The default value of --gridsearch-tasks
in main_incremental.py
is -1
. And it has been given that setting it to -1
will perform grid search for all tasks.
But this does not reflect in the code, and grid search is not done when this is executed.
The only lines where gridsearch_tasks
is referenced are line 167, line 214 and line 241 and all of these work only when the value is positive.
Can you please check if the code is right?
Thank you.
Does the accuracy of each stage only use the category validation set of the current task? Should the accuracy of all stages of training be taken as the final result?
Thank you very much for your reply
Traceback (most recent call last):
File "src/main_incremental.py", line 13, in
from datasets.data_loader import get_loaders
ModuleNotFoundError: No module named 'datasets.data_loader'
Hello,
does FACIL support multihead classification for EWC and LWF ?
To elaborate, I am trying to setup each dataset (with different umber of classes) as a task and perform continual learning. Do you have any suggestions on this?
Thanks
Hi, when experimenting with LwM in FACIL I noticed that the method behaves the same regardless of the choice of gamma
parameter that controls attention distillation loss. Upon closer investigation, I noticed that during training attention maps returned by GradCAM have no grad, as you can check yourself with the debugger in this line:
Line 126 in e9d816c
I think the issue is in class GradCAM
in line 226, where the activations are detached, and later in line 255 which disables gradients when computing attention maps. I think this class should have the option to preserve gradients when computing attention maps and trigger this option for a forward pass of the current net. Then the attention maps for current net will have requires_grad=True
and consequently attention loss of will contribute to weight updates.
Hi !
As mentioned in README.md to add a custom dataset:
If I follow the steps mentioned above to add newdataset, I will get:
Till now, is this correct?
In the newdataset folder added in data folder shown in FACIL capture, should I insert all the train and test images as .jpg files? In addition to train.txt and test.txt ?
Sorry for this long question but I'm having a problem when adding a custom dataset. Appreciate your help !! Thank you :)
Hello!
Thank you for your great work, from which I've learnd a lot. Recently I want to reproduce LUCIR but find that the corresbonding results are quite different from those reported by Hou in his paper (Hou reported an avg acc of 60.18 and yours is 43.4 under the config of 50/11). Which should I follow? The related results are showed as following.
your work----------------------------------
Hou's work (LUCIR)-------------------------
And there are also some work reported the avg acc of lucir under the same config.
PODNet. ECCV2020-------------------------------
Looking forward for your reply!
best wishes!
# dataset args
parser.add_argument('--datasets', default=['imagenet_256'], type=str, choices=list(dataset_config.keys()),
help='Dataset or datasets used (default=%(default)s)', nargs='+', metavar="DATASET")
FileNotFoundError: ../data/ILSVRC12_256/train.txt not found.
Hello!
I am trying to use LWF on VGGFace2, but I am always getting this error.
"line 110, in get_data
assert data[tt]['ncla'] == cpertask[tt], "something went wrong splitting classes" "
Any ideas of how to solve this?
Thanks in advance.
Hi,
I'm not sure if the task-agnostic accuracy is correctly calculated.
I see that in network.py line 44
, you are creating a new Linear layer with nn.Linear(self.out_size, num_outputs)
for each task.
This indicates that each task will have a separate Linear
layer with each having equal capacity.
For task-agnostic accuracy, I see that the outputs of all heads are combined, then softmax is taken.
But in a lot of papers, they use only 1 head for all tasks, to indicate task-agnostic accuracy.
I am aware that when training, they only backpropagate the logits of the classes belonging to the current task, and what is done here is equivalent to that in a way.
May I please know why that is not followed here?
Thank you.
Hello,
Is there any argument in the code that let us limit the number of images per class (training images and test images)? For example, when adding a new dataset with .txt files, we need each time to modify these files to vary the number of images.
Thank you for your help !!
When I want to reproduce the results like figure 8 in the paper,
I run the code below:
python3 -u src/main_incremental.py --approach ewc --num-tasks 10 --exemplar-selection herding --num-exemplars 2000
The results is different from the EWC-e in the figure 8? Any tips?
Respected Authors,
I am unable to match the results given in the paper for a-lwf,a-ewc,a-lfl.Please help me as the resulting accuracy that is coming is much less than that is being told in the paper for Tag,and Taw accuracies.Please give me some suggestions or help me.
In the code for LUCIR, we can read the following comment:
LUCIR is expected to be used with exemplars. If needed to be used without exemplars, overwrite here the _get_optimizer function with the one in LwF and update the criterion
How should the criterion function be updated to allow for a memoryless scenario? It is my understanding that the criterion function should already work without any exemplars in accordance to the original paper, but maybe I'm missing a detail.
Respected Sir,
If you could please tell me how to run the framework in imagenet dataset or any other dataset other than Cifar100,on the datasets that are mentioned here in the dataset folder.
I ran the finetuning script for imagenet-subset
python src/main_incremental.py --approach finetuning --nepochs 200 --batch-size 128 --num-workers 4 --datasets imagenet_subset --num-tasks 12 --nc-first-task 25 --lr 0.05 --weight-decay 1e-3 --clipping 1 --network resnet32 --momentum 0.9 --exp-name exp1 --seed 0
But I am getting this syntactical error
Has somebody got such an issue?
How to perform the grid search to find the accurate lambda value in ICARL approach for a custom dataset? Now the grid search is performed only on the learning rate and the lambda value is kept fixed. Please help.
Hello @mmasana ,
I appreciate your amazing and extremely helpful work.
Is it possible to save the trained models after incremental learning? The code is not working when putting save_models as True. Appreciate your help!
!python3 -u src/main_incremental.py --approach bic --num-exemplars 2000 --save-models True
============================================================================================================
Arguments =
approach: bic
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet32
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: True
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
============================================================================================================
Approach arguments =
T: 2
lamb: -1
num_bias_epochs: 200
val_exemplar_percentage: 0.1
============================================================================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 2000
num_exemplars_per_class: 0
============================================================================================================
Traceback (most recent call last):
File "src/main_incremental.py", line 316, in <module>
main()
File "src/main_incremental.py", line 178, in main
assert len(extra_args) == 0, "Unused args: {}".format(' '.join(extra_args))
AssertionError: Unused args: True
Hello, Thank U for your contribution.
I am a beginner in incremental learning and I dont understand the meaning of Task-Aware and Task-Agnostic . For exemple , when i do a work about class incremental learning , just like iCarl and I dont know the task-id in vaild phase. What means average acc over the step? Task-Agnostic acc?
Thanks.
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
[(0, 25), (1, 25), (2, 25), (3, 25)]
ACCURACY COMING IS
Test on task 0 : loss=1.428 | TAw acc= 54.5%, forg= 9.8%| TAg acc= 6.6%, forg= 57.8% <<<
Test on task 1 : loss=3.908 | TAw acc= 59.6%, forg= 8.0%| TAg acc= 14.5%, forg= 47.4% <<<
Test on task 2 : loss=4.482 | TAw acc= 59.2%, forg= 5.6%| TAg acc= 21.5%, forg= 35.2% <<<
Test on task 3 : loss=4.317 | TAw acc= 70.9%, forg= 0.0%| TAg acc= 67.8%, forg= 0.0% <<<
Save at ../results/cifar100_lwf
TAw Acc
64.3% 0.0% 0.0% 0.0% Avg.: 64.3%
61.3% 67.7% 0.0% 0.0% Avg.: 64.5%
57.5% 63.8% 64.8% 0.0% Avg.: 62.1%
54.5% 59.6% 59.2% 70.9% Avg.: 61.1%
TAg Acc
64.3% 0.0% 0.0% 0.0% Avg.: 64.3%
39.6% 61.9% 0.0% 0.0% Avg.: 50.8%
21.9% 39.7% 56.7% 0.0% Avg.: 39.4%
6.6% 14.5% 21.5% 67.8% Avg.: 27.6%
TAw Forg
0.0% 0.0% 0.0% 0.0%
3.0% 0.0% 0.0% 0.0% Avg.: 3.0%
6.8% 3.8% 0.0% 0.0% Avg.: 5.3%
9.8% 8.0% 5.6% 0.0% Avg.: 7.8%
TAg Forg
0.0% 0.0% 0.0% 0.0%
24.7% 0.0% 0.0% 0.0% Avg.: 24.7%
42.4% 22.2% 0.0% 0.0% Avg.: 32.3%
57.8% 47.4% 35.2% 0.0% Avg.: 46.8%
PLEASE ALSO TELL ME WHAT IS TAWFORG AND TAGFORG
Hi!
Nice work on continual learning.
Recently I reproduced iCaRL, but I cannot achieve the original results. I tried many methods, adjust learning rate, more training epochs, different weight decay.....
And I found your results are quite similar to mine.
Could u pls give some insight about the original results.
In addition, in BiC, their results are about 50% (the 10th task top-1 test acc), this is very surprising, am I missing something?
Hope to get ur replay. This has bothered me for a long time.
From iCaRL https://arxiv.org/pdf/1611.07725.pdf
From yours https://arxiv.org/pdf/2010.15277.pdf
From BiC https://arxiv.org/pdf/1905.13260.pdf
I didn't change any code. So how to use approach BiC?
(FACIL) yupeng@ubuntu:~/FACIL$ python3 -u src/main_incremental.py --approach bic
============================================================================================================
Arguments =
approach: bic
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet32
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
============================================================================================================
Approach arguments =
T: 2
lamb: -1
num_bias_epochs: 200
val_exemplar_percentage: 0.1
============================================================================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
============================================================================================================
WARNING: ../results/cifar100_bic already exists!
Files already downloaded and verified
Files already downloaded and verified
Traceback (most recent call last):
File "/home/yupeng/FACIL/src/main_incremental.py", line 316, in <module>
main()
File "/home/yupeng/FACIL/src/main_incremental.py", line 211, in main
appr = Appr(net, device, **appr_kwargs)
File "/home/yupeng/FACIL/src/approach/bic.py", line 43, in __init__
assert (have_exemplars > 0), 'Error: BiC needs exemplars.'
AssertionError: Error: BiC needs exemplars.
(FACIL) yupeng@ubuntu:~/FACIL$
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.