mmasana / facil Goto Github PK

Framework for Analysis of Class-Incremental Learning with 12 state-of-the-art methods and 3 baselines.

Home Page: https://arxiv.org/pdf/2010.15277.pdf

License: MIT License

Shell 0.88% Python 99.12%

continual-learning deep-learning framework incremental-learning lifelong-learning machine-learning reproducible-research survey

facil's People

Contributors

Stargazers

Watchers

Forkers

liuguoyou rmoress msrocean xialeiliu rachmadvwp oyt9306 gaimjkp draymondliu yujun-shi dlwbm123 mikhilg10 zhen-zohn-wang minhzou hwangjohn abelard223 udemirezen lebrice amajee11us abcxq ashok-arjun happymlearner nuaa-xsf srvcodes reoaah francesco-p haoweiz23 suhongmoon yi94code pauljanson002 ifoyooo semihenser xaviercucurull dberga toil2sweet swagshaw hch-xmu ashishasokan mjlsuccess lotayou soonwonh guritian zhangweifeng1218 ivan-yinty 11happy javiervicho vensss modaccount jumptiger66 lantao1234 hyzcn rovlet pluszzh wangqing-hub krunalr786 jacopobartoli qbenliu bnbschoolaccount btwardow dsoselia dipamgoswami ictthomas migrave z1358 shravankumar147 wu-zz adi15kr gargmanishamegastunt ccioflan 2314254971 wangliang233 wisdeth14 sovenkd zzsyjl wlapollo mvandermeulen athzpadilla davidfm43 skiadasgiorgos hapikale yaqinghui sobol98 baiyuuuuu d1rake eliasladis togotarogit thomas10005 strcpy-s madghostek kandeldeepak46 qvelard nachokelkar rah-man thisguyisnotajumpingbear kjavvor kubiakm venonary junda-wong hrrbay moringlotus

facil's Issues

The meaning of model_old is confusing. In task 1, the number of heads of model_old is sometimes 1 and sometimes 2 during def eval.

Dear author,

Thanks for your work.
Question 1:
I am quite confused about your definition of the model_old. I thought model_old should be the model from the last tasks, therefore, the number of heads of the model_old in tasks 1 should be always 1. However, "the number of heads of the model_old in tasks 1" is sometimes 1 (expected value) and sometimes 2 (unexpected value) during def eval.

I think it's because of your arrangement of the def train which includes train_loop and post_train_process,
When you call .train and .eval sequentially, the post_train_process will make the number of heads of the model_old in tasks 1 be 2 in the .eval function.

For example. In def search_tradeoff (gridsearch.py), you call .train and .eval.
and in incremental_learning.py, you call .train and .eval.

By setting num_epochs=1, the following results can make my confusion clear.

Can I confirm my understanding from you:

when def eval is called after post_train_process. model_old = model
when def eval is called before post_train_process. model_old is from last task, and model is for current task,

Can you help me to understand this point? What's the specific explanation of your model_old in different cases?

Question 2:
The following values are from Task-Aware incremental performance. Surprisingly, as the number of tasks increases, the performance on the first task increases instead (as shown by the bolded numbers), can you kindly explain the reason?

Task Incremental Acc
81.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 81.5%
81.3% 48.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 64.9%
81.6% 52.9% 75.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 69.9%
83.0% 54.6% 76.5% 68.4% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 70.6%
83.5% 55.9% 77.9% 68.3% 72.4% 0.0% 0.0% 0.0% 0.0% 0.0% Avg.: 71.6%
84.2% 58.3% 78.4% 70.2% 73.7% 66.2% 0.0% 0.0% 0.0% 0.0% Avg.: 71.8%
84.0% 59.9% 79.3% 70.4% 77.1% 66.4% 71.5% 0.0% 0.0% 0.0% Avg.: 72.7%
83.8% 60.1% 79.8% 70.6% 77.1% 66.3% 69.5% 69.7% 0.0% 0.0% Avg.: 72.1%
83.9% 60.0% 80.1% 70.8% 78.1% 67.1% 70.0% 69.0% 66.0% 0.0% Avg.: 71.7%
83.9% 59.6% 80.3% 71.2% 78.1% 68.3% 71.1% 70.5% 64.6% 64.2% Avg.: 71.2%

Thanks.
Mengya Xu

Task 1

inside def train_epoch
num_model_old_heads 1
num_model_heads 2

| Epoch 1, time= 2.8s | Train: skip eval |

inside def eval
num_model_old_heads 1
num_model_heads 2
(this def eval is called after def train_epoch, specifically, this def eval is called before post_train_process.)

Valid: time= 0.4s loss=22.879, TAw acc= 19.2% | *
| Selected 2000 train exemplars, time= 11.8s

inside def eval
num_model_old_heads 2
num_model_heads 2
(this def eval is called after def train, specifically, this def eval is called after post_train_process.)

Current acc: 0.214 for lamb=4

inside def train_epoch
num_model_old_heads 1
num_model_heads 2

| Epoch 1, time= 2.9s | Train: skip eval |

inside def eval
num_model_old_heads 1
num_model_heads 2

Valid: time= 0.4s loss=12.227, TAw acc= 22.4% | *
| Selected 2000 train exemplars, time= 11.7s

inside def eval
num_model_old_heads 2
num_model_heads 2

Current acc: 0.236 for lamb=2.0

inside def train_epoch
num_model_old_heads 1
num_model_heads 2

| Epoch 1, time= 2.8s | Train: skip eval |

inside def eval
num_model_old_heads 1
num_model_heads 2

Valid: time= 0.4s loss=7.281, TAw acc= 23.8% | *
| Selected 2000 train exemplars, time= 11.7s

inside def eval
num_model_old_heads 2
num_model_heads 2
Current acc: 0.274 for lamb=1.0

Train
inside def train_epoch
num_model_old_heads 1
num_model_heads 2

| Epoch 1, time= 2.9s | Train: skip eval |
inside def eval
num_model_old_heads 1
num_model_heads 2

Valid: time= 0.4s loss=7.291, TAw acc= 25.0% | *
| Selected 2000 train exemplars, time= 11.7s

Test on task 0 : loss=3.747 | TAw acc= 31.0%, forg= -5.6%| TAg acc= 22.5%, forg= 2.9% <<<
inside def eval
num_model_old_heads 2
num_model_heads 2

Test on task 1 : loss=7.030 | TAw acc= 25.8%, forg= 0.0%| TAg acc= 14.1%, forg= 0.0% <<<
Save at results_test/cifar100_icarl_icarl_fixd_5

Task 2

Conversion of saved trained models (at checkpoints)

Hello,

As described in the documentation, --save-models allows to save the trained models after each incremental step (task). Models are saved in a model.ckpt format without any other additional file (.meta file, .index file).
How can I convert a saved model directly from .ckpt to .pb ?

Appreciate your help! Thank you in advance!

Varying the number of classes among the tasks

Hello !

In Class-incremental learning: survey and performance evaluation on image classification, large domain shift is studied. In this section, the number of classes varies among the tasks (i.e., starting with 102 classes then adding 67 classes then 200, etc.).

Is it possible to specify the number of classes for each task? Because in the code, I only found the possibility of specifying the --nc-first-task (number of classes of the first task) and --num-tasks (total number of tasks).

Thank you in advance !!

How to understand the results?

I have added 2 classes at each task for total of 5 tasks i.e. 5*2=10 total classes. Below are the results.

I have some confusion in understanding the results:

What is the difference between task-agnostic and task-aware accuracies?
The last row, of all the results in all 4 tables, presents the result of average accuracy after full training. So, this should be plotted for comparison with others e.g. image below is from LwM? For example, in case of 10 tasks for 100 classes, last row of the results from TAw Acc table should be plotted.

Imagenet100

Dear author,

Thanks for your great work. It really helps me a lot!

I wonder if you tried to use imagenet_100 (imagenet_subset) as the dataset? I tried to use the imagenet_subset dataset but failed. It really helps me if you can provide the code about using the imagenet_subset dataset!

Upperbound results

Hello!

I have a confusion about how to view Upperbound(Joint) results. First of all, I am running the code by setting the approach to "joint" to see the Upperbound(Joint) result in CIFAR-100 in the script.

There are four types of results for each seed: avg_accs_tag, avg_accs_taw, acc_tag, and acc_taw. In your paper results (Fig 8), it is confusing how to view the Upperbound(Joint) result using the above four results.

Thank you.

could I omit the validation dataset?

Many thanks for your helpful project.

I want to know if there is possible to omit the validation dataset since it will reduce the number of training datasets as shown below,

FACIL/src/datasets/memory_dataset.py

Lines 100 to 110 in f653d6c

 if validation > 0.0: 

 for tt in data.keys(): 

 for cc in range(data[tt]['ncla']): 

 cls_idx = list(np.where(np.asarray(data[tt]['trn']['y']) == cc)[0]) 

 rnd_img = random.sample(cls_idx, int(np.round(len(cls_idx) * validation))) 

 rnd_img.sort(reverse=True) 

 for ii in range(len(rnd_img)): 

 data[tt]['val']['x'].append(data[tt]['trn']['x'][rnd_img[ii]]) 

 data[tt]['val']['y'].append(data[tt]['trn']['y'][rnd_img[ii]]) 

 data[tt]['trn']['x'].pop(rnd_img[ii]) 

 data[tt]['trn']['y'].pop(rnd_img[ii])

I have tried to change the default value of the controlled parameter detailed below,

FACIL/src/datasets/data_loader.py

Line 14 in f653d6c

 def get_loaders(datasets, num_tasks, nc_first_task, batch_size, num_workers, pin_memory, validation=.1): 

but there raised ZeroDivisionError.

Trying to reproduce results in the paper

Hello !

In the paper Class-incremental learning: survey and performance evaluation on image classification, there are several extensive experiments shown in the figures.

While I was trying to use the code and run the experiments, the results are a bit different from the paper's results.

Is it possible to specify the config setting to run these experiments? such as random seeds, learning rates (starting, min and decreasing factors) or weight decays.

For the paper results, I'm mostly regarding to Fig. 7 and Fig. 8 in the paper. The experiments I ran using the code couldn't reach ~80% accuracy on cifar-100 after the first task (10 class), so I'm wondering if I'm missing something on this.

Thank you in advance !!

LwF CIFAR-100 (10/10) No exemplars accuracy

Hello, I noticed that the accuracy of the LwF method in CIFAR-100 (10/10) drops to 16% after 10 tasks when the No exemplars approach is used, which is different fromthe 30.2% accuracy reported in your paper. I waswondering if you could help me understand this discrepancy? Thank you.

TaskID

Hi, this is not really an issue.
I'm currently developing a classification model which is trained by event-driven samples. The number of labels/classes grows as new events are produced and consumed by the model. I partially read :) your article but couldn't not find a reference to what is a task ID. (Is this a batch number for the training set?) Overall I would like to congratulate you for this work since it seems to extensively deal with a real issue.

URL for `--approach` in your src/README.md

It's not an issue regarding your codes,
but I just wanted to let you know that your link for more information about --approach seems to be something wrong. (in src/README.md)

This leads me to [this page](https://github.com/mmasana/FACIL/blob/master/src/approaches/README.md) , but you must have intended to bring [this page](https://github.com/mmasana/FACIL/blob/master/src/approach/README.md)

(approaches -> approach)

By the way, thank you very much for your work!! :))

How to integrate ViT in Networks?

I want to use Huggingface's ViTForImageClassification. How do I integrate it in FACIL? I want to load the pretrained model
'google/vit-base-patch16-224'. I have read the instructions to add networks in readme of networks. However I am still not sure how to implement it. How do I set "self.head_var = 'fc'" when head is changed by "model.classifier = nn.Linear(768, num_classes)"? How exactly will a class even be created in this case?

The question of whether attention distillation loss in LwM can produce gradient.

Hello！
Thank you for your nice work.
I have a question:
LwM (learning without Memorizing) paper uses attention distillation loss. In your code (lwm.py):

# in class GradCAM
def __call__(self, input, class_indices=None, return_outputs=False):
        # pass input & backpropagate for selected class
        if input.dim() == 3:
            input = input.view([1] + list(input.size()))
        self.model.eval()
        model_output = self.model(input)
        logits = torch.cat(model_output, dim=1)
        if class_indices is None:
            class_indices = logits.argmax(dim=1)
        score = logits[:, class_indices].squeeze()
        self.model.zero_grad()
        score.mean().backward(retain_graph=self.retain_graph)
        model_output = [o.detach() for o in model_output]

        # create map based on gradients and activations
        with torch.no_grad():
            weights = F.adaptive_avg_pool2d(self.gradients, 1)
            att_map = (weights * self.activations).sum(dim=1, keepdim=True)
            att_map = F.relu(att_map)
            del self.activations
            del self.gradients
            return (att_map, model_output) if return_outputs else att_map

I feel that using such a code does not seem to produce gradients when backpropagating.
Looking forward to your reply.
Thank you.

Some question about the approach 'BiC'

Hello! Thank you for your amazing work! But it seems that in BiC, in line 102, when creating the dataset val_old, the code doesn't consider the situation that "self.exemplars_dataset.max_num_exemplars_per_class != 0".

EEIL approach distillation loss

In the EEIL paper, distillation loss is applied to classification layers corresponding to previous classes, and the balanced finetuning stage adds temporary distillation loss for classification layers for new classes.
But my question is, in the balanced finetuning stage, I can't see what the temporary distillation loss for classification layer for the new class is. Also, looking at the loss function part of the EEIL code, it appears that during the balanced fine-tuning stage, the fc layer is computed to correspond to task t-1 for the distillation. On the other hand, the distillation is applied to the fc layer up to task-2 before entering the balanced fine-tuning stage. If we calculate like this, doesn't distillation in the unbalanced training stage apply to the classifier corresponding to task t-1?

I'm asking because I get confused even if I look at EEIL paper and code several times.
Thank you.

The extensibility of the FACIL framework.

Is it suitable for the more strict tasks, like the Few-shot Class Incremental Learning (FSCIL) scenario?

gridsearch-tasks == -1 in args does not perform grid search

Hello!

Thank you very much for this library - it's great!

I have a question:

The default value of --gridsearch-tasks in main_incremental.py is -1. And it has been given that setting it to -1 will perform grid search for all tasks.

But this does not reflect in the code, and grid search is not done when this is executed.

The only lines where gridsearch_tasks is referenced are line 167, line 214 and line 241 and all of these work only when the value is positive.

Can you please check if the code is right?

Thank you.

accuracy

Does the accuracy of each stage only use the category validation set of the current task? Should the accuracy of all stages of training be taken as the final result?
Thank you very much for your reply

There is a bug when running the code

Traceback (most recent call last):
File "src/main_incremental.py", line 13, in
from datasets.data_loader import get_loaders
ModuleNotFoundError: No module named 'datasets.data_loader'

I have an error in the file exemplars_selection it says no samples to choose from for class 240, how to fix this?

How to adapt EWC and LWF for dynamic number of classes?

Hello,
does FACIL support multihead classification for EWC and LWF ?
To elaborate, I am trying to setup each dataset (with different umber of classes) as a task and perform continual learning. Do you have any suggestions on this?

Thanks

Getting "RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2"

While executing the python3 -u src/main_incremental.py script, the code is giving the below error:-

LwM - no gradient in attention distillation loss

Hi, when experimenting with LwM in FACIL I noticed that the method behaves the same regardless of the choice of gamma parameter that controls attention distillation loss. Upon closer investigation, I noticed that during training attention maps returned by GradCAM have no grad, as you can check yourself with the debugger in this line:

FACIL/src/approach/lwm.py

Line 126 in e9d816c

attmap = gradcam(images) # this use eval() pass

When we later use attention maps to compute attention distillation loss this loss has no gradient and it's contribution to the gradient update is ignored. Therefore, LwM in FACIL basically does LwF with extra unused computation.

I think the issue is in class GradCAM in line 226, where the activations are detached, and later in line 255 which disables gradients when computing attention maps. I think this class should have the option to preserve gradients when computing attention maps and trigger this option for a forward pass of the current net. Then the attention maps for current net will have requires_grad=Trueand consequently attention loss of will contribute to weight updates.

Adding a new dataset

Hi !

As mentioned in README.md to add a custom dataset:

Create a new entry in dataset_config.py, add the folder with the data in path and any other transformations or class ordering needed.
The option from line 135 in data_loader.py should be enough.
In the same folder as the data add a train.txt and a test.txt files.

If I follow the steps mentioned above to add newdataset, I will get:

Till now, is this correct?

In the newdataset folder added in data folder shown in FACIL capture, should I insert all the train and test images as .jpg files? In addition to train.txt and test.txt ?

Sorry for this long question but I'm having a problem when adding a custom dataset. Appreciate your help !! Thank you :)

Some questions about the reproduce results of LUCIR

Hello!
Thank you for your great work, from which I've learnd a lot. Recently I want to reproduce LUCIR but find that the corresbonding results are quite different from those reported by Hou in his paper (Hou reported an avg acc of 60.18 and yours is 43.4 under the config of 50/11). Which should I follow? The related results are showed as following.
your work----------------------------------

Hou's work (LUCIR)-------------------------

And there are also some work reported the avg acc of lucir under the same config.
PODNet. ECCV2020-------------------------------

Looking forward for your reply!
best wishes!

Error coming while running on Imagenet Dataset

# dataset args
parser.add_argument('--datasets', default=['imagenet_256'], type=str, choices=list(dataset_config.keys()),
                    help='Dataset or datasets used (default=%(default)s)', nargs='+', metavar="DATASET")

FileNotFoundError: ../data/ILSVRC12_256/train.txt not found.

Error while trying to train on VGGFace2

Hello!
I am trying to use LWF on VGGFace2, but I am always getting this error.

"line 110, in get_data
assert data[tt]['ncla'] == cpertask[tt], "something went wrong splitting classes" "

Any ideas of how to solve this?
Thanks in advance.

Is task-agnostic accuracy correct?

Hi,

I'm not sure if the task-agnostic accuracy is correctly calculated.

I see that in network.py line 44, you are creating a new Linear layer with nn.Linear(self.out_size, num_outputs) for each task.

This indicates that each task will have a separate Linear layer with each having equal capacity.

For task-agnostic accuracy, I see that the outputs of all heads are combined, then softmax is taken.

But in a lot of papers, they use only 1 head for all tasks, to indicate task-agnostic accuracy.

I am aware that when training, they only backpropagate the logits of the classes belonging to the current task, and what is done here is equivalent to that in a way.

May I please know why that is not followed here?

Thank you.

Limit the number of images per class

Hello,

Is there any argument in the code that let us limit the number of images per class (training images and test images)? For example, when adding a new dataset with .txt files, we need each time to modify these files to vary the number of images.

Thank you for your help !!

can not reproduce the similar result as paper

When I want to reproduce the results like figure 8 in the paper,

I run the code below:
python3 -u src/main_incremental.py --approach ewc --num-tasks 10 --exemplar-selection herding --num-exemplars 2000
The results is different from the EWC-e in the figure 8? Any tips?

Unable to match the accuracy results present in the ANCL paper using FACIL framework

Respected Authors,
I am unable to match the results given in the paper for a-lwf,a-ewc,a-lfl.Please help me as the resulting accuracy that is coming is much less than that is being told in the paper for Tag,and Taw accuracies.Please give me some suggestions or help me.

LUCIR in memoryless mode

In the code for LUCIR, we can read the following comment:

LUCIR is expected to be used with exemplars. If needed to be used without exemplars, overwrite here the _get_optimizer function with the one in LwF and update the criterion

How should the criterion function be updated to allow for a memoryless scenario? It is my understanding that the criterion function should already work without any exemplars in accordance to the original paper, but maybe I'm missing a detail.

The meaning of TAw & TAg

Thank you very much for providing such an amazing continual learning framework!

I am a little confused about the evaluation metric: TAw acc & TAg acc, could you please explain their meaning respectively? Thank you very much in advance!!

HOW TO RUN THE FRAMEWORK ON IMAGENET DATASET?

Respected Sir,
If you could please tell me how to run the framework in imagenet dataset or any other dataset other than Cifar100,on the datasets that are mentioned here in the dataset folder.

Issue with ImageNet-Subset

I ran the finetuning script for imagenet-subset
python src/main_incremental.py --approach finetuning --nepochs 200 --batch-size 128 --num-workers 4 --datasets imagenet_subset --num-tasks 12 --nc-first-task 25 --lr 0.05 --weight-decay 1e-3 --clipping 1 --network resnet32 --momentum 0.9 --exp-name exp1 --seed 0

But I am getting this syntactical error

Has somebody got such an issue?

GridSearch to find the accurate lambda value

How to perform the grid search to find the accurate lambda value in ICARL approach for a custom dataset? Now the grid search is performed only on the learning rate and the lambda value is kept fixed. Please help.

Unable to save models (--save-models: save trained models)

Hello @mmasana ,

I appreciate your amazing and extremely helpful work.

Is it possible to save the trained models after incremental learning? The code is not working when putting save_models as True. Appreciate your help!

!python3 -u src/main_incremental.py --approach bic --num-exemplars 2000 --save-models True

============================================================================================================
Arguments =
	approach: bic
	batch_size: 64
	clipping: 10000
	datasets: ['cifar100']
	eval_on_train: False
	exp_name: None
	fix_bn: False
	gpu: 0
	gridsearch_tasks: -1
	keep_existing_head: False
	last_layer_analysis: False
	log: ['disk']
	lr: 0.1
	lr_factor: 3
	lr_min: 0.0001
	lr_patience: 5
	momentum: 0.0
	multi_softmax: False
	nc_first_task: None
	nepochs: 200
	network: resnet32
	no_cudnn_deterministic: False
	num_tasks: 4
	num_workers: 4
	pin_memory: False
	pretrained: False
	results_path: ../results
	save_models: True
	seed: 0
	stop_at_task: 0
	use_valid_only: False
	warmup_lr_factor: 1.0
	warmup_nepochs: 0
	weight_decay: 0.0
============================================================================================================
Approach arguments =
	T: 2
	lamb: -1
	num_bias_epochs: 200
	val_exemplar_percentage: 0.1
============================================================================================================
Exemplars dataset arguments =
	exemplar_selection: random
	num_exemplars: 2000
	num_exemplars_per_class: 0
============================================================================================================
Traceback (most recent call last):
  File "src/main_incremental.py", line 316, in <module>
    main()
  File "src/main_incremental.py", line 178, in main
    assert len(extra_args) == 0, "Unused args: {}".format(' '.join(extra_args))
AssertionError: Unused args: True

Some question About Task-Aware and Task-Agnostic metrics

Hello, Thank U for your contribution.
I am a beginner in incremental learning and I dont understand the meaning of Task-Aware and Task-Agnostic . For exemple , when i do a work about class incremental learning , just like iCarl and I dont know the task-id in vaild phase. What means average acc over the step? Task-Agnostic acc?
Thanks.

UNABLE TO MATCH THE ACCURACY RESULTS FOR CIFAR100 DATASET FOR LWF APPROACH

============================================================================================================
Arguments =
approach: lwf
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet32
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0

Approach arguments =
T: 2
lamb: 1

Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0

[(0, 25), (1, 25), (2, 25), (3, 25)]
ACCURACY COMING IS

Test on task 0 : loss=1.428 | TAw acc= 54.5%, forg= 9.8%| TAg acc= 6.6%, forg= 57.8% <<<
Test on task 1 : loss=3.908 | TAw acc= 59.6%, forg= 8.0%| TAg acc= 14.5%, forg= 47.4% <<<
Test on task 2 : loss=4.482 | TAw acc= 59.2%, forg= 5.6%| TAg acc= 21.5%, forg= 35.2% <<<
Test on task 3 : loss=4.317 | TAw acc= 70.9%, forg= 0.0%| TAg acc= 67.8%, forg= 0.0% <<<
Save at ../results/cifar100_lwf

TAw Acc
64.3% 0.0% 0.0% 0.0% Avg.: 64.3%
61.3% 67.7% 0.0% 0.0% Avg.: 64.5%
57.5% 63.8% 64.8% 0.0% Avg.: 62.1%
54.5% 59.6% 59.2% 70.9% Avg.: 61.1%

TAg Acc
64.3% 0.0% 0.0% 0.0% Avg.: 64.3%
39.6% 61.9% 0.0% 0.0% Avg.: 50.8%
21.9% 39.7% 56.7% 0.0% Avg.: 39.4%
6.6% 14.5% 21.5% 67.8% Avg.: 27.6%

TAw Forg
0.0% 0.0% 0.0% 0.0%
3.0% 0.0% 0.0% 0.0% Avg.: 3.0%
6.8% 3.8% 0.0% 0.0% Avg.: 5.3%
9.8% 8.0% 5.6% 0.0% Avg.: 7.8%

TAg Forg
0.0% 0.0% 0.0% 0.0%
24.7% 0.0% 0.0% 0.0% Avg.: 24.7%
42.4% 22.2% 0.0% 0.0% Avg.: 32.3%
57.8% 47.4% 35.2% 0.0% Avg.: 46.8%

PLEASE ALSO TELL ME WHAT IS TAWFORG AND TAGFORG

Results of iCaRL

Hi!

Nice work on continual learning.

Recently I reproduced iCaRL, but I cannot achieve the original results. I tried many methods, adjust learning rate, more training epochs, different weight decay.....

And I found your results are quite similar to mine.

Could u pls give some insight about the original results.

In addition, in BiC, their results are about 50% (the 10th task top-1 test acc), this is very surprising, am I missing something?

Hope to get ur replay. This has bothered me for a long time.

From iCaRL https://arxiv.org/pdf/1611.07725.pdf

From yours https://arxiv.org/pdf/2010.15277.pdf

From BiC https://arxiv.org/pdf/1905.13260.pdf

AssertionError: Error: BiC needs exemplars.

I didn't change any code. So how to use approach BiC?

(FACIL) yupeng@ubuntu:~/FACIL$ python3 -u src/main_incremental.py --approach bic
============================================================================================================
Arguments =
        approach: bic
        batch_size: 64
        clipping: 10000
        datasets: ['cifar100']
        eval_on_train: False
        exp_name: None
        fix_bn: False
        gpu: 0
        gridsearch_tasks: -1
        keep_existing_head: False
        last_layer_analysis: False
        log: ['disk']
        lr: 0.1
        lr_factor: 3
        lr_min: 0.0001
        lr_patience: 5
        momentum: 0.0
        multi_softmax: False
        nc_first_task: None
        nepochs: 200
        network: resnet32
        no_cudnn_deterministic: False
        num_tasks: 4
        num_workers: 4
        pin_memory: False
        pretrained: False
        results_path: ../results
        save_models: False
        seed: 0
        stop_at_task: 0
        use_valid_only: False
        warmup_lr_factor: 1.0
        warmup_nepochs: 0
        weight_decay: 0.0
============================================================================================================
Approach arguments =
        T: 2
        lamb: -1
        num_bias_epochs: 200
        val_exemplar_percentage: 0.1
============================================================================================================
Exemplars dataset arguments =
        exemplar_selection: random
        num_exemplars: 0
        num_exemplars_per_class: 0
============================================================================================================
WARNING: ../results/cifar100_bic already exists!
Files already downloaded and verified
Files already downloaded and verified
Traceback (most recent call last):
  File "/home/yupeng/FACIL/src/main_incremental.py", line 316, in <module>
    main()
  File "/home/yupeng/FACIL/src/main_incremental.py", line 211, in main
    appr = Appr(net, device, **appr_kwargs)
  File "/home/yupeng/FACIL/src/approach/bic.py", line 43, in __init__
    assert (have_exemplars > 0), 'Error: BiC needs exemplars.'
AssertionError: Error: BiC needs exemplars.
(FACIL) yupeng@ubuntu:~/FACIL$

	if validation > 0.0:
	for tt in data.keys():
	for cc in range(data[tt]['ncla']):
	cls_idx = list(np.where(np.asarray(data[tt]['trn']['y']) == cc)[0])
	rnd_img = random.sample(cls_idx, int(np.round(len(cls_idx) * validation)))
	rnd_img.sort(reverse=True)
	for ii in range(len(rnd_img)):
	data[tt]['val']['x'].append(data[tt]['trn']['x'][rnd_img[ii]])
	data[tt]['val']['y'].append(data[tt]['trn']['y'][rnd_img[ii]])
	data[tt]['trn']['x'].pop(rnd_img[ii])
	data[tt]['trn']['y'].pop(rnd_img[ii])

mmasana / facil Goto Github PK

facil's People

Contributors

Stargazers

Watchers

Forkers

facil's Issues

inside def eval num_model_old_heads 2 num_model_heads 2 Current acc: 0.274 for lamb=1.0

Valid: time= 0.4s loss=7.291, TAw acc= 25.0% | * | Selected 2000 train exemplars, time= 11.7s

Approach arguments = T: 2 lamb: 1

Recommend Projects

Recommend Topics

Recommend Org

inside def eval
num_model_old_heads 2
num_model_heads 2
Current acc: 0.274 for lamb=1.0

Valid: time= 0.4s loss=7.291, TAw acc= 25.0% | *
| Selected 2000 train exemplars, time= 11.7s

Approach arguments =
T: 2
lamb: 1