Giter Club home page Giter Club logo

ivclab / cpg Goto Github PK

View Code? Open in Web Editor NEW
117.0 9.0 22.0 19.8 MB

Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen, "Compacting, Picking and Growing for Unforgetting Continual Learning," Thirty-third Conference on Neural Information Processing Systems, NeurIPS 2019

License: BSD 3-Clause "New" or "Revised" License

Python 77.48% Shell 22.52%
deep-neural-networks continual-learning pytorch lifelong-learning classification-model multi-task-learning cpg face-recognition emotion-recognition age-gender-cnn

cpg's Introduction

Compacting, Picking and Growing (CPG)

This is an official Pytorch implementation of CPG - a lifelong learning algorithm for object classification. For details about CPG please refer to the paper Compacting, Picking and Growing for Unforgetting Continual Learning (Slides,Poster)

The code is released for academic research use only. For commercial use, please contact Prof. Chu-Song Chen([email protected]).

Benchmarks

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Citing Paper

Please cite following paper if these codes help your research:

@inproceedings{hung2019compacting,
title={Compacting, Picking and Growing for Unforgetting Continual Learning},
author={Hung, Ching-Yi and Tu, Cheng-Hao and Wu, Cheng-En and Chen, Chien-Hung and Chan, Yi-Ming and Chen, Chu-Song},
booktitle={Advances in Neural Information Processing Systems},
pages={13647--13657},
year={2019}
}

Dependencies

Python>=3.6
PyTorch>=1.0
tqdm

Experiment1 (Compact 20 tasks into VGG16 network)

Step 1. Download CIFAR100 and form the 20 tasks based on their super classes with the cifar2png tool. Or you can just download the converted version of our CIFAR100 from here. Unzip the compressed file and place cifar100_org/ in data/.

Step 2. Use the following command to train individual models for each of the 20 tasks so that we can obtain their accuracy goals.

$ bash experiement1/baseline_cifar100.sh 

If you would like to use higher accuracy goals, execute experiment1/finetune_cifar100.sh instead. The script randomly selects a model trained on previous tasks and finetunes it to the current one. After this step, we obtain logs/baseline_cifar100_acc.txt that contains accuracy goals for 20 tasks.

Step 3. Run CPG to learn 20 tasks sequentially.

$ bash experiment1/CPG_cifar100_scratch_mul_1.5.sh

If you use another accuracy goals, please modify the baseline_cifar100_acc variable in experiment1/CPG_cifar100_scratch_mul_1.5.sh to the path containing your accuracy goals.

Step 4. Inference the learned 20 tasks.

$ bash experiment1/inference_CPG_cifar100.sh

CPG-VGG16 Checkpoints on CIFAR-100 Twenty Tasks.

Extract the downloaded .zip file and place all_max_mul_1.5/ in checkpoints/CPG/experiment1/. Modify the SETTING variable in experiment1/inference_CPG_cifar100.sh to all_max_mul_1.5 before inference.

Task 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Acc. 66.6 77.2 78.6 83.2 88.2 85.8 82.4 85.4 87.6 90.8 91.0 84.6 89.2 83.0 56.2 75.4 71.0 73.8 90.6 93.6

Experiment2 (Compact 6 fine-grained image tasks into ResNet50 network)

Step 1. We provide the datasets of 5 tasks, including cubs_cropped, stanford_cars_cropped, flowers, wikiart and sketches (without imagenet), and they can be downloaded here. After downloading, place the extracted directories in data/. If you would like to construct datasets yourself, please refer to piggyback.

Step 2. Similar to Experiment1, we need to construct the accuracy goals for the 6 tasks. With the following command, we finetune the model pretrained on ImageNet to the other 5 tasks and produce accuracy goals stored in logs/baseline_imagenet_acc_resnet50.txt.

$ bash experiment2/baseline_imagenet.sh

Or we can simply use the results of individual ResNet50 networks reported in the piggyback paper as follows.

{"imagenet": "0.7616", "cubs_cropped": "0.8283", "stanford_cars_cropped": "0.9183", "flowers": "0.9656", "wikiart": "0.7560", "sketches": "0.8078"}

Step 3. Run CPG and choose the desired pruning ratio for each of the 6 tasks.

$ bash experiment2/CPG_imagenet.sh 

We use the above command to run CPG for the task specified by the TARGET_TASK_ID varaiable in experiment2/CPG_imagenet.sh.

For example, start from imagenet (specified by TARGET_TASK_ID=1), we run the above command and then select the gradually pruned model to proceed to learn the next task.

More specifically, we check the record.txt in the checkpoint path, like checkpoints/CPG/experiment2/resnet50/imagenet/gradual_prune/record.txt, and find that there are 0.1, 0.2, 0.3, ... , 0.95 pruning ratios with their corresponding accuracies. We select the appropriate pruning ratio whose accuracy is higher than (or at least close to) the imagenet's accuracy goal. Supposed that 0.4 is the best pruning ratio, we copy the corresponding checkpoint to gradual_prune/ as below.

In checkpoint/CPG/resnet50/imagenet/gradual_prune/

$ cp 0.4/checkpoint-46.pth.tar ./checkpoint-46.pth.tar

At last, we modify TARGET_TASK_ID to 2 and execute experiment2/CPG_imagenet.sh again so that CPG proceeds to learn the next task.

We repeat Step 3. to sequentially learn the 6 tasks.

Step 4. Inference the learned 6 tasks.

$ bash experiment2/inference_CPG_imagenet.sh

CPG-ResNet50 Checkpoints on Fine-grained Dataset.

Extract the downloaded .zip file and place resnet50/ in checkpoints/CPG/experiment2/.

Task ImageNet CUBS Stanford Cars Flowers Wikiart Sketch
Acc. 75.81 83.57 92.81 96.55 76.98 80.33

Experiment3 (Compact 4 facial-informatic tasks into CNN20 network)

Step 1. We provide the datasets of 3 tasks, including emotion, gender and age (without face_verification). For the age task, we adopt the 5-fold scenario and thus have age0, age1, ... , age4 which correspond to the five splits. All face images are aligned using MTCNN with output size of 112 x 112. The converted datasets can be downloaded here.

Step 2. Similarly, we need accuracy goals of the 4 tasks for CPG. We train CNN20 on VGGFace2 for the face verification task and finetune it to emotion, gender and age tasks. This link provides our face verification CNN20 pretrained weights (named face_weight.pth), and the following command finetunes the model to other 3 tasks. To evaluate the face verification task, we also need lfw_pairs.txt which can be downloaded here. Download face_weight.pth and lfw_pairs.txt use the links and place them in face_data/.

$ bash experiment3/baseline_face.sh 

The finetuning results are used as accuracy goals and stored in logs/baseline_face_acc.txt. You can also simply use the results as follows which corresponds to the finetuning results reported in our paper.

{"face_verification": "0.9942", "gender": "0.9080", "emotion": "0.6254", "chalearn_gender": "0.9128", "age0": "0.6531", "age1": "0.5381", "age2": "0.5847", "age3": "0.5151", "age4": "0.5727"}

Step 3. Similar to Experiment2, we add tasks sequentially by iteratively running the following command and copy the pruned models with appropriate pruning ratios.

$ bash experiment3/FvGeEm_CPG_face.sh 

Note that this script is only for learning the first 3 tasks, face verification, gender and emotion, by modifiying the TARGET_TASK_ID variable in it. Because we have 5 folds for the age task, use experiment3/FvGeEmAg0_CPG_face.sh for age0, experiment3/FvGeEmAg1_CPG_face.sh for age1, and so on.

We repeat Step 3. until all 4 tasks, including 5 folds of the age task, are sequentially learned.

Step 4. Inference the learned 4 tasks.

$ bash experiment3/inference_FvGeEmAg.sh ${GPU_ID} ${TARGET_SPARSITY} ${AGEFOLD} ${LOG_PATH}

The inference script takes 4 arguments listed as follows:

  • GPU_ID: Index of GPU used to run the inference
  • TARGET_SPARSITY: The target pruning ratio of the age task model to inference (like 0,1, 0.2, ..., 0.9, 0.95)
  • AGEFOLD: The target fold of the 5 age folds to inference (like age0, age1, ..., age4)
  • LOG_PATH: The target path to output inference log

For an example of using this script, please see experiment3/inference_checkpoints_FvGeEmAg.sh.

CPG-CNN20 Checkpoints on Facial-informatic Dataset.

Extract the downloaded .zip file, place spherenet20/ in checkpoints/CPG/experiment3/ and use the following command for inference with the checkpoints.

$ bash experiment3/inference_checkpoints_FvGeEmAg.sh 
Task Face verification Gender Expression Mean of 5-fold Age
Acc. 99.300 +- 0.348 89.66 63.57 57.66

Benchmarking

Cifar100 20 Tasks (datsets as experiment1 above) - VGG16

Methods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Avg.
PackNet 66.4 80.0 76.2 78.4 80.0 79.8 67.8 61.4 68.8 77.2 79.0 59.4 66.4 57.2 36.0 54.2 51.6 58.8 67.8 83.2 67.5
*PAE 67.2 77.0 78.6 76.0 84.4 81.2 77.6 80.0 80.4 87.8 85.4 77.8 79.4 79.6 51.2 68.4 68.6 68.6 83.2 88.8 77.1
CPG 65.2 76.6 79.8 81.4 86.6 84.8 83.4 85.0 84.2 89.2 90.8 82.4 85.6 85.2 53.2 84.4 70.0 73.4 88.8 94.8 80.9

*PAE is our previous work.

Fine-grained 6 Tasks (datsets as experiment2 above) - ResNet50

Methods ImageNet CUBS Stanford Cars Flowers Wikiart Sketch Model Size (MB)
Train from Scratch 76.16 40.96 61.56 59.73 56.50 75.40 554
Finetune - 82.83 91.83 96.56 75.60 80.78 551
ProgressiveNet 76.16 78.94 89.21 93.41 74.94 76.35 563
PackNet 75.71 80.41 86.11 93.04 69.40 76.17 115
Piggyback 76.16 84.59 89.62 94.77 71.33 79.91 121
CPG 75.81 83.59 92.80 96.62 77.15 80.33 121

Facial-informatic 4 Tasks (datasets as experiment3 above) - CNN20

Methods Face Gender Expression Age
Train from Scratch 99.417+-0.367 83.70 57.64 46.14
Finetune - 90.80 62.54 57.27
CPG 99.300+-0.348 89.66 63.57 57.66

Contact

Please feel free to leave suggestions or comments to Steven C. Y. Hung([email protected]), Cheng-Hao Tu([email protected]), Cheng-En Wu([email protected]), Chein-Hung Chen([email protected]), Yi-Ming Chan([email protected]), Chu-Song Chen([email protected])

cpg's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cpg's Issues

The setup of Piggyback in Table 6

Dear authors,

Thanks for sharing the code!

I didn't quite get the setup of Piggyback in Table 6 in your paper.
I guess PackNet is the application of Piggyback in continual learning. Then what is the difference between Piggyback and PackNet in Table 6, when being sequentially applied to the 6 image classification tasks?

Thanks!

Why training speed slows down after 314 batch size and accuracy=nan in Step 3 of Experiment3 ?

Skipped 6000 image pairs
2022-01-08 18:00:28,652 - root - INFO -

2022-01-08 18:00:28,652 - root - INFO - Before pruning:
2022-01-08 18:00:28,652 - root - INFO - Sparsity range: 0.0 -> 0.1
2022-01-08 18:00:28,652 - root - INFO -

Train Ep. #1: 4%|##3 | 465/12274 [05:52<1:17:12, 2.55it/s, loss=9.35, accuracy=nan, lr=0.0005, sparsity=1, network_width_mpl=1]

Is it normal to have accuracy=nan in the training beginning ?

How do you get the same performance for a previous task?

Hi, thanks for the great work. I try to use your code for my own project, however, I find it is hard to obtain the same accuracy on the previously trained task. I think the main reason is that the masked parameters do not include the batch norm layer. Do you have any idea about this behaviour?

Facial informtion class labels

Can I ask about the label name for each class in facial informatics tasks?
Seems to be:

gender: Are these correct?:

  • 0: Male
  • 1: Female
  • 2: Unknown

Emotion: Are these correct?:

  • 0: Neutral
  • 1: Happy
  • 2: Sad
  • 3: Surprise
  • 4: Fear
  • 5: Disgust
  • 6: Angry

Age

  • What are the age ranges for each of 8 classes?

Thanks

pruning and piggyback mask training order

Hi, I'm reading the source code and your paper.
In your paper, it seems that for a new task, you firstly train (a piggyback mask + released space from gradual pruning for previous tasks), then you apply gradual pruning for the new task.

However, in your code(https://github.com/ivclab/CPG/blob/cpg_face/CPG_cifar100_main_normal.py), it seems that you are firstly doing fine-tuning, then gradual pruning, finally piggyback retraining.

Why the order is different?

Why weights are not masked-out

Hi, Thanks for the great work. I have a question with regard to your gradual prune implementation. I notice that in your implementation, you commented out the code for masking out the model weights (line 88 in utils/prune.py). Is there a particular reason for doing this?

About number of parameters

Thanks for your work and code.
I‘m a bit confusing after reading the code. I'd like to know whether the mask and piggymask are different thing, the former works for pruning and the latter works for picking weights for fine-tuning, and they both have to be saved for inference, right?

get predictions for all tasks in single forward pass

Hi!

Is it possible to use model trained on 4 facial-informatic tasks in a way that it can produce predictions for all tasks in single forward pass?

Something like in the following code:

class SphereNetMT(SphereNet):
    """ Multi-task version of SphereNet, returns predictions for all classifiers in single forward pass
    """
    
    def forward(self, x, tasks=['gender', 'emotion', 'age0']):
        x = self.relu1_1(self.conv1_1(x))
        x = x + self.relu1_3(self.conv1_3(self.relu1_2(self.conv1_2(x))))
        x = self.relu2_1(self.conv2_1(x))
        x = x + self.relu2_3(self.conv2_3(self.relu2_2(self.conv2_2(x))))
        x = x + self.relu2_5(self.conv2_5(self.relu2_4(self.conv2_4(x))))
        x = self.relu3_1(self.conv3_1(x))
        x = x + self.relu3_3(self.conv3_3(self.relu3_2(self.conv3_2(x))))
        x = x + self.relu3_5(self.conv3_5(self.relu3_4(self.conv3_4(x))))
        x = x + self.relu3_7(self.conv3_7(self.relu3_6(self.conv3_6(x))))
        x = x + self.relu3_9(self.conv3_9(self.relu3_8(self.conv3_8(x))))
        x = self.relu4_1(self.conv4_1(x))
        x = x + self.relu4_3(self.conv4_3(self.relu4_2(self.conv4_2(x))))
        x = self.flatten(x)
        predictions = {}
        for i, task in enumerate(tasks):
            predictions[task] = self.classifiers[self.datasets.index(task)](x)
        return predictions

Some question about CPG_cifar100_scratch_mul_1.5.sh

Thanks for this great work
I have two questions about CPG_cifar100_scratch_mul_1.5.sh
I run this .sh to complete experiment1 step3 to achieve my adversarial training project.
When I run the first task and prune it and choose a ratio.
I got this error:
圖片

I can't figure out why it can't pass the network_width_multiplier to next task on VGG network

Second question is about max_allowed_network_width_multiplier

max_allowed_network_width_multiplier=1.5

and network_width_multiplier
network_width_multiplier=1.0

Do they have any related or something special ?

Hope someone can help me solve this problem, thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.