Giter Club home page Giter Club logo

dataset-distillation's Introduction

Dataset Distillation

Project Page | Paper

We provide a PyTorch implementation of Dataset Distillation. We distill the knowledge of tens of thousands of images into a few synthetic training images called distilled images.

(a): On MNIST, 10 distilled images can train a standard LeNet with a fixed initialization to 94% test accuracy (compared to 99% when fully trained). On CIFAR10, 100 distilled images can train a deep network with fixed initialization to 54% test accuracy (compared to 80% when fully trained).

(b): We can distill the domain difference between two SVHN and MNIST into 100 distilled images. These images can be used to quickly fine-tune networks trained for SVHN to achieve a high accuracy on MNIST.

(c): Our method can be used to create adversarial attack images. If well-optimized networks retrained with these images for one single gradient step, they will catastrophically misclassify a particular targeted class.

Dataset Distillation
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros.
arXiv, 2018.
Facebook AI Research, MIT CSAIL, UC Berkeley

The code is written by Tongzhou Wang and Jun-Yan Zhu.

Prerequisites

System requirements

  • Python 3
  • CPU or NVIDIA GPU + CUDA

Dependencies

  • torch >= 1.0.0
  • torchvision >= 0.2.1
  • numpy
  • matplotlib
  • pyyaml
  • tqdm

You may install PyTorch (torch package above) using any suggested method for your environment here.

Using this repo

This repo provides the implementation of three different distillation settings described in the paper. Below we describe the basic distillation setting. For other settings and usages, please check out the Advanced Usage.

Getting Started

We aim to encapsulate the knowledge of the entire training dataset, which typically contains thousands to millions of images, into a small number of synthetic training images. To achieve this, we optimize these distilled images such that newly initialized network(s) can achieve high performance on a task, after only applying gradient steps on these distilled images.

The distilled images can be optimized either for a fixed initialization or random unknown ones from a distribution of initializations.

Random unknown initialization

The default options are designed for random initializations. In each training iteration, new initial weights are sampled and trained. Such trained distilled images can be generally applied to unseen initial weights, provided that the weights come from the same initialization distribution.

  • MNIST:

    python main.py --mode distill_basic --dataset MNIST --arch LeNet
  • Cifar10:

    python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet \
        --distill_lr 0.001

    AlexCifarNet is an architecture adapted from the cuda-convnet project by Alex Krizhevsky.

Fixed known initialization

Alternatively, the distilled images can be optimized for a particular initialization, allowing for high performance using even fewer images (e.g., 10 images trains an initialized LeNet to 94% test accuracy).

  • MNIST:

    python main.py --mode distill_basic --dataset MNIST --arch LeNet \
        --distill_steps 1 --train_nets_type known_init --n_nets 1 \
        --test_nets_type same_as_train
  • Cifar10:

    python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet \
        --distill_lr 0.001 --train_nets_type known_init --n_nets 1 \
        --test_nets_type same_as_train

Citation

If you find this useful for your research, please cite the following paper.

@article{wang2018dataset,
  title={Dataset Distillation},
  author={Wang, Tongzhou and Zhu, Jun-Yan and Torralba, Antonio and Efros, Alexei A},
  journal={arXiv preprint arXiv:1811.10959},
  year={2018}
}

Acknowledgements

This work was supported in part by NSF 1524817 on Advancing Visual Recognition with Feature Visualizations, NSF IIS-1633310, and Berkeley Deep Drive.

dataset-distillation's People

Contributors

carmocca avatar ssnl avatar swap-10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataset-distillation's Issues

back-gradient optimization technique

Hello!

I have a question about back-gradient optimization technique. Your paper mentioned this article, but reading the source code train_distill_image.py, I've noticed that you couldn't use SGD with momentum (because of previous learning rates influence), and so had to save neural network parameters of each forward step. So what is advantage of your scheme over usual backpropagation?

A very interesting idea

I have download your paper and read it. I think it is a very intesting idea and could give some help for our current reaserch. However, a question, does dataset disstillation could get high accuracy output in other dataset as good as MNIST?

Bug in kmeans baseline function?

After running the function to extract the kmeans centroids as baselines, I saved some of these centroids as png images and noticed some of them look like noise which might suggest the presence of a bug, however, I haven't gone through the code myself.

Here are a few of the centroids generated for MNIST class 3:
30313233343536373839

Any idea of what might be the cause? If it is an actual bug I guess this would impact the values presented in the paper.

For reference, here is the code I used:

data = dataset_distillation.utils.baselines.kmeans_train(state, p=2)
imgs, labels = data[-1]  # Use last step
for i, img in enumerate(imgs):
    torchvision.utils.save_image(img, f"{i}.png", nrow=1, padding=0)

Thank you.

Question about distilled images

  • Hi, I have run many demos. Now I have a little question, can I use 10 distilled images of MNIST to train a lenet network directly?

How to adapt to our own databases/architecture?

Hey I am very interested in this work, can you easy my job and indicate which lines do I need to customize to distillate my own dataset with Xavier initialization (Random initialization according to your paper) and a particular architecture not on your list?

'TestRunner' referenced before assignment

To compute the baseline for non-optimized random real images, I'm using the following command:

python main.py --dataset MNIST --arch LeNet \
    --distilled_images_per_class_per_step 10 \
    --phase test \
    --test_nets_type unknown_init \
    --test_distilled_images random_train \
    --test_n_nets 200 \
    --test_n_runs 10

however, I get the following error:

Traceback (most recent call last):
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 359, in main
    test_runner = TestRunner(state)
UnboundLocalError: local variable 'TestRunner' referenced before assignment

After taking a look, this seems to happen because test_optimize_n_runs is not set even though its supposed to be optional.

My guess is that https://github.com/SsnL/dataset-distillation/blob/master/main.py#L324-L355 should be out of its containing else block. Is that it or am I missing something?

Thank you.

How to distribute different GPUs for some large models

Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that

Best,
Yugeng

The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension 0

I noticed this problem when I wanted to test the distilled images (see basic.py):

image

The reason is the condition just above.
Indeed, for a binary classification, whose output returns 32 rows (so 64 values), doing (output > 0.5).to(target.dtype).view(-1) will return a 64-value tensor (32 values of 1) but the target contains only 32 values so it will create this problem.

So, to solve this problem, just apply output.argmax(-1) even for binary classification.

Questions about training after dataset distillation

Hi, thank you for sharing your interesting work.
I'm puzzled about how to train a network use the distilled data, does it just set --mode to 'train' and keep other options unchanged. Such as
python main.py --mode train --dataset MNIST --arch LeNet \ --distill_steps 1 --train_nets_type known_init --n_nets 1 \ --test_nets_type same_as_train

RuntimeError: CUDA out of memory.

Hi,
I have the following error when using the GPU on my own dataset (2 classes) and my own model:

"RuntimeError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 11.17 GiB total capacity; 10.47 GiB already allocated; 107.25 MiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

Following what you explained at this link : #28 , I tried different combinations of distill_steps, distill_epochs, distilled_images_per_class_per_step and num_distill_classes. I realized that the GPU limit was reached when: num_distill_classes * epochs * images_per_class_per_step * distill_steps > 4 .

The problem is that with 2 epochs, 2 steps, 1 image and 1 distilled class, the results are not sufficient.

What can I do to improve them?

ps : I use a tesla k80 (12 gb of dedicated memory) with 56 gb of RAM.

Thank you in advance

Compare to training on randomly selected samples

Do you guys try to compare the results of distilled data to the ones trained on randomly selected samples of the dataset?
For example, if I randomly select 10 images from the MNIST dataset (1 for each category) and train the network on them, how would the results be? I think it's a fundamental thing to compare with.

Very interesting work by the way!

Wrong default arguments

Regarding some of the default arguments set in base_options.py:

https://github.com/SsnL/dataset-distillation/blob/f749262ca2dbd929a07b912cf271c76c0e6e378e/base_options.py#L247-L248

Shouldn't the default value be one of the options? An error is thrown out when using the value charge

https://github.com/SsnL/dataset-distillation/blob/f749262ca2dbd929a07b912cf271c76c0e6e378e/base_options.py#L249-L250

As indicated by both the paper (section S-1) and the help argument, the default value should be 0.02 which is not the case. Were the experiments performed with 0.02 or 0.001?

Broken: yaml.load(input) is removed in PyYAML >=6.0

yaml.load(input) was deprecated in PyYAML 5.1+ and removed in 6.0+ due to a CVE on arbitrary code execution
Code in base_options.py functions get_dummy_state() and set_state() uses yaml.load() function and is broken when used with PyYAML versions 6.0+
Instead yaml.full_load(input) or yaml.load(input, Loader=FullLoader) can be used

ImportError: cannot import name 'reduce_op'

when i run the code
python main.py --mode distill_basic --dataset MNIST --arch LeNet
i got this
`Traceback (most recent call last):
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 5, in
from torch.distributed import ReduceOp
ImportError: cannot import name 'ReduceOp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 11, in
from base_options import options
File "F:\dataset_dis\dataset-distillation-master\base_options.py", line 6, in
import utils
File "F:\dataset_dis\dataset-distillation-master\utils_init_.py", line 3, in
from . import distributed
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 7, in
from torch.distributed import reduce_op
ImportError: cannot import name 'reduce_op'`
is this because of the version of pytorch or something else?

Distilling labels as well as images

It seems that you fix the label associated with each image when producing a weight update with it.

This, I presume, helps the images you learn be specialised to single classes. E.g. the MNIST digits you produce (for random initialisation) are clearly distinct digits.

However, the labels are also a vital part of the dataset.

Did you consider randomly initialising them and backpropagating the training signal to learn them too?

cifar10

TypeError: init() got an unexpected keyword argument 'padding_mode'.
How to deal with it?

Uniquenamespace

Hello.
While I was reading your code,I met a unresolved reference in Line 40 of base_options.py.It is

        self.opt = UniqueNamespace()

So,how can I solve this? Thank you!

some question about dataset distillation

Hello, Dr Wang.
I have some question about dataset distillation. For image x^1 in the synthetic distilled training dataset, the loss function L is very small, or even equal 0. For minimize the objective function, we obtain the distilled dataset only with x^1. As a result, the distilled dataset have only a image.
Or in other words, how do you controll the size of the distilled dataset.
Thank you very much.

retrain distilled images with minibatch-SGD

Hey I am very interested in this work, and have some questions to ask.
I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy.
But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is
(1)Is it just because the different way of optimization?
(2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation?
(3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?

Is the training process on distilled data can be conducted in main.py

Thanks for your great work.

I have a question regarding the code in the repo dataset-distillation.
If I understood correctly, after distilling the image, we can train AlexCifarnet with the distilled Cifar10 data. and then conducting the test on the trained model with original Cifar10 test data.

I have gone over the code, however, I didn't find the snippet to train the distilled image. If the training process is actually presented in the code, could you please help me by noting down the command for training the distilled Cifar10 data after distilling the data?

Looking forward to your reply and hope you have an amazing day!
Thank you and best regards,
Dai

Adapt distill on dataset SVHN

  • Hi, I have just train some networks of Lenet on Mnist and adapt distill on dataset USPS. I am wondering how can I adapt distill on dataset SVHN.

How to distill dataset of size exceeding gpu memory size limit

Hi!
I wonder is there any way to distill much more data exceeding gpu memory size limit? For a large scale dataset or a typical 11G/12G gpu memory size, that can be really useful. At first, I thought state.distributed in your code is intended for that by putting distilled data into multiple gpus, then I found out I was wrong. It seems that this code only distills data of size fit for one gpu memory size. So, any advice on this matter?

Thanks a lot!

Structure of results.pth

Hello!

Very interesting paper, thanks!
I have a question about the ‘’’results.pth’’’. It’s a file with distilled images (tensors), but what about its structure? For example, for MNIST its ‘’’len’’’ 30, so it’s because we have 3 steps? And labels, it’s just tensor from 0 to 9, so the order is important, like the first 3 tensors are class 0, right?
And at the testing phase we use pretrained model which we got after distill basic?

Thanks!

Good luck in next sumission

I am very sad to know your paper was rejected by ICLR, I believe your research is very useful to many areas, especially in security and privacy. Good luck in next sumission.

Getting what appears to be noise on Imagenette + XResnet

I am getting outputs that look completely random when I try to run distillation on a subset of imagenet with a XResnet 18 model.

I have only tried one set of command line args and was wondering whether you had any intuition for what I might obviously be doing wrong or had tried this before.

My command is:

python main.py --mode distill_basic --dataset Imagenette --arch DXResNet18 --batch_size 64 \
    --distill_steps 3 --train_nets_type known_init --n_nets 1 \
    --test_nets_type same_as_train

I made my own DXResnet18 class and Imagenette dataloader.

Thanks in advance!

Logger warnings and order of optimizer.step() and lr_scheduler.step()

Hello,

when I run the following command:

python main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train

I get the following warnings:

/home/claudio.greco/dataset-distillation/base_options.py:423: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please rea
d https://msg.pyyaml.org/load for full details.
  old_yaml = yaml.load(f)  # this is a dict
2019-09-12 16:18:31 [WARNING]  ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/opt.yaml already exists, moved t
o ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/old_opts/opt_2019_09_12__16_13_40.yaml
2019-09-12 16:18:31 [INFO ]  train dataset size: 60000
2019-09-12 16:18:31 [INFO ]  test dataset size:  10000
2019-09-12 16:18:31 [INFO ]  datasets built!
2019-09-12 16:18:31 [INFO ]  mode: distill_basic, phase: train  
2019-09-12 16:18:31 [INFO ]  Build 1 LeNet network(s) with [xavier(1.0)] init
^[[A2019-09-12 16:18:37 [INFO ]  Train 1 steps iterated for 3 epochs
/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `
optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in P
yTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
2019-09-12 16:18:37 [INFO ]  Results saved to ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/checkpoints/epoch
0000/results.pth
2019-09-12 16:18:37 [INFO ]
2019-09-12 16:18:37 [INFO ]  Begin of epoch 0 :
Begin of epoch 0 (1 same_as_train nets): 100%|####################################################################################################| 2/2 [00:00<00:00,  3.36it/s]
--- Logging error ---
Traceback (most recent call last):
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 15, in emit
    tqdm.tqdm.write(msg)
  File "/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/tqdm/_tqdm.py", line 555, in write
    fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1902, in info
    root.info(msg, *args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 20, in emit
    self.handleError(record)
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
2019-09-12 16:18:38 [INFO ]
2019-09-12 16:18:38 [INFO ]  Epoch:    0 [      0/  60000 ( 0%)] Loss: 2.3755 Data Time: 0.44s Train Time: 0.07s
2019-09-12 16:18:40 [INFO ]  Epoch:    1 [      0/  60000 ( 0%)] Loss: 2.2400 Data Time: 0.12s Train Time: 0.03s
2019-09-12 16:18:41 [INFO ]  Epoch:    2 [      0/  60000 ( 0%)] Loss: 1.7438 Data Time: 0.13s Train Time: 0.03s

The error on logging makes me impossible to use this script, because I cannot see accuracy, loss, etc. I also don't know if that error is related to the warning on the order of calling `optimizer.step()` and `lr_scheduler.step()`. (Maybe nan values are generated which cannot be properly encoded by the logger?)

Could you please help me to solve this issue? Could it be related to the versions of Python and PyTorch I am using? I am using Python 3.6.8 and PyTorch 1.2.0. What versions did you use exactly?

Thank you very much in advance.

Best,
Claudio

BatchNorm2d of ResNet model warning

Hello, when I use ResNet18 with pretrained model from pytorch. It will show the warning.
[WARNING] BatchNorm2d contains buffer running_var. The buffer will be treated as a constant and assumed not to change during gradient steps. If this assumption is violated (e.g., BatchNorm*d's running_mean/var), the computation will be incorrect.
I am not sure if it will influence the results.

Many thanks

Getting distilled images and testing on them

Hello!

I'm wondering what is the correct way to get distilled images and test performance on them as well check performance on a normal dataset after training on the distilled images. I'm confused since there are many parameters and I've already read the advanced docs. So, to get distilled data, for example on Cifar10, I need to run

python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet  --distill_lr 0.001

Then distilled images are in file result.pth .

So, to train the network on usual full dataset I need to set --mode train , and if I want to test network performance after training on distilled data I need to set --mode train --phase test ?

Or, in other words, how to get results like in your paper where it’s said that you get 80% when fully trained against 54% with distilled data on CIFAR10?

Looking forward for your response! Thanks!

Questions about implementation of optimizing distilled data

Hello!

I am reading the source code, specifically class Trainer in train_distilled_image.py. I have two questions regarding to your implementation of optimizing distilled data:

  • When computing the gradient of final L w.r.t w in params, you claim that you use w (PRE-GD) in paper and comment in the code. But you are actually using weight after gd in line 156-160 in train_distilled_image.py, since params stores original model weight and model weights after gd in every step. In the loop (line 143), the w corresponds to model weight POST-GD, not PRE-GD. To verify my guess, I check that len(params)=31, len(gws)=30 during running
    python main.py --mode distill_basic --dataset MNIST --arch LeNet. That means in the loop, the updated model weight in the final step in first retrieved.
    I guess simply disgarding the model weight after gd in the final step will do the job.

  • In line 172, you use dw.add_(hvp_grad[0]) to update dw, which is wierd because gradient through different steps does not accumulate by adding. If dw denotes the gradient of final L w.r.t the updated w in each step, I wonder if dw=(hvp_grad[0]) is the correct one. Because in my understanding, every unupdated model weight in this step is the updated model weight in last step, which makes hvp_grad[0] itself the gradient of final L w.r.t the updated w in each step.

Anyway, many thanks to your interesting works!

Question about applicability

Hi, I came across your paper a few weeks ago.
I have a dataset that constantly grows, like every couple of weeks. The growth is both in terms of more examples of a set of known classes as well as new classes being added.
Is it possible to use this method to keep a reduced dataset of the old images?
For eg: I have 10k images that I want to distill into 100. Then I get a new batch of 200 images.
How would I retrain a model "from scratch" using this combination of distilled and raw images?

I'm a grad student focussing on HPC, so I'm sorry if these questions are silly. But I would greatly appreciate any feedback, thank you!

Loss becomes NaN

When I run the unknown initialization experiments on MNIST:
python main.py --mode distill_basic --dataset MNIST --arch LeNet
I get the following after a few dozen epochs:
Traceback (most recent call last): File "main.py", line 402, in <module> main(options.get_state()) File "main.py", line 130, in main steps = train_distilled_image.distill(state, state.models) File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 296, in distill return Trainer(state, models).train() File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 283, in train raise RuntimeError('loss became NaN') RuntimeError: loss became NaN
Was gradient fairly stable when you were running this for the paper? Do I just need to make some more attempts?

Replicating results

I think your paper is fascinating so I have been experimenting with it for a few weeks now.
I was wondering what hyperparams you used to get 10 images that achieve almost 94% accuracy on MNIST after 1 GD step and 3 epochs. I can't seem to hit this when I run the suggested code for 200 epochs. At most I managed to get around 91%.

python3 main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train

Question about commands

image

  • Hi, I can understand that 200 lenet networks are trained for distilling images. But what can those 20 test networks do?

Max number of classes

Hello!

Am I right that the maximum number of classes on that you tested the distillation algorithm is 200 classes (CUB200)? Which GPU did you use?
I’m trying to run the code for more than 10 classes, and my GPU out of memory even for 15 classes. But it’s Tesla V100, and I can’t reproduce results for CUB200. Or you parallelized the algorithm somehow?

Does the instillation model and the test model must be the same?

Sorry for bothering, I have some question when reading the paper
I run main.py and it works well, my question is:
The first question:
Does the instillation model and the test model must be the same?
For example, if I got the distillation images using LeNet, can I train these images on AlexNet?
The second question:
Does the number of the distillation images must be equal to the class numbers?
For example, if I want to distillation the MNIST dataset, can I distillate it into 20 images?
(I saw the distillation_lables equals to num_class)

cifar10: RuntimeError: Mismatch in shape

Thank you so much for sharing the code. I got this error when I run
python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet \ --distill_lr 0.001

torch version: 1.4.0
torchvision: 0.5.0

I am wondering do you have any hints about what's wrong here? Thank you so much in advance.

2020-03-25 14:01:27 [ERROR]  Fatal error:                                                                                                                                                                                                                                
2020-03-25 14:01:27 [ERROR]  Traceback (most recent call last):
2020-03-25 14:01:27 [ERROR]    File "main.py", line 402, in <module>
2020-03-25 14:01:27 [ERROR]      main(options.get_state())
2020-03-25 14:01:27 [ERROR]    File "main.py", line 131, in main
2020-03-25 14:01:27 [ERROR]      steps = train_distilled_image.distill(state, state.models)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
2020-03-25 14:01:27 [ERROR]      return Trainer(state, models).train()
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
2020-03-25 14:01:27 [ERROR]      evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
2020-03-25 14:01:27 [ERROR]      res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
2020-03-25 14:01:27 [ERROR]      params = train_steps_inplace(state, models, steps, params, callback=test_callback)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
2020-03-25 14:01:27 [ERROR]      loss.backward(lr)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
2020-03-25 14:01:27 [ERROR]      torch.autograd.backward(self, gradient, retain_graph, create_graph)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
2020-03-25 14:01:27 [ERROR]      grad_tensors = _make_grads(tensors, grad_tensors)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
2020-03-25 14:01:27 [ERROR]      raise RuntimeError("Mismatch in shape: grad_output["
2020-03-25 14:01:27 [ERROR]  RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Begin of epoch 0 (1 unknown_init nets):  50%|███████████████████████████████████████████████████████████████████████████████████████████████                                                                                               | 1/2 [00:00<00:00,  1.08it/s]Traceback (most recent call last):
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 131, in main
    steps = train_distilled_image.distill(state, state.models)
  File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
    return Trainer(state, models).train()
  File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
    res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
    params = train_steps_inplace(state, models, steps, params, callback=test_callback)
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
    loss.backward(lr)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
    grad_tensors = _make_grads(tensors, grad_tensors)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
    raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).

Running in Jupyter

Hi SsnL

I'm trying to run this on the Jupyter but it returns this error:
Unexpected args: ['-f', '/home/user/.local/share/jupyter/runtime/kernel-d7f01d0e-54cb-461d-8d44-0d43cb505a17.json']

I searched for it, it seems the jupyter can't pass arguments correctly to base_options.py
Do you have any idea that how can I fix it?

Question about pulling data from a .gz file

Hello!

We are trying to use dataset distillation with a .gz file (similar to those that can be downloaded from the MNIST dataset). We've been looking through your dataset distillation code but we've been unable to find out where we could edit the code to pull data from our .gz file instead of from the MNIST dataset.

Could you please let me know in which file/line we could edit your dataset distillation code to pull data from the .gz file?

Thank you in advance!

How do you keep buffer fixed during gradient steps

Hello!
I've noticed your warning

logging.warn(('{} contains buffer {}. The buffer will be treated as '
                        'a constant and assumed not to change during gradient '
                        'steps. If this assumption is violated (e.g., '
                        'BatchNorm*d\'s running_mean/var), the computation will '
                        'be incorrect.').format(m.__class__.__name__, n))

May I ask how do you keep buffer fixed during gradient steps(e.g. running mean and running var in batchnorm)? In this code there is only LeNet and AlexNet, so this won't be a problem. But I wonder have you done experiment on networks with batchnorm?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.