ssnl / dataset-distillation Goto Github PK

View Code? Open in Web Editor NEW

764.0 764.0 116.0 1.79 MB

Open-source code for paper "Dataset Distillation"

Home Page: https://ssnl.github.io/dataset_distillation

License: MIT License

Python 100.00%

dataset-distillation's People

Contributors

Stargazers

Watchers

Forkers

ml-lab killsking vitvicky zgsxwsdxg jy00002 happog raaka1 jongchan celestist devolfnn rabbitnick hyzcn ilia10000 blakecheng xeransis utkuozbulak tlesort sshleifer tochikuji advboxzoo tcwltcwl yaoyao-liu kanglicheng tanmdl cgjacklin patrickzh slowbull yightwing deep-leo ai-maxim guang000 boreallis jackrgoetz hins robot-ai-machinelearning xinlinli170 bxz9200 xueqiyang lliai pinglmlcv 5l1v3r1 ondrejbohdal sanwan dm-medvedev cherry1024 alienbd nierth inspectordidi xxchenxx sivaramakrishnan-rajaraman ralph-finn newscitlh mydecember12 wb11uo jaingaurav3 billpsomas milkigit bukkster fenghz kellsky maomao33 rithikp06 zzzzzzyang yellowsimulator qingtangding noploop davidpengiupui zfgao66 aristotle-li dionman tianyuzelin bridgeyao2022 pedrohenriqp lumenyoung georgegu1997 widiba03304 lichenhao608 swap-10 ywang037 tzq2doc jon-drugstore ankitshah009 stjordanis nctuemo clnchn changliii haole1683 mixiancmx hadia95 danielhelle mnpham0417 abbottyanginchina krishnatejakk liu-hy willianzhuo dexuan-meng hjinnkim myfirstkindom chococoffee999 rlawhdals

dataset-distillation's Issues

Cannot find definition of `local_n_nets`

state.local_n_nets is used without definition. What's the meaning of it?

If train_nets_type=='known_init', why we want to build several networks with different initialization?

https://github.com/SsnL/dataset-distillation/blob/24e6958cf23bc98de20a7b2714fc76fa7d38b874/main.py#L90-L91

Thanks a lot!

Is the training process on distilled data can be conducted in main.py

Thanks for your great work.

I have a question regarding the code in the repo dataset-distillation.
If I understood correctly, after distilling the image, we can train AlexCifarnet with the distilled Cifar10 data. and then conducting the test on the trained model with original Cifar10 test data.

I have gone over the code, however, I didn't find the snippet to train the distilled image. If the training process is actually presented in the code, could you please help me by noting down the command for training the distilled Cifar10 data after distilling the data?

Looking forward to your reply and hope you have an amazing day!
Thank you and best regards,
Dai

cifar10

TypeError: init() got an unexpected keyword argument 'padding_mode'.
How to deal with it?

RuntimeError: CUDA out of memory.

Hi,
I have the following error when using the GPU on my own dataset (2 classes) and my own model:

"RuntimeError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 11.17 GiB total capacity; 10.47 GiB already allocated; 107.25 MiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

Following what you explained at this link : #28 , I tried different combinations of distill_steps, distill_epochs, distilled_images_per_class_per_step and num_distill_classes. I realized that the GPU limit was reached when: num_distill_classes * epochs * images_per_class_per_step * distill_steps > 4 .

The problem is that with 2 epochs, 2 steps, 1 image and 1 distilled class, the results are not sufficient.

What can I do to improve them?

ps : I use a tesla k80 (12 gb of dedicated memory) with 56 gb of RAM.

Thank you in advance

ImportError: cannot import name 'reduce_op'

when i run the code
python main.py --mode distill_basic --dataset MNIST --arch LeNet
i got this
`Traceback (most recent call last):
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 5, in
from torch.distributed import ReduceOp
ImportError: cannot import name 'ReduceOp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 11, in
from base_options import options
File "F:\dataset_dis\dataset-distillation-master\base_options.py", line 6, in
import utils
File "F:\dataset_dis\dataset-distillation-master\utils_init_.py", line 3, in
from . import distributed
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 7, in
from torch.distributed import reduce_op
ImportError: cannot import name 'reduce_op'`
is this because of the version of pytorch or something else?

Wrong default arguments

Regarding some of the default arguments set in base_options.py:

https://github.com/SsnL/dataset-distillation/blob/f749262ca2dbd929a07b912cf271c76c0e6e378e/base_options.py#L247-L248

Shouldn't the default value be one of the options? An error is thrown out when using the value charge

https://github.com/SsnL/dataset-distillation/blob/f749262ca2dbd929a07b912cf271c76c0e6e378e/base_options.py#L249-L250

As indicated by both the paper (section S-1) and the help argument, the default value should be 0.02 which is not the case. Were the experiments performed with 0.02 or 0.001?

Replicating results

I think your paper is fascinating so I have been experimenting with it for a few weeks now.
I was wondering what hyperparams you used to get 10 images that achieve almost 94% accuracy on MNIST after 1 GD step and 3 epochs. I can't seem to hit this when I run the suggested code for 200 epochs. At most I managed to get around 91%.

python3 main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train

Logger warnings and order of optimizer.step() and lr_scheduler.step()

Hello,

when I run the following command:

python main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train

I get the following warnings:

/home/claudio.greco/dataset-distillation/base_options.py:423: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please rea
d https://msg.pyyaml.org/load for full details.
  old_yaml = yaml.load(f)  # this is a dict
2019-09-12 16:18:31 [WARNING]  ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/opt.yaml already exists, moved t
o ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/old_opts/opt_2019_09_12__16_13_40.yaml
2019-09-12 16:18:31 [INFO ]  train dataset size: 60000
2019-09-12 16:18:31 [INFO ]  test dataset size:  10000
2019-09-12 16:18:31 [INFO ]  datasets built!
2019-09-12 16:18:31 [INFO ]  mode: distill_basic, phase: train  
2019-09-12 16:18:31 [INFO ]  Build 1 LeNet network(s) with [xavier(1.0)] init
^[[A2019-09-12 16:18:37 [INFO ]  Train 1 steps iterated for 3 epochs
/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `
optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in P
yTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
2019-09-12 16:18:37 [INFO ]  Results saved to ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/checkpoints/epoch
0000/results.pth
2019-09-12 16:18:37 [INFO ]
2019-09-12 16:18:37 [INFO ]  Begin of epoch 0 :
Begin of epoch 0 (1 same_as_train nets): 100%|####################################################################################################| 2/2 [00:00<00:00,  3.36it/s]
--- Logging error ---
Traceback (most recent call last):
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 15, in emit
    tqdm.tqdm.write(msg)
  File "/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/tqdm/_tqdm.py", line 555, in write
    fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
  File "/usr/lib64/python3.6/logging/__init__.py", line 1902, in info
    root.info(msg, *args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
    self.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
    self.callHandlers(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
    self.emit(record)
  File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 20, in emit
    self.handleError(record)
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 130, in main
    steps = train_distilled_image.distill(state, state.models)  
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
    return Trainer(state, models).train()
  File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
    logging.info(format_stepwise_results(state, steps, result_title, res))
Message: 'Begin of epoch 0  (1 same_as_train nets) test results:\n\t          STEP                   ACCURACY                   LOSS          \n\t            before steps           7.9102 \xb1  nan%            2.4235 \xb1  nan\n\t     step  3 (lr=0.0200)           6.7383 \xb1  nan%            2.3925 \xb1  nan'
Arguments: ()
2019-09-12 16:18:38 [INFO ]
2019-09-12 16:18:38 [INFO ]  Epoch:    0 [      0/  60000 ( 0%)] Loss: 2.3755 Data Time: 0.44s Train Time: 0.07s
2019-09-12 16:18:40 [INFO ]  Epoch:    1 [      0/  60000 ( 0%)] Loss: 2.2400 Data Time: 0.12s Train Time: 0.03s
2019-09-12 16:18:41 [INFO ]  Epoch:    2 [      0/  60000 ( 0%)] Loss: 1.7438 Data Time: 0.13s Train Time: 0.03s

The error on logging makes me impossible to use this script, because I cannot see accuracy, loss, etc. I also don't know if that error is related to the warning on the order of calling `optimizer.step()` and `lr_scheduler.step()`. (Maybe nan values are generated which cannot be properly encoded by the logger?)

Could you please help me to solve this issue? Could it be related to the versions of Python and PyTorch I am using? I am using Python 3.6.8 and PyTorch 1.2.0. What versions did you use exactly?

Thank you very much in advance.

Best,
Claudio

How to adapt to our own databases/architecture?

Hey I am very interested in this work, can you easy my job and indicate which lines do I need to customize to distillate my own dataset with Xavier initialization (Random initialization according to your paper) and a particular architecture not on your list?

Does the instillation model and the test model must be the same?

Sorry for bothering, I have some question when reading the paper
I run main.py and it works well, my question is:
The first question:
Does the instillation model and the test model must be the same?
For example, if I got the distillation images using LeNet, can I train these images on AlexNet?
The second question:
Does the number of the distillation images must be equal to the class numbers?
For example, if I want to distillation the MNIST dataset, can I distillate it into 20 images?
(I saw the distillation_lables equals to num_class)

Question about distill my own dataset

Hi, I have run many demos based on given datasets. How can I distill my own dataset?

'TestRunner' referenced before assignment

To compute the baseline for non-optimized random real images, I'm using the following command:

python main.py --dataset MNIST --arch LeNet \
    --distilled_images_per_class_per_step 10 \
    --phase test \
    --test_nets_type unknown_init \
    --test_distilled_images random_train \
    --test_n_nets 200 \
    --test_n_runs 10

however, I get the following error:

Traceback (most recent call last):
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 359, in main
    test_runner = TestRunner(state)
UnboundLocalError: local variable 'TestRunner' referenced before assignment

After taking a look, this seems to happen because test_optimize_n_runs is not set even though its supposed to be optional.

My guess is that https://github.com/SsnL/dataset-distillation/blob/master/main.py#L324-L355 should be out of its containing else block. Is that it or am I missing something?

Thank you.

Question about weights update in network of phase test

Hi, after training 10 distilled images, how does these 10 distilled images update the weights of the neural network in phase test?

How to apply it on the custom dataset?

Hi, thank you for your great work,
I want to try this on custom dataset, do you have any suggestions on this?
Thanks

How to distill dataset of size exceeding gpu memory size limit

Hi!
I wonder is there any way to distill much more data exceeding gpu memory size limit? For a large scale dataset or a typical 11G/12G gpu memory size, that can be really useful. At first, I thought state.distributed in your code is intended for that by putting distilled data into multiple gpus, then I found out I was wrong. It seems that this code only distills data of size fit for one gpu memory size. So, any advice on this matter?

Thanks a lot!

Loss becomes NaN

When I run the unknown initialization experiments on MNIST:
python main.py --mode distill_basic --dataset MNIST --arch LeNet
I get the following after a few dozen epochs:
Traceback (most recent call last): File "main.py", line 402, in <module> main(options.get_state()) File "main.py", line 130, in main steps = train_distilled_image.distill(state, state.models) File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 296, in distill return Trainer(state, models).train() File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 283, in train raise RuntimeError('loss became NaN') RuntimeError: loss became NaN
Was gradient fairly stable when you were running this for the paper? Do I just need to make some more attempts?

Structure of results.pth

Hello!

Very interesting paper, thanks!
I have a question about the ‘’’results.pth’’’. It’s a file with distilled images (tensors), but what about its structure? For example, for MNIST its ‘’’len’’’ 30, so it’s because we have 3 steps? And labels, it’s just tensor from 0 to 9, so the order is important, like the first 3 tensors are class 0, right?
And at the testing phase we use pretrained model which we got after distill basic?

Thanks!

cifar10: RuntimeError: Mismatch in shape

Thank you so much for sharing the code. I got this error when I run
python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet \ --distill_lr 0.001

torch version: 1.4.0
torchvision: 0.5.0

I am wondering do you have any hints about what's wrong here? Thank you so much in advance.

2020-03-25 14:01:27 [ERROR]  Fatal error:                                                                                                                                                                                                                                
2020-03-25 14:01:27 [ERROR]  Traceback (most recent call last):
2020-03-25 14:01:27 [ERROR]    File "main.py", line 402, in <module>
2020-03-25 14:01:27 [ERROR]      main(options.get_state())
2020-03-25 14:01:27 [ERROR]    File "main.py", line 131, in main
2020-03-25 14:01:27 [ERROR]      steps = train_distilled_image.distill(state, state.models)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
2020-03-25 14:01:27 [ERROR]      return Trainer(state, models).train()
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
2020-03-25 14:01:27 [ERROR]      evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
2020-03-25 14:01:27 [ERROR]      res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
2020-03-25 14:01:27 [ERROR]      params = train_steps_inplace(state, models, steps, params, callback=test_callback)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
2020-03-25 14:01:27 [ERROR]      loss.backward(lr)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
2020-03-25 14:01:27 [ERROR]      torch.autograd.backward(self, gradient, retain_graph, create_graph)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
2020-03-25 14:01:27 [ERROR]      grad_tensors = _make_grads(tensors, grad_tensors)
2020-03-25 14:01:27 [ERROR]    File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
2020-03-25 14:01:27 [ERROR]      raise RuntimeError("Mismatch in shape: grad_output["
2020-03-25 14:01:27 [ERROR]  RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Begin of epoch 0 (1 unknown_init nets):  50%|███████████████████████████████████████████████████████████████████████████████████████████████                                                                                               | 1/2 [00:00<00:00,  1.08it/s]Traceback (most recent call last):
  File "main.py", line 402, in <module>
    main(options.get_state())
  File "main.py", line 131, in main
    steps = train_distilled_image.distill(state, state.models)
  File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
    return Trainer(state, models).train()
  File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
    evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
    res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
    params = train_steps_inplace(state, models, steps, params, callback=test_callback)
  File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
    loss.backward(lr)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
    grad_tensors = _make_grads(tensors, grad_tensors)
  File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
    raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).

Where do distilled images get saved?

I see the visuals_step008.png type files being generated, but where are the individual distilled images?

Thanks in Advance!

Distilling labels as well as images

It seems that you fix the label associated with each image when producing a weight update with it.

This, I presume, helps the images you learn be specialised to single classes. E.g. the MNIST digits you produce (for random initialisation) are clearly distinct digits.

However, the labels are also a vital part of the dataset.

Did you consider randomly initialising them and backpropagating the training signal to learn them too?

Applicability to general structured datasets

Hi, thanks for your work.

Have you tested whether your method is effective for general structured data other than images？

Broken: yaml.load(input) is removed in PyYAML >=6.0

yaml.load(input) was deprecated in PyYAML 5.1+ and removed in 6.0+ due to a CVE on arbitrary code execution
Code in base_options.py functions get_dummy_state() and set_state() uses yaml.load() function and is broken when used with PyYAML versions 6.0+
Instead yaml.full_load(input) or yaml.load(input, Loader=FullLoader) can be used

BatchNorm2d of ResNet model warning

Hello, when I use ResNet18 with pretrained model from pytorch. It will show the warning.
[WARNING] BatchNorm2d contains buffer running_var. The buffer will be treated as a constant and assumed not to change during gradient steps. If this assumption is violated (e.g., BatchNorm*d's running_mean/var), the computation will be incorrect.
I am not sure if it will influence the results.

Many thanks

How to get the distilled images?

Thanks for your great work, I wonder to know how to get the distilled images for MNIST data.

Questions about training after dataset distillation

Hi, thank you for sharing your interesting work.
I'm puzzled about how to train a network use the distilled data, does it just set --mode to 'train' and keep other options unchanged. Such as
python main.py --mode train --dataset MNIST --arch LeNet \ --distill_steps 1 --train_nets_type known_init --n_nets 1 \ --test_nets_type same_as_train

Good luck in next sumission

I am very sad to know your paper was rejected by ICLR, I believe your research is very useful to many areas, especially in security and privacy. Good luck in next sumission.

Question about commands

Hi, I can understand that 200 lenet networks are trained for distilling images. But what can those 20 test networks do?

Compare to training on randomly selected samples

Do you guys try to compare the results of distilled data to the ones trained on randomly selected samples of the dataset?
For example, if I randomly select 10 images from the MNIST dataset (1 for each category) and train the network on them, how would the results be? I think it's a fundamental thing to compare with.

Very interesting work by the way!

Max number of classes

Hello!

Am I right that the maximum number of classes on that you tested the distillation algorithm is 200 classes (CUB200)? Which GPU did you use?
I’m trying to run the code for more than 10 classes, and my GPU out of memory even for 15 classes. But it’s Tesla V100, and I can’t reproduce results for CUB200. Or you parallelized the algorithm somehow?

How do you keep buffer fixed during gradient steps

Hello!
I've noticed your warning

logging.warn(('{} contains buffer {}. The buffer will be treated as '
                        'a constant and assumed not to change during gradient '
                        'steps. If this assumption is violated (e.g., '
                        'BatchNorm*d\'s running_mean/var), the computation will '
                        'be incorrect.').format(m.__class__.__name__, n))

May I ask how do you keep buffer fixed during gradient steps(e.g. running mean and running var in batchnorm)? In this code there is only LeNet and AlexNet, so this won't be a problem. But I wonder have you done experiment on networks with batchnorm?

Thanks a lot!

How to distribute different GPUs for some large models

Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that

Best,
Yugeng

I use the synthesize datas to train lenet，just achieve 12% accuracy

the dataset is mnist， I use random unknown initialization learned 100 synthetic data, then, i use 100 synthetic data to train a random initialization lenet，the accuracy just achieve 12%.
Maybe my code is wrong, can you release the code for me?

The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension 0

I noticed this problem when I wanted to test the distilled images (see basic.py):

The reason is the condition just above.
Indeed, for a binary classification, whose output returns 32 rows (so 64 values), doing (output > 0.5).to(target.dtype).view(-1) will return a 64-value tensor (32 values of 1) but the target contains only 32 values so it will create this problem.

So, to solve this problem, just apply output.argmax(-1) even for binary classification.

Uniquenamespace

Hello.
While I was reading your code,I met a unresolved reference in Line 40 of base_options.py.It is

        self.opt = UniqueNamespace()

So,how can I solve this? Thank you!

some question about dataset distillation

Hello, Dr Wang.
I have some question about dataset distillation. For image x^1 in the synthetic distilled training dataset, the loss function L is very small, or even equal 0. For minimize the objective function, we obtain the distilled dataset only with x^1. As a result, the distilled dataset have only a image.
Or in other words, how do you controll the size of the distilled dataset.
Thank you very much.

How to use a custom dataset?

Adapt distill on dataset SVHN

Hi, I have just train some networks of Lenet on Mnist and adapt distill on dataset USPS. I am wondering how can I adapt distill on dataset SVHN.

Bug in kmeans baseline function?

After running the function to extract the kmeans centroids as baselines, I saved some of these centroids as png images and noticed some of them look like noise which might suggest the presence of a bug, however, I haven't gone through the code myself.

Here are a few of the centroids generated for MNIST class 3:

Any idea of what might be the cause? If it is an actual bug I guess this would impact the values presented in the paper.

For reference, here is the code I used:

data = dataset_distillation.utils.baselines.kmeans_train(state, p=2)
imgs, labels = data[-1]  # Use last step
for i, img in enumerate(imgs):
    torchvision.utils.save_image(img, f"{i}.png", nrow=1, padding=0)

Thank you.

Question about distilled images

Hi, I have run many demos. Now I have a little question, can I use 10 distilled images of MNIST to train a lenet network directly?

retrain distilled images with minibatch-SGD

Hey I am very interested in this work, and have some questions to ask.
I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy.
But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is
(1)Is it just because the different way of optimization?
(2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation?
(3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?

Questions about implementation of optimizing distilled data

Hello!

I am reading the source code, specifically class Trainer in train_distilled_image.py. I have two questions regarding to your implementation of optimizing distilled data:

When computing the gradient of final L w.r.t w in params, you claim that you use w (PRE-GD) in paper and comment in the code. But you are actually using weight after gd in line 156-160 in train_distilled_image.py, since params stores original model weight and model weights after gd in every step. In the loop (line 143), the w corresponds to model weight POST-GD, not PRE-GD. To verify my guess, I check that len(params)=31, len(gws)=30 during running
python main.py --mode distill_basic --dataset MNIST --arch LeNet. That means in the loop, the updated model weight in the final step in first retrieved.
I guess simply disgarding the model weight after gd in the final step will do the job.
In line 172, you use dw.add_(hvp_grad[0]) to update dw, which is wierd because gradient through different steps does not accumulate by adding. If dw denotes the gradient of final L w.r.t the updated w in each step, I wonder if dw=(hvp_grad[0]) is the correct one. Because in my understanding, every unupdated model weight in this step is the updated model weight in last step, which makes hvp_grad[0] itself the gradient of final L w.r.t the updated w in each step.

Anyway, many thanks to your interesting works！

Getting what appears to be noise on Imagenette + XResnet

I am getting outputs that look completely random when I try to run distillation on a subset of imagenet with a XResnet 18 model.

I have only tried one set of command line args and was wondering whether you had any intuition for what I might obviously be doing wrong or had tried this before.

My command is:

python main.py --mode distill_basic --dataset Imagenette --arch DXResNet18 --batch_size 64 \
    --distill_steps 3 --train_nets_type known_init --n_nets 1 \
    --test_nets_type same_as_train

I made my own DXResnet18 class and Imagenette dataloader.

Thanks in advance!

Running in Jupyter

Hi SsnL

I'm trying to run this on the Jupyter but it returns this error:
Unexpected args: ['-f', '/home/user/.local/share/jupyter/runtime/kernel-d7f01d0e-54cb-461d-8d44-0d43cb505a17.json']

I searched for it, it seems the jupyter can't pass arguments correctly to base_options.py
Do you have any idea that how can I fix it?

A very interesting idea

I have download your paper and read it. I think it is a very intesting idea and could give some help for our current reaserch. However, a question, does dataset disstillation could get high accuracy output in other dataset as good as MNIST?

back-gradient optimization technique

Hello!

I have a question about back-gradient optimization technique. Your paper mentioned this article, but reading the source code train_distill_image.py, I've noticed that you couldn't use SGD with momentum (because of previous learning rates influence), and so had to save neural network parameters of each forward step. So what is advantage of your scheme over usual backpropagation?

Question about pulling data from a .gz file

Hello!

We are trying to use dataset distillation with a .gz file (similar to those that can be downloaded from the MNIST dataset). We've been looking through your dataset distillation code but we've been unable to find out where we could edit the code to pull data from our .gz file instead of from the MNIST dataset.

Could you please let me know in which file/line we could edit your dataset distillation code to pull data from the .gz file?

Thank you in advance!

How to use it for custom model and dataset?

Thank you for your nice work.
Could you please provide an example about how to use it for distilling any custom dataset and employ user-defined network structure?

Getting distilled images and testing on them

Hello!

I'm wondering what is the correct way to get distilled images and test performance on them as well check performance on a normal dataset after training on the distilled images. I'm confused since there are many parameters and I've already read the advanced docs. So, to get distilled data, for example on Cifar10, I need to run

python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet  --distill_lr 0.001

Then distilled images are in file result.pth .

So, to train the network on usual full dataset I need to set --mode train , and if I want to test network performance after training on distilled data I need to set --mode train --phase test ?

Or, in other words, how to get results like in your paper where it’s said that you get 80% when fully trained against 54% with distilled data on CIFAR10?

Looking forward for your response! Thanks!

nonlocal avg_images- SyntaxError: invalid syntax

When I run the demo code without any change, I met the error as follows:
File "main.py", line 167
nonlocal avg_images
^
SyntaxError: invalid syntax

Pytorch version: 1.0
Ubuntu: 16.04

Question about applicability

Hi, I came across your paper a few weeks ago.
I have a dataset that constantly grows, like every couple of weeks. The growth is both in terms of more examples of a set of known classes as well as new classes being added.
Is it possible to use this method to keep a reduced dataset of the old images?
For eg: I have 10k images that I want to distill into 100. Then I get a new batch of 200 images.
How would I retrain a model "from scratch" using this combination of distilled and raw images?

I'm a grad student focussing on HPC, so I'm sorry if these questions are silly. But I would greatly appreciate any feedback, thank you!

ssnl / dataset-distillation Goto Github PK

dataset-distillation's People

Contributors

Stargazers

Watchers

Forkers

dataset-distillation's Issues

Recommend Projects

Recommend Topics

Recommend Org