openai / supervised-reptile Goto Github PK

View Code? Open in Web Editor NEW

982.0 158.0 209.0 1.19 MB

Code for the paper "On First-Order Meta-Learning Algorithms"

Home Page: https://arxiv.org/abs/1803.02999

License: MIT License

Python 44.95% Shell 2.57% JavaScript 48.01% HTML 0.39% CSS 4.09%

paper

supervised-reptile's Introduction

Status: Archive (code is provided as-is, no updates expected)

supervised-reptile

Reptile training code for Omniglot and Mini-ImageNet.

Reptile is a meta-learning algorithm that finds a good initialization. It works by sampling a task, training on the sampled task, and then updating the initialization towards the new weights for the task.

Getting the data

The fetch_data.sh script creates a data/ directory and downloads Omniglot and Mini-ImageNet into it. The data is on the order of 5GB, so the download takes 10-20 minutes on a reasonably fast internet connection.

$ ./fetch_data.sh
Fetching omniglot/images_background ...
Extracting omniglot/images_background ...
Fetching omniglot/images_evaluation ...
Extracting omniglot/images_evaluation ...
Fetching Mini-ImageNet train set ...
Fetching wnid: n01532829
Fetching wnid: n01558993
Fetching wnid: n01704323
Fetching wnid: n01749939
...

If you want to download Omniglot but not Mini-ImageNet, you can simply kill the script after it starts downloading Mini-ImageNet. The script automatically deletes partially-downloaded data when it is killed early.

Reproducing training runs

You can train models with the run_omniglot.py and run_miniimagenet.py scripts. Hyper-parameters are specified as flags (see --help for a detailed list). Here are the commands used for the paper:

# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 10 --inner-iters 5 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o15t --transductive

# transductive 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m15t --transductive

# 5-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m55

# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m15

# 5-shot 5-way Omniglot.
python -u run_omniglot.py --train-shots 10 --inner-batch 10 --inner-iters 5 --learning-rate 0.001 --meta-step 1 --meta-step-final 0 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --checkpoint ckpt_o55

# 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 10 --inner-iters 5 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o15

# 1-shot 20-way Omniglot.
python -u run_omniglot.py --shots 1 --classes 20 --inner-batch 20 --inner-iters 10 --meta-step 1 --meta-batch 5 --meta-iters 200000 --eval-batch 10 --eval-iters 50 --learning-rate 0.0005 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o120

# 5-shot 20-way Omniglot.
python -u run_omniglot.py --classes 20 --inner-batch 20 --inner-iters 10 --meta-step 1 --meta-batch 5 --meta-iters 200000 --eval-batch 10 --eval-iters 50 --learning-rate 0.0005 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o520

Training creates checkpoints. Currently, you cannot resume training from a checkpoint, but you can re-run evaluation from a checkpoint by passing --pretrained. You can use TensorBoard on the checkpoint directories to see approximate learning curves during training and testing.

To evaluate with transduction, pass the --transductive flag. In this implementation, transductive evaluation is faster than non-transductive evaluation since it makes better use of batches.

Comparing different inner-loop gradient combinations

Here are the commands for comparing different gradient combinations. The --foml flag indicates that only the final gradient should be used.

# Shared hyper-parameters for all experiments.
shared="--sgd --seed 0 --inner-batch 25 --learning-rate 0.003 --meta-step-final 0 --meta-iters 40000 --eval-batch 25 --eval-iters 5 --eval-interval 1"

python run_omniglot.py --inner-iters 1 --train-shots 5 --meta-step 0.25 --checkpoint g1_ckpt $shared | tee g1.txt

python run_omniglot.py --inner-iters 2 --train-shots 10 --meta-step 0.25 --checkpoint g1_g2_ckpt $shared | tee g1_g2.txt
python run_omniglot.py --inner-iters 2 --train-shots 10 --meta-step 0.125 --checkpoint half_g1_g2_ckpt $shared | tee half_g1_g2.txt
python run_omniglot.py --foml --inner-iters 2 --train-shots 10 --meta-step 0.25 --checkpoint g2_ckpt $shared | tee g2.txt

python run_omniglot.py --inner-iters 3 --train-shots 15 --meta-step 0.25 --checkpoint g1_g2_g3_ckpt $shared | tee g1_g2_g3.txt
python run_omniglot.py --inner-iters 3 --train-shots 15 --meta-step 0.08325 --checkpoint third_g1_g2_g3_ckpt $shared | tee third_g1_g2_g3.txt
python run_omniglot.py --foml --inner-iters 3 --train-shots 15 --meta-step 0.25 --checkpoint g3_ckpt $shared | tee g3.txt

python run_omniglot.py --foml --inner-iters 4 --train-shots 20 --meta-step 0.25 --checkpoint g4_ckpt $shared | tee g4.txt
python run_omniglot.py --inner-iters 4 --train-shots 20 --meta-step 0.25 --checkpoint g1_g2_g3_g4_ckpt $shared | tee g1_g2_g3_g4.txt
python run_omniglot.py --inner-iters 4 --train-shots 20 --meta-step 0.0625 --checkpoint fourth_g1_g2_g3_g4_ckpt $shared | tee fourth_g1_g2_g3_g4.txt

supervised-reptile's People

Stargazers

Watchers

Forkers

amoliu codeaudit jithsjoy javierluraschi davidsonggithub cgoodier ggmm2008 wangqing2018 jochensz nunofernandes-plight hbcbh1999 zpeng1989 ahuirecome winwinjjiang gwli quxiaofeng denethor1997 shunsunsun furyphoenix fujenchu jayvischeng owenlzy monjovi bpd1069 cuckooo mehrdad-shokri pariyat robbine patverga jiths qweszxc7410 24jay ajdroid sunshuai666 xcheng4git ryanbrand hzm8341 lim0606 gaohaidong eridgd zendevelopmentsystems ds-iitr andrewhess chetanmehra zy1620454507 thapliy devscience 4dlcreate lazybrainai shubhamag yassermustafa fullstackenviormentss ykankaya kismuz ourobouros locussam maraparicio ankitvgupta codes-kzhan csyanbin githubpgq bkj vikasmech oofs afshinrahimi selvamshan airobotgui qinxiaojinla quokkayun qhuang-pnl farizikhwantri ai-how mutewall dorniwang trodjr kiaragrouwstra erfaneshrati fangliangbai david-lee-1990 litowang samantha-fu xwjabc glogowski-wojciech sweatyrichard megayeye tedrepo christopherhesse laobiz yinlang832 hl00 yeonsuyam hoangcuong2011 syzlhh pvgupta24 shashankp19 yuye1992 barnrang shaform db-li daiviet01

supervised-reptile's Issues

About batchnorm

Why did you implement it that turned the batchnorm on training mode all the way?

out = tf.layers.batch_normalization(out, training=True)

Shouldn't it be turned off during the test of metatest ？

Question reagarding the mata gradient computation.

83th line in meta.py:

# Compute the meta gradient and return it, the gradient is from one episode
# in metalearner, it will merge all loss from different episode and sum over it.
loss, pred = self.net_pi(query_x, query_y)
grads_pi = autograd.grad(loss, self.net_pi.parameters(), create_graph=True)

should't the meta grad be (grad_of_net_pi - grad_of_net)?

question about dataset

In the paper, I found this sentence,
Unlike FOMAML, Reptile doesn’t need a training-test split for each task, which
may make it a more natural choice in certain settings
(https://arxiv.org/pdf/1803.02999.pdf)

it means I dont need to split data?(support set and query set in each episode in train data set)
Am I right?

Update Omniglot URL

Hello,

Great work indeed!! It evaded me that the OMNIGLOT dataset path is the following: https://github.com/brendenlake/omniglot/tree/master/python while inside fetch_data.sh it is as follows:

OMNIGLOT_URL=https://raw.githubusercontent.com/brendenlake/omniglot/master/python

I run the model, but it doesn't convergence. Is there any point that I need to pay attention to?

I run the following code:

# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

The output results is "batch XXX: train=0.000000 test=0.000000".
Is there any wrong?

0096896962508

كنخ17

demo code for reinforcement learning?

is there any difference for reinforcement learning implementation? maml has 2 repos, one for supervised and one for rl, I wonder why reptile has no rl version.

Error running script

When i run the below command,
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

I get this error:
Traceback (most recent call last):
File "run_omniglot.py", line 9, in
from supervised_reptile.args import argument_parser, model_kwargs, train_kwargs, evaluate_kwargs
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/args.py", line 10, in
from .reptile import Reptile, FOML
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/reptile.py", line 149
def init(self, *args, tail_shots=None, **kwargs):
^
SyntaxError: invalid syntax

The python version I am using is 'Python 2.7.13' and tensorflow 1.6.0
Can you please tell me how can I possibly solve the error?

TIA

Training hyperparameters

Hello,

I'm hoping to confirm that the hyperparameters specified in your paper are correct. Specifically, for miniimagenet, 100k meta steps were taken during training? I ask because it seems some of the default values in the code are different.

Difference between shots and train_shots

I believe num_shots in the code explain number of examples for each class. In the train function the initialization is given as "num_shots = train_shots or num_shots".

Now for 1-shot 5-way example (given),
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

For above case, train-shots = 15 and shots=1, now num_shots in the code would get 15 but it should be 1 as it is one example per class (1-shot, 5-way). Maybe I am missing something, can you please clarify?

Cannot reproduce the results for 1-shot 5-way Mini-ImageNet

I used the command below to perform the experiment (environment: Python 3.6 / TensorFlow 1.10):

# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 5 --checkpoint ckpt_m15t

The last several lines of output:

batch 99950: train=0.200000 test=0.200000         
batch 99960: train=0.000000 test=0.000000       
batch 99970: train=0.200000 test=0.000000         
batch 99980: train=0.200000 test=0.200000        
batch 99990: train=0.200000 test=0.400000         
Evaluating...                                       
Train accuracy: 0.28016                            
Validation accuracy: 0.26512                    
Test accuracy: 0.24262

TensorBoard output:

However, in the paper, the test accuracy is around 47%.

Having a summary of the model in readme would be helpful

Hello,

It would be nice to have a summary of the model, visual representation of the model in readme file.

Parallel version

Any plans to release a multi-GPU version of this? It looks like we should be able to run the meta_batch_size iterations of the outer loop in reptile.train_step in parallel on separate GPUs.

(I may have a shot at implementing it if there are no plans ATM, and if you think it'd give a nontrivial speedup.)

~ Ben

About miniimagenet model

Hi, I noticed that the model used for miniimagenet is slightly different from the model in MAML. In MAML, each conv layer is followed by a batchnorm layer then a maxpool layer, whereas in this code each conv layer is followed by a maxpool layer then a batchnorm layer, is there any reason why you swap the order of batchnorm and maxpool?

Also, does the miniimagenet model in reptile tend to overfit when the number of kernel in each conv layer is larger than 32?

How to understand the transductive in the code?

What partition will change If I set the transductive=True? i cant figure out it

About the role of training set in the process of prediction

def _test_predictions(self, train_set, test_set, input_ph, predictions):
if self._transductive:
inputs, _ = zip(*test_set)
return self.session.run(predictions, feed_dict={input_ph: inputs})
res = []
for test_sample in test_set:
inputs, _ = zip(*train_set)
inputs += (test_sample[0],)
res.append(self.session.run(predictions, feed_dict={input_ph: inputs})[-1])
return res

Why did you add train_set into _test_predictions function, since they had been learned in

self.session.run(minimize_op, feed_dict={input_ph: inputs, label_ph: labels})

which is 119th line of reptile.py

What are 5-shot 5-way Reptile + Transduction hyperparameters?

In https://arxiv.org/abs/1803.02999 you show results for 5-shot 5-way Reptile + Transduction. There is no such configuration in README though. Could you share hyperparameters used for transductive 5-shot 5-way?

1-shot 5-way Mini-ImageNet setting

Why is --eval-batch set to 15 in this setting?
It doesn't seem to make much sense since you just repeat the same samples 3 times in that case.

How to interpret the batch accuracy for train and test

Hi I am constantly getting 1.0000 or 0.5000 or 0.0000 on custom dataset. What does these value mean?? and why I am getting only 1.000 or 0.5000 or 0.0000??

Model Issue

Hi Guys,
I had one query about the CNN based model that you guys used. I was trying to see if you had any results on larger deeper models (overparameterized nets?) I would generally expect these to do better than the 4 layer networks that you train with now. I understand that you wanted to compare the results with MAML, so you tried to keep those similar, but any info you can give me on that?
Thanks!

Typo in Algorithm 2 of the paper

Hi, I think there is typo on the second to last line of Algorithm 2 on page 3 of this paper https://arxiv.org/abs/1803.02999 (March 8 version). Should it be 1 over n instead of k, since we're averaging over tasks rather than sgd steps?

About hyper-parameter

Hi,
Nice paper and thanks for sharing the code.

I noticed that the command given in this repo is different from Appendix A in the paper.
In the paper, hyper-parameters between experiments are shared and set to same in table 3 and 4.

Is there much performance difference between different hyper-parameters? 
What should I do if I want to run your algorithm in a new dataset?

Thanks so much!

Question regarding the evaluation

How many tasks did you use to calculate the mean accuracy and confidence interval. From the code, it looks like that the number of sample used here is 10k for evaluation stage for either of the dataset. Could you please elaborate a bit ? Are the number of samples equivalent to the number of tasks in the eval stage?

Bump into some accuracy problem

I ran the 5-shot 5-way Mini-ImageNet experiment using the command in readme. But when it finished, I got this.

Are the hyperparameters optimal ? Or is there anything else I should pay attention to ? Thanks.

fetch_data might not be working correctly for mini-Imagenet?

Hi,
I followed the instructions but the code for mini-imagenet keeps exiting because:
OSError: cannot identify image file <_io.BufferedReader name='data/miniimagenet/train/n04515003/n04515003_15351.JPEG'>
Most of this folder (n04515003) is empty images, however fetch_data seemed to exit normally without any error messages?
I checked how many empty files there were under mini-imagenet's subfolder and it seems to be 53186/59981 files?
Omniglot seems fine, however?
Any idea what could be wrong with the script or what I could be doing wrong?

moving average in AdamOptimizer when conducting evaluation

It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?

Reptile for numeric data

Any guidance on using reptile for Time series data.

Seems that reptile produce similar gridients as vanilla SGD

I run the sine code and print out the outerloop updated weight without multiply the outerstepsize and the nomal SGD weight, and they are same, even I set the innerepochs bigger than 1.
and form the reptile algorithm, we can see that

,so the only difference is the out loop multiply an epsilon, it is only a learning rate defferent with SGD.

How to convert the saved models to tflite format?

some problems about the dataset

Hi, I recently follow this work, and I am coding some demos to do some experiments. However, I encounter some problems about the dataset, because some restrictions of networks in China, when download the dataset, some images have the size of 0 KB. So, any one can share the dataset downloaded with me?

When using the pre-trained model for retraining, the accuracy declines. What is the reason and is it normal?

args = argument_parser().parse_args()
random.seed(args.seed)

train_set, val_set, test_set = read_dataset(DATA_DIR)

model = MiniImageNetModel(args.classes, **model_kwargs(args))

eval_kwargs = evaluate_kwargs(args)

with tf.Session() as sess:
    if not args.pretrained:
        print('Training...')
        sess.run(tf.global_variables_initializer())
        saver_1 = tf.train.Saver(tf.trainable_variables())
        print(args.checkpoint)
        saver_1.restore(sess, tf.train.latest_checkpoint(args.checkpoint))
        print('Test accuracy: ' + str(evaluate(sess, model, test_set, **eval_kwargs)))
        graph = tf.get_default_graph()
        tf.summary.FileWriter('./logs',graph)
        train(sess, model.minimize_op,model, train_set, test_set, args.checkpoint, **train_kwargs(args))`

When using the model produced by reptile training, after the first round of training, the saved model was tested, and the accuracy dropped by 20 points, about 0.24 (Note: an evaluation process before retraining, verified that it has been loaded correctly model In fact, a restore process was added in advance on the basis of the original reptile code train, and the training code and data set were not changed.) It rose to 0.36 in the 200th round and 0.42 in the 3000th round. The accuracy of original model is about 0.46. The training method and data set used are exactly the same as those used to train the original model, and the loading of the model is the same. Don't know if this is normal and what causes it

openai / supervised-reptile Goto Github PK

supervised-reptile's Introduction

supervised-reptile

Getting the data

Reproducing training runs

Comparing different inner-loop gradient combinations

supervised-reptile's People

Stargazers

Watchers

Forkers

supervised-reptile's Issues

Recommend Projects

Recommend Topics

Recommend Org