Giter Club home page Giter Club logo

supervised-reptile's Introduction

Status: Archive (code is provided as-is, no updates expected)

supervised-reptile

Reptile training code for Omniglot and Mini-ImageNet.

Reptile is a meta-learning algorithm that finds a good initialization. It works by sampling a task, training on the sampled task, and then updating the initialization towards the new weights for the task.

Getting the data

The fetch_data.sh script creates a data/ directory and downloads Omniglot and Mini-ImageNet into it. The data is on the order of 5GB, so the download takes 10-20 minutes on a reasonably fast internet connection.

$ ./fetch_data.sh
Fetching omniglot/images_background ...
Extracting omniglot/images_background ...
Fetching omniglot/images_evaluation ...
Extracting omniglot/images_evaluation ...
Fetching Mini-ImageNet train set ...
Fetching wnid: n01532829
Fetching wnid: n01558993
Fetching wnid: n01704323
Fetching wnid: n01749939
...

If you want to download Omniglot but not Mini-ImageNet, you can simply kill the script after it starts downloading Mini-ImageNet. The script automatically deletes partially-downloaded data when it is killed early.

Reproducing training runs

You can train models with the run_omniglot.py and run_miniimagenet.py scripts. Hyper-parameters are specified as flags (see --help for a detailed list). Here are the commands used for the paper:

# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 10 --inner-iters 5 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o15t --transductive

# transductive 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m15t --transductive

# 5-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m55

# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_m15

# 5-shot 5-way Omniglot.
python -u run_omniglot.py --train-shots 10 --inner-batch 10 --inner-iters 5 --learning-rate 0.001 --meta-step 1 --meta-step-final 0 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --checkpoint ckpt_o55

# 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 10 --inner-iters 5 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 5 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o15

# 1-shot 20-way Omniglot.
python -u run_omniglot.py --shots 1 --classes 20 --inner-batch 20 --inner-iters 10 --meta-step 1 --meta-batch 5 --meta-iters 200000 --eval-batch 10 --eval-iters 50 --learning-rate 0.0005 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o120

# 5-shot 20-way Omniglot.
python -u run_omniglot.py --classes 20 --inner-batch 20 --inner-iters 10 --meta-step 1 --meta-batch 5 --meta-iters 200000 --eval-batch 10 --eval-iters 50 --learning-rate 0.0005 --meta-step-final 0 --train-shots 10 --checkpoint ckpt_o520

Training creates checkpoints. Currently, you cannot resume training from a checkpoint, but you can re-run evaluation from a checkpoint by passing --pretrained. You can use TensorBoard on the checkpoint directories to see approximate learning curves during training and testing.

To evaluate with transduction, pass the --transductive flag. In this implementation, transductive evaluation is faster than non-transductive evaluation since it makes better use of batches.

Comparing different inner-loop gradient combinations

Here are the commands for comparing different gradient combinations. The --foml flag indicates that only the final gradient should be used.

# Shared hyper-parameters for all experiments.
shared="--sgd --seed 0 --inner-batch 25 --learning-rate 0.003 --meta-step-final 0 --meta-iters 40000 --eval-batch 25 --eval-iters 5 --eval-interval 1"

python run_omniglot.py --inner-iters 1 --train-shots 5 --meta-step 0.25 --checkpoint g1_ckpt $shared | tee g1.txt

python run_omniglot.py --inner-iters 2 --train-shots 10 --meta-step 0.25 --checkpoint g1_g2_ckpt $shared | tee g1_g2.txt
python run_omniglot.py --inner-iters 2 --train-shots 10 --meta-step 0.125 --checkpoint half_g1_g2_ckpt $shared | tee half_g1_g2.txt
python run_omniglot.py --foml --inner-iters 2 --train-shots 10 --meta-step 0.25 --checkpoint g2_ckpt $shared | tee g2.txt

python run_omniglot.py --inner-iters 3 --train-shots 15 --meta-step 0.25 --checkpoint g1_g2_g3_ckpt $shared | tee g1_g2_g3.txt
python run_omniglot.py --inner-iters 3 --train-shots 15 --meta-step 0.08325 --checkpoint third_g1_g2_g3_ckpt $shared | tee third_g1_g2_g3.txt
python run_omniglot.py --foml --inner-iters 3 --train-shots 15 --meta-step 0.25 --checkpoint g3_ckpt $shared | tee g3.txt

python run_omniglot.py --foml --inner-iters 4 --train-shots 20 --meta-step 0.25 --checkpoint g4_ckpt $shared | tee g4.txt
python run_omniglot.py --inner-iters 4 --train-shots 20 --meta-step 0.25 --checkpoint g1_g2_g3_g4_ckpt $shared | tee g1_g2_g3_g4.txt
python run_omniglot.py --inner-iters 4 --train-shots 20 --meta-step 0.0625 --checkpoint fourth_g1_g2_g3_g4_ckpt $shared | tee fourth_g1_g2_g3_g4.txt

supervised-reptile's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

supervised-reptile's Issues

About batchnorm

Why did you implement it that turned the batchnorm on training mode all the way?

out = tf.layers.batch_normalization(out, training=True)

Shouldn't it be turned off during the test of metatest ?

Question reagarding the mata gradient computation.

83th line in meta.py:

# Compute the meta gradient and return it, the gradient is from one episode
# in metalearner, it will merge all loss from different episode and sum over it.
loss, pred = self.net_pi(query_x, query_y)
grads_pi = autograd.grad(loss, self.net_pi.parameters(), create_graph=True)

should't the meta grad be (grad_of_net_pi - grad_of_net)?

question about dataset

In the paper, I found this sentence,
Unlike FOMAML, Reptile doesn’t need a training-test split for each task, which
may make it a more natural choice in certain settings
(https://arxiv.org/pdf/1803.02999.pdf)

it means I dont need to split data?(support set and query set in each episode in train data set)
Am I right?

I run the model, but it doesn't convergence. Is there any point that I need to pay attention to?

I run the following code:

# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

The output results is "batch XXX: train=0.000000 test=0.000000".
Is there any wrong?

demo code for reinforcement learning?

is there any difference for reinforcement learning implementation? maml has 2 repos, one for supervised and one for rl, I wonder why reptile has no rl version.

Error running script

When i run the below command,
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

I get this error:
Traceback (most recent call last):
File "run_omniglot.py", line 9, in
from supervised_reptile.args import argument_parser, model_kwargs, train_kwargs, evaluate_kwargs
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/args.py", line 10, in
from .reptile import Reptile, FOML
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/reptile.py", line 149
def init(self, *args, tail_shots=None, **kwargs):
^
SyntaxError: invalid syntax

The python version I am using is 'Python 2.7.13' and tensorflow 1.6.0
Can you please tell me how can I possibly solve the error?

TIA

Training hyperparameters

Hello,

I'm hoping to confirm that the hyperparameters specified in your paper are correct. Specifically, for miniimagenet, 100k meta steps were taken during training? I ask because it seems some of the default values in the code are different.

Difference between shots and train_shots

I believe num_shots in the code explain number of examples for each class. In the train function the initialization is given as "num_shots = train_shots or num_shots".

Now for 1-shot 5-way example (given),
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

For above case, train-shots = 15 and shots=1, now num_shots in the code would get 15 but it should be 1 as it is one example per class (1-shot, 5-way). Maybe I am missing something, can you please clarify?

Cannot reproduce the results for 1-shot 5-way Mini-ImageNet

I used the command below to perform the experiment (environment: Python 3.6 / TensorFlow 1.10):

# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 5 --checkpoint ckpt_m15t

The last several lines of output:

batch 99950: train=0.200000 test=0.200000         
batch 99960: train=0.000000 test=0.000000       
batch 99970: train=0.200000 test=0.000000         
batch 99980: train=0.200000 test=0.200000        
batch 99990: train=0.200000 test=0.400000         
Evaluating...                                       
Train accuracy: 0.28016                            
Validation accuracy: 0.26512                    
Test accuracy: 0.24262

TensorBoard output:
1-shot 5-way reptile

However, in the paper, the test accuracy is around 47%.

Parallel version

Any plans to release a multi-GPU version of this? It looks like we should be able to run the meta_batch_size iterations of the outer loop in reptile.train_step in parallel on separate GPUs.

(I may have a shot at implementing it if there are no plans ATM, and if you think it'd give a nontrivial speedup.)

~ Ben

About miniimagenet model

Hi, I noticed that the model used for miniimagenet is slightly different from the model in MAML. In MAML, each conv layer is followed by a batchnorm layer then a maxpool layer, whereas in this code each conv layer is followed by a maxpool layer then a batchnorm layer, is there any reason why you swap the order of batchnorm and maxpool?

Also, does the miniimagenet model in reptile tend to overfit when the number of kernel in each conv layer is larger than 32?

About the role of training set in the process of prediction

def _test_predictions(self, train_set, test_set, input_ph, predictions):
if self._transductive:
inputs, _ = zip(*test_set)
return self.session.run(predictions, feed_dict={input_ph: inputs})
res = []
for test_sample in test_set:
inputs, _ = zip(*train_set)
inputs += (test_sample[0],)
res.append(self.session.run(predictions, feed_dict={input_ph: inputs})[-1])
return res

Why did you add train_set into _test_predictions function, since they had been learned in

self.session.run(minimize_op, feed_dict={input_ph: inputs, label_ph: labels})

which is 119th line of reptile.py

1-shot 5-way Mini-ImageNet setting

Why is --eval-batch set to 15 in this setting?
It doesn't seem to make much sense since you just repeat the same samples 3 times in that case.

Model Issue

Hi Guys,
I had one query about the CNN based model that you guys used. I was trying to see if you had any results on larger deeper models (overparameterized nets?) I would generally expect these to do better than the 4 layer networks that you train with now. I understand that you wanted to compare the results with MAML, so you tried to keep those similar, but any info you can give me on that?
Thanks!

About hyper-parameter

Hi,
Nice paper and thanks for sharing the code.

I noticed that the command given in this repo is different from Appendix A in the paper.
In the paper, hyper-parameters between experiments are shared and set to same in table 3 and 4.

Is there much performance difference between different hyper-parameters? 
What should I do if I want to run your algorithm in a new dataset?

Thanks so much!

Question regarding the evaluation

How many tasks did you use to calculate the mean accuracy and confidence interval. From the code, it looks like that the number of sample used here is 10k for evaluation stage for either of the dataset. Could you please elaborate a bit ? Are the number of samples equivalent to the number of tasks in the eval stage?

Bump into some accuracy problem

I ran the 5-shot 5-way Mini-ImageNet experiment using the command in readme. But when it finished, I got this.

default

Are the hyperparameters optimal ? Or is there anything else I should pay attention to ? Thanks.

fetch_data might not be working correctly for mini-Imagenet?

Hi,
I followed the instructions but the code for mini-imagenet keeps exiting because:
OSError: cannot identify image file <_io.BufferedReader name='data/miniimagenet/train/n04515003/n04515003_15351.JPEG'>
Most of this folder (n04515003) is empty images, however fetch_data seemed to exit normally without any error messages?
I checked how many empty files there were under mini-imagenet's subfolder and it seems to be 53186/59981 files?
Omniglot seems fine, however?
Any idea what could be wrong with the script or what I could be doing wrong?

moving average in AdamOptimizer when conducting evaluation

It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?

Seems that reptile produce similar gridients as vanilla SGD

I run the sine code and print out the outerloop updated weight without multiply the outerstepsize and the nomal SGD weight, and they are same, even I set the innerepochs bigger than 1.
and form the reptile algorithm, we can see that
CodeCogsEqn

,so the only difference is the out loop multiply an epsilon, it is only a learning rate defferent with SGD.

some problems about the dataset

Hi, I recently follow this work, and I am coding some demos to do some experiments. However, I encounter some problems about the dataset, because some restrictions of networks in China, when download the dataset, some images have the size of 0 KB. So, any one can share the dataset downloaded with me?

When using the pre-trained model for retraining, the accuracy declines. What is the reason and is it normal?

`

args = argument_parser().parse_args()
random.seed(args.seed)

train_set, val_set, test_set = read_dataset(DATA_DIR)

model = MiniImageNetModel(args.classes, **model_kwargs(args))

eval_kwargs = evaluate_kwargs(args)

with tf.Session() as sess:
    if not args.pretrained:
        print('Training...')
        sess.run(tf.global_variables_initializer())
        saver_1 = tf.train.Saver(tf.trainable_variables())
        print(args.checkpoint)
        saver_1.restore(sess, tf.train.latest_checkpoint(args.checkpoint))
        print('Test accuracy: ' + str(evaluate(sess, model, test_set, **eval_kwargs)))
        graph = tf.get_default_graph()
        tf.summary.FileWriter('./logs',graph)
        train(sess, model.minimize_op,model, train_set, test_set, args.checkpoint, **train_kwargs(args))`

When using the model produced by reptile training, after the first round of training, the saved model was tested, and the accuracy dropped by 20 points, about 0.24 (Note: an evaluation process before retraining, verified that it has been loaded correctly model In fact, a restore process was added in advance on the basis of the original reptile code train, and the training code and data set were not changed.) It rose to 0.36 in the 200th round and 0.42 in the 3000th round. The accuracy of original model is about 0.46. The training method and data set used are exactly the same as those used to train the original model, and the loading of the model is the same. Don't know if this is normal and what causes it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.