openai / supervised-reptile Goto Github PK

View Code? Open in Web Editor NEW

982.0 158.0 209.0 1.19 MB

Code for the paper "On First-Order Meta-Learning Algorithms"

Home Page: https://arxiv.org/abs/1803.02999

License: MIT License

Python 44.95% Shell 2.57% JavaScript 48.01% HTML 0.39% CSS 4.09%

paper

supervised-reptile's Issues

some problems about the dataset

Hi, I recently follow this work, and I am coding some demos to do some experiments. However, I encounter some problems about the dataset, because some restrictions of networks in China, when download the dataset, some images have the size of 0 KB. So, any one can share the dataset downloaded with me?

Question reagarding the mata gradient computation.

83th line in meta.py:

# Compute the meta gradient and return it, the gradient is from one episode
# in metalearner, it will merge all loss from different episode and sum over it.
loss, pred = self.net_pi(query_x, query_y)
grads_pi = autograd.grad(loss, self.net_pi.parameters(), create_graph=True)

should't the meta grad be (grad_of_net_pi - grad_of_net)?

Cannot reproduce the results for 1-shot 5-way Mini-ImageNet

I used the command below to perform the experiment (environment: Python 3.6 / TensorFlow 1.10):

# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 5 --checkpoint ckpt_m15t

The last several lines of output:

batch 99950: train=0.200000 test=0.200000         
batch 99960: train=0.000000 test=0.000000       
batch 99970: train=0.200000 test=0.000000         
batch 99980: train=0.200000 test=0.200000        
batch 99990: train=0.200000 test=0.400000         
Evaluating...                                       
Train accuracy: 0.28016                            
Validation accuracy: 0.26512                    
Test accuracy: 0.24262

TensorBoard output:

However, in the paper, the test accuracy is around 47%.

Error running script

When i run the below command,
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

I get this error:
Traceback (most recent call last):
File "run_omniglot.py", line 9, in
from supervised_reptile.args import argument_parser, model_kwargs, train_kwargs, evaluate_kwargs
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/args.py", line 10, in
from .reptile import Reptile, FOML
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/reptile.py", line 149
def init(self, *args, tail_shots=None, **kwargs):
^
SyntaxError: invalid syntax

The python version I am using is 'Python 2.7.13' and tensorflow 1.6.0
Can you please tell me how can I possibly solve the error?

TIA

About the role of training set in the process of prediction

def _test_predictions(self, train_set, test_set, input_ph, predictions):
if self._transductive:
inputs, _ = zip(*test_set)
return self.session.run(predictions, feed_dict={input_ph: inputs})
res = []
for test_sample in test_set:
inputs, _ = zip(*train_set)
inputs += (test_sample[0],)
res.append(self.session.run(predictions, feed_dict={input_ph: inputs})[-1])
return res

Why did you add train_set into _test_predictions function, since they had been learned in

self.session.run(minimize_op, feed_dict={input_ph: inputs, label_ph: labels})

which is 119th line of reptile.py

About miniimagenet model

Hi, I noticed that the model used for miniimagenet is slightly different from the model in MAML. In MAML, each conv layer is followed by a batchnorm layer then a maxpool layer, whereas in this code each conv layer is followed by a maxpool layer then a batchnorm layer, is there any reason why you swap the order of batchnorm and maxpool?

Also, does the miniimagenet model in reptile tend to overfit when the number of kernel in each conv layer is larger than 32?

Seems that reptile produce similar gridients as vanilla SGD

I run the sine code and print out the outerloop updated weight without multiply the outerstepsize and the nomal SGD weight, and they are same, even I set the innerepochs bigger than 1.
and form the reptile algorithm, we can see that

,so the only difference is the out loop multiply an epsilon, it is only a learning rate defferent with SGD.

I run the model, but it doesn't convergence. Is there any point that I need to pay attention to?

I run the following code:

# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

The output results is "batch XXX: train=0.000000 test=0.000000".
Is there any wrong?

How to convert the saved models to tflite format?

Model Issue

Hi Guys,
I had one query about the CNN based model that you guys used. I was trying to see if you had any results on larger deeper models (overparameterized nets?) I would generally expect these to do better than the 4 layer networks that you train with now. I understand that you wanted to compare the results with MAML, so you tried to keep those similar, but any info you can give me on that?
Thanks!

fetch_data might not be working correctly for mini-Imagenet?

Hi,
I followed the instructions but the code for mini-imagenet keeps exiting because:
OSError: cannot identify image file <_io.BufferedReader name='data/miniimagenet/train/n04515003/n04515003_15351.JPEG'>
Most of this folder (n04515003) is empty images, however fetch_data seemed to exit normally without any error messages?
I checked how many empty files there were under mini-imagenet's subfolder and it seems to be 53186/59981 files?
Omniglot seems fine, however?
Any idea what could be wrong with the script or what I could be doing wrong?

0096896962508

كنخ17

About hyper-parameter

Hi,
Nice paper and thanks for sharing the code.

I noticed that the command given in this repo is different from Appendix A in the paper.
In the paper, hyper-parameters between experiments are shared and set to same in table 3 and 4.

Is there much performance difference between different hyper-parameters? 
What should I do if I want to run your algorithm in a new dataset?

Thanks so much!

When using the pre-trained model for retraining, the accuracy declines. What is the reason and is it normal?

args = argument_parser().parse_args()
random.seed(args.seed)

train_set, val_set, test_set = read_dataset(DATA_DIR)

model = MiniImageNetModel(args.classes, **model_kwargs(args))

eval_kwargs = evaluate_kwargs(args)

with tf.Session() as sess:
    if not args.pretrained:
        print('Training...')
        sess.run(tf.global_variables_initializer())
        saver_1 = tf.train.Saver(tf.trainable_variables())
        print(args.checkpoint)
        saver_1.restore(sess, tf.train.latest_checkpoint(args.checkpoint))
        print('Test accuracy: ' + str(evaluate(sess, model, test_set, **eval_kwargs)))
        graph = tf.get_default_graph()
        tf.summary.FileWriter('./logs',graph)
        train(sess, model.minimize_op,model, train_set, test_set, args.checkpoint, **train_kwargs(args))`

When using the model produced by reptile training, after the first round of training, the saved model was tested, and the accuracy dropped by 20 points, about 0.24 (Note: an evaluation process before retraining, verified that it has been loaded correctly model In fact, a restore process was added in advance on the basis of the original reptile code train, and the training code and data set were not changed.) It rose to 0.36 in the 200th round and 0.42 in the 3000th round. The accuracy of original model is about 0.46. The training method and data set used are exactly the same as those used to train the original model, and the loading of the model is the same. Don't know if this is normal and what causes it

demo code for reinforcement learning?

is there any difference for reinforcement learning implementation? maml has 2 repos, one for supervised and one for rl, I wonder why reptile has no rl version.

Question regarding the evaluation

How many tasks did you use to calculate the mean accuracy and confidence interval. From the code, it looks like that the number of sample used here is 10k for evaluation stage for either of the dataset. Could you please elaborate a bit ? Are the number of samples equivalent to the number of tasks in the eval stage?

1-shot 5-way Mini-ImageNet setting

Why is --eval-batch set to 15 in this setting?
It doesn't seem to make much sense since you just repeat the same samples 3 times in that case.

How to understand the transductive in the code?

What partition will change If I set the transductive=True? i cant figure out it

Training hyperparameters

Hello,

I'm hoping to confirm that the hyperparameters specified in your paper are correct. Specifically, for miniimagenet, 100k meta steps were taken during training? I ask because it seems some of the default values in the code are different.

Parallel version

Any plans to release a multi-GPU version of this? It looks like we should be able to run the meta_batch_size iterations of the outer loop in reptile.train_step in parallel on separate GPUs.

(I may have a shot at implementing it if there are no plans ATM, and if you think it'd give a nontrivial speedup.)

~ Ben

Difference between shots and train_shots

I believe num_shots in the code explain number of examples for each class. In the train function the initialization is given as "num_shots = train_shots or num_shots".

Now for 1-shot 5-way example (given),
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive

For above case, train-shots = 15 and shots=1, now num_shots in the code would get 15 but it should be 1 as it is one example per class (1-shot, 5-way). Maybe I am missing something, can you please clarify?

How to interpret the batch accuracy for train and test

Hi I am constantly getting 1.0000 or 0.5000 or 0.0000 on custom dataset. What does these value mean?? and why I am getting only 1.000 or 0.5000 or 0.0000??

question about dataset

In the paper, I found this sentence,
Unlike FOMAML, Reptile doesn’t need a training-test split for each task, which
may make it a more natural choice in certain settings
(https://arxiv.org/pdf/1803.02999.pdf)

it means I dont need to split data?(support set and query set in each episode in train data set)
Am I right?

Bump into some accuracy problem

I ran the 5-shot 5-way Mini-ImageNet experiment using the command in readme. But when it finished, I got this.

Are the hyperparameters optimal ? Or is there anything else I should pay attention to ? Thanks.

What are 5-shot 5-way Reptile + Transduction hyperparameters?

In https://arxiv.org/abs/1803.02999 you show results for 5-shot 5-way Reptile + Transduction. There is no such configuration in README though. Could you share hyperparameters used for transductive 5-shot 5-way?

About batchnorm

Why did you implement it that turned the batchnorm on training mode all the way?

out = tf.layers.batch_normalization(out, training=True)

Shouldn't it be turned off during the test of metatest ？

Having a summary of the model in readme would be helpful

Hello,

It would be nice to have a summary of the model, visual representation of the model in readme file.

Update Omniglot URL

Hello,

Great work indeed!! It evaded me that the OMNIGLOT dataset path is the following: https://github.com/brendenlake/omniglot/tree/master/python while inside fetch_data.sh it is as follows:

OMNIGLOT_URL=https://raw.githubusercontent.com/brendenlake/omniglot/master/python

Reptile for numeric data

Any guidance on using reptile for Time series data.

moving average in AdamOptimizer when conducting evaluation

It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?

Typo in Algorithm 2 of the paper

Hi, I think there is typo on the second to last line of Algorithm 2 on page 3 of this paper https://arxiv.org/abs/1803.02999 (March 8 version). Should it be 1 over n instead of k, since we're averaging over tasks rather than sgd steps?

openai / supervised-reptile Goto Github PK

supervised-reptile's Issues

Recommend Projects

Recommend Topics

Recommend Org