openai / supervised-reptile Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "On First-Order Meta-Learning Algorithms"
Home Page: https://arxiv.org/abs/1803.02999
License: MIT License
Code for the paper "On First-Order Meta-Learning Algorithms"
Home Page: https://arxiv.org/abs/1803.02999
License: MIT License
Hi, I recently follow this work, and I am coding some demos to do some experiments. However, I encounter some problems about the dataset, because some restrictions of networks in China, when download the dataset, some images have the size of 0 KB. So, any one can share the dataset downloaded with me?
83th line in meta.py:
# Compute the meta gradient and return it, the gradient is from one episode
# in metalearner, it will merge all loss from different episode and sum over it.
loss, pred = self.net_pi(query_x, query_y)
grads_pi = autograd.grad(loss, self.net_pi.parameters(), create_graph=True)
should't the meta grad be (grad_of_net_pi - grad_of_net)?
I used the command below to perform the experiment (environment: Python 3.6 / TensorFlow 1.10):
# 1-shot 5-way Mini-ImageNet.
python -u run_miniimagenet.py --shots 1 --inner-batch 10 --inner-iters 8 --meta-step 1 --meta-batch 5 --meta-iters 100000 --eval-batch 15 --eval-iters 50 --learning-rate 0.001 --meta-step-final 0 --train-shots 5 --checkpoint ckpt_m15t
The last several lines of output:
batch 99950: train=0.200000 test=0.200000
batch 99960: train=0.000000 test=0.000000
batch 99970: train=0.200000 test=0.000000
batch 99980: train=0.200000 test=0.200000
batch 99990: train=0.200000 test=0.400000
Evaluating...
Train accuracy: 0.28016
Validation accuracy: 0.26512
Test accuracy: 0.24262
However, in the paper, the test accuracy is around 47%.
When i run the below command,
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive
I get this error:
Traceback (most recent call last):
File "run_omniglot.py", line 9, in
from supervised_reptile.args import argument_parser, model_kwargs, train_kwargs, evaluate_kwargs
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/args.py", line 10, in
from .reptile import Reptile, FOML
File "/Users/nazu/meta_learning/supervised-reptile/supervised_reptile/reptile.py", line 149
def init(self, *args, tail_shots=None, **kwargs):
^
SyntaxError: invalid syntax
The python version I am using is 'Python 2.7.13' and tensorflow 1.6.0
Can you please tell me how can I possibly solve the error?
TIA
def _test_predictions(self, train_set, test_set, input_ph, predictions):
if self._transductive:
inputs, _ = zip(*test_set)
return self.session.run(predictions, feed_dict={input_ph: inputs})
res = []
for test_sample in test_set:
inputs, _ = zip(*train_set)
inputs += (test_sample[0],)
res.append(self.session.run(predictions, feed_dict={input_ph: inputs})[-1])
return res
Why did you add train_set into _test_predictions function, since they had been learned in
self.session.run(minimize_op, feed_dict={input_ph: inputs, label_ph: labels})
which is 119th line of reptile.py
Hi, I noticed that the model used for miniimagenet is slightly different from the model in MAML. In MAML, each conv layer is followed by a batchnorm layer then a maxpool layer, whereas in this code each conv layer is followed by a maxpool layer then a batchnorm layer, is there any reason why you swap the order of batchnorm and maxpool?
Also, does the miniimagenet model in reptile tend to overfit when the number of kernel in each conv layer is larger than 32?
I run the sine code and print out the outerloop updated weight without multiply the outerstepsize
and the nomal SGD weight, and they are same, even I set the innerepochs
bigger than 1.
and form the reptile algorithm, we can see that
,so the only difference is the out loop multiply an epsilon
, it is only a learning rate defferent with SGD.
I run the following code:
# transductive 1-shot 5-way Omniglot.
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive
The output results is "batch XXX: train=0.000000 test=0.000000".
Is there any wrong?
Hi Guys,
I had one query about the CNN based model that you guys used. I was trying to see if you had any results on larger deeper models (overparameterized nets?) I would generally expect these to do better than the 4 layer networks that you train with now. I understand that you wanted to compare the results with MAML, so you tried to keep those similar, but any info you can give me on that?
Thanks!
Hi,
I followed the instructions but the code for mini-imagenet keeps exiting because:
OSError: cannot identify image file <_io.BufferedReader name='data/miniimagenet/train/n04515003/n04515003_15351.JPEG'>
Most of this folder (n04515003) is empty images, however fetch_data seemed to exit normally without any error messages?
I checked how many empty files there were under mini-imagenet's subfolder and it seems to be 53186/59981 files?
Omniglot seems fine, however?
Any idea what could be wrong with the script or what I could be doing wrong?
كنخ17
Hi,
Nice paper and thanks for sharing the code.
I noticed that the command given in this repo is different from Appendix A in the paper.
In the paper, hyper-parameters between experiments are shared and set to same in table 3 and 4.
Is there much performance difference between different hyper-parameters?
What should I do if I want to run your algorithm in a new dataset?
Thanks so much!
`
args = argument_parser().parse_args()
random.seed(args.seed)
train_set, val_set, test_set = read_dataset(DATA_DIR)
model = MiniImageNetModel(args.classes, **model_kwargs(args))
eval_kwargs = evaluate_kwargs(args)
with tf.Session() as sess:
if not args.pretrained:
print('Training...')
sess.run(tf.global_variables_initializer())
saver_1 = tf.train.Saver(tf.trainable_variables())
print(args.checkpoint)
saver_1.restore(sess, tf.train.latest_checkpoint(args.checkpoint))
print('Test accuracy: ' + str(evaluate(sess, model, test_set, **eval_kwargs)))
graph = tf.get_default_graph()
tf.summary.FileWriter('./logs',graph)
train(sess, model.minimize_op,model, train_set, test_set, args.checkpoint, **train_kwargs(args))`
When using the model produced by reptile training, after the first round of training, the saved model was tested, and the accuracy dropped by 20 points, about 0.24 (Note: an evaluation process before retraining, verified that it has been loaded correctly model In fact, a restore process was added in advance on the basis of the original reptile code train, and the training code and data set were not changed.) It rose to 0.36 in the 200th round and 0.42 in the 3000th round. The accuracy of original model is about 0.46. The training method and data set used are exactly the same as those used to train the original model, and the loading of the model is the same. Don't know if this is normal and what causes it
is there any difference for reinforcement learning implementation? maml has 2 repos, one for supervised and one for rl, I wonder why reptile has no rl version.
How many tasks did you use to calculate the mean accuracy and confidence interval. From the code, it looks like that the number of sample used here is 10k for evaluation stage for either of the dataset. Could you please elaborate a bit ? Are the number of samples equivalent to the number of tasks in the eval stage?
Why is --eval-batch set to 15 in this setting?
It doesn't seem to make much sense since you just repeat the same samples 3 times in that case.
What partition will change If I set the transductive=True? i cant figure out it
Hello,
I'm hoping to confirm that the hyperparameters specified in your paper are correct. Specifically, for miniimagenet, 100k meta steps were taken during training? I ask because it seems some of the default values in the code are different.
Any plans to release a multi-GPU version of this? It looks like we should be able to run the meta_batch_size
iterations of the outer loop in reptile.train_step
in parallel on separate GPUs.
(I may have a shot at implementing it if there are no plans ATM, and if you think it'd give a nontrivial speedup.)
~ Ben
I believe num_shots in the code explain number of examples for each class. In the train function the initialization is given as "num_shots = train_shots or num_shots".
Now for 1-shot 5-way example (given),
python -u run_omniglot.py --shots 1 --inner-batch 25 --inner-iters 3 --meta-step 1 --meta-batch 10 --meta-iters 100000 --eval-batch 25 --eval-iters 5 --learning-rate 0.001 --meta-step-final 0 --train-shots 15 --checkpoint ckpt_o15t --transductive
For above case, train-shots = 15 and shots=1, now num_shots in the code would get 15 but it should be 1 as it is one example per class (1-shot, 5-way). Maybe I am missing something, can you please clarify?
In the paper, I found this sentence,
Unlike FOMAML, Reptile doesn’t need a training-test split for each task, which
may make it a more natural choice in certain settings
(https://arxiv.org/pdf/1803.02999.pdf)
it means I dont need to split data?(support set and query set in each episode in train data set)
Am I right?
In https://arxiv.org/abs/1803.02999 you show results for 5-shot 5-way Reptile + Transduction. There is no such configuration in README though. Could you share hyperparameters used for transductive 5-shot 5-way?
Why did you implement it that turned the batchnorm on training mode all the way?
out = tf.layers.batch_normalization(out, training=True)
Shouldn't it be turned off during the test of metatest ?
Hello,
It would be nice to have a summary of the model, visual representation of the model in readme file.
Hello,
Great work indeed!! It evaded me that the OMNIGLOT dataset path is the following: https://github.com/brendenlake/omniglot/tree/master/python while inside fetch_data.sh it is as follows:
OMNIGLOT_URL=https://raw.githubusercontent.com/brendenlake/omniglot/master/python
Any guidance on using reptile for Time series data.
It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?
Hi, I think there is typo on the second to last line of Algorithm 2 on page 3 of this paper https://arxiv.org/abs/1803.02999 (March 8 version). Should it be 1 over n instead of k, since we're averaging over tasks rather than sgd steps?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.