Currently I see two ways of using the Trainer.test_task</cod

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Some proposals about the `Trainer` logic about evojax HOT 7 CLOSED

danielgafni commented on August 23, 2024

Some proposals about the `Trainer` logic

from evojax.

Comments (7)

lerrytang commented on August 23, 2024

Hi, thanks for raising this issue.
Before you start to write code, let us have a discussion to see if the PR is necessary.
The following are my thoughts for discussion:

Your argument makes sense. How about this solution? In training, we pass the training task and the validation task to the trainer. In tests, we pass the test task as both the train and the test tasks and also set demo_mode=True (ref). This will tell the trainer to only evaluate the policy on the test task and no training is done.
If my proposed solution in 1 seems reasonable to you, in tests the best model is equivalent to the last model and the problem is solved.

I personally don't like the idea of early stopping, and the trainer saves the model snapshot with the best validation score. The mapping between this model and the training iteration can be traced in the training log.

from evojax.

danielgafni commented on August 23, 2024

Hey! Thanks for the quick response.

I agree with your points, it's definitely possible to keep the current interface. This usage pattern should be documented somewhere tho.

re: early stopping - it can be necessary in some scenarios. The model is indeed being saved after every iteration, but the trainer.run method doesn't return the information about the best model. You can find it in the logs, sure, but there has to be a programmatic way to do it. Otherwise it's impossible to load the best model automatically. Maybe a solution here is to log the model based on val_score, not on train_score. Then the models/best.npz model would mean "model with best validation score". What do you think about it?

from evojax.

lerrytang commented on August 23, 2024

Maybe a solution here is to log the model based on val_score, not on train_score. Then the models/best.npz model would mean "model with best validation score". What do you think about it?

I made sure the best model (its log too) was based on the validation score (related source code), can you double check this part?

from evojax.

danielgafni commented on August 23, 2024

Oh, you are right, sorry for the confusion.

from evojax.

danielgafni commented on August 23, 2024

@lerrytang so what about early stopping? can we add an early stopping patience and threshold parameters to the trainer? E.g. stop the training loop if the test score doesn't improve by over the last iterations

from evojax.

lerrytang commented on August 23, 2024

While it is common in to use early stopping in supervised learning problems, early stopping is misleading in solving tasks with neuroevolution (from my experience). For example, one often sees the learning curve (test scores) dip for quite a long time before rising up when training a locomotion controller. You may think we can put a knob on how many iterations we should tolerate before we see any progress, but I don't think this extra hyper-parameter is worth the trouble.

from evojax.

danielgafni commented on August 23, 2024

So basically you are saying it's always better to run the ES for a lot of iterations? I'll do some reward/iter plotting for my problem that involves timeseries (which means my train and validation data might come from different distributions). It's probably very data-dependent. Will tell you the results once we add the custom log_fn :)

from evojax.

Some proposals about the `Trainer` logic about evojax HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent