Giter Club home page Giter Club logo

Comments (21)

AranKomat avatar AranKomat commented on July 23, 2024 2

@gooooloo Well, that makes sense. But when I said 150 stones on average Go game, I didn't take into account the symmetries, so for fair comparison I didn't consider symmetries of reversi, which has the same set of symmetries as Go. Sorry for not being explicit. Since what we're concerned with is the ratio between our training/self-play ratio (5.3 after symmetries) vs. AZ's training/self-play ratio (about 0.44, but it's 0.44/8=0.055 after symmetries), there's still 100 times of difference, which is reasonable given the number of GPUs we're using.

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024 1

Thanks for your answer. In the case of Go with AlphaZero, 700k minibatches (2048 positions each) and 21 million self-play games were performed. Assuming that each game ended with 150 stones (positions) placed, 700k x 2048/(21m x 150)=0.44 [trained position]/[self-play-generated position], which is much less than 68. So, I guess you can improve your performance with more self-plays per update. Maybe the performance gain by increasing the sims/move from 100 to 800 was because you had a small self-play/training ratio, that is, you had too little exploration. Since having more games generated means more diverse data than having more sims/move, so spending more time on self-play may be more beneficial than more sims/move. But in practice, since your alg doesn't allow multi-processing (of multiple games) as done by Akababa, my suggestion may be not useful. But this may be useful for @gooooloo .

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024 1

I am testing on feature/multiprocess_selfplay,

when 16 parallel in self-play,

  • 580 seconds per 1 self-play game (16 parallel) -> 36 seconds per self-play game
  • 400 positions per 1 self-play game
  • 225 seconds per 200 steps(bs=256) -> 225 seconds per 200*256 positions

so

  • Training: 228 positions / seconds (=200*256/225)
  • SelfPlay: 11 positions / seconds (=400 / 36)
  • Training/SelfPlay Ratio: 21 (=228/11)

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024 1

I also added wait to optimizer to change the ratio.

Now,

  • 164 self-play game per 1 hour -> 22(=3600/164) seconds per self-play game
  • 400 positions per 1 self-play game
  • 225 * 2 seconds per 200 steps(bs=256) -> 450 seconds per 200 * 256 positions

so

  • Training: 113 positions / seconds (=200*256/450)
  • SelfPlay: 18 positions / seconds (= 400 / 22)
  • Training/SelfPlay Ratio: 6.2 (=113/18)

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024 1

@AranKomat

Mine is:

  • 30 processes for self-play, about 150 seconds per game per process, gives 5 seconds per game in average.
  • about 12 minutes per 100 steps training, batch size = 3072, gives 426 positions per second (=3072*100/12/60)

I actually don't understand below number @mokemokechicken mentioned:

400 positions per 1 self-play game

But if I just use this number, then I have self-play speed: 80 positions per second (=400/5).
Then Training/SelfPlay Ration: 5.3 (=426/80)

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024 1

I used (nb_game_in_file, max_file_num)=(5, 300), so the number of total games in training data was 1500 (games).
My training dataset size was about 600k (positions).
So, 600k / 1500 = 400 (position/game).

But a reversi game has up to 60 position to move, isn't it? Event with up to 5 "PASS" move, it is 65. Then even with game state flip and rotation, it is at most 260.

UPDATE:
Oh my fault, "flip and rotation" gives a x8 multiplication, not x4. Then it makes sense. 400/8=50, you are playing 50 moves per game, giving you have a resignation mechanism.

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024 1

The ratio of 0.44 was obtained from AlphaZero, where symmetry wasn't exploited. Also, Shogi and Chess cannot exploit symmetries, so they set the self-play vs training ratio of AlphaZero based on the assumption that self-play data isn't necessarily as plentiful as in symmetric games. Without symmetry, the ratio is 0.44, which is closer to 1. The ratio for Shogi and Chess may be even closer to 1. Also, in symmetric games without symmetric data augmentation, the NN quickly learns symmetry, which was demonstrated by AZ being superior to AGZ in Go. Considering the eventual meaninglessness of symmetric data augmentation, the net ratio of @gooooloo becomes 5.3*8=42.4. So, he needs at least 42 times more GPUs for self-play to get to 1.

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024 1

@AranKomat @mokemokechicken I double checked my pipeline's performance, should be 25 processes + 180 second per game per process, which gives 7 seconds per game in average. Then My ratio should be about 7.*(=426/(400/7)), not 5.3.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

@apollo-time

I think that there is that's possibility,
and if we want to improve the model more and more, we need larger sim_per_move and self-play dataset.

I have a simple hypothesis that

  • upper performace(=strength) of model is decided by sim_per_move.
  • speed of changing(≒improvement) is decided by speed of generating self-play data and size of self-play dataset(small size is faster).
  • generalization performace of model is decided by size of self-play dataset(large size is more general).

so, I feel that increasing sim_per_move and dataset size gradually is effective.
(I think that Human also do that to become professional.)

from reversi-alpha-zero.

apollo-time avatar apollo-time commented on July 23, 2024

I think larger slim_per_move and self-play dataset can't resolve no longer visits problem, because the unusually positions can't be selected by self-play MCTS.
So I try select fully random action sometimes in self-play, and ignore previous history of the random action.

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024

@mokemokechicken I asked @gooooloo a similar question in other thread, but what is the default ratio of the number of games per gradient update ratio of your algorithm? I guess the ratio is important for the performance, since it behaves like sims/move, which is undoubtedly important.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

@AranKomat

what is the default ratio of the number of games per gradient update ratio of your algorithm?

I do not know which number to answer concretely, but the resulting speed is as follows.

setting

  • batch size: 256
  • sim per move: 400
  • (nb_game_in_file, max_file_num): (5, 300)

speed

  • 80 seconds per 1 self-play game
  • 400 positions per 1 self-play game
  • 150 seconds per 200 steps(bs=256) -> 150 seconds per 200*256 positions

so

  • Training: 341 positions / seconds (=200*256/160)
  • SelfPlay: 5 positions / seconds (=400 / 80)
  • Training/SelfPlay Ratio: 68 (=341/5)

Maybe, it means that 1 position is learned 68 times regardless (nb_game_in_file, max_file_num).

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

@AranKomat

I guess you can improve your performance with more self-plays per update.

I think so too.
In my environment, although GPU usage is already 100%(by self-play and training),
implementing multiprocess self-play will increase self-play games per training.

So I am planing to implement multiprocess self-play,
However, it is under consideration whether or not it really works with the present method.

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024

Cool. So, multi-processing successfully decreased the ratio and achieved 36s per game under 400 sims/move. Now, it suffices to elucidate the trade-off between training/selfplay ratio and sims/move. I'm excited for your subsequent announcements!

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024

Thanks for your answer. In the case of Go with AlphaZero, 700k minibatches (2048 positions each) and 21 million self-play games were performed. Assuming that each game ended with 150 stones (positions) placed, 700k x 2048/(21m x 150)=0.44 [trained position]/[self-play-generated position], which is much less than 68. So, I guess you can improve your performance with more self-plays per update. Maybe the performance gain by increasing the sims/move from 100 to 800 was because you had a small self-play/training ratio, that is, you had too little exploration. Since having more games generated means more diverse data than having more sims/move, so spending more time on self-play may be more beneficial than more sims/move. But in practice, since your alg doesn't allow multi-processing (of multiple games) as done by Akababa, my suggestion may be not useful. But this may be useful for @gooooloo .

Thanks @AranKomat . I didn't see this post until just now...

I guess you can improve your performance with more self-plays per update

Yes, I also think so. Deepmind uses 2000+ or 4000+ TPU for selfplay (as Aja Huang says in a post, I just can't remember the link). We can see the self play performance is important.

Maybe the performance gain by increasing the sims/move from 100 to 800 was because you had a small self-play/training ratio, that is, you had too little exploration.

Actually I was getting an smaller selfplay/training ratio when increasing sims/move from 100 to 800. Although I also introduced multi process implementation at that time, the overall self play game speed is a little bit slower than before. Yet I observe the AI strength improvement.

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024

@gooooloo In AlphaZero, staggering 5000 TPUs were used, so I totally agree. It's weird but nice that increased sims/move resulted in a smaller ratio. Hopefully, @mokemokechicken and others will observe a similar phenomena.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

400 positions per 1 self-play game

Note:
I used (nb_game_in_file, max_file_num)=(5, 300), so the number of total games in training data was 1500 (games).
My training dataset size was about 600k (positions).
So, 600k / 1500 = 400 (position/game).

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024

... had a small self-play/training ratio

It's weird ... that increased sims/move resulted in a smaller ratio

The ratio is # of selp play moves / # of trained moves. I increased # sims per move, then self play got slower, then # of self play moves smaller. But training module not changed. So the total ratio got smaller. Isn't it?

from reversi-alpha-zero.

AranKomat avatar AranKomat commented on July 23, 2024

@gooooloo Sorry, I thought you were talking about training/self-play ratio, but it was opposite. My mistake. I also agree with you about the number of positions per game.

from reversi-alpha-zero.

gooooloo avatar gooooloo commented on July 23, 2024

@AranKomat I made a mistake calculating. Please see that post again, I modified it.

from reversi-alpha-zero.

mokemokechicken avatar mokemokechicken commented on July 23, 2024

It is strange that training/self-play ratio becomes under 1. It means that there are positions not used in training.
So, I think the ratio was almost 1.

from reversi-alpha-zero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.