Baseline Comparison?,about mokemokechicken/reversi-alpha-zero

Comments (39)

apollo-time commented on June 25, 2024 3

My model (black) beats GRhino lv5 with open book variation "Low" and randomness 0 now.
I make web player html, but I haven't any server to run tensorflow model.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 2

@apollo-time

can u beat windows online reversi game level 5?

I just played with it, using the same model and simulations_per_move(which is 800) with the Lv99 game, and I win online reversi game level 5 ( 2:0 ), lose to level 6 ( 1:3 ).

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 2

Hi everyone, my codes getting the model are here: https://github.com/gooooloo/reversi-alpha-zero, if you are interested.

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024 1

@apollo-time

I use GRhino by docker on mac.
FYI: https://github.com/mokemokechicken/grhino-docker

from reversi-alpha-zero.

apollo-time commented on June 25, 2024 1

I see "Online Reversi" on the Microsoft Store is very excellent.
My model beats level 2 hardly. (2018/01/10)
My model beats level 3 hardly now. (2018/01/11)

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 1

Hi everyone, I found http://www.orbanova.com/nboard/ is very strong. Also it supports many levels to play with. Would be a good baseline to compare with.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 1

@mokemokechicken it is halfly because of your great implementation. So thank you :)

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?

I also think so. At first, I was using 100 sims per move. I wanted a fast self play speed. After about 100k steps( batch_size = 3072 )，it seemed got stuck and not improving. Then changed to 800 sims. Then at about 200k steps, it has become quite strong. My final model beating lv99 is at 300k+ steps.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes. Then I can run several MCTS processes and only 1 Neural Network process at the same time. This idea is borrowed from this repo (Thanks @Akababa ). By doing this, I make full use of GPU and CPU. Although a simple game get slower due to 800 sims, but multi-games parallelization saves back a lot. ---- I am mentioning this because I think in AlphaGoZero method, self play speed does matter.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

Why do you use history?

Because I happened to see this reddit post from David Silver @ DeepMind. This is the quote:

it is useful to have some history to have an idea of where the opponent played recently - these can act as a kind of attention mechanism (i.e. focus on where my opponent thinks is important)

I use this implementation from the beginning and didn't test the 3 * 8 * 8 shape, so I don't have the experience to say. But I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) + 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 1

@apollo-time I had another new generation model the day before yesterday, but not getting any better model these two days. Let's wait for some more days and see.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024 1

@AranKomat

... due to their large architecture ...

my config (the network architecture are same as @mokemokechicken 's original implementation) :

class ModelConfig:
    cnn_filter_num = 256
    cnn_filter_size = 3
    res_layer_num = 10
    l2_reg = 1e-4
    value_fc_size = 256
    input_size = (5,8,8) 
    policy_size = 8*8+1

... and large buffer

mine:

class PlayDataConfig:
    def __init__(self):
        self.nb_game_in_file = 50
        self.max_file_num = 1000

class TrainerConfig:
    def __init__(self):
        self.batch_size = 3072
        self.epoch_to_checkpoint = 1
        self.epoch_steps = 100
        self.save_model_steps = 800
        self.lr_schedule = (
            (0.2,    1500),  # means being 0.2 until 1500 steps.
            (0.02,   20000),
            (0.002,  100000),
            (0.0002, 9999999999)
        )

I also change sampling method. I do this because I found in my case(much more play data), @mokemokechicken 's original implementation takes too long waiting for all loaded data got trained at least once before new play data got loaded and before new candidate model got generated.

    def generate_train_data(self, batch_size):
        while True:
            x = []

            for _ in range(batch_size):
                n = randint(0, data_size - 1)
                # sample the nth data and append to x

            yield x

    def train_epoch(self, epochs):
        tc = self.config.trainer
        self.model.model.fit_generator(generator=self.generate_train_data(tc.batch_size),
                                       steps_per_epoch=tc.epoch_steps,
                                       epochs=epochs)
        return tc.epoch_steps * epochs


    def training(self):
        while True:
            self.update_learning_rate()
            steps = self.train_epoch(self.config.trainer.epoch_to_checkpoint)
            self.total_steps += steps

            if last_save_step + self.config.trainer.save_model_steps <= self.total_steps:
                self.save_current_model_as_to_eval()
                last_save_step = self.total_steps

            self.load_play_data()

So basically, I am using "normal" config, but changes a lot of things.
Other configs are listed as below if you are interested:

class PlayConfig:
    def __init__(self):
        self.simulation_num_per_move = 800
        self.c_puct = 5
        self.noise_eps = 0.25
        self.dirichlet_alpha = 0.4
        self.change_tau_turn = 10
        self.virtual_loss = 3
        self.prediction_queue_size = 8
        self.parallel_search_num = 8
        self.v_resign_check_min_n = 100
        self.v_resign_init = -0.9
        self.v_resign_delta = 0.01
        self.v_resign_disable_prop = 0.1
        self.v_resign_false_positive_fraction_t_max = 0.05
        self.v_resign_false_positive_fraction_t_min = 0.04

from reversi-alpha-zero.

evalon32 commented on June 25, 2024

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.
For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky. That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game). Earlier, it often had no idea. It also used to lose consistently to grhino L1; now it usually wins (sadly, it's usually because grhino L1 blunders in a won position).

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

Hi @vincentlooi

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against?

I use iOS app of https://itunes.apple.com/ca/app/id574915961 as the benchmark.
The app has 1 ~ 99 levels.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

Yes.
"Won the App LV x?" means the model won the level at least once (regardless of the number of losses).

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

I didn't know grhino.
And I confirmed that the newest model loses grhino Lv2...

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

Hi @evalon32

In README, it says the "App" is this: https://itunes.apple.com/ca/app/id574915961. I'm not familiar with it and don't have an iOS device, but I'm guessing it's not that strong.

The app has levels of 1~99.
Maybe the lv29 is not so strong.

For what it's worth, I've also been testing the networks against grhino, with similar results. I've had RAZ beat ghrino L2 once, but only because it got lucky.

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

That said, I think it's a good sign that RAZ can now tell that its position gradually deteriorates (the evaluation goes relatively smoothly from 0 to -1 over the course of the game)

I also think that is a good feature.
In my newest model, the evaluation often plummets.

from reversi-alpha-zero.

evalon32 commented on June 25, 2024

I would like you to tell me, what is RAZ? (I couldn't search it in google ...)

Oh sorry, RAZ = reversi-alpha-zero :)

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

RAZ = reversi-alpha-zero :)

Oh, I see! (^^

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

FYI:

the App LV29 vs grhino Lv2: LV29 won 2 times and lost 0 times.
the App LV29 vs grhino Lv3: LV29 won 0 times and lost 1 time.

from reversi-alpha-zero.

evalon32 commented on June 25, 2024

I just had the newest model play a match of 10 games vs grhino L2 (took forever, since I don't have a GPU).
It won 2 out of 5 as black and 2 out of 5 as white. Getting exciting!

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

That's good!

took forever, since I don't have a GPU

FYI:
I am also evaluating on Mac(not have a GPU),
optimized TensorFlow(1.4) is about 3~5 times faster than normal pip CPU version.
https://www.tensorflow.org/install/install_sources

from reversi-alpha-zero.

mrlooi commented on June 25, 2024

I managed to make some progress in training the model. I played the model against grhino lv2 5 times: 4 wins, 1 loss. Still lost vs grhino lv3 though. I also played the model against the newest/best model in your download script, and had a win rate of ~85% over roughly 25 games.

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

The current training method in your script trains on all data in the folder regardless of when the data was created, which means training per epoch iteration will always become longer as self-play generates more and more data. I'm not sure if this is necessary, since old data reflects older policy and not necessarily the newest policy, and hence could be redundant at the cost of more training steps and potentially overfitting. Perhaps it might be a good idea to weight the data based on how recently it was played i.e. how much the data reflects the latest policy, or consider turning the data into a fixed size buffer (perhaps 250k-300k samples) that discards old data as new ones are generated

EDIT:
Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@vincentlooi

Thank you for sharing exciting information!

EDIT: Just beat grhino lv3! The model now beats grhino lv2 almost every time, getting exciting

That's great!!

I managed to train this model over the course of a week from scratch (on 1080 GPU), by constantly removing old data (data older than 1-2 days) manually from the data/play_data folder each time while the model keeps self-playing.

Nice try!
I also think it is one of the important hyperparameter.
The max sample number of training data can be changed by PlayDataConfig#{nb_game_in_file,max_file_num} (used here ).

I will change the parameter in my training.
In my environment, the number of training data files generated by self-play is about 100/day (500 games/day).
So, it seems better to set max_file_num around 300 (currently 2000).

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

what is the best reversi game?
I have not iPhone but have Mac.
My model beats all of Android Reversi and Windows App Reversi.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@mokemokechicken @vincentlooi @evalon32 When playing with GRhino, besides the "level" setting, what is your "open book varation" setting? I am playing Ubuntu GRhino with my model, and want to do a (indirect) comparsion with yours. Thanks.

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@gooooloo My open book variation is "Low".

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@mokemokechicken gotcha. Thanks.

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@gooooloo it's great! Thank you very much!

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

I implemented NBoard Protocol.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@mokemokechicken Just a report, my model beats Lv99 using 800 simulations per move setting. See https://play.lobi.co/video/17f52b6e921be174057239d39d239b6061d3c1c9. The AlphaGoZero method works. I am also using 800 simulations per move when self play. I keep the evaluator alive, with best model replacing condition: ELO rating >= 150 among 400 games( with ELO rating we are counting draw games in) . I am using 2 historical boards as Neural Network input, which means a shape of 588.

Besides, when playing with the App, I found using 40 or 100 simulations per move setting is already quite strong. The 100 sims setting beats Lv98 easily. But Lv99 is more difficult than Lv98, I tested 40/100/400 sims and all of them loses, until I changed to 800 sims.

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@gooooloo

Great! Congratulation!!

I am surprised to hear from this report!

800 simulations per move

After all, in order to be strong, it may be necessary to use large "simulations per move" in self-play, isn't it?
I am feeling that "simulations per move" decides the model's upper strength.

2 historical boards as Neural Network input, which means a shape of 5 * 8 * 8

It is very interesting.
Why do you use history?
Do you think it brought good effects?

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@gooooloo

Thank you for your reply.

I think what is also worthy mentioning is that, although I changed to 800 sims, I didn't make the overall selfplay too much slower. I did this by separating MCTS and Neural Network to different processes. They communicate via named pipes.

Great. I think it is the best implementation.

I believe it is possible to bring an "attention" chance ( by subtracting the previous board ). Maybe it helps.

I see.
I could not think of that possibility. It is very interesting.

At last, I am using 6GPU: 5 Tesla P40(1 for optimaztion, 4 for self play) + 1 Tesla M40(for evaluator). Maybe it is mostly because of the computation force...

That's very powerful!! :)

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

@gooooloo Um... Is really useful history?
When use history, the player can not play on the one board state.
I see some games as chess have must play from some board state that it is not initial state.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@apollo-time do you mean the first step of game? As the AlphaGoZero paper mentions, all-zero board are used if there is not enough history boards.

8 feature planes Xt consist of binary values indicating the presence of the current player’s stones (Xti = 1 if intersection i contains a stone of the player’s colour at time-step t; 0 if the intersection is empty, contains an opponent stone, or if t < 0)

"t < 0" is the case here.

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

@gooooloo No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

@gooooloo can u beat windows online reversi game level 5?

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@apollo-time

No, I mean that some game can play from some board state that is not initial state, for example Chess Puzzles.

I see. I don't consider that case.

can u beat windows online reversi game level 5?

I don't have a windows system( I will try to find one ). But I can't beat NBoard's Novello 20 level. ( I can beat 10 level though with 1600 sims per move). Nor the NTest 30 level.

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

@gooooloo thanks, My question is same with Cassandra120's

from reversi-alpha-zero.

apollo-time commented on June 25, 2024

@gooooloo My model(simulations_per_move=800) beats online reversi game level 4 now, and my model don't use history.
But do you feel the model improved continuously?

from reversi-alpha-zero.

AranKomat commented on June 25, 2024

@gooooloo

After about 100k steps( batch_size = 3072 )，it seemed got stuck and not improving.

That's also the case in AlphaZero. The performance more or less stagnated after that point. But they achieved an already strong performance (with difference board game) at 100k iters not only due to using 800 sims/move but also due to their large architecture and large buffer. Also, they did one iteration of update for each 30 or so games (3M games after 100k iters), which may not be the case in the implementation of @mokemokechicken, Zeta36 and Akababa.

How was your case? Did you use "normal" setting instead of "mini" of config?

from reversi-alpha-zero.

AranKomat commented on June 25, 2024

@gooooloo Thanks so much for detailed information. Looks like you don't have self.search_threads for multi-threading. Did you find multi-processing only to be sufficient? It's impressive that your sampling method enabled you to finish 200k iters with your large architecture. Looks like Akababa's multiprocessing is very powerful. But I've failed to see how many self-play games you've finished up til 100~200k iters. Have you tracked the number of games?

from reversi-alpha-zero.

mokemokechicken commented on June 25, 2024

@gooooloo @apollo-time @evalon32 @vincentlooi @AranKomat

I created Performance Reports for sharing our achievements, and linked from the top of readme.
I would be grateful if you would post it.

from reversi-alpha-zero.

gooooloo commented on June 25, 2024

@AranKomat

Have you tracked the number of games?

No I have not. I wish I had.

from reversi-alpha-zero.

Baseline Comparison? about reversi-alpha-zero HOT 39 OPEN

Comments (39)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent