Giter Club home page Giter Club logo

reversi-alpha-zero's Issues

ChessAlpha Zero development

Hello, @mokemokechicken and @Yuhang.

As I promised I've done (just in one day, I had no more time) an adaptation of the reversi-zero project @mokemokechicken did into a chess version: https://github.com/Zeta36/chess-alpha-zero

The project is already functional (in the sense that it doesn't fail and the three workers do their job), but unfortunately I have no GPU (just an Intel i5 CPU) nor money to spend in a AWS server or similar.
So I just could check the self-play with the toy config "--type mini". Moreover, I had to descend self.simulation_num_per_move = 2 and self.parallel_search_num = 2.

In this way I was able to generate the 100 games needed for the optimitazion worker to start. The optimization process seemed to work perfeclty, and the model was able to reach a loss of ~0.6 after 1000 steps. I guess so that the model was able to overfit the 100 games of the former self-play.

Then I execute the evaluation process and it worked fine. The overfitted model was able to defeat the random original model of the beggining by 100% (causality??).
Finally I check the ASCII way to play against the best model. It worked as expected. To indicate our moves we have to use UCI notation: a1a2, b3b8, etc. More info here: https://chessprogramming.wikispaces.com/Algebraic+Chess+Notation

By the way, the model output is now of size 8128 (instead of the 64 of reversi and the 362 of Go), and it correspond to all possible legal UCI moves in a chess game. I generate these new labels in the config.py file.
I have to note you also that the board state (and the player turn) is traced by the fen chess notation: https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation

Here is for example the FEN for the starting position: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 (w is for white to move).

I changed also a little bit the resign function. Chess is not like Go or Reversi where you always finish the game more or less in the same number of movements. In Chess the game can end in a lot of ways (checkmate, stalemate, etc.) and the self-play coul be more than 200 movements before reaching an ending position (normally in a draw). So I decided to cut-off the play after some player has more than 13 points of advantage (this score is computed as usual taking into account the value of the pieces: he queen is worth 10, roots 5.5, etc).

As you can imagine with my poor machine I could not fully test the project beyond these tiny tests of functionality. So I'd really appreciate if you could please take some free time of your GPU's for testing this implementation in a more serious way. Both you can of course be colaborators of the project if you wish.

Also I don't know if I commited some theoretical bugs after this adaptation to chess and I'd apretiate too any comments by your side in this sense.

Best regards!!

About the optimizer?

  1. I found that the optimizer only load data at the beginning, will it reload new play data in the training progress?
    2.Hope more log can be available, such as loss with step

Is 0.55 too high for replace_rate given Reversi can have draw result?

I know the Deepmind paper says a replace_rate of 0.55. But considering in that Go game under that rule, there is no "draw" result, so 0.55 is reasonable. However, in reversi there is "draw", so is it too high for the replace rate still being 0.55?

By 0.55, that saying, the next generation has to beat best model in most games, even draw is not allowed. That seems difficult. And the best model is thus less evolved, which makes the selfplay policy less improved neither. Then the training data less improved.

Or another question can be: in your practice when evaluating, how often does "draw" ending happen? In my local running, it happens in about rate of 1/8 when evaluating. I am still in early stage of training, and I rewrite the selfplay part also, so I don't know whether this 1/8 rate is reasonable or not. Just curious what rate of drawing you got.

Thanks.

GPU ResourceExhaustedError after many times of Keras model.load() during self-play

In challenge 2, the AlphaZero method, self-play always uses the newest next_generation model. When running both self and opt workers, the self worker will always load the newest next_generation model saved by the opt worker when starting a new game.

Over a long period of time (say, 1-2 days), the self worker will load a new model so many times that it will cause ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3,3,256,256].

I found that this is because Keras doesn't free the GPU memory occupied by the old model weights when loading new model weights. The following simple modification of src/reversi-zero/worker/self-play.py will quickly reproduce this error by running python src/reversi_zero/run.py self:

...
import keras.backend as K
...
def start(config: Config):
    tf_util.set_session_config(per_process_gpu_memory_fraction=0.05)  # make the error occur faster
    return SelfPlayWorker(config, env=ReversiEnv()).start()
...
class SelfPlayWorker:
    ...
    def start(self):
        if self.model is None:
            self.model = self.load_model()

        self.buffer = []
        idx = 1

        while True:
            start_time = time()
            # env = self.start_game(idx)
            end_time = time()
            logger.debug(f"play game {idx} time={end_time - start_time} sec, ")
                         # f"turn={env.turn}:{env.board.number_of_black_and_white}")
            if True or (idx % self.config.play_data.nb_game_in_file) == 0:
                # K.clear_session()
                load_best_model_weights(self.model)  # repeatedly loading even the same weight will produce this error
                idx += 1
                continue

I've run with different per_process_gpu_memory_fraction values and found that the error occurs after exactly the corresponding times of model loading. For example, per_process_gpu_memory_fraction=0.05 on my GTX 960 with 4037MB GPU memory will crash after exactly floor(4037 x 0.05 / 46) = 4 times of model loading (since the model weight h5 file is 46MB).

There's a simple way to fix this in Keras. Just run keras.backend.clear_session() before loading new weights. In the above example, uncommenting K.clear_session() solves this error.
I've opened a pull request for this by fixing this in lib/model_helper.py.

Great job!!

Wonderful job, friend.

Can you please tell us what's the performance you got with this approach? Do you have some statistics or something?

Regards!

What is action_by_value?

I see ReversiPlayer.action function try select action_by_value when turn > change_tau_turn.

action = int(np.random.choice(range(64), p=policy))
action_by_value = int(np.argmax(self.var_q[key] + (self.var_n[key] > 0)*100))
if action == action_by_value or env.turn < self.play_config.change_tau_turn or env.turn <= 1:
    break

I think action_by_value is the first index with none zero n.
Why did you select action like this?

Drop wxPython?

Installing wxPython is a nightmare on many platforms. Usually users cannot have an out-of-box installation without quite much searching. Web based UI can be a friendly replacement.

Baseline Comparison?

Is there a baseline for comparing the learned model e.g. a benchmark software to evaluate against? It would be useful for us to know how effective the learning algorithm actually is.

For example, what do you mean by "Won the App LV x?" Does it mean that if the model beat the app even once, it counts as a win even if it loses the other times?

I downloaded your "best model" and "newest model", and played both networks against grhino AI (level 2). Sadly, both networks got destroyed by grhino on multiple tries. If you have a benchmark of levels to beat before grhino, that would be really helpful

AlphaZero Approach

Hi,

Great work with your repository, impressive stuff, Just interested to see when you are running the software, with self-play and optimise at the same time, how many self-play games do you aim to complete between the optimiser releasing a new model? I wonder as I would have thought if not enough games are completed the model would over-fit?

Thanks jack

About MCTS

virtual_loss = self.config.play.virtual_loss
self.var_n[key][action_t] += virtual_loss
self.var_w[key][action_t] -= virtual_loss
leaf_v = await self.search_my_move(env) # next move

I see update N and W with virtual loss when select the node in order to discourages other threads from simultaneously exploring the identical variation (in paper).

  1. Why don't update Q with N/W at this time?
  2. Isn't it W=W+virtual loss when player is white?
  3. Why didn't share tree between two players?

About the time of self-play

@mokemokechicken It seems that I can only finish one game with about 108s using the default hyper-parameters. My CPU is 8 core i7-7700K CPU @ 4.20GHz as well and my GPU is GeForce GTX 1080 Ti. Is there any hyper-parameters changed without any declaration in README? Thanks!

Implement resign

It is better to implement resign when self-play.
Because it is not important to learn moves in one-side game.
It may be a waste of capacity.

404 in the new download script

I tried to run the new download_newest_model_as_best_model.sh script, but both of the URLs it tries to fetch from return 404: Not Found.

Replacing CNN with decoder-only Transformer for possible acceleration?

As I mentioned before, I'm working on applying AlphaZero to text generation using decoder-only Transformer instead of CNN. My implementation is nearly finished, but I haven't tested to see its performance on text generation. Besides, Transformer can be used for board games like reversi, since you can represent each move as a symbol (for example, you can represent any move of reversi with a number from 0 to 63). Obviously, this doesn't contain any geometric information, but it's interesting to see whether this info is really that important or not compared with the speed advantage, which is because layer-wise per-move FLOPS is now roughly bs x 4 x hidden_dim^2 instead of bs x 8^2 x hidden_dim^2 x 3^2, which is 144x faster. Any question? If you're interested, I'll notify as soon as my implementation works, so that you and I can extract necessary components to apply to your reversi stuff.

Cannot use multiple GPUs in self-play

@mokemokechicken @gooooloo I added one more GPU. However, the added GPU has 0% usage rate, and doubling prediction_queue_size, parallel_search_num and multi_process_num doesn't make a difference. Also, has asynchronous training, or one GPU doing grad updates while other GPUs doing self-play and weights are synced after each grad update, been implemented in @gooooloo 's algorithm?

Gobang version

I make a AlphaZero Implementation of Gobang based on TensorFlow.

Another resign condition?

@mokemokechicken

if self.play_config.resign_threshold is not None and \
np.max(self.var_q[key] - (self.var_n[key] == 0)*10) <= self.play_config.resign_threshold:

Do you think it is reasonable to resign if min(varq, (varn==0)*10)> 0.8 (or 0.9)?

As you said the <-0.8 resignation helps a lot due to its ignorance on one-side game state. I also verified the effect on my local run. Then >0.8 is also a one-side game. And I guess MCTS is strong enough to guide ai to win on this situation? Then the NN will be more accurate due to less sample space. I know it is not mentioned on Deepmind Paper. Maybe it is because of large state space of Go game. But maybe it's worth trying on Reversi?What do you think?

Failed running GUI

Installed everything.

Is the failure happening after the TF warnings?

If I want to get rid of the TF warnings (they are just be warnings, right?), what should I do?

Please help,
thanks.

> python src/reversi_zero/run.py play_gui
2017-12-10 16:02:48,794@reversi_zero.manager INFO # config type: normal
Using TensorFlow backend.
2017-12-10 16:03:02,034@reversi_zero.agent.model DEBUG # loading model from /Users/john/dev/igo/reversi0/reversi-az/data/model/model_best_config.json
2017-12-10 16:03:04.151972: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-10 16:03:04.152015: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-10 16:03:04,714@reversi_zero.agent.model DEBUG # loaded model digest = ae1dd819bdaf71fcc6e95e8b64bb53db6ca5fa63398fdff2582ab14ed9c87109
This program needs access to the screen. Please run with a
Framework build of python, and only when you are logged in
on the main display of your Mac.

Random flip and rotation when evaluate

I see you did random flip and rotation in the Player's expand_and_evaluate function.
I think this is need when add data for training, but don't necessary when evaluate for selecting action.
How about it?

Performance Reports

Please share your reversi model achievement!
Models that are not reversi-alpha-zero are also welcome.
Battle record, configuration, repository url, comments and so on.

About using different players for training game generation

So I have a question which is related to another similar project for go https://github.com/gcp/leela-zero
In this project, self play game are generated from the same player playing against itself. So black and white have the same random seed, and have a shared search tree through tree reuse.
If I'm reading the code right, in reversi-alpha-zero, 2 independent players are used to generate self-play games, with their own separate search tree and different random seed.
I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

Child seeds being identical to the parent seed may nullify the effect of multi-processing/threading

This is my finding from a toy version of my customization of Akababa's implementation, but I'm certain this is relevant to your implementation and at least worth asking here, as your seeding part isn't essentially different from his. I've noticed that, since each process/thread shares the same seed, each process/thread generates the same result (e.g. state transition during simulation) regardless of the underlying probability distribution. Some processes/threads are faster than others, so eventually you may not find this deterministic behavior, as some processes/threads yield different outputs at the same instant despite the identical pseudo-random sequence generation. Did you find this problematic? I don't think this has been mentioned yet.

Relevant sources:
best-seed-for-parallel-process
Random seed is replication across child processes #9650
seeding-random-number-generators-in-parallel-programs

maybe a bug here

if is_root_node and self.play_config.noise_eps > 0: # Is it correct?? -> (1-e)p + e*Dir(alpha)
if self.play_config.dirichlet_noise_only_for_legal_moves:
noise = dirichlet_noise_of_mask(legal_moves, self.play_config.dirichlet_alpha)
else:
noise = np.random.dirichlet([self.play_config.dirichlet_alpha] * 64)
p_ = (1 - self.play_config.noise_eps) * p_ + self.play_config.noise_eps * noise
# re-normalize in legal moves
p_ = p_ * bit_to_array(legal_moves, 64)

p_ = self.normalize(p_, temperature)

Maybe a bug here: p_ here is NOT a probability distribution over legal moves, until you do normalization in codes after. But in the dirichlet_noise_only_for_legal_moves == True case, dirichlet noise is already a probability distribution over legal moves. Saying, you are adding dirichlet noise on a non-probability-distribution, which I believe not consistent with AlphaGoZero paper.

I happen to find my implementation had this bug too, and after I fixed this bug, my AI's strength improves significantly.

how much does share_mtcs_info_in_self_play contribute in strength?

@mokemokechicken now you plays draw with NTest:13. Good job!

I notice the design of share_mtcs_info_in_self_play. It share mcts info among different games of same model. This is different with AlphaGoZero/AlphaZero paper, but I imagine it would improve selfplay quality a lot. How is it in real practice?

And how many memory usage does it bring?

automatically ntest

@mokemokechicken in ReadMe you said:

NBoard cannot play with two different engines (maybe).

I feel it can in another way. From this source code of NBoard, seems like it just run a ntest executable, then communicate to it via NBoard protocol. Since you have understand that protocol well, you can also launch ntest and communicate with it in your code. Then you can play your model with ntest without manual intervention. If you want to check game detail, just save the movement and replay it.

Is it multiple searching at the same time?

coroutine_list = []
for it in range(self.play_config.simulation_num_per_move):
cor = self.start_search_my_move(own, enemy)
coroutine_list.append(cor)
coroutine_list.append(self.prediction_worker())
loop.run_until_complete(asyncio.gather(*coroutine_list))

I first think this code is searching in the simulation_num_per_move threads at the same time.
But I see async function is not called in the multi thread.
How about it, and how can I search three in multiple threads?

a question about reloading model

self.try_reload_model()

When reloading, some self play game may be still in progress. Will this reloading be OK? Some games will be using old model in first half of game, and new model in second half of game. Some more discussion can be seen here. In another fork there is a fix of this, but it will cause some idle time of cpu.

Or, you are aware of this issue, and think it is ok? As we can see, you model are making good progress as well...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.