Giter Club home page Giter Club logo

rl-popit's Introduction

rl-popit

Welcome to my website to play the game. Give it a try!

https://crema.evalieben.cn/game/

屏幕截图 2023-01-03 032403

The game is currently only available in Chinese.

Neural Network Architecture

The model structure references DeepMind's work on AlphaGo Zero:

Silver, D., Schrittwieser, J., Simonyan, K. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270

The best model, resnet3-64, is comprised of:

  • 1 input convolutional layer
  • 3 residual blocks (2 convolutional layers + 1 skip connection for each block)
  • 1 policy head (1 convolutional layer + 1 fully connected layers)
  • 1 value head (1 convolutional layer + 2 fully connected layers)

Each of the convolutional layer has 64 features (the input layer has 2 features). The network structure resembles AlphaGo Zero's but has much less features and residual blocks (My game is too simple after all).

Method

The neural network is trained by Proximal Policy Optimization, PPO. I also tweaked the original PPO implementation according to this webpage:

The 37 Implementation Details of Proximal Policy Optimization https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/

Some of the implementations work very well for my model: entropy maximization, normalization of advantages, global gradient clipping, etc. Each training cycle begins with 128 vectorized environments sampling game states, and then do backpropagation with a minibatch size of 2048. I simply set the reward of each action to 1 if agent wins in one game, elsewise the reward for all actions will be -1.

Training Curve

The horizontal axis shows the number of epochs and the vertial axis shows the win rate (%). Each curve represents an opponent using the old model. Once the win rate reaches 90%, drop the old model and then save the latest for opponent to use. Training is much tougher after 1000 epochs, and the model finally converges in 15000 epochs.

Win Rate Change during Epoch 0 - 1.4k

win_rate

Win Rate Change during Epoch 1k - 15k

win_rate(1)

rl-popit's People

Contributors

crema-lida avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.