Giter Club home page Giter Club logo

minigo's Introduction

Minigo: A minimalist Go engine modeled after AlphaGo Zero, built on MuGo

This is an implementation of a neural-network based Go AI, using TensorFlow. While inspired by DeepMind's AlphaGo algorithm, this project is not a DeepMind project nor is it affiliated with the official AlphaGo project.

This is NOT an official version of AlphaGo

Repeat, this is not the official AlphaGo program by DeepMind. This is an independent effort by Go enthusiasts to replicate the results of the AlphaGo Zero paper ("Mastering the Game of Go without Human Knowledge," Nature), with some resources generously made available by Google.

Minigo is based off of Brian Lee's "MuGo" -- a pure Python implementation of the first AlphaGo paper "Mastering the Game of Go with Deep Neural Networks and Tree Search" published in Nature. This implementation adds features and architecture changes present in the more recent AlphaGo Zero paper, "Mastering the Game of Go without Human Knowledge". More recently, this architecture was extended for Chess and Shogi in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". These papers will often be abridged in Minigo documentation as AG (for AlphaGo), AGZ (for AlphaGo Zero), and AZ (for AlphaZero) respectively.

Goals of the Project

  1. Provide a clear set of learning examples using Tensorflow, Kubernetes, and Google Cloud Platform for establishing Reinforcement Learning pipelines on various hardware accelerators.

  2. Reproduce the methods of the original DeepMind AlphaGo papers as faithfully as possible, through an open-source implementation and open-source pipeline tools.

  3. Provide our data, results, and discoveries in the open to benefit the Go, machine learning, and Kubernetes communities.

An explicit non-goal of the project is to produce a competitive Go program that establishes itself as the top Go AI. Instead, we strive for a readable, understandable implementation that can benefit the community, even if that means our implementation is not as fast or efficient as possible.

While this product might produce such a strong model, we hope to focus on the process. Remember, getting there is half the fun. :)

We hope this project is an accessible way for interested developers to have access to a strong Go model with an easy-to-understand platform of python code available for extension, adaptation, etc.

If you'd like to read about our experiences training models, see RESULTS.md.

To see our guidelines for contributing, see CONTRIBUTING.md.

Getting Started

This project assumes you have the following:

The Hitchhiker's guide to python has a good intro to python development and virtualenv usage. The instructions after this point haven't been tested in environments that are not using virtualenv.

pip3 install virtualenv
pip3 install virtualenvwrapper

Install Bazel

BAZEL_VERSION=0.24.1
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
chmod 755 bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo ./bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh

Install TensorFlow

First set up and enter your virtualenv and then the shared requirements:

pip3 install -r requirements.txt

Then, you'll need to choose to install the GPU or CPU tensorflow requirements:

  • GPU: pip3 install "tensorflow-gpu==1.15.0".
    • Note: You must install CUDA 10.0. for Tensorflow 1.13.0+.
  • CPU: pip3 install "tensorflow==1.15.0".

Setting up the Environment

You may want to use a cloud project for resources. If so set:

PROJECT=foo-project

Then, running

source cluster/common.sh

will set up other environment variables defaults.

Running unit tests

./test.sh

To run individual modules

BOARD_SIZE=9 python3 tests/run_tests.py test_go
BOARD_SIZE=19 python3 tests/run_tests.py test_mcts

Automated Tests

Test Dashboard

To automatically test PRs, Minigo uses Prow, which is a test framework created by the Kubernetes team for testing changes in a hermetic environment. We use prow for running unit tests, linting our code, and launching our test Minigo Kubernetes clusters.

You can see the status of our automated tests by looking at the Prow and Testgrid UIs:

Basics

All commands are compatible with either Google Cloud Storage as a remote file system, or your local file system. The examples here use GCS, but local file paths will work just as well.

To use GCS, set the BUCKET_NAME variable and authenticate via gcloud login. Otherwise, all commands fetching files from GCS will hang.

For instance, this would set a bucket, authenticate, and then look for the most recent model.

# When you first start we recommend using our minigo-pub bucket.
# Later you can setup your own bucket and store data there.
export BUCKET_NAME=minigo-pub/v9-19x19
gcloud auth application-default login
gsutil ls gs://$BUCKET_NAME/models | tail -4

Which might look like:

gs://$BUCKET_NAME/models/000737-fury.data-00000-of-00001
gs://$BUCKET_NAME/models/000737-fury.index
gs://$BUCKET_NAME/models/000737-fury.meta
gs://$BUCKET_NAME/models/000737-fury.pb

These four files comprise the model. Commands that take a model as an argument usually need the path to the model basename, e.g. gs://$BUCKET_NAME/models/000737-fury

You'll need to copy them to your local disk. This fragment copies the files associated with $MODEL_NAME to the directory specified by MINIGO_MODELS:

MODEL_NAME=000737-fury
MINIGO_MODELS=$HOME/minigo-models
mkdir -p $MINIGO_MODELS/models
gsutil ls gs://$BUCKET_NAME/models/$MODEL_NAME.* | \
       gsutil cp -I $MINIGO_MODELS/models

Selfplay

To watch Minigo play a game, you need to specify a model. Here's an example to play using the latest model in your bucket

python3 selfplay.py \
  --verbose=2 \
  --num_readouts=400 \
  --load_file=$MINIGO_MODELS/models/$MODEL_NAME

where READOUTS is how many searches to make per move. Timing information and statistics will be printed at each move. Setting verbosity to 3 or higher will print a board at each move.

Playing Against Minigo

Minigo uses the GTP Protocol, and you can use any gtp-compliant program with it.

# Latest model should look like: /path/to/models/000123-something
LATEST_MODEL=$(ls -d $MINIGO_MODELS/* | tail -1 | cut -f 1 -d '.')
python3 gtp.py --load_file=$LATEST_MODEL --num_readouts=$READOUTS --verbose=3

After some loading messages, it will display GTP engine ready, at which point it can receive commands. GTP cheatsheet:

genmove [color]             # Asks the engine to generate a move for a side
play [color] [coordinate]   # Tells the engine that a move should be played for `color` at `coordinate`
showboard                   # Asks the engine to print the board.

One way to play via GTP is to use gogui-display (which implements a UI that speaks GTP.) You can download the gogui set of tools at http://gogui.sourceforge.net/. See also documentation on interesting ways to use GTP.

gogui-twogtp -black 'python3 gtp.py --load_file=$LATEST_MODEL' -white 'gogui-display' -size 19 -komi 7.5 -verbose -auto

Another way to play via GTP is to watch it play against GnuGo, while spectating the games:

BLACK="gnugo --mode gtp"
WHITE="python3 gtp.py --load_file=$LATEST_MODEL"
TWOGTP="gogui-twogtp -black \"$BLACK\" -white \"$WHITE\" -games 10 \
  -size 19 -alternate -sgffile gnugo"
gogui -size 19 -program "$TWOGTP" -computer-both -auto

Training Minigo

Overview

The following sequence of commands will allow you to do one iteration of reinforcement learning on 9x9. These are the basic commands used to produce the models and games referenced above.

The commands are

  • bootstrap: initializes a random model
  • selfplay: plays games with the latest model, producing data used for training
  • train: trains a new model with the selfplay results from the most recent N generations.

Training works via tf.Estimator; a working directory manages checkpoints and training logs, and the latest checkpoint is periodically exported to GCS, where it gets picked up by selfplay workers.

Configuration for things like "where do debug SGFs get written", "where does training data get written", "where do the latest models get published" are managed by the helper scripts in the rl_loop directory. Those helper scripts execute the same commands as demonstrated below. Configuration for things like "what size network is being used?" or "how many readouts during selfplay" can be passed in as flags. The mask_flags.py utility helps ensure all parts of the pipeline are using the same network configuration.

All local paths in the examples can be replaced with gs:// GCS paths, and the Kubernetes-orchestrated version of the reinforcement learning loop uses GCS.

Bootstrap

This command initializes your working directory for the trainer and a random model. This random model is also exported to --model-save-path so that selfplay can immediately start playing with this random model.

If these directories don't exist, bootstrap will create them for you.

export MODEL_NAME=000000-bootstrap
python3 bootstrap.py \
  --work_dir=estimator_working_dir \
  --export_path=outputs/models/$MODEL_NAME

Self-play

This command starts self-playing, outputting its raw game data as tf.Examples as well as in SGF form in the directories.

python3 selfplay.py \
  --load_file=outputs/models/$MODEL_NAME \
  --num_readouts 10 \
  --verbose 3 \
  --selfplay_dir=outputs/data/selfplay \
  --holdout_dir=outputs/data/holdout \
  --sgf_dir=outputs/sgf

Training

This command takes a directory of tf.Example files from selfplay and trains a new model, starting from the latest model weights in the estimator_working_dir parameter.

Run the training job:

python3 train.py \
  outputs/data/selfplay/* \
  --work_dir=estimator_working_dir \
  --export_path=outputs/models/000001-first_generation

At the end of training, the latest checkpoint will be exported to. Additionally, you can follow along with the training progress with TensorBoard. If you point TensorBoard at the estimator working directory, it will find the training log files and display them.

tensorboard --logdir=estimator_working_dir

Validation

It can be useful to set aside some games to use as a 'validation set' for tracking the model overfitting. One way to do this is with the validate command.

Validating on holdout data

By default, Minigo will hold out 5% of selfplay games for validation. This can be changed by adjusting the holdout_pct flag on the selfplay command.

With this setup, rl_loop/train_and_validate.py will validate on the same window of games that were used to train, writing TensorBoard logs to the estimator working directory.

Validating on a different set of data

This might be useful if you have some known set of 'good data' to test your network against, e.g., a set of pro games. Assuming you've got a set of .sgfs with the proper komi & boardsizes, you'll want to preprocess them into the .tfrecord files, by running something similar to

import preprocessing
filenames = [generate a list of filenames here]
for f in filenames:
    try:
        preprocessing.make_dataset_from_sgf(f, f.replace(".sgf", ".tfrecord.zz"))
    except:
        print(f)

Once you've collected all the files in a directory, producing validation is as easy as

python3 validate.py \
  validation_files/ \
  --work_dir=estimator_working_dir \
  --validation_name=pro_dataset

The validate.py will glob all the .tfrecord.zz files under the directories given as positional arguments and compute the validation error for the positions from those files.

Retraining a model

The training data for most of Minigo's models up to v13 is publicly available in the minigo-pub Cloud storage bucket, e.g.:

gsutil ls gs://minigo-pub/v13-19x19/data/golden_chunks/

For models v14 and onwards, we started using Cloud BigTable and are still working on making that data public.

Here's how to retrain your own model from this source data using a Cloud TPU:

# I wrote these notes using our existing TPU-enabled project, so they're missing
# a few preliminary steps, like setting up a Cloud account, creating a project,
# etc. New users will also need to enable Cloud TPU on their project using the
# TPUs panel.

###############################################################################

# Note that you will be billed for any storage you use and also while you have
# VMs running. Remember to shut down your VMs when you're not using them!

# To use a Cloud TPU on GCE, you need to create a special TPU-enabled VM using
# the `ctpu` tool. First, set up some environment variables:
#   GCE_PROJECT=<your project name>
#   GCE_VM_NAME=<your VM's name>
#   GCE_ZONE<the zone in which you want to bring uo your VM, e.g. us-central1-f>

# In this example, we will use the following values:
GCE_PROJECT=example-project
GCE_VM_NAME=minigo-etpu-test
GCE_ZONE=us-central1-f

# Create the Cloud TPU enabled VM.
ctpu up \
  --project="${GCE_PROJECT}" \
  --zone="${GCE_ZONE}" \
  --name="${GCE_VM_NAME}" \
  --tf-version=1.13

# This will take a few minutes and you should see output similar to the
# following:
#   ctpu will use the following configuration values:
#         Name:                 minigo-etpu-test
#         Zone:                 us-central1-f
#         GCP Project:          example-project
#         TensorFlow Version:   1.13
#  OK to create your Cloud TPU resources with the above configuration? [Yn]: y
#  2019/04/09 10:50:04 Creating GCE VM minigo-etpu-test (this may take a minute)...
#  2019/04/09 10:50:04 Creating TPU minigo-etpu-test (this may take a few minutes)...
#  2019/04/09 10:50:11 GCE operation still running...
#  2019/04/09 10:50:12 TPU operation still running...

# Once the Cloud TPU is created, `ctpu` will have SSHed you into the machine.

# Remember to set the same environment variables on your VM.
GCE_PROJECT=example-project
GCE_VM_NAME=minigo-etpu-test
GCE_ZONE=us-central1-f

# Clone the Minigo Github repository:
git clone --depth 1 https://github.com/tensorflow/minigo
cd minigo

# Install virtualenv.
pip3 install virtualenv virtualenvwrapper

# Create a virtual environment
virtualenv -p /usr/bin/python3 --system-site-packages "${HOME}/.venvs/minigo"

# Activate the virtual environment.
source "${HOME}/.venvs/minigo/bin/activate"

# Install Minigo dependencies (TensorFlow for Cloud TPU is already installed as
# part of the VM image).
pip install -r requirements.txt

# When training on a Cloud TPU, the training work directory must be on Google Cloud Storage.
# You'll need to choose your own globally unique bucket name.
# The bucket location should be close to your VM.
GCS_BUCKET_NAME=minigo_test_bucket
GCE_BUCKET_LOCATION=us-central1
gsutil mb -p "${GCE_PROJECT}" -l "${GCE_BUCKET_LOCATION}" "gs://${GCS_BUCKET_NAME}"

# Run the training script and note the location of the training work_dir
# it reports, e.g.
#    Writing to gs://minigo_test_bucket/train/2019-04-25-18
./oneoffs/train.sh "${GCS_BUCKET_NAME}"

# Launch tensorboard, pointing it at the work_dir reported by the train.sh script.
tensorboard --logdir=gs://minigo_test_bucket/train/2019-04-25-18

# After a few minutes, TensorBoard should start updating.
# Interesting graphs to look at are value_cost_normalized, policy_cost and policy_entropy.

Running Minigo on a Kubernetes Cluster

See more at cluster/README.md

minigo's People

Contributors

amj avatar artasparks avatar briandersn avatar brilee avatar chsigg avatar delock avatar dependabot[bot] avatar gitosaurus avatar godmoves avatar gr3mlin avatar jmgilmer avatar josephch405 avatar killerducky avatar mihaimaruseac avatar nealwu avatar rajpratik71 avatar schneiderl avatar sethtroisi avatar sligocki avatar tommadams avatar w-hat avatar walmsley avatar wxs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minigo's Issues

Autoformat the Python

We should use auto-formatters where possible. For python, there are a couple options. I'll probably go with autopep8, since that's pretty standard.

Upload SGFs without debug comments

This is a discussion issue. Currently our SGFs have a huge amount of debug junk. Ultimately, this means that most of our GCS usage is really big SGFs. For our 9s run, our GCS bucket contained 130gb of data, 100gb of that was SGFs. just having the moves would reduce the size by probably 1/10 or 1/100

softmax_cross_entropy_with_logits deprecated

Noticed while running tests

WARNING:tensorflow:From /Users/jhoak/inprogress/minigo/dual_net.py:274: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Code of Conduct

As is standard with all OSS projects, we should add a CodeOfConduct. Since we're part of TensorFlow, we'll want to use the TF COC.

Where to see who wins when playing with gnu go

Hi, I know this is a very naive question. I am new to Go. I played a game using my model against gnu go following the command in the readme (the twogtp command) of this repository. Anyone can tell me how to check who wins?

The gnugo.sgf is below:

(;FF[4]CA[UTF-8]AP[gogui-twogtp:1.4.9]SZ[9]
KM[6.5]PB[GNU Go]PW[Somebot-000001-model]DT[2018-02-15]
C[Black command: gnugo --mode gtp
White command: python3 main.py gtp -l models/000001-model
Black version: 3.8
White version: 0.1
Result[Black]: B+78.5
Result[White]: ?
Host: xliu-linux3 (Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz)
Date: February 15, 2018 4:09:07 PM PST]
;B[ee];W[dg];B[cf];W[di];B[cd];W[fa];B[gb];W[ig];B[gg];W[ei]
;B[ff];W[gi];B[hh];W[fc];B[fb];W[eh];B[cg];W[if];B[he];W[cc]
;B[bc];W[ie];B[hd];W[da];B[dc];W[ih];B[ch];W[ga];B[ha];W[ad]
;B[ed];W[be];B[bd];W[de];B[df];W[fg];B[hi];W[eb];B[hb];W[gd]
;B[db];W[ef];B[gh];W[bh];B[ci];W[ca];B[cb];W[ac];B[ab];W[af]
;B[ce];W[id];B[ic];W[ag];B[ba];W[gf];B[fe];W[ah];B[fh];W[ec]
;B[hf];W[ii];B[hg];W[ae];B[hc];W[ig];B[dd];W[if];B[];W[ge]
;B[];W[ie];B[];W[id];B[];W[ib];B[];W[])

And there is a gnugo.dat file:

Black: GNU Go

BlackCommand: gnugo --mode gtp

BlackLabel: GNU Go

BlackVersion: 3.8

Date: February 15, 2018 4:08:51 PM PST

Host: xliu-linux3 (Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz)

Komi: 6.5

Referee: -

Size: 9

White: Somebot-000001-model

WhiteCommand: python3 main.py gtp -l models/000001-model

WhiteLabel: Somebot-000001-model

WhiteVersion: 0.1

Xml: 0

#GAME RES_B RES_W RES_R ALT DUP LEN TIME_B TIME_W CPU_B CPU_W ERR ERR_MSG
0 B+78.5 ? ? 0 - 78 10.1 4.8 10.2 0 0

main.evaluate needs updating for vlosses, currently does not run.

Hi,

I used the bootstrap model to test the main.evaluation() function. The black and white players are the same model. However, during running, an error happened:

File "./try.py", line 13, in
main.evaluate(black_model = black_model, white_model = white_model, output_dir = out_dir, readouts = 10, games = 1, verbose = 3)
File "/home/xliu/minigo/main.py", line 113, in evaluate
black_net, white_net, games, readouts, verbose)
File "/home/xliu/minigo/evaluation.py", line 66, in play_match
prob, val, up_to=pair[num_moves % 2].root)
File "/home/xliu/minigo/mcts.py", line 197, in incorporate_results
assert not self.position.is_game_over()
AssertionError

I can not figure out the reason. Can anyone help me? Before this error, there was another error at the end of evaluation.py, that says the MCT object does not have the is_done() attribute. I fixed it by changing p[0].is_done() and p[1].is_done() to p[0].root.is_done() and p[1].root.is_done().

Thank you for your help.

Feature: nice gogui/GTP extensions

Our GTP code has a nice place to put in some GTP extensions in gtp_extensions.py
It should be pretty easy to hook up some code to dump things in the format gogui/sabaki like to see for variations or heatmaps or whatever.

it'd be a good starter project for someone interested in getting acquainted with the codebase.

Test for "symmetry divergence" during training

One of the symptoms of value net overfitting that we found was that the 8 board symmetries would yield wildly different results when put through the value net. So instead of the value net returning roughly consistent estimations, stdev=0.1 or so, it would instead disagree completely, with 7 symmetries saying that B+ 0.99, and the 8th symmetry saying W+ 0.99.

It'd be nice to log the average stdev of the 8 symmetries (or some other characterization of how divergent the symmetries are.)

A better solution for hyperparams

We have multiple hyperparams. Here's a short list:

  • go.N = 9, 19. Needs to stay constant through the entire training run. Currently being set via env variables
  • MCTS playout parameters: resign threshold, number of readouts, virtual loss parallelism, game depth limit, move threshold for softpick, probability to disable resign, tree reuse. Can be changed with each selfplay worker. Currently being set by hardcoded constants.
  • NN parameters: depth, conv layer width, fully connected layer width. Needs to stay constant through the entire training run. Currently being set via a makeshift hyperparams dict.
  • Training parameters: l2 loss, learning rate, momentum, training window size, shuffle buffer size, number of concurrent files, example select rate. Can be changed with each run of train() to get next gen. Set in a variety of places
  • Rate of training vs rate of game generation (aka how many self-play workers). This one is slightly linked to number of readouts; since training on an example is roughly 2x as slow as eval'ing an example, then to balance position generation rate, 1600 readouts would necessitate ~800 selfplay workers. Or if we wanted to generate twice as many positions as we would ever use, then we would double the number of selfplay workers.

SGF Analysis tools

It's easy to grab a bunch of SGFs with gsutil cp ... After that, we're looking at statistics largely by doing 'grep'... which is getting slower as we add more stuff to our SGF comments.

It'd be nice to:
a. See opening move stats per generation, with reflection/symmetry taken into account. This could be made pretty fast by only reading the head of the file until the 1st move, etc.
b. Figure out what moves the network is moving towards/away. This involves looking at how the visit counts after search match with the priors -- if a move (on the whole) is getting more visits than the priors suggested, after d-noise, then we can expect that move to be more represented in the next generation, & vice versa.
c. It'd be nice to look for the 'most surprising moves' in a game -- e.g., greatest difference from policy prior and final exploration rate.
d. Similarly, there are some nice questions about the times in a game when it has many options vs when it only has one or two; some nice ways to explore these would be good.

Decide on learning rate decay schedule.

We've used tf.train.exponential_decay(1e-2, global_step, 10 ** 7, 0.1). This isn't quite right, and it's possible it's the source of some of our overfitting woes. Let's check our math to get an equivalent to the AGZ paper.

Need better story for shared images/data

Right now, we're relying on a single project for images, GKE clusters, GCS buckets etc. There's lots of problems with collision, quota, etc.

It should be easy to spin up a new project (get quota, which isn't terribly easy), and then run the system.

Current plan is to use the cloud project minigo-pub, but there's still some setup that's needed.

Create evaluation server

We should have a zoo where self-play variants can play against each other to establish ratings. This could just mean figuring out the infrastructure to automatically hook new training checkpoints into CGOS, ideally without overwhelming CGOS.

Better readout allocation / time handling

(from brilee)
Currently, our self-play and tournament mode both allocate a fixed number of seconds and/or reads.

Both modes suffer from the same problem, which is that they spend their readouts inefficiently. If a response is obvious and has 800 reads compared to its siblings which have 10 reads each, then, there's no point to doing another 600 reads to reach our set point of 1600 reads per move. On the other hand, if the current leading candidate has 800 reads but the second place candidate discovered a new response and is rapidly catching up, we don't give the new candidate enough time to overtake the first place candidate. Ideally, we should truncate search at a point when all moves look sort of equally uninteresting to invest more time into (i.e. they have very similar action score.)

I don't care so much about tournament mode, but self-play could potentially achieve stronger play and using fewer reads, which should speed up the RL feedback loop. That our tournament mode would get stronger is just a nice bonus.

Comment or code problem.

I was reading the code in mtcs,py and here

minigo/mcts.py

Line 239 in e7e686d

slightly larger than unity to encourage diversity in early play and

I found that the comment does not fit the code. I was just wondering which is wrong.
.95 would be ok if you have a value of prob < 1 but at this point is equal to child_n that can also be > 1 so I guess the "squash" operation should be after the probability calculation. Or am I missing something?

Improve worker/training balance.

How should we handle self-play balance with training?

AGZ 20 block
5M games, 700 checkpoints = ~7k games/checkpoint

AGZ 40 block
29M games, 3.1M batch = 3100 checkpoints. = ~10k games/batch???

but also:
"In each iteration, αθ∗ plays 25,000 games of self-play" (AGZ)
So perhaps: the evaluator passes approximately 1/3-1/4 of all checkpoints.

AZ: 700k steps of double size (4098)
21M games. no 'checkpoints'. Would be equivalent to 15k games/checkpoint in AGZ reckoning

These seem to suggest that having more games per generation is preferable, and suggests there's not as much concern about the number of generations represented in the training window.

So, i'd like this issue to resolve:

  • how we should target our worker/playout balance
  • what changes we'll need to make to achieve it. In particular, if our training/worker jobs are subject to pre-emption or are otherwise, uh, not robust, i'd like to brainstorm what we can do to keep the two jobs in balance.

CGOS play improvements

  • upgrade to the python client
  • integrate opponent name & game end results (new extensions?)

Jitter chunk sizes

Apparently shuffle queues will round-robin over the files it's reading. So if it's reading 64 parallel files that are exactly 10,000 examples each, then it will finish reading all 64 at exactly the same time. That could result in some issues. To get rid of that, jitter the chunk sizes to be +/-25% of the desired size.

How to automatically adjust the resign threshold?

  • We've got a script that calculates our current false positive rate for a given threshold
  • we can parametrize the threshold for our workers by editing player_wrapper.sh
  • We don't have a way to automatically calculate the correct false positive rate, edit the value in player_wrapper.sh, and then rebuild the image.

Alternate idea: We could also automatically calculate the correct false positive rate, stick it in a file, and have the workers read the file.

Create validation set (self-play and professional)

For some small percent of self-play games, we should set them aside, never train on them, and check the value net loss on these games. This should help us figure out if we're overfitting.

A variant on this is to create a validation set of pro games. Self-play validation sets are nice because there's not a very large library of pro 9x9 games. But for 19x19, a pro validation set is doable.

Create a Google Group for Discussions

Currently, our repo is where we track discussions (via Issues). We could create a Google Group (i.e., a mailing list) or use some other discussion forum (Slack). We don't want to fragment discussion, but it depends on what the community wants and how much participation we have (we don't want empty rooms).

This is a discussion issue -- feel free to chime in. Currently we don't have any plans here, but that could change based on community feedback.

Integrate with Automatic/Continuous Testing

We should use a test-runner that tests PRs and maybe even does continuous testing. TensorFlow tends to use Jenkins. However, K8S has produced an awesome runner called Prow. Since this is a K8S + TF project, I think it would be a great fit.

I'll take a look at doing that integration.

Pass around saved/exported models everywhere

instead of having our selfplay workers constantly look for the .data-00000-of-00001, .index, .meta files, we should properly export our graph. Not only would this make a lot of our scripts cleaner, it'd be a good way to run some optimizers etc. prior to farming it all out.

Implement symmetries in training

We use random symmetries for MCTS selfplay, but we don't use random symmetries during training. We should add that to help stabilize training.

cos-nvidia-driver-installer needs updating to k8s 1.9, TF 1.5

doesn't work for TF 1.5
might not be correct to begin with :)
So GPUs on kubernetes are changing how they are used and consumed in 1.8 (https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)

Our current daemonset job also installs the drivers AND exposes them for scheduling -- this seems to conflict or otherwise trample on the drivers in the base NVIDIA image, and means that the cuda 9 images don't actually work.

So the work here is to:

figure out what is needed to get a working device-plugin for k8s 1.8 that exposes our GPU
get that device-plugin sorted with its nvidia drivers
get TF1.5 working
update the docs to reflect this final solution.

TensorFlow: DeprecationWarnings

When running the tests, I've noticed some DeprecationWarnings. This look like they're in the core tensorflow libraries. I wonder if we need to update or if the TF team needs to update their clients (probably the latter).

/Users/jhoak/.virtualenvs/mgz/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py:539: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
  return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)
/Users/jhoak/.virtualenvs/mgz/lib/python3.6/site-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
  if d.decorator_argspec is not None), _inspect.getargspec(target))

Using too much GPU memory

Running BOARD_SIZE=19 python3 /home/aolsen/projects/minigo/minigo/main.py gtp -l /home/aolsen/minigo-models/000199-lightning -r 100 -v 3, nvidia-smi shows it uses 2558MiB of GPU memory. After running genmove b it goes to 2662MiB. This is preventing me from running two minigo instances to test changes since my GPU only has 4GB.

LZ only uses 483MB for a larger 20x256 network (note LZ computes policy and value heads on the CPU).

@amj suggested this link https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory

Add Komi Feature

Right now, we train the AI to play with one Komi setting (usually 7.5 or 6.5). Then, the AI must play with that Komi forever or it'll get confused about the score. That's a bit too bad because it means that even if the AI gets/gives a handicap, Komi still needs to be set at 7.5 for white.

I don't see this as a huge problem, but I think it's worth tracking this feature. It also might be fun to play around with this on a 9x9-trained AI.

selfplay function only plays 1 game, can we make it play many games in one call

Hi,
Right now the selfplay function only plays 1 game per call. I changed it to the following to make it play many games in one call. I am wondering can we just load the model once and let it play many games like this code? This is a lot faster than loading the model every time we play a game.

def selfplay(
load_file: "The path to the network model files",
output_dir: "Where to write the games"="data/selfplay",
holdout_dir: "Where to write the games"="data/holdout",
output_sgf: "Where to write the sgfs"="sgf/",
readouts: 'How many simulations to run per move'=100,
verbose: '>=2 will print debug info, >=3 will print boards' = 1,
resign_threshold: 'absolute value of threshold to resign at' = 0.95,
holdout_pct: 'how many games to hold out for evaluation' = 0.05):
_ensure_dir_exists(output_sgf)
_ensure_dir_exists(output_dir)

Only load model once

with timer("Loading weights from %s ... " % load_file):
    network = dual_net.DualNetwork(load_file)
    network.name = os.path.basename(load_file)

Let the model play many games

for i in range(5000):
  with timer("Playing game"):
      player = selfplay_mcts.play(
        network, readouts, resign_threshold, verbose)

  output_name = '{}-{}'.format(int(time.time()), socket.gethostname())
  game_data = player.extract_data()
  with gfile.GFile(os.path.join(output_sgf, '{}.sgf'.format(output_name)), 'w') as f:
     f.write(player.to_sgf())

  tf_examples = preprocessing.make_dataset_from_selfplay(game_data)

  # Hold out 5% of games for evaluation.
  if random.random() < holdout_pct:
    fname = os.path.join(holdout_dir, "{}.tfrecord.zz".format(output_name))
  else:
    fname = os.path.join(output_dir, "{}.tfrecord.zz".format(output_name))

  preprocessing.write_tf_examples(fname, tf_examples)

Support more GTP commands

in particular:

  • final_score
  • final_status_list

maybe some of the time commands to make the CGOS player a little more robust.

Implement symmetries in training

We use random symmetries for MCTS selfplay, but we don't use random symmetries during training. We should add that to help stabilize training.

Move python code to subdirectory

This is a discussion issue to track a discussion we were having. Should the python code at the top-level live at the top level or in a sub-dir? I'm of the opinion it would be cleaner and more standard for it to live in something like model/ or src/, which would keep tho top level cleaner. Thoughts @brilee @amj ?

Reduce history length requirement

Nothing actionable here, just recording some thoughts:

Training from the last ~50 generations seems like an awfully long window. Early on in the training process, this seems like it would keep crappy games around for too long; later on in the training, as the RL process starts to asymptote, it seems like it might matter less. In both cases, it seems like the long history window is to prevent the network from forgetting how to play simpler moves.

All of this seems somewhat related to the "catastrophic forgetting" problem. Any solutions to that problem may be applicable somehow to the RL process.

Investigate/Monitor preemption rate

We're using preemptible GPUs. And that means that sometimes, we get preempted!
We should have better insights / monitoring around the preemption rate.

Rewrite self-play in more performant language

I've chatted in passing with @amj about this and I wanted to discuss it more formally. Self-play is, as I understand it, the most important part to speed up since more-games-faster means you can train the go-bot faster.

Here are a couple of options that I would consider:

  • C++
  • Go
  • Rust

Some factors:

Currently, we're leaning towards a rewrite in Go (maybe), since we believe that would be simple / easy to follow. Also, I've at least used Go quite a bit on Kubernetes, so there's an experience factor too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.