pragtob / rubykon Goto Github PK

View Code? Open in Web Editor NEW

35.0 6.0 2.0 244 KB

A Go-Bot written in Ruby. Work not so much in progress anymore but also not quite done 😅

License: Other

Ruby 96.67% Shell 3.33%

rubykon's Introduction

Rubykon

A Go-Engine being built in Ruby.

Status?

mostly not updated any more, if there's new work then it's focussed on bencharmks

There is a CLI with which you can play, it does a full UCT MCTS. Still work to do on making move generation and scoring faster. Also there is no AMAF/RAVE implementation yet (which would make it a lot stronger) and it also does not use any expert knowledge right now. So still a lot to do, but it works.

Sub gems

Right now the mcts and benchmark/avg gem that I wrote for this are still embedded in here. They are bound to be broken out and released as separate gems to play with. If you want to use them now, just use rubykon and you can require mcts or benchmark/avg :)

Why would you build a Go-Bot in Ruby?

Cause it's fun.

Setting up

It should work with any standard ruby implementation. bundle install and you're ready to go.

Benchmarking

If you're here for the benchmarking, then there are a couple of useful scripts. Assuming you have asdf with both the ruby and java plugins you can run setup_all.sh which installs all rubies and JVMs for the benchmark.

You can then:

cd benchmark/
benchmark.sh mcts_avg.rb

This runs the mcts_avg.rb (adjust timings as necessary) benchmark with all the configured ruby installations. This can take a long while so you might want to comment out certain rubies/JVMs or entire sections.

Contributing

While not actively developped contributions are welcome.

Especially performance related contributions should ideally come with before/after benchmark results. Not for all ruby versions mentioned in benchmark.sh but a fair subset of them. At best you'd run mcts_avg - feel free to adjust the warmup times to be massively smaller for implementations that don't need it (see Warmup section for indications).

Ideally it'd have benchmarks for:

recent CRuby (plus points: also with --jit)
recent JRuby with invokedynamic
recent truffleruby (ideally both native and jvm but one is enough)

If that's too much ruby setup (I understand) feel free to PR and let me run the benchmarks for the missing implementations. Might take me a while though ;)

The goal of this is to make sure two things:

a.) We don't accidentally make performance worse (I had ideas for great optimizations that actually made it worse...) b.) We don't implement optimizations that benefit one implementation while making the others worse

Blog Posts

These days this is mostly used for writing performance blog posts. You can find all of them at https://pragtob.wordpress.com/tag/rubykon/

rubykon's People

Contributors

Stargazers

Watchers

Forkers

philihp lingkodsoft

rubykon's Issues

Handle wrong moves in the CLI games gracefully

not just fail them ;)

Implement heuristics to optionally use

Parallelize MCTS playout

playouts can be parallelized pretty easily while the tree is still held in one spot.

Thinking using celluloid/concurrent-ruby actors to do it.

Implement more efficient liberty counting

It doesn't support exact liberty counting, but for very lightweight playouts it's cool not to support that as it should be much faster. However, make it configurable to use the current data structure or this one (as rubykone might want to do capture races in the future and the code works and is there).

Described in this email to the computer go mailing list (couldn't find the paper so far), it is also described in this video:

Track the liberties for a group/chain using a simplified data structure that looks like struct ChainLibertyInfo { int liberty_count; int liberty_sum; int liberty_sum_squares; }

The interesting thing about this data structure is that it's a Set which can answer the isInAtari() query and the getAtariPoint() query, without having to track a lot of data. But it doesn't support enumerating all the liberties.

This data structure supports adding and removing liberties, and crucially, it supports adding the same liberty multiple times, and only counting it once for atari purposes. So, when you add a stone to a chain, you just examine the 4 neighbors of the stone, and for each one, if it's empty, add it to the liberties for that chain. Double or triple-counting the same point will still work.

Address each point on the board as an integer. When you add a liberty, that looks like: void add(int pt) { ++liberty_count; liberty_sum += pt; liberty_sum_squares += pt*pt; }

Removing liberties and merging chains is pretty straightforward.

If you add point 3 to the chain four times, then you have { liberty_count=4; liberty_sum=12; liberty_sum_squares=36; }

If you then remove point 3 four times, you get back to empty { liberty_count=0; liberty_sum=0; liberty_sum_squares=0; }

isInAtari() looks like: (liberty_count > 0) && (liberty_count * liberty_sum_squares) == (liberty_sum * liberty_sum)

(Note that this answers 'true' for the above case of adding point 3 to the chain 4 times... This chain is in atari because it only has one liberty, even though it was multiply-added)

If the chain is in atari, then the atari point is (liberty_sum / liberty_count);

This allows you to track information for life/death of groups with very little overhead. But if your heavy playouts need to know actual liberty count for capture race etc, then you'll still need something else.

Implement wdyt command and remove the current output

The wdyt command should print out what rubykon thought when playing the last move. E.g. the most popular moves it chose.

At the same time also remove the current output.

Implement GTP to play via a gui and on servers

Reuse objects

We create tons and tons of new objects for playouts, which in turn increases GC and object allocation time. Some of these objects we need, but specifically for the playouts instead of duping we could have one object that we then set to the desired state before playout.

I.e. if we have actors, this could be one game state per actore. Interesting to see the performance impact of that.

Implement parameters for CLI itnerface

Some of the questions that are asked right now (board size, number of playouts) should also be able to be passed through the CLI - e.g. through -b and -p and then of course the questions should not be asked anymore.

Implement more efficient random move generation

Instead of initiating a search again and again on the board from a random point for a valid moves keep a set of all valid moves and only remove/add to that depending on what happens on the board.

Should be much faster, especially for showing all valid moves. It's also much trickier as some moves lead to other moves being suicide moves, or captures lead to new valid moves, or a move makes a previous suicide move valid, because it is now capturing...

It's rather hard and needs insights/hooks into the group tracking, basically on the other end of the code base. Which is challenging, slowly I start to understand why most go code bases seem to be a giant mash of things with few OOP concepts :)

benchmark/avg: Implement option to remove outliers

Even with pretty big warm up times and run times (110s and 120s) truffle/graal still shows a high standard deviation:

[43.949, 18.839, 18.992, 22.223, 12.722, 11.132]
9x9 10_000 iterations         2.82 i/min  21.31 s (avg)  (± 50.77%)
[8.564, 5.856, 4.989, 5.906, 5.035, 5.461, 4.767, 4.839, 5.528, 4.724, 4.96, 4.91, 5.767, 4.65, 5.082, 4.518, 5.308, 5.398, 5.017, 4.579, 4.862, 4.715, 4.482, 5.186]
13x13 2_000 iterations        11.51 i/min  5.21 s (avg)  (± 15.47%)
[6.648, 5.634, 5.659, 7.776, 6.143, 6.006, 6.175, 6.36, 7.261, 5.867, 6.998, 6.226, 6.395, 5.742, 5.902, 6.533, 7.221, 6.662, 6.841]
19x19 1_000 iterations        9.34 i/min  6.42 s (avg)  (± 9.04%)

For now, tuning the the warmup times :)

Port to crystal - for performance and profit and benchmark comparisons!

Of course, ruby is certainly not the most efficient language to implement this in, it's a toy thing.