Comments (8)
Which temperature parameter do you mean?
-
In AZ, they set the temperature to a fixed 1 for the move selection in self-play. The engine chooses proportionally to the visit count. I have no idea why you think the temperature should decrease with the length of the game. If it's only to ensure divergence (more than to increase exploration), that would be reasonable (and match the Alpha Zero Go that originally had t=1 for the first 30 moves only). But exploration is good!
-
There is a cfg_softmax_temp that acts as an operator on the Network outputs. The main use is to allow some further tuning after the best network has been established. It also interacts with the UCT parameter.
from leela-chess.
Thank you, I mixed up the two parameters.
Now referring to the first one: "For the first 30 moves of each game, the temperature is set to τ = 1; this selects moves proportionally to their visit count in MCTS, and ensures a diverse set of positions are encountered. For the remainder of the game, an infinitesimal temperature is used, τ→0"
, I understood that deep in the search, the temperature should decay.
Sorry for being a beginner, what do you mean when you say that it ensures divergence ? and why would it be reasonable ?
from leela-chess.
I understood that deep in the search, the temperature should decay.
This parameter has nothing to do with the search or search depth. It is applied to the final search output. (And it is constant = 1 in AZ, instead of variable in AZ Go)
what do you mean when you say that it ensures divergence ? and why would it be reasonable ?
The idea is that generating more self-play games only helps if they are different. In AZ Go, there was additional randomness from rotating the board randomly, which is not present in chess. If you are only interested in playing different games (instead of also exploring moves the current network considers less good), it is reasonable to only do the randomization early on. At a certain point, the game will have diverged already.
from leela-chess.
Thanks a lot!
from leela-chess.
The infinitesimal temperature τ→0 refers to a formula in the Alphago Zero paper, which sets the move probability (before normalization) as N^(1/τ). For τ=1, this means move probability proportional to visit count, for τ→0 it means greedy selection, i.e. move with highest visit count is always selected. τ→0 is a mathematical convention, since you're not allowed to divide by zero.
from leela-chess.
In the current self-play implementation, every chosen move is the best, right ? How do we ensure divergence then ? Dirichlet noise is enough to avoid that we always produce the same game over and over ? Maybe there is another random part in the search but I don't see it.
from leela-chess.
In Leela Zero, additional randomisation is provided by the application of a random symmetry (rotation/reflection) to the board before network eval. That is harder to do in chess, but may be possible, see #25. If not, a temperature larger than 0 will provide some degree of randomness. Alpha Zero actually uses τ=1 for self play, so there's plenty of divergence there.
from leela-chess.
Thanks, and #28 answers my question too.
from leela-chess.
Related Issues (20)
- Explore every move twice before normal training self-play search HOT 14
- Adjust root magic numbers to help search find moves for training HOT 1
- Double Underpromotion and Stranded Rook
- "command stream overflowed" on AMD CAICOS GPU
- Don't use losing positions to train policy HOT 6
- Optimisation idea: skip showing info on the command line HOT 1
- LC0 misses 3 fold repetition and gets mated in 1 move HOT 1
- Create tournament layout HOT 2
- Update Wikipedia article ??
- cmake cannot find openblas
- Is lc0 uses opening book for training? HOT 2
- crash
- lc0 opencl crash
- lc0 --help doesn't work; uses stderr instead of stdout HOT 1
- Docs: Compiling on Linux requires OpenCL
- Anyone please help me
- chess-alpha-zero
- Action representation used by AlphaZero in Chess HOT 2
- dead link to training
- Unhandled exception: clGetPlatformIDs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from leela-chess.