lichess-org / database Goto Github PK
View Code? Open in Web Editor NEWPublic exports of all rated games, puzzles, and computer evaluations.
Home Page: https://database.lichess.org
License: GNU Affero General Public License v3.0
Public exports of all rated games, puzzles, and computer evaluations.
Home Page: https://database.lichess.org
License: GNU Affero General Public License v3.0
pbzip2 - parallel bzip2 file compressor, v1.1.6
https://linux.die.net/man/1/pbzip2
not so much to improve compression speed, but to speed up decompression:
Files that are compressed with pbzip2 are broken up into pieces and each individual piece is compressed. This is how pbzip2 runs faster on multiple CPUs since the pieces can be compressed simultaneously. The final .bz2 file may be slightly larger than if it was compressed with the regular bzip2 program due to this file splitting (usually less than 0.2% larger). Files that are compressed with pbzip2 will also gain considerable speedup when decompressed using pbzip2.
Files that were compressed using bzip2 will not see speedup since bzip2 packages the data into a single chunk that cannot be split between processors.
Here is an example from lichess_db_standard_rated_2015-03.pgn
.
One of the PGNs I am getting from this file looks in the following way:
[Event "Rated Classical game"]
[Site "https://lichess.org/cclZlRwO"]
[Date "????.??.??"]
[Round "?"]
[White "gunsti"]
[Black "azuaga"]
[Result "1/2-1/2"]
[UTCDate "2015.03.05"]
[UTCTime "23:09:12"]
[WhiteElo "1572"]
[BlackElo "1631"]
[WhiteRatingDiff "+2"]
[BlackRatingDiff "-2"]
[ECO "B30"]
[Opening "Sicilian Defense: Nyezhmetdinov-Rossolimo Attack"]
[TimeControl "360+10"]
[Termination "Normal"]
1. e4 { [%eval 0.17] } 1... c5 { [%eval 0.27] } 2. Nf3 { [%eval 0.27] } 2... Nc6 { [%eval 0.32] } 3. Bb5 { [%eval 0.45] } 3... e6 { [%eval 0.39] } 4. Nc3 { [%eval 0.26] } 4... a6 { [%eval 0.48] } 5. Bxc6 { [%eval 0.42] } 5... bxc6 { [%eval 0.41] } 6. e5 { [%eval 0.41] } 6... g6 $4 { [%eval #3] } 7. d4 $4 { [%eval 0.55] } 7... cxd4 { [%eval 0.74] } 8. Qxd4 { [%eval 0.7] } 8... Bg7 { [%eval 0.84] } 9. O-O { [%eval 0.63] } 9... Qc7 $2 { [%eval 2.18] } 10. Bf4 $2 { [%eval 0.62] } 10... Ne7 $6 { [%eval 1.38] } 11. Rad1 $6 { [%eval 0.48] } 11... Nd5 $6 { [%eval 1.32] } 12. Nxd5 $2 { [%eval 0.15] } 12... cxd5 { [%eval 0.26] } 13. c3 { [%eval 0.2] } 13... Rb8 { [%eval 0.24] } 14. b4 { [%eval 0.09] } 14... a5 { [%eval 0.32] } 15. a3 { [%eval 0.0] } 15... O-O { [%eval 0.33] } 16. Rfe1 { [%eval 0.0] } 16... Ba6 { [%eval 0.22] } 17. Qe3 { [%eval 0.0] } 17... Rbc8 { [%eval 0.13] } 18. Rc1 { [%eval 0.2] } 18... axb4 { [%eval 0.57] } 19. axb4 $6 { [%eval 0.03] } 19... Qc4 { [%eval 0.15] } 20. Bh6 { [%eval 0.08] } 20... Qd3 { [%eval 0.36] } 21. Qg5 $2 { [%eval -0.69] } 21... Qf5 $6 { [%eval 0.29] } 22. Qh4 { [%eval 0.36] } 22... Rc4 $6 { [%eval 1.15] } 23. Nd4 { [%eval 1.12] } 23... Qh5 { [%eval 1.08] } 24. Qxh5 { [%eval 1.18] } 24... gxh5 { [%eval 1.17] } 25. Bxg7 { [%eval 1.06] } 25... Kxg7 { [%eval 1.12] } 26. Re3 $6 { [%eval 0.17] } 26... Kh8 $2 { [%eval 1.73] } 27. Rh3 $6 { [%eval 0.93] } 27... Rfc8 { [%eval 0.74] } 28. Ne2 $2 { [%eval -2.1] } 28... Re4 { [%eval -1.77] } 29. Ng3 { [%eval -1.83] } 29... Rxe5 $2 { [%eval -0.25] } 30. Nxh5 { [%eval -0.26] } 30... Re4 $2 { [%eval 1.91] } 31. Nf6 { [%eval 1.91] } 31... Kg7 { [%eval 1.98] } 32. Nxe4 { [%eval 1.96] } 32... dxe4 { [%eval 1.86] } 33. Re3 $6 { [%eval 1.32] } 33... d5 { [%eval 1.43] } 34. Ra1 $2 { [%eval 0.0] } 34... Bc4 $2 { [%eval 1.34] } 35. h4 { [%eval 1.11] } 35... Bd3 { [%eval 1.18] } 36. Ra3 { [%eval 1.15] } 36... Kg6 { [%eval 1.25] } 37. g3 { [%eval 1.06] } 37... Kh5 { [%eval 1.5] } 38. Kh2 { [%eval 1.06] } 38... Kg6 { [%eval 1.13] } 39. Re1 { [%eval 1.36] } 39... Kf6 { [%eval 1.5] } 40. Kh3 { [%eval 1.39] } 40... Rg8 { [%eval 1.48] } 41. Rb3 { [%eval 1.41] } 41... Bb5 { [%eval 1.51] } 42. Rg1 { [%eval 1.08] } 42... h5 { [%eval 1.34] } 43. g4 $6 { [%eval 0.81] } 43... hxg4+ { [%eval 0.87] } 44. Rxg4 { [%eval 0.8] } 44... Rh8 { [%eval 1.07] } 45. Kg3 { [%eval 0.93] } 45... Ke5 { [%eval 1.18] } 46. Rg5+ { [%eval 1.09] } 46... f5 { [%eval 1.29] } 47. h5 { [%eval 1.36] } 47... Be2 { [%eval 1.6] } 48. Kh4 { [%eval 1.43] } 48... Bb5 { [%eval 1.63] } 49. Rg6 { [%eval 1.27] } 49... Kf4 { [%eval 1.25] } 50. Rxe6 $6 { [%eval 0.47] } 50... Be2 $6 { [%eval 1.07] } 51. Kh3 $4 { [%eval #-3] } 51... Rxh5+ { [%eval #-2] } 52. Kg2 { [%eval #-2] } 52... Rg5+ $6 { [%eval #-4] } 53. Kh2 { [%eval #-4] } 53... Rh5+ { [%eval #-3] } 54. Kg2 { [%eval #-2] } 54... Rg5+ $4 { [%eval -6.61] } 55. Kh2 $4 { [%eval #-6] } 55... Rh5+ $4 { [%eval 0.0] } 56. Kg2 { [%eval #-2] } 1/2-1/2
And the corresponding game from lichess: https://lichess.org/cclZlRwO
Notice that the PGN claims that evaluation of move 6 is mate in 3: 6... g6 $4 { [%eval #3] }
. One the other hand the data on lichess looks just fine. There are other less noticeable differences. Like the first move is evaluated as 0.17 in PGN and as 0 on the site (almost every move has differences which can't be due to the rounding errors.
Any idea why this is happening?
The lichess puzzle db at
https://database.lichess.org/#puzzles
does not follow the specified format. Instead of encoding puzzles with the move number as in the example:
00sHx,q3k1nr/1pp1nQpp/3p4/1P2p3/4P3/B1PP1b2/B5PP/5K2 b k - 0 17,e8d7 a2e6 d7d8 f7f8,1760,80,83,72,mate mateIn2 middlegame short,https://lichess.org/yyznGmXs/black#34,Italian_Game Italian_Game_Classical_Variation
00sJ9,r3r1k1/p4ppp/2p2n2/1p6/3P1qb1/2NQR3/PPB2PP1/R1B3K1 w - - 5 18,e3g3 e8e1 g1h2 e1c1 a1c1 f4h6 h2g1 h6c1,2671,105,87,325,advantage attraction fork middlegame sacrifice veryLong,https://lichess.org/gyFeQsOE#35,French_Defense French_Defense_Exchange_Variation
The move number is written as
00008,r6k/pp2r2p/4Rp1Q/3p4/8/1N1P2R1/PqP2bPP/7K b - - 0 24,f2g3 e6e7 b2b1 b3c1 b1c1 h6c1,1951,77,94,5235,crushing hangingPiece long middlegame,[https://lichess.org/787zsVup/black#Some(47)](https://lichess.org/787zsVup/black#Some%2847%29),
0000D,5rk1/1p3ppp/pq3b2/8/8/1P1Q1N2/P4PPP/3R2K1 w - - 2 27,d3d6 f8d8 d6d8 f6d8,1470,75,96,25000,advantage endgame short,[https://lichess.org/F8M8OS71#Some(52)](https://lichess.org/F8M8OS71#Some%2852%29),
0008Q,8/4R3/1p2P3/p4r2/P6p/1P3Pk1/4K3/8 w - - 1 64,e7f7 f5e5 e2f1 e5e6,1277,75,90,488,advantage endgame rookEndgame short,[https://lichess.org/MQSyb3KW#Some(126)](https://lichess.org/MQSyb3KW#Some%28126%29),
Note the "#Some(47)" etc.
I believe this is an error caused by the fact that this line
database/src/main/scala/Puzzles.scala
Line 80 in 690c473
calls readPly
which returns an Option[Ply] (https://github.com/lichess-org/scalachess/blob/6cffb30ecf55b9859263bb150cafb7812e6180ef/src/main/scala/format/FenReader.scala#L99) and so serializing it results in Some in the URL.
Two fields that could provide useful information to the puzzles would be ECO and opening.
Although my interest is in a sense selfish (it would save me the time of downloading the games), I think it might be of interest to more people.
I got the idea from: Scott blog: Lichess puzzles, by ECO
in games without engine evaluations only.
is:
1. e4 { [%clk 0:02:00] } e5 { [%clk 0:02:00] }
should be:
1. e4 { [%clk 0:02:00] } 1... e5 { [%clk 0:02:00] }
originally reported in lichess-org/lila#7811
[Event "Rated Chess960 tournament https://lichess.org/tournament/PgAJTxsQ"]
[Site "https://lichess.org/aVjT1AD3"]
[White "boxplayer"]
[Black "carl2000"]
[Result "0-1"]
[UTCDate "2017.09.30"]
[UTCTime "22:00:40"]
[WhiteElo "1726"]
[BlackElo "2091"]
[WhiteRatingDiff "-2"]
[BlackRatingDiff "+3"]
[BlackTitle "GM"]
[ECO "?"]
[Opening "?"]
[TimeControl "300+0"]
[Termination "Normal"]
[FEN "?"] <--
[SetUp "1"]
[Variant "Chess960"] <--
Sometimes the "opening" tag in the database PGN specifies a different opening than what the opening explorer shows. In these cases, the opening explorer has been correct in the examples I have checked.
For example, consider this game:
https://lichess.org/kOmHdNie
In the opening explorer, it is correctly classified as "Bongcloud Attack" from move 2 onward. In the database file lichess_db_standard_rated_2020-09.pgn.bz2
, though, the PGN appears as follows, with the tag [Opening "King's Pawn Game"]
:
[Event "Rated Blitz game"]
[Site "https://lichess.org/kOmHdNie"]
[Date "2020.09.01"]
[Round "-"]
[White "Vexation"]
[Black "Rica_Miorelli"]
[Result "1-0"]
[UTCDate "2020.09.01"]
[UTCTime "01:06:50"]
[WhiteElo "1286"]
[BlackElo "1260"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-5"]
[ECO "C20"]
[Opening "King's Pawn Game"]
[TimeControl "180+0"]
[Termination "Time forfeit"]
1. e4 { [%clk 0:03:00] } e5 { [%clk 0:03:00] } 2. Ke2 { [%clk 0:02:59] } Nf6 { [%clk 0:02:58] } 3. f3 { [%clk 0:02:58] } d6 { [%clk 0:02:56] } 4. d3 { [%clk 0:02:57] } Be7 { [%clk 0:02:55] } 5. Nh3 { [%clk 0:02:53] } O-O { [%clk 0:02:53] } 6. c3 { [%clk 0:02:50] } Nc6 { [%clk 0:02:52] } 7. Nd2 { [%clk 0:02:47] } d5 { [%clk 0:02:45] } 8. b3 { [%clk 0:02:43] } dxe4 { [%clk 0:02:43] } 9. dxe4 { [%clk 0:02:40] } a6 { [%clk 0:02:37] } 10. Nc4 { [%clk 0:02:37] } b5 { [%clk 0:02:34] } 11. Qxd8 { [%clk 0:02:34] } Rxd8 { [%clk 0:02:32] } 12. Ne3 { [%clk 0:02:29] } b4 { [%clk 0:02:26] } 13. c4 { [%clk 0:02:26] } a5 { [%clk 0:02:21] } 14. Nf5 { [%clk 0:02:13] } Nd4+ { [%clk 0:02:18] } 15. Nxd4 { [%clk 0:02:11] } exd4 { [%clk 0:02:15] } 16. Kd3 { [%clk 0:02:08] } c5 { [%clk 0:02:11] } 17. Bg5 { [%clk 0:02:03] } Bxh3 { [%clk 0:02:09] } 18. gxh3 { [%clk 0:02:02] } h6 { [%clk 0:02:07] } 19. Bxf6 { [%clk 0:01:59] } Bxf6 { [%clk 0:02:06] } 20. h4 { [%clk 0:01:58] } a4 { [%clk 0:01:56] } 21. bxa4 { [%clk 0:01:54] } Rxa4 { [%clk 0:01:54] } 22. Rg1 { [%clk 0:01:51] } b3 { [%clk 0:01:53] } 23. a3 { [%clk 0:01:48] } b2 { [%clk 0:01:50] } 24. Rb1 { [%clk 0:01:44] } Rb8 { [%clk 0:01:47] } 25. Kc2 { [%clk 0:01:41] } Rxc4+ { [%clk 0:01:39] } 26. Kd3 { [%clk 0:01:37] } Ra4 { [%clk 0:01:37] } 27. Bh3 { [%clk 0:01:33] } Rxa3+ { [%clk 0:01:34] } 28. Kc4 { [%clk 0:01:31] } Rc3+ { [%clk 0:01:27] } 29. Kd5 { [%clk 0:01:27] } Rd8+ { [%clk 0:01:21] } 30. Kc6 { [%clk 0:01:24] } Be7 { [%clk 0:01:14] } 31. Bd7 { [%clk 0:01:18] } Rxf3 { [%clk 0:01:01] } 32. Rxb2 { [%clk 0:01:16] } Rf6+ { [%clk 0:00:59] } 33. Kc7 { [%clk 0:01:14] } Ra6 { [%clk 0:00:52] } 34. Rb8 { [%clk 0:01:09] } Rxb8 { [%clk 0:00:48] } 35. Kxb8 { [%clk 0:01:07] } Bd6+ { [%clk 0:00:46] } 36. Kb7 { [%clk 0:01:06] } Ra4 { [%clk 0:00:39] } 37. Bxa4 { [%clk 0:01:05] } d3 { [%clk 0:00:34] } 38. Rd1 { [%clk 0:01:02] } c4 { [%clk 0:00:33] } 39. Kc6 { [%clk 0:01:00] } Bxh2 { [%clk 0:00:31] } 40. Bb5 { [%clk 0:00:57] } c3 { [%clk 0:00:23] } 41. Bxd3 { [%clk 0:00:55] } Bf4 { [%clk 0:00:22] } 42. Rf1 { [%clk 0:00:51] } 1-0
This game ought to be classified as "Bongcloud Attack" in the database.
In fact, "Bongcloud Attack" does not appear as an opening even once in the file. There seem to be other openings affected by this too. For example, "Fried Fox Defense" seems consistently misclassified as "Barnes Defense" in the database.
(migrated from lichess-org/lila#9720)
Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that abandoned games had long been removed from the server.
Starting from the recent July 2021 database however, the PGN actually contains games which were cancelled/abandoned near the end of the month.
One such example game in the July 2021 database is given below, which was played/abandoned on July 30th. Note that the corresponding link https://lichess.org/zPDH02kW is now long gone, but presumably when the static export was generated the game still existed on the server.
[Event "Rated Correspondence game"]
[Site "https://lichess.org/zPDH02kW"]
[Date "2021.07.30"]
[Round "-"]
[White "Minty0209"]
[Black "Amini_alireza"]
[Result "*"]
[UTCDate "2021.07.30"]
[UTCTime "02:18:45"]
[WhiteElo "1500"]
[BlackElo "1500"]
[WhiteTitle "WFM"]
[ECO "?"]
[Opening "?"]
[TimeControl "-"]
[Termination "Abandoned"]
*
To fix the database and exclude such games (as was also done for all months prior to July 2021), maybe one could generate the static database later, when the aborted games have been removed from the server, or the export could manually filter out these aborted games.
See also the corresponding Zulip discussion.
There are a few strangely evaluated games:
where each position is mate in 5. All such games happened on Dec 9 2016. Probably there are more games like these. Why has this happened?
[Event "Rated Antichess game"]
[Site "https://lichess.org/ndpqRuoj"]
[White "RaviIndra"]
[Black "elilly"]
[Result "0-1"]
[UTCDate "2017.09.30"]
[UTCTime "22:00:03"]
[WhiteElo "1888"]
[BlackElo "2258"]
[WhiteRatingDiff "-3"]
[BlackRatingDiff "+4"]
[ECO "?"] <--
[Opening "?"] <--
[TimeControl "30+0"]
[Termination "Normal"]
[Variant "Antichess"]
Hi guys!
I am building a quick puzzle trainer for fun. github.com/chesspecker
How does Lichess puzzles handles multiple answers for examples multiple mate in one?
Here is an example:
{ PuzzleId: "GKhUo", FEN: "r5k1/p4pp1/7p/2p5/NpP1PB2/1B3pq1/PP4bb/R4Q1K w - - 6 28", Moves: "f1g2 f3g2", Rating: 1860, RatingDeviation: 257, Popularity: -100, NbPlays: 7, Themes: Array, GameUrl : "https://lichess.org/xuPt7fwP#55", }
Example:
[Event "Rated Racing Kings game"]
[Site "https://lichess.org/zjd7TXD6"]
[Date "2021.09.01"]
[Round "-"]
[White "Nvincable"]
[Black "Fabi-Car"]
[Result "1-0"]
[UTCDate "2021.09.01"]
[UTCTime "00:00:11"]
[WhiteElo "1578"]
[BlackElo "1650"]
[WhiteRatingDiff "+7"]
[BlackRatingDiff "-14"]
[TimeControl "120+0"]
[Termination "Normal"]
[FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"]
[SetUp "1"]
[Variant "Racing Kings"]
1. Kg3 { [%clk 0:02:00] } 1... Nxf2 { [%clk 0:02:00] } 2. Kf4 { [%clk 0:02:00] } 2... Nxh1 { [%clk 0:01:58] } 3. Ke5 { [%clk 0:01:56] } 3... Ka3 { [%clk 0:01:38] } 4. Kf6 { [%clk 0:01:56] } 4... Qa2 { [%clk 0:01:32] } 5. Kg7 { [%clk 0:01:45] } 5... Rb8 { [%clk 0:01:29] } 6. Rxh1 { [%clk 0:01:42] } 6... Rf8 { [%clk 0:01:21] } 7. Kxf8# { [%clk 0:01:34] } 1-0
(migrated from lichess-org/lila#9721)
Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that correspondence games started in that month had long finished.
However, with new databases now being generated shortly after the end of the month, the PGN databases now actually contain correspondence games which were still in progress and therefore had many moves missing.
An example of such a game from the July 2021 database: a half-way finished correspondence game which started some time in July, but finished some time in August/September after 60 moves (119 plies), as can be seen at https://lichess.org/TY9oxOqR :
[Event "Rated Correspondence game"]
[Site "https://lichess.org/TY9oxOqR"]
[Date "2021.07.17"]
[Round "-"]
[White "mahatma09"]
[Black "bishopdaniel"]
[Result "*"]
[UTCDate "2021.07.17"]
[UTCTime "15:30:24"]
[WhiteElo "1752"]
[BlackElo "1752"]
[ECO "D00"]
[Opening "Queen's Pawn Game: Chigorin Variation"]
[TimeControl "-"]
[Termination "Unterminated"]
1. d4 d5 2. Nc3 Nc6 3. Nf3 Nf6 4. Bg5 Bg4 5. e3 e6 6. a3 Be7 7. Bb5 O-O 8. Bxc6 bxc6 9. h3 Bxf3 10. Qxf3 h6 11. Bh4 Rb8 12. Rb1 c5 13. O-O cxd4 14. exd4 Qd6 15. Rfe1 Rbe8 16. Bg3 Qd7 17. a4 Bb4 18. Re2 Bxc3 19. Qxc3 Qxa4 20. Qxc7 Rc8 21. Qf4 Rxc2 22. Rxc2 Qxc2 23. Qc1 Rc8 24. Qxc2 Rxc2 25. b4 Ne4 26. Bb8 a6 27. f3 Nc3 28. Ra1 Ra2 29. Rxa2 *
For correspondence games, it probably makes sense to make separate databases for them, and export them in batches based on the dates the games finished rather than when they started. Otherwise there will always be lots of these unfinished games in these databases, and the full games will not appear in any subsequent databases either. (Or one would have to wait several months before generating the export, as correspondence games might still be in progress.)
So, maybe the nicest solution: separate correspondence games from the main "standard chess" database, and batch that separate database according to the dates the games finished, rather than started.
See also the corresponding Zulip discussion.
With a test description
Allow to download all games for certain user in a PGN database.
There is exactly 7 such games. Ranging from 2014 to 2016
https://lichess.org/0ouSssGU
https://lichess.org/wBhVZYgI
https://lichess.org/g2HCpy8B
https://lichess.org/gI2EuG6T
https://lichess.org/w2WvLGHQ
https://lichess.org/XDQeUk6j
https://lichess.org/G7KuUEum
All illegal moves were castling moves.
It might be worth to document this because people using the files may not notice at first.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.