Giter Club home page Giter Club logo

database's People

Contributors

amjohnson36 avatar benediktwerner avatar dependabot[bot] avatar fitztrev avatar hugoklok12 avatar kraktus avatar lukhas avatar marcusbuffett avatar mark-dev avatar mentix02 avatar niklasf avatar ornicar avatar queensgambit avatar ramon-deniz avatar thomas-daniels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

database's Issues

switch to pbzip2

pbzip2 - parallel bzip2 file compressor, v1.1.6
https://linux.die.net/man/1/pbzip2

not so much to improve compression speed, but to speed up decompression:

Files that are compressed with pbzip2 are broken up into pieces and each individual piece is compressed. This is how pbzip2 runs faster on multiple CPUs since the pieces can be compressed simultaneously. The final .bz2 file may be slightly larger than if it was compressed with the regular bzip2 program due to this file splitting (usually less than 0.2% larger). Files that are compressed with pbzip2 will also gain considerable speedup when decompressed using pbzip2.

Files that were compressed using bzip2 will not see speedup since bzip2 packages the data into a single chunk that cannot be split between processors.

Inconsistency between PGN data and the data on the site.

Here is an example from lichess_db_standard_rated_2015-03.pgn.

One of the PGNs I am getting from this file looks in the following way:

[Event "Rated Classical game"]
[Site "https://lichess.org/cclZlRwO"]
[Date "????.??.??"]
[Round "?"]
[White "gunsti"]
[Black "azuaga"]
[Result "1/2-1/2"]
[UTCDate "2015.03.05"]
[UTCTime "23:09:12"]
[WhiteElo "1572"]
[BlackElo "1631"]
[WhiteRatingDiff "+2"]
[BlackRatingDiff "-2"]
[ECO "B30"]
[Opening "Sicilian Defense: Nyezhmetdinov-Rossolimo Attack"]
[TimeControl "360+10"]
[Termination "Normal"]

1. e4 { [%eval 0.17] } 1... c5 { [%eval 0.27] } 2. Nf3 { [%eval 0.27] } 2... Nc6 { [%eval 0.32] } 3. Bb5 { [%eval 0.45] } 3... e6 { [%eval 0.39] } 4. Nc3 { [%eval 0.26] } 4... a6 { [%eval 0.48] } 5. Bxc6 { [%eval 0.42] } 5... bxc6 { [%eval 0.41] } 6. e5 { [%eval 0.41] } 6... g6 $4 { [%eval #3] } 7. d4 $4 { [%eval 0.55] } 7... cxd4 { [%eval 0.74] } 8. Qxd4 { [%eval 0.7] } 8... Bg7 { [%eval 0.84] } 9. O-O { [%eval 0.63] } 9... Qc7 $2 { [%eval 2.18] } 10. Bf4 $2 { [%eval 0.62] } 10... Ne7 $6 { [%eval 1.38] } 11. Rad1 $6 { [%eval 0.48] } 11... Nd5 $6 { [%eval 1.32] } 12. Nxd5 $2 { [%eval 0.15] } 12... cxd5 { [%eval 0.26] } 13. c3 { [%eval 0.2] } 13... Rb8 { [%eval 0.24] } 14. b4 { [%eval 0.09] } 14... a5 { [%eval 0.32] } 15. a3 { [%eval 0.0] } 15... O-O { [%eval 0.33] } 16. Rfe1 { [%eval 0.0] } 16... Ba6 { [%eval 0.22] } 17. Qe3 { [%eval 0.0] } 17... Rbc8 { [%eval 0.13] } 18. Rc1 { [%eval 0.2] } 18... axb4 { [%eval 0.57] } 19. axb4 $6 { [%eval 0.03] } 19... Qc4 { [%eval 0.15] } 20. Bh6 { [%eval 0.08] } 20... Qd3 { [%eval 0.36] } 21. Qg5 $2 { [%eval -0.69] } 21... Qf5 $6 { [%eval 0.29] } 22. Qh4 { [%eval 0.36] } 22... Rc4 $6 { [%eval 1.15] } 23. Nd4 { [%eval 1.12] } 23... Qh5 { [%eval 1.08] } 24. Qxh5 { [%eval 1.18] } 24... gxh5 { [%eval 1.17] } 25. Bxg7 { [%eval 1.06] } 25... Kxg7 { [%eval 1.12] } 26. Re3 $6 { [%eval 0.17] } 26... Kh8 $2 { [%eval 1.73] } 27. Rh3 $6 { [%eval 0.93] } 27... Rfc8 { [%eval 0.74] } 28. Ne2 $2 { [%eval -2.1] } 28... Re4 { [%eval -1.77] } 29. Ng3 { [%eval -1.83] } 29... Rxe5 $2 { [%eval -0.25] } 30. Nxh5 { [%eval -0.26] } 30... Re4 $2 { [%eval 1.91] } 31. Nf6 { [%eval 1.91] } 31... Kg7 { [%eval 1.98] } 32. Nxe4 { [%eval 1.96] } 32... dxe4 { [%eval 1.86] } 33. Re3 $6 { [%eval 1.32] } 33... d5 { [%eval 1.43] } 34. Ra1 $2 { [%eval 0.0] } 34... Bc4 $2 { [%eval 1.34] } 35. h4 { [%eval 1.11] } 35... Bd3 { [%eval 1.18] } 36. Ra3 { [%eval 1.15] } 36... Kg6 { [%eval 1.25] } 37. g3 { [%eval 1.06] } 37... Kh5 { [%eval 1.5] } 38. Kh2 { [%eval 1.06] } 38... Kg6 { [%eval 1.13] } 39. Re1 { [%eval 1.36] } 39... Kf6 { [%eval 1.5] } 40. Kh3 { [%eval 1.39] } 40... Rg8 { [%eval 1.48] } 41. Rb3 { [%eval 1.41] } 41... Bb5 { [%eval 1.51] } 42. Rg1 { [%eval 1.08] } 42... h5 { [%eval 1.34] } 43. g4 $6 { [%eval 0.81] } 43... hxg4+ { [%eval 0.87] } 44. Rxg4 { [%eval 0.8] } 44... Rh8 { [%eval 1.07] } 45. Kg3 { [%eval 0.93] } 45... Ke5 { [%eval 1.18] } 46. Rg5+ { [%eval 1.09] } 46... f5 { [%eval 1.29] } 47. h5 { [%eval 1.36] } 47... Be2 { [%eval 1.6] } 48. Kh4 { [%eval 1.43] } 48... Bb5 { [%eval 1.63] } 49. Rg6 { [%eval 1.27] } 49... Kf4 { [%eval 1.25] } 50. Rxe6 $6 { [%eval 0.47] } 50... Be2 $6 { [%eval 1.07] } 51. Kh3 $4 { [%eval #-3] } 51... Rxh5+ { [%eval #-2] } 52. Kg2 { [%eval #-2] } 52... Rg5+ $6 { [%eval #-4] } 53. Kh2 { [%eval #-4] } 53... Rh5+ { [%eval #-3] } 54. Kg2 { [%eval #-2] } 54... Rg5+ $4 { [%eval -6.61] } 55. Kh2 $4 { [%eval #-6] } 55... Rh5+ $4 { [%eval 0.0] } 56. Kg2 { [%eval #-2] } 1/2-1/2

And the corresponding game from lichess: https://lichess.org/cclZlRwO

Notice that the PGN claims that evaluation of move 6 is mate in 3: 6... g6 $4 { [%eval #3] }. One the other hand the data on lichess looks just fine. There are other less noticeable differences. Like the first move is evaluated as 0.17 in PGN and as 0 on the site (almost every move has differences which can't be due to the rounding errors.

Any idea why this is happening?

Chess puzzles write move number as #Some(23)

The lichess puzzle db at
https://database.lichess.org/#puzzles
does not follow the specified format. Instead of encoding puzzles with the move number as in the example:

00sHx,q3k1nr/1pp1nQpp/3p4/1P2p3/4P3/B1PP1b2/B5PP/5K2 b k - 0 17,e8d7 a2e6 d7d8 f7f8,1760,80,83,72,mate mateIn2 middlegame short,https://lichess.org/yyznGmXs/black#34,Italian_Game Italian_Game_Classical_Variation
00sJ9,r3r1k1/p4ppp/2p2n2/1p6/3P1qb1/2NQR3/PPB2PP1/R1B3K1 w - - 5 18,e3g3 e8e1 g1h2 e1c1 a1c1 f4h6 h2g1 h6c1,2671,105,87,325,advantage attraction fork middlegame sacrifice veryLong,https://lichess.org/gyFeQsOE#35,French_Defense French_Defense_Exchange_Variation

The move number is written as

00008,r6k/pp2r2p/4Rp1Q/3p4/8/1N1P2R1/PqP2bPP/7K b - - 0 24,f2g3 e6e7 b2b1 b3c1 b1c1 h6c1,1951,77,94,5235,crushing hangingPiece long middlegame,[https://lichess.org/787zsVup/black#Some(47)](https://lichess.org/787zsVup/black#Some%2847%29),
0000D,5rk1/1p3ppp/pq3b2/8/8/1P1Q1N2/P4PPP/3R2K1 w - - 2 27,d3d6 f8d8 d6d8 f6d8,1470,75,96,25000,advantage endgame short,[https://lichess.org/F8M8OS71#Some(52)](https://lichess.org/F8M8OS71#Some%2852%29),
0008Q,8/4R3/1p2P3/p4r2/P6p/1P3Pk1/4K3/8 w - - 1 64,e7f7 f5e5 e2f1 e5e6,1277,75,90,488,advantage endgame rookEndgame short,[https://lichess.org/MQSyb3KW#Some(126)](https://lichess.org/MQSyb3KW#Some%28126%29),

Note the "#Some(47)" etc.

I believe this is an error caused by the fact that this line

val ply = Fen.readPly(fen)

calls readPly which returns an Option[Ply] (https://github.com/lichess-org/scalachess/blob/6cffb30ecf55b9859263bb150cafb7812e6180ef/src/main/scala/format/FenReader.scala#L99) and so serializing it results in Some in the URL.

ECO and opening in puzzles

Two fields that could provide useful information to the puzzles would be ECO and opening.
Although my interest is in a sense selfish (it would save me the time of downloading the games), I think it might be of interest to more people.
I got the idea from: Scott blog: Lichess puzzles, by ECO

chess960 games are missing fen header

[Event "Rated Chess960 tournament https://lichess.org/tournament/PgAJTxsQ"]    
[Site "https://lichess.org/aVjT1AD3"]       
[White "boxplayer"]       
[Black "carl2000"]       
[Result "0-1"]       
[UTCDate "2017.09.30"]       
[UTCTime "22:00:40"]       
[WhiteElo "1726"]       
[BlackElo "2091"]       
[WhiteRatingDiff "-2"]       
[BlackRatingDiff "+3"]       
[BlackTitle "GM"]       
[ECO "?"]       
[Opening "?"]       
[TimeControl "300+0"]       
[Termination "Normal"]       
[FEN "?"] <--
[SetUp "1"]       
[Variant "Chess960"] <--

"Opening" tag sometimes inconsistent with opening explorer

Sometimes the "opening" tag in the database PGN specifies a different opening than what the opening explorer shows. In these cases, the opening explorer has been correct in the examples I have checked.

For example, consider this game:
https://lichess.org/kOmHdNie

In the opening explorer, it is correctly classified as "Bongcloud Attack" from move 2 onward. In the database file lichess_db_standard_rated_2020-09.pgn.bz2, though, the PGN appears as follows, with the tag [Opening "King's Pawn Game"]:

[Event "Rated Blitz game"]
[Site "https://lichess.org/kOmHdNie"]
[Date "2020.09.01"]
[Round "-"]
[White "Vexation"]
[Black "Rica_Miorelli"]
[Result "1-0"]
[UTCDate "2020.09.01"]
[UTCTime "01:06:50"]
[WhiteElo "1286"]
[BlackElo "1260"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-5"]
[ECO "C20"]
[Opening "King's Pawn Game"]
[TimeControl "180+0"]
[Termination "Time forfeit"]

1. e4 { [%clk 0:03:00] } e5 { [%clk 0:03:00] } 2. Ke2 { [%clk 0:02:59] } Nf6 { [%clk 0:02:58] } 3. f3 { [%clk 0:02:58] } d6 { [%clk 0:02:56] } 4. d3 { [%clk 0:02:57] } Be7 { [%clk 0:02:55] } 5. Nh3 { [%clk 0:02:53] } O-O { [%clk 0:02:53] } 6. c3 { [%clk 0:02:50] } Nc6 { [%clk 0:02:52] } 7. Nd2 { [%clk 0:02:47] } d5 { [%clk 0:02:45] } 8. b3 { [%clk 0:02:43] } dxe4 { [%clk 0:02:43] } 9. dxe4 { [%clk 0:02:40] } a6 { [%clk 0:02:37] } 10. Nc4 { [%clk 0:02:37] } b5 { [%clk 0:02:34] } 11. Qxd8 { [%clk 0:02:34] } Rxd8 { [%clk 0:02:32] } 12. Ne3 { [%clk 0:02:29] } b4 { [%clk 0:02:26] } 13. c4 { [%clk 0:02:26] } a5 { [%clk 0:02:21] } 14. Nf5 { [%clk 0:02:13] } Nd4+ { [%clk 0:02:18] } 15. Nxd4 { [%clk 0:02:11] } exd4 { [%clk 0:02:15] } 16. Kd3 { [%clk 0:02:08] } c5 { [%clk 0:02:11] } 17. Bg5 { [%clk 0:02:03] } Bxh3 { [%clk 0:02:09] } 18. gxh3 { [%clk 0:02:02] } h6 { [%clk 0:02:07] } 19. Bxf6 { [%clk 0:01:59] } Bxf6 { [%clk 0:02:06] } 20. h4 { [%clk 0:01:58] } a4 { [%clk 0:01:56] } 21. bxa4 { [%clk 0:01:54] } Rxa4 { [%clk 0:01:54] } 22. Rg1 { [%clk 0:01:51] } b3 { [%clk 0:01:53] } 23. a3 { [%clk 0:01:48] } b2 { [%clk 0:01:50] } 24. Rb1 { [%clk 0:01:44] } Rb8 { [%clk 0:01:47] } 25. Kc2 { [%clk 0:01:41] } Rxc4+ { [%clk 0:01:39] } 26. Kd3 { [%clk 0:01:37] } Ra4 { [%clk 0:01:37] } 27. Bh3 { [%clk 0:01:33] } Rxa3+ { [%clk 0:01:34] } 28. Kc4 { [%clk 0:01:31] } Rc3+ { [%clk 0:01:27] } 29. Kd5 { [%clk 0:01:27] } Rd8+ { [%clk 0:01:21] } 30. Kc6 { [%clk 0:01:24] } Be7 { [%clk 0:01:14] } 31. Bd7 { [%clk 0:01:18] } Rxf3 { [%clk 0:01:01] } 32. Rxb2 { [%clk 0:01:16] } Rf6+ { [%clk 0:00:59] } 33. Kc7 { [%clk 0:01:14] } Ra6 { [%clk 0:00:52] } 34. Rb8 { [%clk 0:01:09] } Rxb8 { [%clk 0:00:48] } 35. Kxb8 { [%clk 0:01:07] } Bd6+ { [%clk 0:00:46] } 36. Kb7 { [%clk 0:01:06] } Ra4 { [%clk 0:00:39] } 37. Bxa4 { [%clk 0:01:05] } d3 { [%clk 0:00:34] } 38. Rd1 { [%clk 0:01:02] } c4 { [%clk 0:00:33] } 39. Kc6 { [%clk 0:01:00] } Bxh2 { [%clk 0:00:31] } 40. Bb5 { [%clk 0:00:57] } c3 { [%clk 0:00:23] } 41. Bxd3 { [%clk 0:00:55] } Bf4 { [%clk 0:00:22] } 42. Rf1 { [%clk 0:00:51] } 1-0

This game ought to be classified as "Bongcloud Attack" in the database.

In fact, "Bongcloud Attack" does not appear as an opening even once in the file. There seem to be other openings affected by this too. For example, "Fried Fox Defense" seems consistently misclassified as "Barnes Defense" in the database.

Abandoned games at the end of the month

(migrated from lichess-org/lila#9720)

Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that abandoned games had long been removed from the server.

Starting from the recent July 2021 database however, the PGN actually contains games which were cancelled/abandoned near the end of the month.

One such example game in the July 2021 database is given below, which was played/abandoned on July 30th. Note that the corresponding link https://lichess.org/zPDH02kW is now long gone, but presumably when the static export was generated the game still existed on the server.

[Event "Rated Correspondence game"]
[Site "https://lichess.org/zPDH02kW"]
[Date "2021.07.30"]
[Round "-"]
[White "Minty0209"]
[Black "Amini_alireza"]
[Result "*"]
[UTCDate "2021.07.30"]
[UTCTime "02:18:45"]
[WhiteElo "1500"]
[BlackElo "1500"]
[WhiteTitle "WFM"]
[ECO "?"]
[Opening "?"]
[TimeControl "-"]
[Termination "Abandoned"]

*

To fix the database and exclude such games (as was also done for all months prior to July 2021), maybe one could generate the static database later, when the aborted games have been removed from the server, or the export could manually filter out these aborted games.

See also the corresponding Zulip discussion.

variant games do not need ECO header

[Event "Rated Antichess game"]         
[Site "https://lichess.org/ndpqRuoj"]        
[White "RaviIndra"]        
[Black "elilly"]        
[Result "0-1"]        
[UTCDate "2017.09.30"]        
[UTCTime "22:00:03"]        
[WhiteElo "1888"]        
[BlackElo "2258"]        
[WhiteRatingDiff "-3"]        
[BlackRatingDiff "+4"]        
[ECO "?"] <--
[Opening "?"] <--
[TimeControl "30+0"]        
[Termination "Normal"]        
[Variant "Antichess"]

Handling multiple answers

Hi guys!

I am building a quick puzzle trainer for fun. github.com/chesspecker

How does Lichess puzzles handles multiple answers for examples multiple mate in one?

Here is an example:

{ PuzzleId: "GKhUo", FEN: "r5k1/p4pp1/7p/2p5/NpP1PB2/1B3pq1/PP4bb/R4Q1K w - - 6 28", Moves: "f1g2 f3g2", Rating: 1860, RatingDeviation: 257, Popularity: -100, NbPlays: 7, Themes: Array, GameUrl : "https://lichess.org/xuPt7fwP#55", }

https://lichess.org/training/GKhUo

Racing kings games starting from June 2021 have invalid FEN tag

Example:

[Event "Rated Racing Kings game"]
[Site "https://lichess.org/zjd7TXD6"]
[Date "2021.09.01"]
[Round "-"]
[White "Nvincable"]
[Black "Fabi-Car"]
[Result "1-0"]
[UTCDate "2021.09.01"]
[UTCTime "00:00:11"]
[WhiteElo "1578"]
[BlackElo "1650"]
[WhiteRatingDiff "+7"]
[BlackRatingDiff "-14"]
[TimeControl "120+0"]
[Termination "Normal"]
[FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"]
[SetUp "1"]
[Variant "Racing Kings"]

1. Kg3 { [%clk 0:02:00] } 1... Nxf2 { [%clk 0:02:00] } 2. Kf4 { [%clk 0:02:00] } 2... Nxh1 { [%clk 0:01:58] } 3. Ke5 { [%clk 0:01:56] } 3... Ka3 { [%clk 0:01:38] } 4. Kf6 { [%clk 0:01:56] } 4... Qa2 { [%clk 0:01:32] } 5. Kg7 { [%clk 0:01:45] } 5... Rb8 { [%clk 0:01:29] } 6. Rxh1 { [%clk 0:01:42] } 6... Rf8 { [%clk 0:01:21] } 7. Kxf8# { [%clk 0:01:34] } 1-0

Unfinished correspondence games

(migrated from lichess-org/lila#9721)

Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that correspondence games started in that month had long finished.

However, with new databases now being generated shortly after the end of the month, the PGN databases now actually contain correspondence games which were still in progress and therefore had many moves missing.

An example of such a game from the July 2021 database: a half-way finished correspondence game which started some time in July, but finished some time in August/September after 60 moves (119 plies), as can be seen at https://lichess.org/TY9oxOqR :

[Event "Rated Correspondence game"]
[Site "https://lichess.org/TY9oxOqR"]
[Date "2021.07.17"]
[Round "-"]
[White "mahatma09"]
[Black "bishopdaniel"]
[Result "*"]
[UTCDate "2021.07.17"]
[UTCTime "15:30:24"]
[WhiteElo "1752"]
[BlackElo "1752"]
[ECO "D00"]
[Opening "Queen's Pawn Game: Chigorin Variation"]
[TimeControl "-"]
[Termination "Unterminated"]

1. d4 d5 2. Nc3 Nc6 3. Nf3 Nf6 4. Bg5 Bg4 5. e3 e6 6. a3 Be7 7. Bb5 O-O 8. Bxc6 bxc6 9. h3 Bxf3 10. Qxf3 h6 11. Bh4 Rb8 12. Rb1 c5 13. O-O cxd4 14. exd4 Qd6 15. Rfe1 Rbe8 16. Bg3 Qd7 17. a4 Bb4 18. Re2 Bxc3 19. Qxc3 Qxa4 20. Qxc7 Rc8 21. Qf4 Rxc2 22. Rxc2 Qxc2 23. Qc1 Rc8 24. Qxc2 Rxc2 25. b4 Ne4 26. Bb8 a6 27. f3 Nc3 28. Ra1 Ra2 29. Rxa2 *

For correspondence games, it probably makes sense to make separate databases for them, and export them in batches based on the dates the games finished rather than when they started. Otherwise there will always be lots of these unfinished games in these databases, and the full games will not appear in any subsequent databases either. (Or one would have to wait several months before generating the export, as correspondence games might still be in progress.)

So, maybe the nicest solution: separate correspondence games from the main "standard chess" database, and batch that separate database according to the dates the games finished, rather than started.

See also the corresponding Zulip discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.