Comments (11)
I think the first time it failed on add-errata because the word in question was deleted from JMdict (due to my comment in fact...), I'll try to make it work with the latest data in the coming weeks.
debugger invoked on a CL-POSTGRES-ERROR:FOREIGN-KEY-VIOLATION in thread
#<THREAD "main thread" RUNNING {1001870103}>:
Database error 23503: insert or update on table "kana_text" violates foreign key constraint "kana_text_entry_seq_foreign"
DETAIL: Key (seq)=(2209300) is not present in table "entry".
QUERY: INSERT INTO kana_text (best_kanji, nokanji, conjugate_p, common_tags, common, ord, text, seq) VALUES (NULL, false, true, E'', NULL, 0, E'たへる', 2209300) RETURNING id
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:54562, peer: 127.0.0.1:5432" {1001B65323}>)
source: (ERROR (CL-POSTGRES-ERROR::GET-ERROR-TYPE CODE) :CODE CODE :MESSAGE
(GET-FIELD #\M) :DETAIL (GET-FIELD #\D) :HINT (GET-FIELD #\H)
:CONTEXT (GET-FIELD #\W) ...)
from ichiran.
Hi, unfortunately because JMdict data always changes it's impossible to segmentation tests to always pass unless they're modified and the code has been manually calibrated. For that reason only the latest release is guaranteed to actually pass all the tests.
For example
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("猫" "は" "しっぽ" "を" "ぴんと" "立てて" "歩いた")
| but saw ("猫" "は" "しっぽ" "を" "ぴんと立てて" "歩いた")
This test failure is caused by the word ぴんと立つ being added to JMdict database on 2022-07-19. Since the latest release of Ichiran was in January 2022 the test doesn't use this word for segmentation.
As for this,
Database error 42P01: relation "kanji" does not exist
check that you have downloaded file kanjidic2.xml and specified a path to it in settings. Try manually running the following functions:
(ichiran/mnt:load-kanjidic)
(ichiran/mnt:load-kanji-stats)
from ichiran.
I understand; so the example answer is actually better than expected one, given the current state of JMDict, and what's expected needs to be adjusted.
As for kanjidic2.xml, I have it and the path is correct.
- (ichiran/mnt:load-kanjidic)
500 entries loaded
1000 entries loaded
1500 entries loaded
2000 entries loaded
2500 entries loaded
3000 entries loaded
3500 entries loaded
4000 entries loaded
4500 entries loaded
5000 entries loaded
5500 entries loaded
6000 entries loaded
6500 entries loaded
7000 entries loaded
7500 entries loaded
8000 entries loaded
8500 entries loaded
9000 entries loaded
9500 entries loaded
10000 entries loaded
10500 entries loaded
11000 entries loaded
11500 entries loaded
12000 entries loaded
12500 entries loaded
13000 entries loaded
13109 entries total
NIL- (ichiran/mnt:load-kanji-stats)
100 kanji processed
200 kanji processed
300 kanji processed
400 kanji processed
500 kanji processed
600 kanji processed
700 kanji processed
800 kanji processed
900 kanji processed
1000 kanji processed
1100 kanji processed
1200 kanji processed
1300 kanji processed
1400 kanji processed
1500 kanji processed
1600 kanji processed
1700 kanji processed
1800 kanji processed
1900 kanji processed
2000 kanji processed
2100 kanji processed
2136 kanji total
NIL
I did that right now, but it should have executed earlier as well as part of full-init, so I have to assume these were already loaded and calculated when I ran tests previously. I can't run tests again at the moment to confirm that it's still there though, as in the meantime I added in some logging to better understand ho it works, and the side-effect seems to be that the tests lock up mid-way. I think it's possible some other change to JMDict or KanjiDic might be causing the earlier error though.
from ichiran.
I repeated the procedure on a fresh database, and the 'kanji' error didn't show up. So indeed, most likely the kanjidic2 database hadn't been loaded despite full-init having finished execution, and the kanjidic2 path being already provided to it before it started.
A mystery, but apparently no longer reproducible.
It's still failing the same 31 tests, but it's expected. Closing.
from ichiran.
I reinitialized the entire database, and indeed, it turned out that there had been lingering side-effects of that crash (notably n-kanji and n-kana in many conjugations were left at 0, which wasn't causing crashing, but was causing trouble with scoring).
After the reinitialisation, it only fails on 13 tests:
Unit Test Summary
| 748 assertions total
| 735 passed
| 13 failed
| 0 execution errors
| 0 missing tests
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("だしといて") but saw ("だし" "といて")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("猫" "は" "しっぽ" "を" "ぴんと" "立てて" "歩いた")
| but saw ("猫" "は" "しっぽ" "を" "ぴんと立てて" "歩いた")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("おとめ" "に" "ふさわしい" "振る舞い") but saw ("お" "とめ" "に" "ふさわしい" "振る舞い")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("バラしちゃってる") but saw ("バラ" "しちゃってる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("ガス" "が" "ついている") but saw ("ガス" "が" "ついて" "いる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("工夫" "が" "される") but saw ("工夫" "がされる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("共感" "性") but saw ("共感性")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("しない" "かい") but saw ("し" "ないかい")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("てか" "最近" "ファン" "層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる"
"ってのは" "無謀")
| but saw ("てか" "最近" "ファン層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる" "ってのは"
"無謀")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("奴" "が" "まとも" "に" "見られない") but saw ("奴" "が" "まともに" "見られない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("体" "に" "悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
| but saw ("体に悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("雨" "が" "降りそう" "な" "気がします") but saw ("雨が降りそう" "な" "気がします")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("そういう" "お" "隣" "どうし") but saw ("そういう" "お" "隣どうし")
|
SEGMENTATION-TEST: 497 assertions passed, 13 failed.
#<TEST-RESULTS-DB Total(748) Passed(735) Failed(13) Errors(0)>
from ichiran.
dec23 branch contains code which should pass all tests on recent JMdict dumps (make sure to run (add-errata)
after updating). I'll make a new release soon unless there are some terrible issues with it (haven't tested this version much yet).
from ichiran.
I just got around to doing it, and full-init seems to be failing very early on:
* (ichiran/maintenance:full-init)
Initializing ichiran/dict...
debugger invoked on a CL-POSTGRES-ERROR:UNIQUE-VIOLATION in thread
#<THREAD "main thread" RUNNING {10010C0093}>:
Database error 23505: duplicate key value violates unique constraint "entry_pkey"
DETAIL: Key (seq)=(1000280) already exists.
QUERY: INSERT INTO entry (primary_nokanji, n_kana, n_kanji, root_p, content, seq) VALUES (false, 0, 0, true, E'<?xml version="1.0" encoding="UTF-8"?>
<entry>
<ent_seq>1000280</ent_seq>
<k_ele>
<keb>論う</keb>
</k_ele>
<r_ele>
<reb>あげつらう</reb>
</r_ele>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<misc>uk</misc>
<gloss xml:lang="eng">to discuss</gloss>
</sense>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<gloss xml:lang="eng">to find fault with</gloss>
<gloss xml:lang="eng">to criticize</gloss>
<gloss xml:lang="eng">to criticise</gloss>
</sense>
</entry>', 1000280)
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:55237, peer: 127.0.0.1:5432" {100EF410F3}>)
source: (ERROR (CL-POSTGRES-ERROR::GET-ERROR-TYPE CODE) :CODE CODE :MESSAGE
(GET-FIELD #\M) :DETAIL (GET-FIELD #\D) :HINT (GET-FIELD #\H)
:CONTEXT (GET-FIELD #\W) ...)
0] 0
*
Previous master worked correctly with the same JMDict file from around the middle of December, so I think some code change must have caused this...
To be sure, I downloaded the newest JMdict_e today's one, and tried with it, but that didn't fix anything, same crash.
Very strange, it's supposed to be dropping the tables at the beginning of full-init, and it seems impossible for the xml file to have a duplicated entry...
Maybe I should have tried just add-errata first, but I wanted to be sure it's all reset. Now I also can't try add-errata anymore, since full-init deleted the tables.
from ichiran.
from ichiran.
Actually nevermind that. This is related to a change I made to load-entry
to auto-conjugate words from data/extra.xml
EDIT: just pushed a fix to the branch
from ichiran.
Your last fix seems to have fixed that one. full-init now gets as far as the "Loading custom data..." before crashing:
Loading custom data...
debugger invoked on a CXML:WELL-FORMEDNESS-VIOLATION in thread
#<THREAD "main thread" RUNNING {10010E8093}>:
Document not well-formed: Bad attribute value delimiter #\\, must be either #\" or #\'.
Location:
Line 44, column 24 in NIL
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [ABORT] Exit debugger, returning to top level.
(CXML::%ERROR CXML:WELL-FORMEDNESS-VIOLATION #<RUNES:XSTREAM [main document :MAIN NIL]> "Document not well-formed: Bad attribute value delimiter #\\\\, must be either #\\\" or #\\'.")
source: (ERROR CLASS :FORMAT-CONTROL "~A" :FORMAT-ARGUMENTS
(LIST (GET-OUTPUT-STREAM-STRING S)))
0]
EDIT: I am going to assume the problem is that "eng" in two last seqs in extra.xml is escaped, unlike "eng" in old content in there, and edit that and restart full-init.
from ichiran.
yeah the xml file was corrupted, I fixed and added a test for it
from ichiran.
Related Issues (20)
- Paper/Explanation of algorithm used HOT 9
- Support for がい/かい suffix HOT 1
- JSON returned by ichiran/cli HOT 4
- 一箇年 and 堪へる are missing kana_text, causing internal server error HOT 2
- Used postgres version HOT 4
- Minor note about database_name HOT 2
- Include root word information for conjugated words in JSON
- ichiran-cli doesn't work HOT 8
- Support for んだ and んです suffix HOT 1
- ichiran gets 「1週間後 」wrong HOT 1
- Logging for Postgres queries HOT 4
- Docker entrypoint missing on Windows HOT 1
- `Database error 42P01: relation "kana_text" does not exist` from CLI due to `switch-conn-vars` HOT 2
- Improving expression detection HOT 1
- てもいい / でもいい dropping も out of data HOT 2
- Spliting words functionality HOT 6
- Docker compose issues with pg_restore and running tests/cli HOT 4
- Curious treatment of kanji-break list HOT 2
- Docker setup with 2024 January dump complains HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ichiran.