Giter Club home page Giter Club logo

Comments (11)

tshatrov avatar tshatrov commented on May 23, 2024 1

I think the first time it failed on add-errata because the word in question was deleted from JMdict (due to my comment in fact...), I'll try to make it work with the latest data in the coming weeks.

debugger invoked on a CL-POSTGRES-ERROR:FOREIGN-KEY-VIOLATION in thread
#<THREAD "main thread" RUNNING {1001870103}>:
  Database error 23503: insert or update on table "kana_text" violates foreign key constraint "kana_text_entry_seq_foreign"
DETAIL: Key (seq)=(2209300) is not present in table "entry".
QUERY: INSERT INTO kana_text (best_kanji, nokanji, conjugate_p, common_tags, common, ord, text, seq)  VALUES (NULL, false, true, E'', NULL, 0, E'たへる', 2209300) RETURNING id

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:54562, peer: 127.0.0.1:5432" {1001B65323}>)
   source: (ERROR (CL-POSTGRES-ERROR::GET-ERROR-TYPE CODE) :CODE CODE :MESSAGE
                  (GET-FIELD #\M) :DETAIL (GET-FIELD #\D) :HINT (GET-FIELD #\H)
                  :CONTEXT (GET-FIELD #\W) ...)

from ichiran.

tshatrov avatar tshatrov commented on May 23, 2024

Hi, unfortunately because JMdict data always changes it's impossible to segmentation tests to always pass unless they're modified and the code has been manually calibrated. For that reason only the latest release is guaranteed to actually pass all the tests.

For example

| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("猫" "は" "しっぽ" "を" "ぴんと" "立てて" "歩いた")
| but saw ("猫" "は" "しっぽ" "を" "ぴんと立てて" "歩いた")

This test failure is caused by the word ぴんと立つ being added to JMdict database on 2022-07-19. Since the latest release of Ichiran was in January 2022 the test doesn't use this word for segmentation.

As for this,

Database error 42P01: relation "kanji" does not exist

check that you have downloaded file kanjidic2.xml and specified a path to it in settings. Try manually running the following functions:

(ichiran/mnt:load-kanjidic)
(ichiran/mnt:load-kanji-stats)

from ichiran.

vpltd-kgalaj avatar vpltd-kgalaj commented on May 23, 2024

I understand; so the example answer is actually better than expected one, given the current state of JMDict, and what's expected needs to be adjusted.

As for kanjidic2.xml, I have it and the path is correct.

  • (ichiran/mnt:load-kanjidic)
    500 entries loaded
    1000 entries loaded
    1500 entries loaded
    2000 entries loaded
    2500 entries loaded
    3000 entries loaded
    3500 entries loaded
    4000 entries loaded
    4500 entries loaded
    5000 entries loaded
    5500 entries loaded
    6000 entries loaded
    6500 entries loaded
    7000 entries loaded
    7500 entries loaded
    8000 entries loaded
    8500 entries loaded
    9000 entries loaded
    9500 entries loaded
    10000 entries loaded
    10500 entries loaded
    11000 entries loaded
    11500 entries loaded
    12000 entries loaded
    12500 entries loaded
    13000 entries loaded
    13109 entries total
    NIL
  • (ichiran/mnt:load-kanji-stats)
    100 kanji processed
    200 kanji processed
    300 kanji processed
    400 kanji processed
    500 kanji processed
    600 kanji processed
    700 kanji processed
    800 kanji processed
    900 kanji processed
    1000 kanji processed
    1100 kanji processed
    1200 kanji processed
    1300 kanji processed
    1400 kanji processed
    1500 kanji processed
    1600 kanji processed
    1700 kanji processed
    1800 kanji processed
    1900 kanji processed
    2000 kanji processed
    2100 kanji processed
    2136 kanji total
    NIL

I did that right now, but it should have executed earlier as well as part of full-init, so I have to assume these were already loaded and calculated when I ran tests previously. I can't run tests again at the moment to confirm that it's still there though, as in the meantime I added in some logging to better understand ho it works, and the side-effect seems to be that the tests lock up mid-way. I think it's possible some other change to JMDict or KanjiDic might be causing the earlier error though.

from ichiran.

vpltd-kgalaj avatar vpltd-kgalaj commented on May 23, 2024

I repeated the procedure on a fresh database, and the 'kanji' error didn't show up. So indeed, most likely the kanjidic2 database hadn't been loaded despite full-init having finished execution, and the kanjidic2 path being already provided to it before it started.

A mystery, but apparently no longer reproducible.

It's still failing the same 31 tests, but it's expected. Closing.

from ichiran.

vpltd-kgalaj avatar vpltd-kgalaj commented on May 23, 2024

I reinitialized the entire database, and indeed, it turned out that there had been lingering side-effects of that crash (notably n-kanji and n-kana in many conjugations were left at 0, which wasn't causing crashing, but was causing trouble with scoring).

After the reinitialisation, it only fails on 13 tests:

Unit Test Summary
| 748 assertions total
| 735 passed
| 13 failed
| 0 execution errors
| 0 missing tests
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("だしといて") but saw ("だし" "といて")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("猫" "は" "しっぽ" "を" "ぴんと" "立てて" "歩いた")
| but saw ("猫" "は" "しっぽ" "を" "ぴんと立てて" "歩いた")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("おとめ" "に" "ふさわしい" "振る舞い") but saw ("お" "とめ" "に" "ふさわしい" "振る舞い")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("バラしちゃってる") but saw ("バラ" "しちゃってる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("ガス" "が" "ついている") but saw ("ガス" "が" "ついて" "いる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("工夫" "が" "される") but saw ("工夫" "がされる")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("共感" "性") but saw ("共感性")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("しない" "かい") but saw ("し" "ないかい")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("てか" "最近" "ファン" "層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる"
"ってのは" "無謀")
| but saw ("てか" "最近" "ファン層" "は" "円盤" "すら" "買わない" "から" "そいつら" "から" "金" "とる" "ってのは"
"無謀")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("奴" "が" "まとも" "に" "見られない") but saw ("奴" "が" "まともに" "見られない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("体" "に" "悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
| but saw ("体に悪い" "と" "知り" "ながら" "タバコをやめる" "こと" "は" "できない")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("雨" "が" "降りそう" "な" "気がします") but saw ("雨が降りそう" "な" "気がします")
|
| Failed Form: ICHIRAN/TEST::RESULT
| Expected ("そういう" "お" "隣" "どうし") but saw ("そういう" "お" "隣どうし")
|
SEGMENTATION-TEST: 497 assertions passed, 13 failed.
#<TEST-RESULTS-DB Total(748) Passed(735) Failed(13) Errors(0)>

from ichiran.

tshatrov avatar tshatrov commented on May 23, 2024

dec23 branch contains code which should pass all tests on recent JMdict dumps (make sure to run (add-errata) after updating). I'll make a new release soon unless there are some terrible issues with it (haven't tested this version much yet).

from ichiran.

vpltd-kgalaj avatar vpltd-kgalaj commented on May 23, 2024

I just got around to doing it, and full-init seems to be failing very early on:

* (ichiran/maintenance:full-init)
Initializing ichiran/dict...

debugger invoked on a CL-POSTGRES-ERROR:UNIQUE-VIOLATION in thread
#<THREAD "main thread" RUNNING {10010C0093}>:
  Database error 23505: duplicate key value violates unique constraint "entry_pkey"
DETAIL: Key (seq)=(1000280) already exists.
QUERY: INSERT INTO entry (primary_nokanji, n_kana, n_kanji, root_p, content, seq)  VALUES (false, 0, 0, true, E'<?xml version="1.0" encoding="UTF-8"?>
<entry>
<ent_seq>1000280</ent_seq>
<k_ele>
<keb>論う</keb>
</k_ele>
<r_ele>
<reb>あげつらう</reb>
</r_ele>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<misc>uk</misc>
<gloss xml:lang="eng">to discuss</gloss>
</sense>
<sense>
<pos>v5u</pos>
<pos>vt</pos>
<gloss xml:lang="eng">to find fault with</gloss>
<gloss xml:lang="eng">to criticize</gloss>
<gloss xml:lang="eng">to criticise</gloss>
</sense>
</entry>', 1000280)

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:55237, peer: 127.0.0.1:5432" {100EF410F3}>)
   source: (ERROR (CL-POSTGRES-ERROR::GET-ERROR-TYPE CODE) :CODE CODE :MESSAGE
                  (GET-FIELD #\M) :DETAIL (GET-FIELD #\D) :HINT (GET-FIELD #\H)
                  :CONTEXT (GET-FIELD #\W) ...)
0] 0
* 

Previous master worked correctly with the same JMDict file from around the middle of December, so I think some code change must have caused this...

To be sure, I downloaded the newest JMdict_e today's one, and tried with it, but that didn't fix anything, same crash.

Very strange, it's supposed to be dropping the tables at the beginning of full-init, and it seems impossible for the xml file to have a duplicated entry...

Maybe I should have tried just add-errata first, but I wanted to be sure it's all reset. Now I also can't try add-errata anymore, since full-init deleted the tables.

from ichiran.

tshatrov avatar tshatrov commented on May 23, 2024

from ichiran.

tshatrov avatar tshatrov commented on May 23, 2024

Actually nevermind that. This is related to a change I made to load-entry to auto-conjugate words from data/extra.xml

EDIT: just pushed a fix to the branch

from ichiran.

vpltd-kgalaj avatar vpltd-kgalaj commented on May 23, 2024

Your last fix seems to have fixed that one. full-init now gets as far as the "Loading custom data..." before crashing:

Loading custom data...

debugger invoked on a CXML:WELL-FORMEDNESS-VIOLATION in thread
#<THREAD "main thread" RUNNING {10010E8093}>:
  Document not well-formed: Bad attribute value delimiter #\\, must be either #\" or #\'.
Location:
  Line 44, column 24 in NIL


Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(CXML::%ERROR CXML:WELL-FORMEDNESS-VIOLATION #<RUNES:XSTREAM [main document :MAIN NIL]> "Document not well-formed: Bad attribute value delimiter #\\\\, must be either #\\\" or #\\'.")
   source: (ERROR CLASS :FORMAT-CONTROL "~A" :FORMAT-ARGUMENTS
                  (LIST (GET-OUTPUT-STREAM-STRING S)))
0] 

EDIT: I am going to assume the problem is that "eng" in two last seqs in extra.xml is escaped, unlike "eng" in old content in there, and edit that and restart full-init.

from ichiran.

tshatrov avatar tshatrov commented on May 23, 2024

yeah the xml file was corrupted, I fixed and added a test for it

from ichiran.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.