delph-in / jaen Goto Github PK
View Code? Open in Web Editor NEWJapanese↔English transfer grammar for machine translation
License: Other
Japanese↔English transfer grammar for machine translation
License: Other
In shared/mtr.tdl
I see that the n_adj+n_mtr
type was commented out in a commit: f094327#diff-7e0c510906e928b7e15dfa5cc54ba881R2049
The type name is duplicated in jaen/mtr.tdl
with an identical definition, so I think JaEn grammars were able to compile with that version, but if it's meant to be commented out I think the whole definition should be commented out. If not, why is it duplicated in jaen/mtr.tdl
?
From the ACE generation (ERG) log files in a translation pipeline:
25841 EP 'ja:_koto_n_nom' is not covered
25008 EP 'ja:neg_x' is not covered
24090 EP 'ja:coord_c' is not covered
21305 EP 'ja:_te_p_adjunct' is not covered
14923 EP 'ja:unspec_adj' is not covered
14923 EP 'ja:degree' is not covered
12529 EP 'ja:_you_n' is not covered
11217 EP 'ja:adversative' is not covered
7539 EP 'ja:_ni_p' is not covered
7482 EP 'ja:udef_q' is not covered
7340 EP 'ja:vv' is not covered
7122 EP 'ja:_suru_v_soc' is not covered
6587 EP 'ja:_kudasaru_v_aux' is not covered
5926 EP 'ja:_no_p' is not covered
4323 EP 'ja:_comma_d' is not covered
4054 EP 'ja:unknown_v' is not covered
3286 EP 'ja:_tokoro_n_2' is not covered
3164 EP 'ja:_ga_d' is not covered
3121 EP 'ja:_hou_n_7' is not covered
3115 EP 'ja:_はやる_v_unk' is not covered
3084 EP 'ja:_sha_a_4' is not covered
3076 EP 'ja:discourse_x' is not covered
3075 EP 'ja:_mo_d' is not covered
2934 EP 'ja:_chuu_n' is not covered
2779 EP 'ja:plus' is not covered
2309 EP 'ja:_made_p' is not covered
2267 EP 'ja:_mato_n' is not covered
2199 EP 'ja:_tame_n_5' is not covered
2190 EP 'ja:dofw' is not covered
This is a partial list. On the left are the occurrence counts. It's not surprising that Jacy predicates are not covered by the ERG, but when they are very frequent it means that JaEn should perhaps have a hand-built rule to catch the cases when the automatically extracted rules fail to transfer something. In some cases, there is such a rule, but it has become outdated. For instance, neg_x
is not covered because JaEn's rule still targets neg_v
. Similarly, JaEn targets coord
instead of coord_c
.
And here's some of those that aren't covered on the ERG side:
30754 EP 'def_q' is not covered
16051 EP 'implicit_q' is not covered
5386 EP '_good_a_at-for' is not covered
4053 EP 'of_rel_noun_mark' is not covered
3168 EP '_house_n_1' is not covered
2879 EP '_so_c' is not covered
2266 EP 'time_n' is not covered
1540 EP 'place_n' is not covered
1269 EP 'abstr_deg' is not covered
889 EP 'def_implicit_q' is not covered
848 EP '_soon_p' is not covered
794 EP '_home_p' is not covered
654 EP '_late_p' is not covered
555 EP '_here_a_1' is not covered
537 EP 'manner' is not covered
517 EP '_yesterday_a_1' is not covered
502 EP '_tomorrow_a_1' is not covered
435 EP '_bear_v_2' is not covered
383 EP '_there_a_1' is not covered
354 EP 'thing' is not covered
300 EP '_as_p_comp' is not covered
297 EP '_grandmother_n_1' is not covered
264 EP '_of_x_subord' is not covered
259 EP '_i_n_num' is not covered
240 EP 'numbered_hour' is not covered
188 EP 'pron' is not covered
There some other reasons for these, but generally it's also because the hand-built JaEn rules are out of date. The def_q
and implicit_q
ones are because the modified SEM-I for the ERG missed.
ACE warns of duplicated type names, and often the resolution is obvious, but see the following:
prepositions.mtr:
;;;
;;; This is to prevent prepositions from occurring before words like `here'.
;;; (23-sep-10; ph)
ni_p_rel-loc_nonsp_rel := preposition_mtr &
[ CONTEXT.RELS < [ PRED place_n_rel, ARG0 #x ] >,
INPUT.RELS < [ PRED "ja:_ni_p_rel", ARG2 #x ] >,
OUTPUT.RELS < [ PRED loc_nonsp_rel ] > ].
lex-exp.mtr
;;; 彼女 は どこ に 行っ た の >> Where did she go?
ni_p_rel-loc_nonsp_rel := arg12_v_mtr &
[ CONTEXT.RELS < [ PRED "ja:place_rel", ARG2 #x ] >,
JA.RELS < [ PRED "ja:_ni_p_rel", ARG0 #x ] >,
EN.RELS < [ PRED loc_nonsp_rel, ARG0 #x ] > ].
Note how the second has ja:
prefixed to the context predicate but the first one doesn't? Which will match?
Of course, this is not the only difference between the two.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.