Giter Club home page Giter Club logo

jaen's People

Contributors

fcbond avatar goodmami avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

bikenshrestha

jaen's Issues

n_adj+n_mtr accidentally commented out?

In shared/mtr.tdl I see that the n_adj+n_mtr type was commented out in a commit: f094327#diff-7e0c510906e928b7e15dfa5cc54ba881R2049

The type name is duplicated in jaen/mtr.tdl with an identical definition, so I think JaEn grammars were able to compile with that version, but if it's meant to be commented out I think the whole definition should be commented out. If not, why is it duplicated in jaen/mtr.tdl?

Jacy and ERG divergence

From the ACE generation (ERG) log files in a translation pipeline:

  25841 EP 'ja:_koto_n_nom' is not covered
  25008 EP 'ja:neg_x' is not covered
  24090 EP 'ja:coord_c' is not covered
  21305 EP 'ja:_te_p_adjunct' is not covered
  14923 EP 'ja:unspec_adj' is not covered
  14923 EP 'ja:degree' is not covered
  12529 EP 'ja:_you_n' is not covered
  11217 EP 'ja:adversative' is not covered
   7539 EP 'ja:_ni_p' is not covered
   7482 EP 'ja:udef_q' is not covered
   7340 EP 'ja:vv' is not covered
   7122 EP 'ja:_suru_v_soc' is not covered
   6587 EP 'ja:_kudasaru_v_aux' is not covered
   5926 EP 'ja:_no_p' is not covered
   4323 EP 'ja:_comma_d' is not covered
   4054 EP 'ja:unknown_v' is not covered
   3286 EP 'ja:_tokoro_n_2' is not covered
   3164 EP 'ja:_ga_d' is not covered
   3121 EP 'ja:_hou_n_7' is not covered
   3115 EP 'ja:_はやる_v_unk' is not covered
   3084 EP 'ja:_sha_a_4' is not covered
   3076 EP 'ja:discourse_x' is not covered
   3075 EP 'ja:_mo_d' is not covered
   2934 EP 'ja:_chuu_n' is not covered
   2779 EP 'ja:plus' is not covered
   2309 EP 'ja:_made_p' is not covered
   2267 EP 'ja:_mato_n' is not covered
   2199 EP 'ja:_tame_n_5' is not covered
   2190 EP 'ja:dofw' is not covered

This is a partial list. On the left are the occurrence counts. It's not surprising that Jacy predicates are not covered by the ERG, but when they are very frequent it means that JaEn should perhaps have a hand-built rule to catch the cases when the automatically extracted rules fail to transfer something. In some cases, there is such a rule, but it has become outdated. For instance, neg_x is not covered because JaEn's rule still targets neg_v. Similarly, JaEn targets coord instead of coord_c.

And here's some of those that aren't covered on the ERG side:

  30754 EP 'def_q' is not covered
  16051 EP 'implicit_q' is not covered
   5386 EP '_good_a_at-for' is not covered
   4053 EP 'of_rel_noun_mark' is not covered
   3168 EP '_house_n_1' is not covered
   2879 EP '_so_c' is not covered
   2266 EP 'time_n' is not covered
   1540 EP 'place_n' is not covered
   1269 EP 'abstr_deg' is not covered
    889 EP 'def_implicit_q' is not covered
    848 EP '_soon_p' is not covered
    794 EP '_home_p' is not covered
    654 EP '_late_p' is not covered
    555 EP '_here_a_1' is not covered
    537 EP 'manner' is not covered
    517 EP '_yesterday_a_1' is not covered
    502 EP '_tomorrow_a_1' is not covered
    435 EP '_bear_v_2' is not covered
    383 EP '_there_a_1' is not covered
    354 EP 'thing' is not covered
    300 EP '_as_p_comp' is not covered
    297 EP '_grandmother_n_1' is not covered
    264 EP '_of_x_subord' is not covered
    259 EP '_i_n_num' is not covered
    240 EP 'numbered_hour' is not covered
    188 EP 'pron' is not covered

There some other reasons for these, but generally it's also because the hand-built JaEn rules are out of date. The def_q and implicit_q ones are because the modified SEM-I for the ERG missed.

ja: prefix in context or not?

ACE warns of duplicated type names, and often the resolution is obvious, but see the following:

prepositions.mtr:

;;;
;;; This is to prevent prepositions from occurring before words like `here'.
;;;                                                             (23-sep-10; ph)
ni_p_rel-loc_nonsp_rel := preposition_mtr &
[ CONTEXT.RELS < [ PRED place_n_rel, ARG0 #x ] >,
  INPUT.RELS < [ PRED "ja:_ni_p_rel", ARG2 #x ] >,
  OUTPUT.RELS < [ PRED loc_nonsp_rel ] > ].

lex-exp.mtr

;;; 彼女 は どこ に 行っ た の >> Where did she go?
ni_p_rel-loc_nonsp_rel := arg12_v_mtr &
[ CONTEXT.RELS < [ PRED "ja:place_rel", ARG2 #x ] >,
  JA.RELS < [ PRED "ja:_ni_p_rel", ARG0 #x ] >,
  EN.RELS < [ PRED loc_nonsp_rel, ARG0 #x ] > ].

Note how the second has ja: prefixed to the context predicate but the first one doesn't? Which will match?

Of course, this is not the only difference between the two.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.