Giter Club home page Giter Club logo

gf-tokipona's Introduction

gf-tokipona

Describe Toki Pona using Grammatical Framework.

All the features described in the official Toki Pona book (ISBN 978-0-9782923-0-0) are implemented.

Fast start

$ gf Tokipona.gf

         *  *  *
      *           *
    *               *
   *
   *
   *        * * * * * *
   *        *         *
    *       * * * *  *
      *     *      *
         *  *  *
...


Test> linearize UseCl (Clause Sina_Pron (PredNP (AdjNP (UseN Mije_W) (UseA Sona_W))))
sina mije sona .


Test> parse "sina mije sona ."
UseCl (Clause Sina_Pron (PredNP (AdjNP (UseN Mije_W) (UseA Sona_W))))
UseCl (ClausePred (PredNP (AdjNP (AdjNP Sina_Pron (UseA Mije_W)) (UseA Sona_W))))
UseCl (ClausePred (PredNP (AdjNP (AdjNP (UseN Sina_W) (UseA Mije_W)) (UseA Sona_W))))

Explanation

Example

Consider the following parse tree:

"sina mije sona ." ->

UseCl (Clause
          Sina_Pron
          (PredNP
              (AdjNP (UseN Mije_W) (UseA Sona_W))))

A Clause consists of a subject and a predicate. The subject is the pronomen sina. The predicate PredNP is a noun phrase predicate mije sona, which is a noun mije with adjective sona.

The constructor UseCl cl makes a sentence by adding a dot at the end of the clause.

All the constructors are commented in the file grammar/GrammarBase.gf.

Read the treebank

Toki Pona examples and corresponding parse trees can be found in treebank.json. If a feature is explained in chapter NN, then the corresponding entries are under ID tplangLNN*.

A sample entry from the treebank:

  {
    "en": "This is a person.",
    "gf": "UseCl (Clause (UseN Ni_W) (PredNP (UseN Jan_W)))",
    "id": "tplangL02E01",
    "tp": "ni li jan ."
  },

Each entry has:

  • unique id,
  • en: English phrase,
  • tp: Toki Pona equivalent of the English phrase,
  • gf: preferred parse tree for the Toke Pona phrase.

PU treebank

The file ./external/pu_phrases/treebank.txt contains all the phrases from the official Toki Pona book.

Test

The file treebank.json is a test fixture. To check the conversion:

$ make test

or individually:

$ make test-parse
$ make test-linearize # or just test-lin
$ make test-pu-parse

Use a regular expression in --grep to check a specific example:

python3 utils/check_parse.py --grep "L02"
python3 utils/check_linearize.py --grep "L02"

License

See LICENSE.md

Author

Oleg Parashchenko, olpa@uucode.com.

gf-tokipona home page: https://github.com/olpa/gf-tokipona.

gf-tokipona's People

Contributors

olpa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

barumau

gf-tokipona's Issues

Most probalbe parse first

All the parses for "sina mije sona .":

UseCl (Greeting (AdjNP Sina_Pron (AdjAP (UseA Mije_A) (UseA Sona_A))))
UseCl (Greeting (AdjNP (AdjNP Sina_Pron (UseA Mije_A)) (UseA Sona_A)))
UseCl (Greeting (AdjNP (AdjnpNP Sina_Pron (UseN Mije_N)) (UseA Sona_A)))
UseCl (Greeting (AdjnpNP Sina_Pron (AdjNP (UseN Mije_N) (UseA Sona_A))))
UseCl (Greeting (AdjnpNP Sina_Pron (AdjnpNP (UseN Mije_N) (UseN Sona_N))))
UseCl (Greeting (AdjnpNP (AdjNP Sina_Pron (UseA Mije_A)) (UseN Sona_N)))
UseCl (Greeting (AdjnpNP (AdjnpNP Sina_Pron (UseN Mije_N)) (UseN Sona_N)))
UseCl (PredAP Sina_Pron (AdjAP (UseA Mije_A) (UseA Sona_A)))
UseCl (PredNP Sina_Pron (AdjNP (UseN Mije_N) (UseA Sona_A)))
UseCl (PredNP Sina_Pron (AdjnpNP (UseN Mije_N) (UseN Sona_N)))
UseCl (PredVP Sina_Pron (AdjVP (UseV Mije_V) (UseA Sona_A)))

All are valid interpretations, but try to make this one come first:

UseCl (PredNP Sina_Pron (AdjNP (UseN Mije_N) (UseA Sona_A)))

Generalize `anu`

So far only anu seme is supported on sentence level. But a standalone anu should work also for nouns:

mi kute e mije anu meli .

Adjectives: likely no. It means that "mi wile telo lete anu seli" parses as "mi wile (telo lete) anu (seli)" and translates "I want (cold water) or (fire)". If one wants anu for adjectives, just repeat the noun: "mi wile telo lete anu telo seli" (I want cold water or warm water).

The official book says for "en" ("and"): between multiple subjects. So I think the same should be for "anu". TODO: fix the grammar, remove AndAP.

Verbs: we have "and" for verbs, so it makes sense to allow "or" for verbs. Something like:

mije li sona anu li toki

(The man knows or talks).

PredVP with "li" without "e"

Among of the parses for "lipu soweli li pona ."" are:

UseCl (PredVP (AdjNP (UseN Lipu_N) (UseA Soweli_A)) (UseV Pona_V))
UseCl (PredVP (AdjnpNP (UseN Lipu_N) (UseN Soweli_N)) (UseV Pona_V))

Looks wrong for me.

The source of the rule comes from the chapter 5: You can omit the object of a verb or use "ijo" as a filler object: "mije li sona e ijo" or "mije li sona".

I think the verb without an object can be considered to be a noun.

Rework grammar

  1. Change predicates from "linguistic semantics" to "traditional grammar". Consider a parse of "mi pona":
UseCl (PredAP Mi_Pron (UseA Pona_A))

In a reworked state, it should be something like

UseCl Mi_Pron (PredAP (UseA Pona_A)))
  1. A preposition (f.e. "tawa mi") can be a predicate alone

  2. A proposition shoud be attached to a predicate, not to a sentence.

  3. The only types of predicate are: NP, VP, Pred and And

  4. forbid AP as a predicate. VP can be a predicate only if it has an object.

  5. Update "and" definition (both for verbs and nouns) to produce unique parses

Drop A, N and V differences

After
3098ebd
the list of nouns and adjectives (and partially also verbs) is very similar. Think if it still makes sense to differentiate them.

parse "lipu soweli li pona ." gives:

UseCl (PredAP (AdjNP (UseN Lipu_N) (UseA Soweli_A)) (UseA Pona_A))
UseCl (PredAP (AdjnpNP (UseN Lipu_N) (UseN Soweli_N)) (UseA Pona_A))
UseCl (PredNP (AdjNP (UseN Lipu_N) (UseA Soweli_A)) (UseN Pona_N))
UseCl (PredNP (AdjnpNP (UseN Lipu_N) (UseN Soweli_N)) (UseN Pona_N))
...

For many examples, only the third parse is required.

UseCl (PredNP (AdjNP (UseN Lipu_N) (UseA Soweli_A)) (UseN Pona_N))
...

Drop A/N/V difference of words

The list of nounts, adjectives and verbs is 99% the same. No need to multiply the definition with _A, _N or _V suffix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.