kowey / geni Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 2.0 28.32 MB

Surface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar

License: GNU General Public License v2.0

Makefile 1.07% Haskell 91.73% Shell 1.46% Perl 2.24% XSLT 2.75% CSS 0.15% C 0.60%

geni's People

Contributors

Stargazers

Watchers

Forkers

gabriella439 silky

geni's Issues

document basic flags in manual

geni accepts malformed rootfeat

The square brackets should either be optional, or mandatory but enforced. Right now they are silently accepted and ignored.

hide GenIVal constructor

The point is to make it so that you can never build GConst []

jsonify stupidmorph and tomorph

output to 3rd party morphological generator should include variables

This is just a TODO. Check the source code to see if GenI will output variables in feature structures to the third party morphological realiser.

Right now literate GenI only shows constants in the example.

scrollbar broken in results tab (summary) on MacOS X

Probably a wxHaskell issue :-(

no conditional compilation in geni library!

Right now you have to cabal clean if you switch back and forth between the GUI and text geni.
That's stupid. The library should be independent of the compilation mode

outsource to graphviz library

Note the graphviz is licensed under the EPL (sigh!) so we would have to create an exception in our GPL to allow people to distribute the two as one program.

mismatch between order of items in realisations summary tab and realisations debugger tab

We need to replace our sorting strategy with something that works with proper data structures instead of strings.

Eva would also like to display the morph-realised outputs in the debugger or to somehow have a situation where you get the final output, you click on it and you get the tree.

graphviz gets confused by multi-word lexical items

geniserver should be able to output morph features

Right now you can only do morphological realisation on the server side.

It would be nice if geniserver could somehow return feature structures so that clients can do morphological realisation themselves.

RND: what should a language for geniserver and client look like?

Right now we have our own custom format. Could making the format XML-based be useful? Actually, I have a prejudice against XML, but if we could use something lighterweight like JSON or YAML, maybe...

polarity precompilation [old tracker]

What on earth did I mean by this?

empty strings should be quoted when producing geni output

Should be (""), was ()

This affects geniconvert on the surface, but it's really a problem with the core printing code.

lexical selection factorisation [old todo list]

I don't understand what this is about :-(

support subsumption with unification variables in relations

Not just the arguments, but the relations themselves.

percolate features during morphological realisation

The morphological realisation (built-in) is dumb in that it unifies each pre-terminal node of the derived tree independently with the morphological lexicon.

This is not good, because it does not allow for mutually exclusive realisations:
he hold_s_ the apple vs you hold the apple

Right now, the workaround is to supply the necessary features via the input semantics (morphinfo file), but ideally you should be able to just make it work automatically.

This should reduce generation time somewhat. It occurred to me that there's actually a very simple way to index the generation chart: just use the semantic index or the category or even a tuple of the root node. For atomic disjunction and variables, just dump into a variable slot that we always have to look up.

Note that substitution would have to be changed so that items with open substitution sites go back at the end of the agenda instead of on the chart.

Also note that substitution sites with disjunctive or variable indices would just have to look at all chart items.

clarify use of the internal morphological generator

Fixed by kicking it out and making it a standalone module.

break geni morphological realiser out of the geni executable

Make it really a separate program

don't use root feature filtering if it's not specified

feeding non-existent test case to GenI causes it to say bug in GenI

disjunctions of paraphrase selectors

Luciana Benotti asked:

We know that in Geni you can select tree properties in the input
semantics by using square brackets next to a literal, for example,
runs(e,j)[Active]. Is there a way of indicating a disjunction of the
tree properties? Such as runs(e,j)[Active|Passive] in order to obtain
the active and passive realizations of runs(e,j).

remove IAF code

The index accessibility filtering code is not being maintained.
It should just be removed to reduce cruft.

possible polarity filtering bug

{{{
dist/build/geni/geni -m examples/chatnoir/macros -l examples/chatnoir/lexicon -s examples/chatnoir/suite --verbose --testcase="le_mechant_chat_noir_chasser_le_souris" --opts='pol' --rootfeat='cat:p'
Loading test suite examples/chatnoir/suite... 4 entries
Loading trees examples/chatnoir/macros... 11 entries
Loading lexicon examples/chatnoir/lexicon... 15 entries
Loading test suite examples/chatnoir/suite... 4 entries
Lexical items selected:
noir
chat
chasser
le
le
mechant
souris

Trees anchored (family) :
noir:adj_post
chat:nC
chasser:vArity2:rel1vn0
chasser:vArity2:rel0vn1
chasser:vArity2:qu1vn0
chasser:vArity2:qu0vn1
chasser:vArity2:n0vn1
chasser:vArity2:vinfn1
le:Det
le:Det
mechant:adj_pre
souris:nC

geni: [polarities] No instances of cat in [].
}}}

alternative syntax for anchor (or other) features? generalisation of morphinfo

The morphinfo file creates a maintenance problem -- too many files to keep track of and update. A better solution would be something that lets you supply similar information within the input semantics (test suite).

So something like
feature(m,person:2)
in the input semantics

support and require UTF-8 input/output

We don't have the resources to deal with multiple encodings, so let's just standardise on a Unicode-friendly one and hope for the best.

This may induce some pain though, because we have legacy resources that are in ISO-8859-1

results out of order when pre-terminals have more than one terminal

(foo (x y)) will give the linear order y x...

replace automaton code with HaLeX?

See: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HaLeX

This may affect the polarity automaton performance, but (a) probably not by much and (b) it would only be overhead. The really important thing is to be fanatical about keeping GenI minimalistic

display number of results found in GUI and console mode...

Both unique and overall.

clicking 'load' in config gui has no effect

If you run GenI and just naively use the GUI to try and load a grammar or lexicon, nothing happens.

This was because we were using the old version of the state on reload.

The bigger problem is that my GUI code is a big pile of spaghetti :-(

consider BSD license for GenI

I'll have to see what Carlos thinks about it.

unification bug

The code
{{{
main =
do print $ head $ unify left right
print $ head $ unify right left
where
left = map (GVar . show) [1..3]
right = drop 1 left ++ [GConst ["X"]]
}}}

The output
{{{
([?1,?1,X],fromList [("1",X),("2",X),("3",X)])
([?2,?3,X],fromList [("1",X),("2",X),("3",X)])
}}}

I think it should be

The output
{{{
([X,X,X],fromList [("1",X),("2",X),("3",X)])
([X,X,X],fromList [("1",X),("2",X),("3",X)])
}}}

The fix is probably simple: just replace after unify

listing of items seems incorrect in debugger

I could have sworn I kept seeing something like this happen.

See geni -m dist/build/grammar/valuation-sem.geni -l dist/build/lexicon/lemmas.glex -s suite/verbs --testcase=t110 and skip 140 steps

include GenI version in manual

Now that we have geni --version, this should be easy enough...

scrollbar not working in realisations summary

MacOSX wxWidgets 2.8.10, wxHaskell 0.11.x

geni: Prelude.foldr1: empty list

No idea why this happens.

JSON format where possible

The lexicon and test suite have no business using a custom ad-hoc format. Now that I know a perfectly acceptable lightweight standard language exists for this this thing (JSON), I should just use it to make life simpler for everybody.

Backward compatibility would be nice. I guess we'll have to support both, maybe even extend geniconvert?

Macros file probably unchanged. No real gain from JSON-ifying that.
Feature structures may be tricky. We'll have to think about this a bit

bracketed output (new command line argument)

The bracketed output is a compromise between a full parse tree and a plain string.

Parse tree:
{{{
S(NP(somebody),VP(VP(V(saw),NP(something)),PP(somewhere)))
}}}

Bracketed output:
{{{
somebody ((saw something) somewhere)
}}}

Notice that we try very hard to avoid excess parentheses. The point is to make it easier for grammar hackers to understand (for example), why we get different instances of the same output. So we want to keep things as readable as possible

prependToSubst: Eric broke unification. Prepending K-t-73 twice

Yikes!

geni: Bug in GenI!
prependToSubst: Eric broke unification. Prepending K-t-73 twice

geni -m dist/build/grammar/valuation-sem.geni -l dist/build/lexicon/lemmas.glex -s suite/verbs --testcase=t110 --batchdir=/tmp/b

elementary tree semantics should be unification of tree schema and lemma semantics

command line arguments in MacOS X

Right now in Leopard, I have to run geni with open geni.app, which means no command line arguments, ARGH!

At least some sort of workaround would be good

separate trash pile for last operation performed

This should make it easier to figure out what GenI is doing and why.

RND: plugging GenI into a systemic grammar based generator [old tracker]

The idea is for the SFG based system to generate inputs to GenI

remove now defunct f-root optimisation flg

The "optimisation" is a now baked-in (probably always was)

root feature in main window

And not just in configuration window...

My idea for how this should work is that it be another field in the input semantics area. I think this would take some refactoring, some kind of function that goes from config to the input semantics area and back.

unification environment

OK, it doesn't have to be a monad. But I want to have some sort of abstraction that guarantees that when I do unification on something, the results from previous unification will be automatically propagated to that thing. Seems like it should be fairly straightforward. You could just model this as a state monad for example, and have the unification function get/put the substitutions state.

What may be annoying is having to write a monad transformer and slip it into our current MT stack.

The goal is to have something that makes our code easier to write, and less error-prone, while also staying cheap (we shouldn't be doing any needless traversals).

keep track of sets of adjunctions into the same node [old tracker]

This could be a form of packing where we try to store adjunctions into the "same" node as a set (that can happen in any order).

Would only be really useful if multi-adjunction happens a lot

move NLP.GenI.Console regression testing code to test module

Hmm, now I have a better idea about the argument for separating test logic from business logic.