kowey / geni Goto Github PK
View Code? Open in Web Editor NEWSurface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar
License: GNU General Public License v2.0
Surface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar
License: GNU General Public License v2.0
The square brackets should either be optional, or mandatory but enforced. Right now they are silently accepted and ignored.
The point is to make it so that you can never build GConst []
This is just a TODO. Check the source code to see if GenI will output variables in feature structures to the third party morphological realiser.
Right now literate GenI only shows constants in the example.
Probably a wxHaskell issue :-(
Right now you have to cabal clean if you switch back and forth between the GUI and text geni.
That's stupid. The library should be independent of the compilation mode
Note the graphviz is licensed under the EPL (sigh!) so we would have to create an exception in our GPL to allow people to distribute the two as one program.
We need to replace our sorting strategy with something that works with proper data structures instead of strings.
Eva would also like to display the morph-realised outputs in the debugger or to somehow have a situation where you get the final output, you click on it and you get the tree.
Right now you can only do morphological realisation on the server side.
It would be nice if geniserver could somehow return feature structures so that clients can do morphological realisation themselves.
Right now we have our own custom format. Could making the format XML-based be useful? Actually, I have a prejudice against XML, but if we could use something lighterweight like JSON or YAML, maybe...
What on earth did I mean by this?
Should be (""), was ()
This affects geniconvert on the surface, but it's really a problem with the core printing code.
I don't understand what this is about :-(
Not just the arguments, but the relations themselves.
The morphological realisation (built-in) is dumb in that it unifies each pre-terminal node of the derived tree independently with the morphological lexicon.
This is not good, because it does not allow for mutually exclusive realisations:
he hold_s_ the apple vs you hold the apple
Right now, the workaround is to supply the necessary features via the input semantics (morphinfo file), but ideally you should be able to just make it work automatically.
Make it a required flag/input instead
This should reduce generation time somewhat. It occurred to me that there's actually a very simple way to index the generation chart: just use the semantic index or the category or even a tuple of the root node. For atomic disjunction and variables, just dump into a variable slot that we always have to look up.
Note that substitution would have to be changed so that items with open substitution sites go back at the end of the agenda instead of on the chart.
Also note that substitution sites with disjunctive or variable indices would just have to look at all chart items.
Fixed by kicking it out and making it a standalone module.
Make it really a separate program
Luciana Benotti asked:
We know that in Geni you can select tree properties in the input
semantics by using square brackets next to a literal, for example,
runs(e,j)[Active]. Is there a way of indicating a disjunction of the
tree properties? Such as runs(e,j)[Active|Passive] in order to obtain
the active and passive realizations of runs(e,j).
The index accessibility filtering code is not being maintained.
It should just be removed to reduce cruft.
{{{
dist/build/geni/geni -m examples/chatnoir/macros -l examples/chatnoir/lexicon -s examples/chatnoir/suite --verbose --testcase="le_mechant_chat_noir_chasser_le_souris" --opts='pol' --rootfeat='cat:p'
Loading test suite examples/chatnoir/suite... 4 entries
Loading trees examples/chatnoir/macros... 11 entries
Loading lexicon examples/chatnoir/lexicon... 15 entries
Loading test suite examples/chatnoir/suite... 4 entries
Lexical items selected:
noir
chat
chasser
le
le
mechant
souris
Trees anchored (family) :
noir:adj_post
chat:nC
chasser:vArity2:rel1vn0
chasser:vArity2:rel0vn1
chasser:vArity2:qu1vn0
chasser:vArity2:qu0vn1
chasser:vArity2:n0vn1
chasser:vArity2:vinfn1
le:Det
le:Det
mechant:adj_pre
souris:nC
geni: [polarities] No instances of cat in [].
}}}
The morphinfo file creates a maintenance problem -- too many files to keep track of and update. A better solution would be something that lets you supply similar information within the input semantics (test suite).
So something like
feature(m,person:2)
in the input semantics
We don't have the resources to deal with multiple encodings, so let's just standardise on a Unicode-friendly one and hope for the best.
This may induce some pain though, because we have legacy resources that are in ISO-8859-1
(foo (x y)) will give the linear order y x...
See: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HaLeX
This may affect the polarity automaton performance, but (a) probably not by much and (b) it would only be overhead. The really important thing is to be fanatical about keeping GenI minimalistic
Both unique and overall.
If you run GenI and just naively use the GUI to try and load a grammar or lexicon, nothing happens.
This was because we were using the old version of the state on reload.
The bigger problem is that my GUI code is a big pile of spaghetti :-(
I'll have to see what Carlos thinks about it.
The code
{{{
main =
do print $ head $ unify left right
print $ head $ unify right left
where
left = map (GVar . show) [1..3]
right = drop 1 left ++ [GConst ["X"]]
}}}
The output
{{{
([?1,?1,X],fromList [("1",X),("2",X),("3",X)])
([?2,?3,X],fromList [("1",X),("2",X),("3",X)])
}}}
I think it should be
The output
{{{
([X,X,X],fromList [("1",X),("2",X),("3",X)])
([X,X,X],fromList [("1",X),("2",X),("3",X)])
}}}
The fix is probably simple: just replace after unify
I could have sworn I kept seeing something like this happen.
See geni -m dist/build/grammar/valuation-sem.geni -l dist/build/lexicon/lemmas.glex -s suite/verbs --testcase=t110 and skip 140 steps
Now that we have geni --version, this should be easy enough...
MacOSX wxWidgets 2.8.10, wxHaskell 0.11.x
No idea why this happens.
The lexicon and test suite have no business using a custom ad-hoc format. Now that I know a perfectly acceptable lightweight standard language exists for this this thing (JSON), I should just use it to make life simpler for everybody.
Backward compatibility would be nice. I guess we'll have to support both, maybe even extend geniconvert?
Macros file probably unchanged. No real gain from JSON-ifying that.
Feature structures may be tricky. We'll have to think about this a bit
The bracketed output is a compromise between a full parse tree and a plain string.
Parse tree:
{{{
S(NP(somebody),VP(VP(V(saw),NP(something)),PP(somewhere)))
}}}
Bracketed output:
{{{
somebody ((saw something) somewhere)
}}}
Notice that we try very hard to avoid excess parentheses. The point is to make it easier for grammar hackers to understand (for example), why we get different instances of the same output. So we want to keep things as readable as possible
Yikes!
geni: Bug in GenI!
prependToSubst: Eric broke unification. Prepending K-t-73 twice
geni -m dist/build/grammar/valuation-sem.geni -l dist/build/lexicon/lemmas.glex -s suite/verbs --testcase=t110 --batchdir=/tmp/b
See also #43
Right now in Leopard, I have to run geni with open geni.app, which means no command line arguments, ARGH!
At least some sort of workaround would be good
This should make it easier to figure out what GenI is doing and why.
The idea is for the SFG based system to generate inputs to GenI
The "optimisation" is a now baked-in (probably always was)
And not just in configuration window...
My idea for how this should work is that it be another field in the input semantics area. I think this would take some refactoring, some kind of function that goes from config to the input semantics area and back.
OK, it doesn't have to be a monad. But I want to have some sort of abstraction that guarantees that when I do unification on something, the results from previous unification will be automatically propagated to that thing. Seems like it should be fairly straightforward. You could just model this as a state monad for example, and have the unification function get/put the substitutions state.
What may be annoying is having to write a monad transformer and slip it into our current MT stack.
The goal is to have something that makes our code easier to write, and less error-prone, while also staying cheap (we shouldn't be doing any needless traversals).
This could be a form of packing where we try to store adjunctions into the "same" node as a set (that can happen in any order).
Would only be really useful if multi-adjunction happens a lot
Hmm, now I have a better idea about the argument for separating test logic from business logic.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.