antske / coref_draft Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
In get_mentions
first all mentions detected using get_relevant_head_ids
are put in the OrderedDict
and only then all the mentions derived from named entites are, meaning that they aren't in order of appearance.
See apply_proper_head_word_match
The intended algorithm for speaker_identification
is quoted below from Lee et al. (2013),
with check marks indicating whether the rules are actually implemented:
- <I>s assigned to the same speaker are coreferent.
- <you>s with the same speaker are coreferent.
- The speaker and <I>s in her text are coreferent.
(...)
- The speaker and a mention which is not <I> in the speaker's
utterance cannot be coreferent.- Two <I>s (or two <you>s, or two <we>s) assigned to different
speakers cannot be coreferent.- Two different person pronouns by the same speaker cannot be
coreferent.- Nominal mentions cannot be coreferent with <I>, <you>, or <we> in
the same turn or quotation.- In conversations, <you> can corefer only with the previous
speaker.(...)
We define <I> as I, my, me, or mine, <we> as first person
plural pronouns, and <you> as second person pronouns.
Apart from this, the current implementation makes every 3rd person pronoun coreferent with the "topic" of the quote, which can't be right if the pronouns have different markers, like gender.
The code uses globals in a number of places, making it unsafe to use in a threaded environment.
Best solution is probably to refactor the globals into instance variables on either a new class or on the naf object, and maybe they can even be moved to KafNafParserPy as they look pretty generic (?).
Simple test program:
import sys
import random
from threading import Thread
from multisieve_coreference import process_coreference
from KafNafParserPy import KafNafParser
def run():
while True:
f = random.choice(fns)
nafin = KafNafParser(open(f))
nafin = process_coreference(nafin)
fns = sys.argv[1:]
for i in range(10):
Thread(target=run).start()
Results in:
$ env/bin/python test.py /tmp/test*.naf
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "test.py", line 12, in run
nafin = process_coreference(nafin)
File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 701, in process_coreference
coref_classes, mentions = resolve_coreference(nafin)
File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 658, in resolve_coreference
match_full_name_overlap(mentions, coref_classes)
File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 37, in match_full_name_overlap
mention_string = get_string_from_ids(mention.get_span())
File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 20, in get_string_from_ids
surface_string += token_string + ' '
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
ConstituencyTrees
is actually a collection of dependency trees.Nothing checks whether the created ID is already in use.
Coreferent chains with contradicting semantic values ('he' and 'she' references) are merged based on less significant evidence.
main_modifiers
is used only in sieves 5, 6 and 7, for the "Compatible modifiers only" constraint. Lee et al.:
Compatible modifiers only - the mention's modifiers are all included in the modifers of the antecedent candidate.
(...)
For this feature we only use modifiers that are nouns or adjectives.
This seems to imply that the main_modifers
are all the modifiers
that are also nouns or adjectives. Currently, however, main_modifiers
contains everything in the Mention.span
that is a noun or adjective, thus lots more than merely the modifiers.
Possessive pronouns are linked to each other, but not to their coreferent
Merge two entities containing mentions for which all of the following hold:
But this algorithm is wrong for Dutch (thinks Martin):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.