The coref_draft from antske

coref_draft's Issues

All mentions derived from named entities are in the wrong position

In get_mentions first all mentions detected using get_relevant_head_ids are put in the OrderedDict and only then all the mentions derived from named entites are, meaning that they aren't in order of appearance.

Pass 8 - Proper Head Word Match does not implement "No location mismatches"

See apply_proper_head_word_match

`speaker_identification` does not follow lee et al. (2013)

The intended algorithm for speaker_identification is quoted below from Lee et al. (2013),
with check marks indicating whether the rules are actually implemented:

s assigned to the same speaker are coreferent.

<you>s with the same speaker are coreferent.

The speaker and s in her text are coreferent.

(...)

The speaker and a mention which is not in the speaker's
utterance cannot be coreferent.

Two s (or two <you>s, or two <we>s) assigned to different
speakers cannot be coreferent.

Two different person pronouns by the same speaker cannot be
coreferent.

Nominal mentions cannot be coreferent with , <you>, or <we> in
the same turn or quotation.

In conversations, <you> can corefer only with the previous
speaker.

(...)
We define as I, my, me, or mine, <we> as first person
plural pronouns, and <you> as second person pronouns.

Apart from this, the current implementation makes every 3rd person pronoun coreferent with the "topic" of the quote, which can't be right if the pronouns have different markers, like gender.

Code is not thread safe

The code uses globals in a number of places, making it unsafe to use in a threaded environment.

Best solution is probably to refactor the globals into instance variables on either a new class or on the naf object, and maybe they can even be moved to KafNafParserPy as they look pretty generic (?).

Simple test program:

import sys
import random
from threading import Thread
from multisieve_coreference import process_coreference
from KafNafParserPy import KafNafParser

def run():
    while True:
        f = random.choice(fns)
        nafin = KafNafParser(open(f))
        nafin = process_coreference(nafin)

fns = sys.argv[1:]
for i in range(10):
    Thread(target=run).start()

Results in:

$ env/bin/python test.py /tmp/test*.naf

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "test.py", line 12, in run
    nafin = process_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 701, in process_coreference
    coref_classes, mentions = resolve_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 658, in resolve_coreference
    match_full_name_overlap(mentions, coref_classes)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 37, in match_full_name_overlap
    mention_string = get_string_from_ids(mention.get_span())
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 20, in get_string_from_ids
    surface_string += token_string + ' '
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Role appositive missing from sieve 4 (precise construct)

Constituents vs. dependents

What is called ConstituencyTrees is actually a collection of dependency trees.
The code uses dependency trees everywhere, but the output of Alpino also seems to contain constituency trees. Why aren't they used?

Demonym missing from sieve 4 (precise construcs)

`dump.py` possibly adds duplicate corefclass id to NAF

Nothing checks whether the created ID is already in use.

Make `get_numbers` a method of `Mention`

Prohibitions between links are not working (not implemented yet)

Coreferent chains with contradicting semantic values ('he' and 'she' references) are merged based on less significant evidence.

`Mention.main_modifiers` is not a subset of `Mention.modifiers`

main_modifiers is used only in sieves 5, 6 and 7, for the "Compatible modifiers only" constraint. Lee et al.:

Compatible modifiers only - the mention's modifiers are all included in the modifers of the antecedent candidate.
(...)
For this feature we only use modifiers that are nouns or adjectives.

This seems to imply that the main_modifers are all the modifiers that are also nouns or adjectives. Currently, however, main_modifiers contains everything in the Mention.span that is a noun or adjective, thus lots more than merely the modifiers.

possessive pronouns are not linked to their coreferent

Possessive pronouns are linked to each other, but not to their coreferent

resolve_reflexive_pronoun_structures not scientifically grounded

Merge two entities containing mentions for which all of the following hold:

they are in the same sentence
they aren't contained in each other
other is before mention

But this algorithm is wrong for Dutch (thinks Martin):

it's far too eager:
it does not check whether the antecedent is the subject.
it's too strict:
"[zich] wassen deed [hij] elke dag"
is a counter-example for the last rule

antske / coref_draft Goto Github PK

coref_draft's People

Contributors

Watchers

Forkers

coref_draft's Issues

Recommend Projects

Recommend Topics

Recommend Org