Giter Club home page Giter Club logo

coref_draft's People

Contributors

antske avatar mpvharmelen avatar vanatteveldt avatar

Watchers

 avatar  avatar

coref_draft's Issues

`speaker_identification` does not follow lee et al. (2013)

The intended algorithm for speaker_identification is quoted below from Lee et al. (2013),
with check marks indicating whether the rules are actually implemented:

  • <I>s assigned to the same speaker are coreferent.
  • <you>s with the same speaker are coreferent.
  • The speaker and <I>s in her text are coreferent.

(...)

  • The speaker and a mention which is not <I> in the speaker's
    utterance cannot be coreferent.
  • Two <I>s (or two <you>s, or two <we>s) assigned to different
    speakers cannot be coreferent.
  • Two different person pronouns by the same speaker cannot be
    coreferent.
  • Nominal mentions cannot be coreferent with <I>, <you>, or <we> in
    the same turn or quotation.
  • In conversations, <you> can corefer only with the previous
    speaker.

(...)
We define <I> as I, my, me, or mine, <we> as first person
plural pronouns, and <you> as second person pronouns.

Apart from this, the current implementation makes every 3rd person pronoun coreferent with the "topic" of the quote, which can't be right if the pronouns have different markers, like gender.

Code is not thread safe

The code uses globals in a number of places, making it unsafe to use in a threaded environment.

Best solution is probably to refactor the globals into instance variables on either a new class or on the naf object, and maybe they can even be moved to KafNafParserPy as they look pretty generic (?).

Simple test program:

import sys
import random
from threading import Thread
from multisieve_coreference import process_coreference
from KafNafParserPy import KafNafParser

def run():
    while True:
        f = random.choice(fns)
        nafin = KafNafParser(open(f))
        nafin = process_coreference(nafin)

fns = sys.argv[1:]
for i in range(10):
    Thread(target=run).start()

Results in:

$ env/bin/python test.py /tmp/test*.naf

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "test.py", line 12, in run
    nafin = process_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 701, in process_coreference
    coref_classes, mentions = resolve_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 658, in resolve_coreference
    match_full_name_overlap(mentions, coref_classes)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 37, in match_full_name_overlap
    mention_string = get_string_from_ids(mention.get_span())
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 20, in get_string_from_ids
    surface_string += token_string + ' '
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Constituents vs. dependents

  • What is called ConstituencyTrees is actually a collection of dependency trees.
  • The code uses dependency trees everywhere, but the output of Alpino also seems to contain constituency trees. Why aren't they used?

`Mention.main_modifiers` is not a subset of `Mention.modifiers`

main_modifiers is used only in sieves 5, 6 and 7, for the "Compatible modifiers only" constraint. Lee et al.:

Compatible modifiers only - the mention's modifiers are all included in the modifers of the antecedent candidate.
(...)
For this feature we only use modifiers that are nouns or adjectives.

This seems to imply that the main_modifers are all the modifiers that are also nouns or adjectives. Currently, however, main_modifiers contains everything in the Mention.span that is a noun or adjective, thus lots more than merely the modifiers.

resolve_reflexive_pronoun_structures not scientifically grounded

Merge two entities containing mentions for which all of the following hold:

  • they are in the same sentence
  • they aren't contained in each other
  • other is before mention

But this algorithm is wrong for Dutch (thinks Martin):

  • it's far too eager:
    it does not check whether the antecedent is the subject.
  • it's too strict:
    "[zich] wassen deed [hij] elke dag"
    is a counter-example for the last rule

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.