Giter Club home page Giter Club logo

Comments (7)

ryancasburn-KAI avatar ryancasburn-KAI commented on June 2, 2024

In my opinion, john's closed tasks is ambiguous because john has closed tasks is a perfectly reasonable phrase as is the possessive form. The confusion here comes from the word "closed" which can be an adjective or a past tense verb. I'm not an expert on plugins in this package yet, but I'd guess there would be a way for a plugin to lean to and prioritize the possessive tag.

However, I think that there is a bug with john’s neat documents about georgia

It's being parsed as "john is neat documents about georgia" which is almost certainly going to be an incorrect tagging in all cases. It also would be unlikely for it to be "john has neat documents about georgia" as (from what I've come to understand as a non-linguist) the has contraction generally is only used before a past participle verb. So, this one at least should be getting recognized as a possessive. I'd think by:

let twoTerm = terms[i + 2]
if (twoTerm && twoTerm.tags.has('Noun') && !twoTerm.tags.has('Pronoun')) {
return true
}

This one ultimately comes down to switches and is related to #1070 that I opened a couple of days ago. The tagger initially thinks that "documents" is a plural noun (which is correct), but changes it to a present tense verb (which is incorrect) because the next word is about, which locks in the word documents to be a verb (like "John documents Georgia wildlife"). I'll keep this in mind as I continue thinking through an improvement to the switches. Happy for your insight and Spencer's too.

An example which may be key in improving this:

John talks about Georgia

Talks should be a verb. This is currently handled because of the "about" lock.

John's talks about Georgia

talks here should be a plural noun. We know this because John is talks about Georgia doesn't make sense, John has talks about Georgia is an improper use of the contraction, so John's must be possessive. Are those all good assumptions? and a possessive needs to be followed by a noun chunk (ie noun alone, or adjective + noun, etc)

One more to make it more confusing:

John's nuts about Georgia

Should be "John is nuts about Georgia." How does this fit in with everything else? what makes talks and nuts different?

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

Yep, Ryan you've got it dead-on. Well done.
There's an is/has classifier, and it does an okay job, but it runs really early in the tagger.
These ambiguous cases like john's talks about are (often) shaken-out further downstream.

I'm open to suggestions about how to improve this, as it produces pretty-bad outcomes when it's wrong.

I've always wanted to keep things one-pass. Changing the contraction back, after the tagger had made various decisions, seems like a difficult solution.

The good news is that many of these problem words like 'talks' are flagged, and we can add careful rules about them to is/has classifier. We could add some extra look-arounds there, to mitigate this.

Happy to help plug away at this. Thank you Caleb and Ryan.

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

added some tests to dev, for the is-has and did-would contractions. Bout 30 of 200 failing. Seems like a fun one.

I think the john’s neat documents example is tripping-up on the unicode apostrophe, which I will take a look at next week.
cheers

from compromise.

calebmer avatar calebmer commented on June 2, 2024

Thank y’all for looking into this! @ryancasburn-KAI you mentioned:

…but I'd guess there would be a way for a plugin to lean to and prioritize the possessive tag…

From poking around the source code, I can’t see an obvious way to use a plugin to prioritize the possessive tag. Is there some plugin capability I’m missing? Can I write a plugin the pre-emptively tags a 's contraction as possessive? (Unclear if that would be respected from a quick read of the source code.)

from compromise.

ryancasburn-KAI avatar ryancasburn-KAI commented on June 2, 2024

@calebmer - you can try this

const plugin = {
  compute: {
    custPossessives: doc => doc.match("(#Person &&/'s$/)").tag('Possessive'),
  },
}
nlp.plugin(plugin)
nlp._world.hooks.splice(7, 0, 'custPossessives')


console.log(nlp("john's closed tasks").json()[0].terms)

The only thing this seems to get wrong is that "closed" is still labeled as a verb, even though "John's" is a possessive.

This comes down to

// rough sort, so 'Noun' is after ProperNoun, etc
let tags = Array.from(term.tags).sort((a, b) => {
let numA = tagSet[a] ? tagSet[a].parents.length : 0
let numB = tagSet[b] ? tagSet[b].parents.length : 0
return numA > numB ? -1 : 1
})

This puts the sort order as:

  1. MaleName
  2. FirstName
  3. Person
  4. Singular
  5. ProperNoun
  6. Possessive
  7. Noun

an Adj|Past with a ProperNoun before is classified as a verb (John documented things.)
an Adj|Past with a Possessive before is classified as an adjective (John's documented things)

I don't know if this is fixable via a plugin. @spencermountain thoughts on a smarter sorter? Maybe possessive is handled specially, since it is an add on tag (ie, can go on any noun, but if it applies, it's rules should be considered first)?

// rough sort, so 'Noun' is after ProperNoun, etc
let tags = Array.from(term.tags).sort((a, b) => {
let numA = tagSet[a] ? tagSet[a].parents.length : 0
let numB = tagSet[b] ? tagSet[b].parents.length : 0
return numA > numB ? -1 : 1
})

to:

  let tags = Array.from(term.tags).sort((a, b) => {
    let numA = tagSet[a] ? tagSet[a].parents.length : 0
    let numB = tagSet[b] ? tagSet[b].parents.length : 0
    if (a == 'Possessive') {
      return -1
    }
    if (b == 'Possessive') {
      return 1
    }
    return numA > numB ? -1 : 1
  })

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

thanks Ryan, hope to have a fix for this in the next day or two.
cheers

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

released as 14.11.1 - please check it out, and see if you can find more examples where it is misunderstanding an apostrophe s, either as Possessive, or through is/has.
thanks!

from compromise.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.