Comments (7)
In my opinion, john's closed tasks
is ambiguous because john has closed tasks
is a perfectly reasonable phrase as is the possessive form. The confusion here comes from the word "closed" which can be an adjective or a past tense verb. I'm not an expert on plugins in this package yet, but I'd guess there would be a way for a plugin to lean to and prioritize the possessive tag.
However, I think that there is a bug with john’s neat documents about georgia
It's being parsed as "john is neat documents about georgia" which is almost certainly going to be an incorrect tagging in all cases. It also would be unlikely for it to be "john has neat documents about georgia" as (from what I've come to understand as a non-linguist) the has
contraction generally is only used before a past participle verb. So, this one at least should be getting recognized as a possessive. I'd think by:
compromise/src/2-two/contraction-two/compute/isPossessive.js
Lines 79 to 82 in 4ef66b3
This one ultimately comes down to switches and is related to #1070 that I opened a couple of days ago. The tagger initially thinks that "documents" is a plural noun (which is correct), but changes it to a present tense verb (which is incorrect) because the next word is about, which locks in the word documents
to be a verb (like "John documents Georgia wildlife"). I'll keep this in mind as I continue thinking through an improvement to the switches. Happy for your insight and Spencer's too.
An example which may be key in improving this:
John talks about Georgia
Talks should be a verb. This is currently handled because of the "about" lock.
John's talks about Georgia
talks here should be a plural noun. We know this because John is talks about Georgia doesn't make sense, John has talks about Georgia is an improper use of the contraction, so John's must be possessive. Are those all good assumptions? and a possessive needs to be followed by a noun chunk (ie noun alone, or adjective + noun, etc)
One more to make it more confusing:
John's nuts about Georgia
Should be "John is nuts about Georgia." How does this fit in with everything else? what makes talks and nuts different?
from compromise.
Yep, Ryan you've got it dead-on. Well done.
There's an is/has classifier, and it does an okay job, but it runs really early in the tagger.
These ambiguous cases like john's talks about
are (often) shaken-out further downstream.
I'm open to suggestions about how to improve this, as it produces pretty-bad outcomes when it's wrong.
I've always wanted to keep things one-pass. Changing the contraction back, after the tagger had made various decisions, seems like a difficult solution.
The good news is that many of these problem words like 'talks' are flagged, and we can add careful rules about them to is/has classifier. We could add some extra look-arounds there, to mitigate this.
Happy to help plug away at this. Thank you Caleb and Ryan.
from compromise.
added some tests to dev, for the is-has
and did-would
contractions. Bout 30 of 200 failing. Seems like a fun one.
I think the john’s neat documents
example is tripping-up on the unicode apostrophe, which I will take a look at next week.
cheers
from compromise.
Thank y’all for looking into this! @ryancasburn-KAI you mentioned:
…but I'd guess there would be a way for a plugin to lean to and prioritize the possessive tag…
From poking around the source code, I can’t see an obvious way to use a plugin to prioritize the possessive tag. Is there some plugin capability I’m missing? Can I write a plugin the pre-emptively tags a 's
contraction as possessive? (Unclear if that would be respected from a quick read of the source code.)
from compromise.
@calebmer - you can try this
const plugin = {
compute: {
custPossessives: doc => doc.match("(#Person &&/'s$/)").tag('Possessive'),
},
}
nlp.plugin(plugin)
nlp._world.hooks.splice(7, 0, 'custPossessives')
console.log(nlp("john's closed tasks").json()[0].terms)
The only thing this seems to get wrong is that "closed" is still labeled as a verb, even though "John's" is a possessive.
This comes down to
compromise/src/2-two/preTagger/compute/tagger/3rd-pass/06-switches.js
Lines 25 to 30 in 4ef66b3
This puts the sort order as:
- MaleName
- FirstName
- Person
- Singular
- ProperNoun
- Possessive
- Noun
an Adj|Past with a ProperNoun before is classified as a verb (John documented things.)
an Adj|Past with a Possessive before is classified as an adjective (John's documented things)
I don't know if this is fixable via a plugin. @spencermountain thoughts on a smarter sorter? Maybe possessive is handled specially, since it is an add on tag (ie, can go on any noun, but if it applies, it's rules should be considered first)?
compromise/src/2-two/preTagger/compute/tagger/3rd-pass/06-switches.js
Lines 25 to 30 in 4ef66b3
to:
let tags = Array.from(term.tags).sort((a, b) => {
let numA = tagSet[a] ? tagSet[a].parents.length : 0
let numB = tagSet[b] ? tagSet[b].parents.length : 0
if (a == 'Possessive') {
return -1
}
if (b == 'Possessive') {
return 1
}
return numA > numB ? -1 : 1
})
from compromise.
thanks Ryan, hope to have a fix for this in the next day or two.
cheers
from compromise.
released as 14.11.1
- please check it out, and see if you can find more examples where it is misunderstanding an apostrophe s, either as Possessive, or through is/has.
thanks!
from compromise.
Related Issues (20)
- Query: Does Compromise.js compile RegExes from match-syntax? HOT 1
- Get .terms() but keep hyphenated strings (similar to .hyphenated() ) HOT 1
- Using .freeze() in nlp.plugin()? HOT 11
- JSON Speed HOT 2
- Tagging mixed number as #Value HOT 5
- Feature request: Logical operations in match HOT 2
- [Issue]: Various common nouns tagged as proper noun. HOT 6
- True Casing HOT 10
- [Improvements]: Add .toLowerCase() API to various functions. HOT 1
- [Issue]: Gov Rule & Possible Other's Needs Improved. HOT 5
- [Issue]: "My favorite time of the year" in .nouns() response HOT 3
- `.prepend()` removes frozen tags for acronyms HOT 2
- Improve TypeScript DX by reducing usage of "any" HOT 1
- NFD form combining characters not picked up as part of word HOT 3
- Feature: .slashes() tokenize transform HOT 6
- Geedy tag matching and punctuation HOT 2
- [Feature Request]: Flesch–Kincaid Function HOT 6
- "to" is a preposition and not a conjuction HOT 1
- Verb is mistakenly parsed as a noun. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compromise.