We are using nlp_compromise to parse requests for data pulls. In many cases, a product

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Double quotes to specify pos about compromise HOT 7 CLOSED

spencermountain commented on May 18, 2024

Double quotes to specify pos

from compromise.

Comments (7)

slneufeld commented on May 18, 2024 1

Yes, ampersand is a good one, as is 'n or 'N. Dashes are also a challenge.

Regarding silentrob's comments, agreed, I'm not looking to bastardize a general purpose tool for my own selfish needs.

My use case is very specific to noun discovery, which in the case of brand names or trademarks, and to a large degree retailer names, can be very challenging. Our database alone has tens of thousands of such entities, many of them combining different parts of speech -- Big Red, Cap N Crunch -- or just difficult to logically group together, like Shop Rite, Stop & Shop, Wal*Mart and so on. And while all I really need are IN's, CC's and NN's, I need to know them really, really well for my solution to be successful, and my preference is to keep this on the client-side and use logic versus a brute force dictionary-style approach.

On top of nlp_compromise, I've implemented two hacks -- one for the double-quote capability, and the more important one is a hacky converter that looks at token tags, looks to the previous and or next tag, and based on the three, converts the one I'm operating on into an IN, a CC, an NN, or basically an ignore. The sentences used in my solution are essentially grouped into "show me x1, x2 and x3 for y1, y2 and y3 during z1, z2 and z3", with x's being one group of like objects, y being another, z another, and so on. The complexity is in simply allowing the user more flexibility to be descriptive in their sentence, i.e. "show me both x1 and x2 in y1, y2 and y3 during the periods z1 and z2", and ensuring that I can add flexibility in the future for more complex statements. So I'm not inclined to create a "dumb parser" that looks for break words, commas and the like, because I think ultimately identifying parts of speech will afford me the flexibility and future-proofing I need.

from compromise.

spencermountain commented on May 18, 2024

hi! ofcouse. That's a great idea. I agree that a quote with only a few words is a strong Noun signal.
In that case too, the ampersand also seems like a good signal for a noun. Do you have others?
I'm in the latter stages of a big v2 rewrite, and i'd be happy to include these rules there.

from compromise.

silentrob commented on May 18, 2024

That seems trickey, I could see "Stop & Shop" be translated to "Stop and Shop" which should get tagged as "Stop/NN and/CC Shop/NN". I suspect you want the phase to be parsed as a NP, but this is done by a parser and not a tagger.

from compromise.

silentrob commented on May 18, 2024

@spencermountain I have explored pulling out common bigrams and trigrams using a lookup table to aid in entity recognition. For example: "fish and chips", "french fries" also "united states" they all mean something as a group more than the sum of their parts.

from compromise.

spencermountain commented on May 18, 2024

oh. very neat stuff.
can you help contribute to v2?

this is how i've been looking at that problem here.
You can see there's lots of work to do.

V1 has a tonne of lumper-splitter rules, like you mentioned. Ideally, we can think of a better way to articulate them. SilentRob's done loads of this already.
It's a great time, to be rethinking this, and I welcome any ideas.

from compromise.

slneufeld commented on May 18, 2024

Happy to help contribute, if I can get away from the day job :)

Yes, that's an interesting approach to applying a "coarse filter" to POS tags. Better than what I was considering, I think.

from compromise.

spencermountain commented on May 18, 2024

hey, I think the problems you've mentioned have been addressed in the much-smarter lumping scheme. Let me know if you find any other doozies.
cheers

from compromise.

Double quotes to specify pos about compromise HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent