Giter Club home page Giter Club logo

Comments (12)

koaning avatar koaning commented on May 29, 2024

I'm also thinking of a postprocessing trick now. If a token is detected as an entity, but it is part of a noun-chunk, we may also attempt to highlight the entire noun-chunk.

This would be for a separate tutorial, but I'm curious what you think of the idea.

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

They are, but the n-grams do actually need to be present in the embedding model. If not, the algorithm doesn´t have any input to expand over.

I can see 2 solutions:

  • I could choose to average out embeddings of the individual tokens in the n-grams. In case of "big knife" this will likely result in a lot less accurate results.
  • I could choose to only include the token of the n-gram that is most likely to align with the label. In case of "big knife", this would result in "knife" aligning with "utensils" and therefore result in the embedding of "knife" to be used as fallback.

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

Additionally, there is a function to add a flag include_compound_words, which should allow for the model to detect "big knife" based on only having an initial similarity result for "knife".

This I also one of the features that isn't properly added to the documentation.

Besides that, the exclude_pos and exclude_dep are too.

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

I generally like to compose the behaviour of the patterns along with your rule-based matcher explorer https://demos.explosion.ai/matcher.

from concise-concepts.

koaning avatar koaning commented on May 29, 2024

Yeah, averaging the embeddings of inputs seems like it'll result in a bad time.

But it was indeed probably the include_compound_words feature that was missing from my initial trial.

There is also a third option, one that (hopefully) will get announced next week on our YouTube channel.

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

Now you got me curious about the third option.

But cool that you are working on a tutorial. Let me know if there are any hiccups or features you might think of.

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

@koaning I closed this for now. Will review the solution after your blogpost.

from concise-concepts.

koaning avatar koaning commented on May 29, 2024

It will be a two-part thing, the first part will be on YouTube. The thing about the solution though is that it is already implemented in another library 😉

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

That library being? 😅 or are you talking about the doc.noun_chunks part?

from concise-concepts.

koaning avatar koaning commented on May 29, 2024

https://twitter.com/explosion_ai/status/1579840012174204928

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 29, 2024

Cool. I´ll do some testing and look into a way to integrate this.

from concise-concepts.

koaning avatar koaning commented on May 29, 2024

There are likely some other integrations inbound, but yeah, s2v is a great trick.

from concise-concepts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.