Comments (12)
I'm also thinking of a postprocessing trick now. If a token is detected as an entity, but it is part of a noun-chunk, we may also attempt to highlight the entire noun-chunk.
This would be for a separate tutorial, but I'm curious what you think of the idea.
from concise-concepts.
They are, but the n-grams do actually need to be present in the embedding model. If not, the algorithm doesn´t have any input to expand over.
I can see 2 solutions:
- I could choose to average out embeddings of the individual tokens in the n-grams. In case of "big knife" this will likely result in a lot less accurate results.
- I could choose to only include the token of the n-gram that is most likely to align with the label. In case of "big knife", this would result in "knife" aligning with "utensils" and therefore result in the embedding of "knife" to be used as fallback.
from concise-concepts.
Additionally, there is a function to add a flag include_compound_words
, which should allow for the model to detect "big knife" based on only having an initial similarity result for "knife".
This I also one of the features that isn't properly added to the documentation.
Besides that, the exclude_pos
and exclude_dep
are too.
from concise-concepts.
I generally like to compose the behaviour of the patterns along with your rule-based matcher explorer https://demos.explosion.ai/matcher.
from concise-concepts.
Yeah, averaging the embeddings of inputs seems like it'll result in a bad time.
But it was indeed probably the include_compound_words
feature that was missing from my initial trial.
There is also a third option, one that (hopefully) will get announced next week on our YouTube channel.
from concise-concepts.
Now you got me curious about the third option.
But cool that you are working on a tutorial. Let me know if there are any hiccups or features you might think of.
from concise-concepts.
@koaning I closed this for now. Will review the solution after your blogpost.
from concise-concepts.
It will be a two-part thing, the first part will be on YouTube. The thing about the solution though is that it is already implemented in another library 😉
from concise-concepts.
That library being? 😅 or are you talking about the doc.noun_chunks
part?
from concise-concepts.
https://twitter.com/explosion_ai/status/1579840012174204928
from concise-concepts.
Cool. I´ll do some testing and look into a way to integrate this.
from concise-concepts.
There are likely some other integrations inbound, but yeah, s2v is a great trick.
from concise-concepts.
Related Issues (20)
- error: missing ), unterminated subpattern at position x HOT 2
- Example fail while using GPUs HOT 2
- Python latest package 0.6.2 failing. Error in Conceptualizer.py.Results Deterioration HOT 8
- Unable to load local custom gensim model HOT 2
- duplicate logging regarding missing entires in embedding model HOT 1
- matching_patterns.json HOT 2
- OSError on while adding concise_concepts to spacy nlp pipeline HOT 1
- add spaczz fuzzymatcher option to concise-concepts
- Custom models showing different confidences even 0 in case of mixed casing text HOT 2
- add sense2vec support too and integrate with POS-config HOT 2
- Loading transformer based models and handling phrases HOT 2
- consider generative LLM prompt based word expansion
- Question: How to use (external) transformer-based embeddings? HOT 3
- Model Sensitivity HOT 1
- Including Entities in concise concepts HOT 1
- determine fuzzyness with character distance `fuzzy=0` -> `fuzzy=n`
- Handling of Multiple Words HOT 5
- Lemmatization need for LEMMA patterns HOT 2
- json array too large HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from concise-concepts.