Comments (13)
from concise-concepts.
I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error
from concise-concepts.
from concise-concepts.
Sure, here is the code snipped I used
import pandas as pd
df = pd.read_csv('IMDB Dataset.csv')
from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
sent = [row.split() for row in df['review']]
phrases = Phrases(sent, min_count=30, progress_per=10000)
bigram = Phraser(phrases)
sentences = bigram[sent]
from gensim.models import Word2Vec
w2v_model = Word2Vec(min_count=20,
window=2,
vector_size=200,
sample=6e-5,
alpha=0.03,
min_alpha=0.0007,
negative=20,
)
w2v_model.build_vocab(sentences, progress_per=10000)
w2v_model.train(sentences, total_examples=w2v_model.corpus_count, epochs=10, report_delay=1)
w2v_model.save("film.model")
import spacy
from spacy import displacy
import concise_concepts
nlp = spacy.load('en_core_web_md', disable=["ner"])
data = {
"fruit": ["apple", "pear", "orange"],
"vegetable": ["broccoli", "spinach", "tomato"],
"meat": ["beef", "pork", "fish", "lamb"]
}
model_path = "film.model"
nlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})
from concise-concepts.
Hi David, I am facing the same error while trying to pass my custom trained word2vec model.Have tried every scenario which you had posted earlier.Have even reffered to the word2vec model doccumentation to train my model as prescribed.Even then getting the error.
Even for this code snippet
import spacy
from spacy import displacy
import concise_concepts
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}
text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''
from gensim.test.utils import common_texts
from gensim.models import Word2Vec
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")
model_path = "Word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner'])
ent_score for entity condifence scoring
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
doc = nlp(text)
Error:
~\anaconda3\lib\site-packages\concise_concepts\conceptualizer\Conceptualizer.py in verify_data(self, verbose)
107 for key, value in self.data.items():
108 verified_values = []
--> 109 if key.replace(" ", "_") not in self.kv:
110 if verbose:
111 logger.warning(f"key {key} not present in word2vec model")
TypeError: argument of type 'Word2Vec' is not iterable
from concise-concepts.
I'm taking a look this week.
from concise-concepts.
@prakhar251998 I also have this problem. Have you solved it somehow?
from concise-concepts.
Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part
from concise-concepts.
from concise-concepts.
@davidberenstein1957 Thanks.
First
I don't know if it can help you, I have gensim==4.2.0
, I have seen (very fast) the Conceptualizer.py
library and it seems that several times (in the functions as verify_data(), expand_concepts()...etc) the error is due to an iteration like:
if key.replace ("", "_") not in self.kv
However, where self.kv is not the vocab keys (I don't know if this code expect to find the vocab keys as self.kv)
I tried to replace this iteration with:
keys_list = list (self.kv.wv.key_to_index.keys())
...
if key.replace ("", "_") not in keys_list:
...
This happens multiple times in the library.
There are also other errors, such as;
self.kv.most_similar
that need to be:
self.kv.wv.most_similar
and others like this.
Even by correcting these errors, all works but the model mismatches my words.
Second
Then, I have a question if possible.
I'm new with Gensim, I noticed that the key of the given dictionary must necessarily be in the Word2Vec vocab.
Example:
data = {
"word A": ["house", "home", ...],
"word B": ['display', 'smartphone', ...],
}
model = Word2Vec(sentences=common_texts, ...)
...
nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True, "model_path": model_path})
So word A and word B need to be in the model vocab. Otherwise, I have a key not found error. The initial training sentences need these keys in it I guess?
Thanks
from concise-concepts.
I just resolved this. @GenVr @prakhar251998 @akshaydevml thank you for the input!
from concise-concepts.
@davidberenstein1957 thanks. I have tried this code (with your new changes) but still have the error reported at the end.
import spacy
from spacy import displacy
import concise_concepts
from gensim.test.utils import common_texts
from gensim.models import Word2Vec
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}
text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")
model_path = "word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
Error:
WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-5-4778ce6d6aae>](https://localhost:8080/#) in <module>
1 nlp = spacy.load("en_core_web_lg", disable=['ner'])
----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
[/usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py](https://localhost:8080/#) in verify_data(self, verbose)
182 verified_values
183 ), f"None of the entries for key {key} are present in the word2vec model"
--> 184 self.data = deepcopy(verified_data)
185 self.original_data = deepcopy(self.data)
186
AssertionError: None of the entries for key display are present in the word2vec model
from concise-concepts.
from concise-concepts.
Related Issues (20)
- error: missing ), unterminated subpattern at position x HOT 2
- Example fail while using GPUs HOT 2
- Python latest package 0.6.2 failing. Error in Conceptualizer.py.Results Deterioration HOT 8
- Unable to load local custom gensim model HOT 2
- duplicate logging regarding missing entires in embedding model HOT 1
- matching_patterns.json HOT 2
- multi token patterns HOT 12
- OSError on while adding concise_concepts to spacy nlp pipeline HOT 1
- add spaczz fuzzymatcher option to concise-concepts
- Custom models showing different confidences even 0 in case of mixed casing text HOT 2
- add sense2vec support too and integrate with POS-config HOT 2
- Loading transformer based models and handling phrases HOT 2
- consider generative LLM prompt based word expansion
- Question: How to use (external) transformer-based embeddings? HOT 3
- Model Sensitivity HOT 1
- Including Entities in concise concepts HOT 1
- determine fuzzyness with character distance `fuzzy=0` -> `fuzzy=n`
- Handling of Multiple Words HOT 5
- Lemmatization need for LEMMA patterns HOT 2
- json array too large HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from concise-concepts.