Raised an issue earlier regarding the same problem and <a class="user-mention notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Not yet @GenVr.Waiting for <a class="user-mention notranslate" data-hovercard-type="us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Still unable to pass in a custom Gensim model about concise-concepts HOT 13 CLOSED

akshaydevml commented on May 30, 2024

Still unable to pass in a custom Gensim model

from concise-concepts.

Comments (13)

davidberenstein1957 commented on May 30, 2024

Hello, I feel this was resolved by installing the dependencies required by the package. Gensim >= 4. Regards, David

…

On 5 Jun 2022, at 15:24, akshaydevml ***@***.***> wrote: Raised an issue earlier regarding the same problem and @davidberenstein1957 committed a fix and posted this code block as solution import spacy from spacy import displacy import concise_concepts data = { "fruit": ["apple", "pear", "orange"], "vegetable": ["broccoli", "spinach", "tomato", "garlic", "onion", "beans"], "meat": ["beef", "pork", "fish", "lamb", "bacon", "ham", "meatball"], "dairy": ["milk", "butter", "eggs", "cheese", "cheddar", "yoghurt", "egg"], "herbs": ["rosemary", "salt", "sage", "basil", "cilantro"], "carbs": ["bread", "rice", "toast", "tortilla", "noodles", "bagel", "croissant"], } text = """ Heat the oil in a large pan and add the Onion, celery and carrots. Then, cook over a medium–low heat for 10 minutes, or until softened. Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes. Later, add some oranges and chickens. """ model_path = "word2vec.model" nlp = spacy.load("en_core_web_md", disable=["ner"]) nlp.add_pipe( "concise_concepts", config={ "data": data, "model_path": model_path, "ent_score": True, }, ) doc = nlp(text) options = { "colors": { "fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon", "dairy": "lightblue", "herbs": "darkgreen", "carbs": "lightbrown", }, "ents": ["fruit", "vegetable", "meat", "dairy", "herbs", "carbs"], } ents = doc.ents for ent in ents: new_label = f"{ent.label_} ({float(ent.ent_score):.0%})" options["colors"][new_label] = options["colors"].get(ent.label.lower(), None) options["ents"].append(new_label) ent.label_ = new_label doc.ents = ents displacy.render(doc, style="ent", options=options) However, I am still getting the 'Word2vec object is not iterable error'. Could you please look into it? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

from concise-concepts.

akshaydevml commented on May 30, 2024

I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error

from concise-concepts.

davidberenstein1957 commented on May 30, 2024

Could you send me some reproducible code and files you are using?

…

On 5 Jun 2022, at 15:57, akshaydevml ***@***.***> wrote: I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

from concise-concepts.

akshaydevml commented on May 30, 2024

Sure, here is the code snipped I used

import pandas as pd
df = pd.read_csv('IMDB Dataset.csv')

from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
sent = [row.split() for row in df['review']]
phrases = Phrases(sent, min_count=30, progress_per=10000)
bigram = Phraser(phrases)
sentences = bigram[sent]

from gensim.models import Word2Vec
w2v_model = Word2Vec(min_count=20,
window=2,
vector_size=200,
sample=6e-5,
alpha=0.03,
min_alpha=0.0007,
negative=20,
)
w2v_model.build_vocab(sentences, progress_per=10000)
w2v_model.train(sentences, total_examples=w2v_model.corpus_count, epochs=10, report_delay=1)
w2v_model.save("film.model")

import spacy
from spacy import displacy
import concise_concepts
nlp = spacy.load('en_core_web_md', disable=["ner"])
data = {
"fruit": ["apple", "pear", "orange"],
"vegetable": ["broccoli", "spinach", "tomato"],
"meat": ["beef", "pork", "fish", "lamb"]
}

model_path = "film.model"

nlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})

from concise-concepts.

prakhar251998 commented on May 30, 2024

Hi David, I am facing the same error while trying to pass my custom trained word2vec model.Have tried every scenario which you had posted earlier.Have even reffered to the word2vec model doccumentation to train my model as prescribed.Even then getting the error.
Even for this code snippet

import spacy
from spacy import displacy
import concise_concepts
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}

text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''

from gensim.test.utils import common_texts
from gensim.models import Word2Vec
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")

model_path = "Word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ent_score for entity condifence scoring

nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
doc = nlp(text)

Error:

~\anaconda3\lib\site-packages\concise_concepts\conceptualizer\Conceptualizer.py in verify_data(self, verbose)
107 for key, value in self.data.items():
108 verified_values = []
--> 109 if key.replace(" ", "_") not in self.kv:
110 if verbose:
111 logger.warning(f"key {key} not present in word2vec model")

TypeError: argument of type 'Word2Vec' is not iterable

from concise-concepts.

davidberenstein1957 commented on May 30, 2024

I'm taking a look this week.

from concise-concepts.

GenVr commented on May 30, 2024

@prakhar251998 I also have this problem. Have you solved it somehow?

from concise-concepts.

prakhar251998 commented on May 30, 2024

Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part

from concise-concepts.

davidberenstein1957 commented on May 30, 2024

Hello,I made some initial progress last week but I will be able to wrap it up coming week. Regards,David On 20 Sep 2022, at 07:41, prakhar251998 ***@***.***> wrote: Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

from concise-concepts.

GenVr commented on May 30, 2024

@davidberenstein1957 Thanks.

First

I don't know if it can help you, I have gensim==4.2.0, I have seen (very fast) the Conceptualizer.py library and it seems that several times (in the functions as verify_data(), expand_concepts()...etc) the error is due to an iteration like:

if key.replace ("", "_") not in self.kv

However, where self.kv is not the vocab keys (I don't know if this code expect to find the vocab keys as self.kv)

I tried to replace this iteration with:

keys_list = list (self.kv.wv.key_to_index.keys())
...
if key.replace ("", "_") not in keys_list:
   ...

This happens multiple times in the library.

There are also other errors, such as;
self.kv.most_similar

that need to be:

self.kv.wv.most_similar

and others like this.

Even by correcting these errors, all works but the model mismatches my words.

Second

Then, I have a question if possible.
I'm new with Gensim, I noticed that the key of the given dictionary must necessarily be in the Word2Vec vocab.

Example:

data = {
    "word A": ["house", "home", ...],
    "word B": ['display', 'smartphone', ...],
}


model = Word2Vec(sentences=common_texts, ...)

...

nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True, "model_path": model_path})

So word A and word B need to be in the model vocab. Otherwise, I have a key not found error. The initial training sentences need these keys in it I guess?

Thanks

from concise-concepts.

davidberenstein1957 commented on May 30, 2024

I just resolved this. @GenVr @prakhar251998 @akshaydevml thank you for the input!

from concise-concepts.

GenVr commented on May 30, 2024

@davidberenstein1957 thanks. I have tried this code (with your new changes) but still have the error reported at the end.

import spacy
from spacy import displacy
import concise_concepts
from gensim.test.utils import common_texts
from gensim.models import Word2Vec

data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}

text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''

model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")
model_path = "word2vec.model"

nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})

Error:


WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-5-4778ce6d6aae>](https://localhost:8080/#) in <module>
      1 nlp = spacy.load("en_core_web_lg", disable=['ner'])
----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})


[/usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py](https://localhost:8080/#) in verify_data(self, verbose)
    182                 verified_values
    183             ), f"None of the entries for key {key} are present in the word2vec model"
--> 184         self.data = deepcopy(verified_data)
    185         self.original_data = deepcopy(self.data)
    186 

AssertionError: None of the entries for key display are present in the word2vec model

from concise-concepts.

davidberenstein1957 commented on May 30, 2024

Hello, This is actually expected behaviour, since you are trying to match a label and words that are not present in the trained wor2vec model. You initially get warning regarding the missing keys and words, but since none of the data is available in the model, it actually raises an error. It did let me to find another small implementation error with the ngram support, so keep the feedback comming! Regards, David

…

On 26 Sept 2022, at 14:11, GennaroV ***@***.***> wrote: @davidberenstein1957 <https://github.com/davidberenstein1957> thanks. I have tried this code (with your new changes) but still have the error reported at the end. import spacy from spacy import displacy import concise_concepts from gensim.test.utils import common_texts from gensim.models import Word2Vec data = { "display":["pixel","resolution","touchscreen"], "performace":['multitask','processor','graphics','ram','hang'], "storage":["internal","memory","expandable"], "camera" :["focus","resolution","flash","photos"], "Battery":["capacity","quick","charging"], "connectivity":['gps','bluetooth','wifi','sim'], "sensors":["light","proximity","compass","gyroscope"] } text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW. ''' model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4) model.save("word2vec.model") model_path = "word2vec.model" nlp = spacy.load("en_core_web_lg", disable=['ner']) nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path}) Error: WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) [<ipython-input-5-4778ce6d6aae>](https://localhost:8080/#) in <module> 1 nlp = spacy.load("en_core_web_lg", disable=['ner']) ----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path}) [/usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py](https://localhost:8080/#) in verify_data(self, verbose) 182 verified_values 183 ), f"None of the entries for key {key} are present in the word2vec model" --> 184 self.data = deepcopy(verified_data) 185 self.original_data = deepcopy(self.data) 186 AssertionError: None of the entries for key display are present in the word2vec model — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAZHZEQDQFLA3BPQOFFXADWAGHGLANCNFSM5X5BV66A>. You are receiving this because you were mentioned.

from concise-concepts.

Still unable to pass in a custom Gensim model about concise-concepts HOT 13 CLOSED

Comments (13)

ent_score for entity condifence scoring

First

Second

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent