Giter Club home page Giter Club logo

Comments (13)

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

from concise-concepts.

akshaydevml avatar akshaydevml commented on May 30, 2024

I am using Gensim 4.2.0 and sill getting the error, tried in multiple different environments, still the same error

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

from concise-concepts.

akshaydevml avatar akshaydevml commented on May 30, 2024

Sure, here is the code snipped I used

import pandas as pd
df = pd.read_csv('IMDB Dataset.csv')

from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
sent = [row.split() for row in df['review']]
phrases = Phrases(sent, min_count=30, progress_per=10000)
bigram = Phraser(phrases)
sentences = bigram[sent]

from gensim.models import Word2Vec
w2v_model = Word2Vec(min_count=20,
window=2,
vector_size=200,
sample=6e-5,
alpha=0.03,
min_alpha=0.0007,
negative=20,
)
w2v_model.build_vocab(sentences, progress_per=10000)
w2v_model.train(sentences, total_examples=w2v_model.corpus_count, epochs=10, report_delay=1)
w2v_model.save("film.model")

import spacy
from spacy import displacy
import concise_concepts
nlp = spacy.load('en_core_web_md', disable=["ner"])
data = {
"fruit": ["apple", "pear", "orange"],
"vegetable": ["broccoli", "spinach", "tomato"],
"meat": ["beef", "pork", "fish", "lamb"]
}

model_path = "film.model"

nlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})

from concise-concepts.

prakhar251998 avatar prakhar251998 commented on May 30, 2024

Hi David, I am facing the same error while trying to pass my custom trained word2vec model.Have tried every scenario which you had posted earlier.Have even reffered to the word2vec model doccumentation to train my model as prescribed.Even then getting the error.
Even for this code snippet

import spacy
from spacy import displacy
import concise_concepts
data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}

text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''

from gensim.test.utils import common_texts
from gensim.models import Word2Vec
model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")

model_path = "Word2vec.model"
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ent_score for entity condifence scoring

nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})
doc = nlp(text)

Error:

~\anaconda3\lib\site-packages\concise_concepts\conceptualizer\Conceptualizer.py in verify_data(self, verbose)
107 for key, value in self.data.items():
108 verified_values = []
--> 109 if key.replace(" ", "_") not in self.kv:
110 if verbose:
111 logger.warning(f"key {key} not present in word2vec model")

TypeError: argument of type 'Word2Vec' is not iterable

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

I'm taking a look this week.

from concise-concepts.

GenVr avatar GenVr commented on May 30, 2024

@prakhar251998 I also have this problem. Have you solved it somehow?

from concise-concepts.

prakhar251998 avatar prakhar251998 commented on May 30, 2024

Not yet @GenVr.Waiting for @davidberenstein1957 update fix on this part

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

from concise-concepts.

GenVr avatar GenVr commented on May 30, 2024

@davidberenstein1957 Thanks.

First

I don't know if it can help you, I have gensim==4.2.0, I have seen (very fast) the Conceptualizer.py library and it seems that several times (in the functions as verify_data(), expand_concepts()...etc) the error is due to an iteration like:

if key.replace ("", "_") not in self.kv

However, where self.kv is not the vocab keys (I don't know if this code expect to find the vocab keys as self.kv)

I tried to replace this iteration with:

keys_list = list (self.kv.wv.key_to_index.keys())
...
if key.replace ("", "_") not in keys_list:
   ...

This happens multiple times in the library.

There are also other errors, such as;
self.kv.most_similar

that need to be:

self.kv.wv.most_similar

and others like this.

Even by correcting these errors, all works but the model mismatches my words.

Second

Then, I have a question if possible.
I'm new with Gensim, I noticed that the key of the given dictionary must necessarily be in the Word2Vec vocab.

Example:

data = {
    "word A": ["house", "home", ...],
    "word B": ['display', 'smartphone', ...],
}


model = Word2Vec(sentences=common_texts, ...)

...

nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True, "model_path": model_path})

So word A and word B need to be in the model vocab. Otherwise, I have a key not found error. The initial training sentences need these keys in it I guess?

Thanks

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

I just resolved this. @GenVr @prakhar251998 @akshaydevml thank you for the input!

from concise-concepts.

GenVr avatar GenVr commented on May 30, 2024

@davidberenstein1957 thanks. I have tried this code (with your new changes) but still have the error reported at the end.

import spacy
from spacy import displacy
import concise_concepts
from gensim.test.utils import common_texts
from gensim.models import Word2Vec

data = {
"display":["pixel","resolution","touchscreen"],
"performace":['multitask','processor','graphics','ram','hang'],
"storage":["internal","memory","expandable"],
"camera" :["focus","resolution","flash","photos"],
"Battery":["capacity","quick","charging"],
"connectivity":['gps','bluetooth','wifi','sim'],
"sensors":["light","proximity","compass","gyroscope"]
}

text = '''believe me, it's the slowest mobile I saw. Don't go on screen and Battery, it is an extremely slow mobile phone and takes ages to open and navigate. Forget about heavy use, it can't handle normal regular use. I made a huge mistake but pls don't buy this mobile. It's only a few months and I am thinking to change it. Its dam SLOW SLOW SLOW.
'''

model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
model.save("word2vec.model")
model_path = "word2vec.model"

nlp = spacy.load("en_core_web_lg", disable=['ner'])
nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})

Error:


WARNING:concise_concepts.conceptualizer.Conceptualizer:key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word pixel from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word resolution from key display not present in word2vec model
WARNING:concise_concepts.conceptualizer.Conceptualizer:word touchscreen from key display not present in word2vec model

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-5-4778ce6d6aae>](https://localhost:8080/#) in <module>
      1 nlp = spacy.load("en_core_web_lg", disable=['ner'])
----> 2 nlp.add_pipe("concise_concepts", config={"data": data,"model_path": model_path})


[/usr/local/lib/python3.7/dist-packages/concise_concepts/conceptualizer/Conceptualizer.py](https://localhost:8080/#) in verify_data(self, verbose)
    182                 verified_values
    183             ), f"None of the entries for key {key} are present in the word2vec model"
--> 184         self.data = deepcopy(verified_data)
    185         self.original_data = deepcopy(self.data)
    186 

AssertionError: None of the entries for key display are present in the word2vec model

from concise-concepts.

davidberenstein1957 avatar davidberenstein1957 commented on May 30, 2024

from concise-concepts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.