dborrelli / chat-intents Goto Github PK

View Code? Open in Web Editor NEW

161.0 4.0 23.0 6.54 MB

Clustering sentence embeddings to extract message intent

License: MIT License

Jupyter Notebook 97.79% Python 2.21%

nlp sentence-embeddings clustering unsupervised-learning document-embeddings

chat-intents's People

Contributors

Stargazers

Watchers

chat-intents's Issues

How can I use all CPUs when tuning hyperparams

@dborrelli When I specify a value for the "random_state" parameter in the "bayesian_search," I receive the following warning: "UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism. warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")

Hyper Param tuning is taking significant amount of time.I want to use the 'random_state' parameter to ensure reproducibility, while also setting 'n_jobs' to -1 to enable parallel processing. What's the best way to achieve this?"

Label extraction only for english.

Hi,
I am using chat-intents and the clustering works very well.
However, I am working with french data and the label extraction gives poor results. I assume it's because this method necessarily uses a specialized spacy model for English.
I was wondering if the name of the loaded spacy model or at least the language could be passed as a parameter of apply_and_summarize_labels for example ?
This way, the performance could be much better for all languages other than English.

Install not working

!pip install chatintents
leads to

ERROR: Could not find a version that satisfies the requirement chatintents (from versions: none)
ERROR: No matching distribution found for chatintents

I'm using google colab

How to cap the maximum number of topics generated?

Hi, how do I put a limit on the maximum number of topics to be generated? And a minimum? Is there a way to do this within the hyper parameter optimization?

Thanks,
Ari

AttributeError: 'numpy.ndarray' object has no attribute 'unique'

Hi, while I'm using apply_and_summarize_labels,
it's causing an issue as below. Please help.

df_summary, labeled_docs = model.apply_and_summarize_labels(data_sample.sentence)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_21756/2555802596.py in <module>
----> 1 df_summary, labeled_docs = model.apply_and_summarize_labels(data_sample.sentence)

/opt/conda/lib/python3.9/site-packages/chatintents/ChatIntents.py in apply_and_summarize_labels(self, df_data)
    418         df_clustered[category_col] = self.best_clusters.labels_
    419 
--> 420         numerical_labels = df_clustered[category_col].unique()
    421 
    422         # create dictionary mapping the numerical category to the generated

AttributeError: 'numpy.ndarray' object has no attribute 'unique'

dborrelli / chat-intents Goto Github PK

chat-intents's People

Contributors

Stargazers

Watchers

Forkers

chat-intents's Issues

How can I use all CPUs when tuning hyperparams

Label extraction only for english.

Install not working

How to cap the maximum number of topics generated?

AttributeError: 'numpy.ndarray' object has no attribute 'unique'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent