Comments (4)
Hi @dmresearch15. Thanks for reporting this issue.
It looks like there is a bug, that we cannot return results without an error for an unseen word.
We definitely need to fix it.
from h2o-3.
I reproduced the error by this code:
job_titles <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv", col.names = c("category", "jobtitle"), col.types = c("String", "String"), header = TRUE)
words <- h2o.tokenize(job_titles, " ")
vec <- h2o.word2vec(training_frame = words)
// pass
syn <- h2o.findSynonyms(vec, "teacher", count = 20)
print(syn)
// fail
syn2 <- h2o.findSynonyms(vec, "Tteacher", count = 20)
print(syn2)
from h2o-3.
I'm presently incorporating this into my project. It's helpful to have a timeframe for resolving this issue.
from h2o-3.
Hi @dmresearch15, I fixed the bug in R API here: #16280. Hopefully, this change will be released in the fix release at the end of the week.
If the model can't find synonyms, it failed with the error you shared. The question still is, why can your model find synonyms for "national" and not for "National"? You may need to tune your model a little bit.
from h2o-3.
Related Issues (20)
- Appendix n/o/p: updating user guide page to adhere to style guide (nbins, nbins_cats, nbins_top_level, nfolds, nlambdas, noise, non_negative, ntrees, objective_epsilon, offset_column, out_of_bounds, pca_impl, pca_method, plug_values, pred_noise_bandwidth, prior) HOT 1
- Appendix q/r/s: updating user guide page to adhere to style guide (quantile_alpha, rand_family, random_columns, rate, rate_annealing, rate_decay, remove_collinear_columns, sample_rate, sample_rate_per_class, sample_size, score_each_iteration, score_tree_interval, seed, single_node_mode, smoothing, solver, sort_metric, standardize, start_column, stop_column, stopping_metric, stopping_rounds, stopping_tolerance, stratify_by) HOT 1
- Appendix t/u/v/w/x/y: updating user guide page to adhere to style guide (theta, ties, training_frame, transform, treatment_column, tweedie_link_power, tweedie_power, tweedie_variance_power, uplift_metric, upload_custom_distribution, upload_custom_metric, use_all_factor_levels, user_points, validation_frame, weights_column, x, y) HOT 1
- Implement UMAP
- Implement HDBSCAN
- Job request failed Local server has died unexpectedly. RIP., will retry after 3s HOT 2
- Fix plotting in explain: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show()
- List tests that needed to be manually verified when changing plotting actions in Python for explain function HOT 1
- Fix as_data_frame and not use csv as a medium HOT 1
- Add use_multi_thread for as_data_frame
- Bug in ICE Plot with R 4.4
- Add support for Websockets to steam.jar
- R 4.4 warning `Did you mean to use "<<-"? ( in method "get_model" for class "models_info")` HOT 1
- Upload H2O-3 3.46.0.3 to CRAN
- Bug in GBM python example
- 3.46.0.3 Release Notes
- Overview video for H2O-3 like DAI
- Make sure H2O-3 runs with both new and older Numpy
- Add to Jenkins test: checking that we can connect to the websocket endpoint.
- Extended Isolation Forest - add validation possibility to the GridSearch similar to Isolation Forest
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2o-3.