Comments (7)
你好,可以的。Word2Vec 是一个语言模型,本身和语言类型无关。只要有分好词的语料,都可以训练的。
from word2vec.
你好,可以的。Word2Vec 是一个语言模型,本身和语言类型无关。只要有分好词的语料,都可以训练的。
训练也和中文的使用方法一样对吗?
另外,请问你知道有什么英文语料可以用的~
from word2vec.
你好,训练方法和中文一样的,在分好词的语料上进行训练。
英语的话,建议可以直接使用 Google 训练好的模型: GoogleNews-vectors-negative300.bin.gz,相关信息可以参考 word2vec 说明页。
英文语料建议使用维基百科,可以参考 word2vec 说明页中的 Where to obtain the training data
英语分词可以使用斯坦福 NLP 处理包
from word2vec.
好的,谢谢你!
from word2vec.
再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?
from word2vec.
再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?
这个应该是你的内存不够了吧,我加载的时候模型消耗了3G左右的内存
from word2vec.
再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?这个应该是你的内存不够了吧,我加载的时候模型消耗了3G左右的内存
请问你的电脑内存有多大?
from word2vec.
Related Issues (10)
- 您好,请教个问题 HOT 8
- 怎么训练模型 HOT 2
- 关于里面的loadModel变量 HOT 1
- Exception in thread "main" java.lang.NoClassDefFoundError: org/ansj/recognition/Recognition at Hello.hello.main(hello.java:34) HOT 3
- 你好,能把项目放到Maven**仓库吗 HOT 1
- 句子相似度计算方法出处 HOT 1
- 例子都跑不通 HOT 3
- 如何训练java版的模型 HOT 1
- 适配其他国家的语言 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from word2vec.