Giter Club home page Giter Club logo

Comments (7)

jsksxs360 avatar jsksxs360 commented on June 12, 2024

你好,可以的。Word2Vec 是一个语言模型,本身和语言类型无关。只要有分好词的语料,都可以训练的。

from word2vec.

PetePro avatar PetePro commented on June 12, 2024

你好,可以的。Word2Vec 是一个语言模型,本身和语言类型无关。只要有分好词的语料,都可以训练的。

训练也和中文的使用方法一样对吗?
另外,请问你知道有什么英文语料可以用的~

from word2vec.

jsksxs360 avatar jsksxs360 commented on June 12, 2024

你好,训练方法和中文一样的,在分好词的语料上进行训练。

英语的话,建议可以直接使用 Google 训练好的模型: GoogleNews-vectors-negative300.bin.gz,相关信息可以参考 word2vec 说明页

英文语料建议使用维基百科,可以参考 word2vec 说明页中的 Where to obtain the training data

英语分词可以使用斯坦福 NLP 处理包

from word2vec.

PetePro avatar PetePro commented on June 12, 2024

好的,谢谢你!

from word2vec.

PetePro avatar PetePro commented on June 12, 2024

再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?

from word2vec.

cvenwu avatar cvenwu commented on June 12, 2024

再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?

这个应该是你的内存不够了吧,我加载的时候模型消耗了3G左右的内存

from word2vec.

PetePro avatar PetePro commented on June 12, 2024

再次请教一下:
我照着中文的样子使用Google训练好的模型:
Word2Vec vec = new Word2Vec(); try { vec.loadGoogleModel("D:\\WorkSpace\\EclipseJee\\GoogleNews-vectors-negative300.bin"); } catch (IOException e) { e.printStackTrace(); } Set<WordEntry> similarWords = vec.getSimilarWords("hello", 10); for (WordEntry word : similarWords) { System.out.println(word.name + " : " + word.score); }
运行过后报这样的错:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded。我谷歌查了,是因为耗尽了所有可用内存。
请问你知道有什么解决办法吗?

这个应该是你的内存不够了吧,我加载的时候模型消耗了3G左右的内存

请问你的电脑内存有多大?

from word2vec.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.