Giter Club home page Giter Club logo

word2vec's People

Contributors

jsksxs360 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

word2vec's Issues

适配其他国家的语言

请问 如果要适配大多数其他国家的语言,要怎么做?

是要自己找到语料库,然后训练? 不同国家,分词方法还不一样吗?

如何训练java版的模型

如何训练java版的模型(是否还有做分词),得需要多大的语料库呢?
想做某个受限领域内的问答系统,因为语料库比较少,可以用这个模型来搞么?(感谢回答)

句子相似度计算方法出处

您好!请问计算句子相似度的fastSentenceSimilarity()sentenceSimilarity(),两个方法是参考了什么文献呢?

Exception in thread "main" java.lang.NoClassDefFoundError: org/ansj/recognition/Recognition at Hello.hello.main(hello.java:34)

您好,对您的代码很感兴趣,但是当我执行下面的代码:

package Hello;
import java.io.IOException;
import java.util.List;
import java.util.Set;
import me.xiaosheng.util.Segment;
import me.xiaosheng.word2vec.*;
public class hello {
	public static void main(String[] args) throws Exception
	{
		Word2Vec vec = new Word2Vec();
		try {
			vec.loadGoogleModel("/home/ztgong/work/language/datasets/wiki_chinese_word2vec(Google).model");
		} catch (IOException e) {
			e.printStackTrace();
		}	
		String s1 = "苏州有多条公路正在施工,造成局部地区汽车行驶非常缓慢。";
		String s2 = "苏州最近有多条公路在施工,导致部分地区交通拥堵,汽车难以通行。";
		String s3 = "苏州是一座美丽的城市,四季分明,雨量充沛。";

		//分词,获取词语列表
		List<String> wordList1 = Segment.getWords(s1);
		List<String> wordList2 = Segment.getWords(s2);
		List<String> wordList3 = Segment.getWords(s3);

		//句子相似度(所有词语权值设为1)
		System.out.println("s1|s1: " + vec.sentenceSimilarity(wordList1, wordList1));
		System.out.println("s1|s2: " + vec.sentenceSimilarity(wordList1, wordList2));
		System.out.println("s1|s3: " + vec.sentenceSimilarity(wordList1, wordList3));

		//句子相似度(名词、动词权值设为1,其他设为0.8)
		float[] weightArray1 = Segment.getPOSWeightArray(Segment.getPOS(s1));
		float[] weightArray2 = Segment.getPOSWeightArray(Segment.getPOS(s2));
		float[] weightArray3 = Segment.getPOSWeightArray(Segment.getPOS(s3));
		System.out.println("s1|s1: " + vec.sentenceSimilarity(wordList1, wordList1, weightArray1, weightArray1));
		System.out.println("s1|s2: " + vec.sentenceSimilarity(wordList1, wordList2, weightArray1, weightArray2));
		System.out.println("s1|s3: " + vec.sentenceSimilarity(wordList1, wordList3, weightArray1, weightArray3));
	}
}

显示错误如下:

Exception in thread "main" java.lang.NoClassDefFoundError: org/ansj/recognition/Recognition
	at Hello.hello.main(hello.java:34)
Caused by: java.lang.ClassNotFoundException: org.ansj.recognition.Recognition
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 1 more

该怎么修复呢?
谢谢!!

关于里面的loadModel变量

Word2Vec中的loadModel变量在public void loadGoogleModel(String modelPath)中设置的为true,但是在public void loadJavaModel(String modelPath)中设置为true,在训练好后的java模型中,然后加载public void loadJavaModel(String modelPath)的方法,会导致null,作者你看一下是否是变量设置为false的原因

怎么训练模型

Word2Vec.trainJavaModel("data/train.txt", "data/test.model");

你好, data/train.txt 和 data/test.model 能给个样例吗。

例如:我有10句话,分词之后,在train.txt是什么样子的。
把相近的词空格分开,放到同一行? 还是10句话,一句一行,词用空格

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.