Giter Club home page Giter Club logo

word2vec's Issues

句子相似度计算方法出处

您好!请问计算句子相似度的fastSentenceSimilarity()sentenceSimilarity(),两个方法是参考了什么文献呢?

如何训练java版的模型

如何训练java版的模型(是否还有做分词),得需要多大的语料库呢?
想做某个受限领域内的问答系统,因为语料库比较少,可以用这个模型来搞么?(感谢回答)

关于里面的loadModel变量

Word2Vec中的loadModel变量在public void loadGoogleModel(String modelPath)中设置的为true,但是在public void loadJavaModel(String modelPath)中设置为true,在训练好后的java模型中,然后加载public void loadJavaModel(String modelPath)的方法,会导致null,作者你看一下是否是变量设置为false的原因

适配其他国家的语言

请问 如果要适配大多数其他国家的语言,要怎么做?

是要自己找到语料库,然后训练? 不同国家,分词方法还不一样吗?

Exception in thread "main" java.lang.NoClassDefFoundError: org/ansj/recognition/Recognition at Hello.hello.main(hello.java:34)

您好,对您的代码很感兴趣,但是当我执行下面的代码:

package Hello;
import java.io.IOException;
import java.util.List;
import java.util.Set;
import me.xiaosheng.util.Segment;
import me.xiaosheng.word2vec.*;
public class hello {
	public static void main(String[] args) throws Exception
	{
		Word2Vec vec = new Word2Vec();
		try {
			vec.loadGoogleModel("/home/ztgong/work/language/datasets/wiki_chinese_word2vec(Google).model");
		} catch (IOException e) {
			e.printStackTrace();
		}	
		String s1 = "苏州有多条公路正在施工,造成局部地区汽车行驶非常缓慢。";
		String s2 = "苏州最近有多条公路在施工,导致部分地区交通拥堵,汽车难以通行。";
		String s3 = "苏州是一座美丽的城市,四季分明,雨量充沛。";

		//分词,获取词语列表
		List<String> wordList1 = Segment.getWords(s1);
		List<String> wordList2 = Segment.getWords(s2);
		List<String> wordList3 = Segment.getWords(s3);

		//句子相似度(所有词语权值设为1)
		System.out.println("s1|s1: " + vec.sentenceSimilarity(wordList1, wordList1));
		System.out.println("s1|s2: " + vec.sentenceSimilarity(wordList1, wordList2));
		System.out.println("s1|s3: " + vec.sentenceSimilarity(wordList1, wordList3));

		//句子相似度(名词、动词权值设为1,其他设为0.8)
		float[] weightArray1 = Segment.getPOSWeightArray(Segment.getPOS(s1));
		float[] weightArray2 = Segment.getPOSWeightArray(Segment.getPOS(s2));
		float[] weightArray3 = Segment.getPOSWeightArray(Segment.getPOS(s3));
		System.out.println("s1|s1: " + vec.sentenceSimilarity(wordList1, wordList1, weightArray1, weightArray1));
		System.out.println("s1|s2: " + vec.sentenceSimilarity(wordList1, wordList2, weightArray1, weightArray2));
		System.out.println("s1|s3: " + vec.sentenceSimilarity(wordList1, wordList3, weightArray1, weightArray3));
	}
}

显示错误如下:

Exception in thread "main" java.lang.NoClassDefFoundError: org/ansj/recognition/Recognition
	at Hello.hello.main(hello.java:34)
Caused by: java.lang.ClassNotFoundException: org.ansj.recognition.Recognition
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 1 more

该怎么修复呢?
谢谢!!

怎么训练模型

Word2Vec.trainJavaModel("data/train.txt", "data/test.model");

你好, data/train.txt 和 data/test.model 能给个样例吗。

例如:我有10句话,分词之后,在train.txt是什么样子的。
把相近的词空格分开,放到同一行? 还是10句话,一句一行,词用空格

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.