bnqtoan / clearnlp Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/clearnlp
License: Other
Automatically exported from code.google.com/p/clearnlp
License: Other
What steps will reproduce the problem?
1. Checkout the latest Git master from googlecode
2. Download the latest models (eg
https://bitbucket.org/jdchoi77/models/downloads/ontonotes-en-pos-1.3.0.tgz )
3. Parse using eg
mvn exec:java -Dexec.mainClass=com.googlecode.clearnlp.demo.DemoDEPParser -Dexec.args="model/dictionary-1.2.0.zip model/ontonotes-en-pos-1.3.0.tgz model/ontonotes-en-dep-1.3.0.tgz src/main/resources/sample/iphone5.txt src/main/resources/sample/iphone5.txt.newparsed"
What is the expected output? What do you see instead?
Instead of parse output, we get a null pointer exception
What version of the product are you using? On what operating system?
Git master 6fb797d1ad2a49946fcf907c77045136940936e3 (version 1.3.0)
Please provide any additional information below.
Parsing works fine with the old models. Looks like the models are misaligned
with the Git version
Original issue reported on code.google.com by [email protected]
on 25 Jan 2013 at 1:16
What steps will reproduce the problem?
While training the model, you are using a set of input files- abbrevations,
compund, etc.
Can we use a set of our own dictionary files for training the model.
For example, we have a set of terms from medical or law field and I want to tokenize those terms as a single term. e.g. law maker.
Can you please suggest the correct process for this.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 7 Oct 2013 at 1:40
I'm trying to find a good Semantic Role Labeling tool that I can use in my java
code using Netbeans.
I tried ClearNLP and it work with testing the version with the right output fom
this link: https://code.google.com/p/clearnlp/wiki/Installation
But when I used the following code:
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package stanfordposcode;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.PrintStream;
import java.util.List;
import com.googlecode.clearnlp.component.AbstractComponent;
import com.googlecode.clearnlp.dependency.DEPTree;
import com.googlecode.clearnlp.engine.EngineGetter;
import com.googlecode.clearnlp.nlp.NLPDecode;
import com.googlecode.clearnlp.nlp.NLPLib;
import com.googlecode.clearnlp.reader.AbstractReader;
import com.googlecode.clearnlp.segmentation.AbstractSegmenter;
import com.googlecode.clearnlp.tokenization.AbstractTokenizer;
import com.googlecode.clearnlp.util.UTInput;
import com.googlecode.clearnlp.util.UTOutput;
// Import log4j classes.
import org.apache.log4j.Logger;
import org.apache.log4j.BasicConfigurator;
public class SRL
{
final String language = AbstractReader.LANG_EN;
static Logger logger = Logger.getLogger(SRL.class);
public SRL(String dictFile, String posModelFile, String depModelFile, String predModelFile, String roleModelFile, String srlModelFile, String inputFile, String outputFile) throws Exception
{
AbstractTokenizer tokenizer = EngineGetter.getTokenizer(language, new FileInputStream(dictFile));
AbstractComponent tagger = EngineGetter.getComponent(new FileInputStream(posModelFile) , language, NLPLib.MODE_POS);
AbstractComponent analyzer = EngineGetter.getComponent(new FileInputStream(dictFile) , language, NLPLib.MODE_MORPH);
AbstractComponent parser = EngineGetter.getComponent(new FileInputStream(depModelFile) , language, NLPLib.MODE_DEP);
AbstractComponent identifier = EngineGetter.getComponent(new FileInputStream(predModelFile), language, NLPLib.MODE_PRED);
AbstractComponent classifier = EngineGetter.getComponent(new FileInputStream(roleModelFile), language, NLPLib.MODE_ROLE);
AbstractComponent labeler = EngineGetter.getComponent(new FileInputStream(srlModelFile) , language, NLPLib.MODE_SRL);
AbstractComponent[] components = {tagger, analyzer, parser, identifier, classifier, labeler};
String sentence = "I'd like to meet Dr. Choi.";
process(tokenizer, components, sentence);
process(tokenizer, components, UTInput.createBufferedFileReader(inputFile), UTOutput.createPrintBufferedFileStream(outputFile));
}
public void process(AbstractTokenizer tokenizer, AbstractComponent[] components, String sentence)
{
DEPTree tree = NLPDecode.toDEPTree(tokenizer.getTokens(sentence));
for (AbstractComponent component : components)
component.process(tree);
System.out.println(tree.toStringSRL()+"\n");
}
public void process(AbstractTokenizer tokenizer, AbstractComponent[] components, BufferedReader reader, PrintStream fout)
{
AbstractSegmenter segmenter = EngineGetter.getSegmenter(language, tokenizer);
DEPTree tree;
for (List<String> tokens : segmenter.getSentences(reader))
{
tree = NLPDecode.toDEPTree(tokens);
for (AbstractComponent component : components)
component.process(tree);
fout.println(tree.toStringSRL()+"\n");
}
fout.close();
}
public static void main(String[] args)
{
BasicConfigurator.configure();
String dictFile = "/Users/ha/clearnlp/dictionary-1.3.1.jar"; // e.g., dictionary.zip
String posModelFile = "/Users/ha/clearnlp/ontonotes-en-pos-1.3.0.jar"; // e.g., ontonotes-en-pos.tgz
String depModelFile = "/Users/ha/clearnlp/ontonotes-en-dep-1.3.0.jar"; // e.g., ontonotes-en-dep.tgz
String predModelFile = "/Users/ha/clearnlp/ontonotes-en-pred-1.3.0.jar"; // e.g., ontonotes-en-pred.tgz
String roleModelFile = "/Users/ha/clearnlp/ontonotes-en-role-1.3.0.jar"; // e.g., ontonotes-en-role.tgz
String srlModelFile = "/Users/ha/clearnlp/ontonotes-en-srl-1.3.0.jar"; // e.g., ontonotes-en-srl.tgz
String inputFile = "/Users/ha/NetBeansProjects/StanfordPOSCode/src/stanfordposcode/input.txt";
String outputFile = "/Users/ha/NetBeansProjects/StanfordPOSCode/src/stanfordposcode/output.txt";
try
{
new SRL(dictFile, posModelFile, depModelFile, predModelFile, roleModelFile, srlModelFile, inputFile, outputFile);
}
catch (Exception e) {e.printStackTrace();}
}
}
I got the following error:
........
13084 [main] INFO com.googlecode.clearnlp.classification.model.StringModel - Loading model:
java.lang.NullPointerException
at com.googlecode.clearnlp.tokenization.EnglishTokenizer.normalizeNonUTF8(EnglishTokenizer.java:362)
at com.googlecode.clearnlp.tokenization.EnglishTokenizer.getTokenList(EnglishTokenizer.java:111)
at com.googlecode.clearnlp.tokenization.AbstractTokenizer.getTokens(AbstractTokenizer.java:61)
at stanfordposcode.SRL.process(SRL.java:54)
at stanfordposcode.SRL.<init>(SRL.java:48)
at stanfordposcode.SRL.main(SRL.java:95)
BUILD SUCCESSFUL (total time: 18 seconds)
I already added all the jar files:
http://i.stack.imgur.com/cIECT.png
how can I solve this error?
and is there a better SRL that I can use?
Thanks in advance
Original issue reported on code.google.com by [email protected]
on 15 Jan 2015 at 3:58
I was looking at the static factory methods in EngineGetter, and I noticed
several methods like this:
static public AbstractSegmenter getSegmenter(String language, AbstractTokenizer tokenizer)
{
if (language.equals(AbstractReader.LANG_EN))
return new EnglishSegmenter(tokenizer);
return null;
}
It seems that instead of returning null, these methods should return an
IllegalArgumentException that says "the requested language is not currently
supported".
Original issue reported on code.google.com by lee.becker
on 27 Oct 2012 at 5:33
What steps will reproduce the problem?
1. parse a sentence 'The train leaves at 5pm.' using
EngineProcess.getDEPTree(...)
2. print the resultant DEPTree
3. experience the NPE
What is the expected output? What do you see instead?
1 The the DT _ 2 det _ _
2 train train NN _ 3 nsubj 3:A0 _
3 leaves leave VBZ pb=leave.XX 0 root _ _
4 at at IN _ 3 prep 3:AM-TMP _
5 5 0 CD _ 6 num _ _
6 pm pm NN _ 4 pobj _ _
7 . . . _ 3 punct _ _
Null pointer stacktrace
What version of the product are you using? On what operating system?
1.2.1
Please provide any additional information below.
add a null check before Collections.sort(...)
private String toString(List<DEPArc> heads)
{
StringBuilder build = new StringBuilder();
Collections.sort(heads);
for (DEPArc arc : heads)
{
build.append(DEPLib.DELIM_HEADS);
build.append(arc.toString());
}
if (build.length() > 0)
return build.substring(DEPLib.DELIM_HEADS.length());
else
return AbstractColumnReader.BLANK_COLUMN;
}
Original issue reported on code.google.com by [email protected]
on 9 Nov 2012 at 11:08
_This was originally posted at our forum by Lee Becker_
Would it be possible to add APIs to the factory methods in EngineGetters to
accept InputStreams? Currently they only accept modelFiles or dataFiles as
Strings. It would be useful to accept InputStreams so that the developer can
decide whether it comes from a File, URL, or URI. This will also assist
integration into UIMA-based systems like ClearTK or cTAKES.
For example, these would all be useful interfaces:
static public DEPParser getDEPParser(InputStream modeInputStream)
static public Pair<POSTagger[],Double> getPOSTaggers(InputStream
modelInputStream) throws Exception
static public AbstractTokenizer getTokenizer(String language, InputStream
dictInputStream)
Thanks,
Lee
Original issue reported on code.google.com by [email protected]
on 29 Oct 2012 at 7:02
Hi,
I am using clearNLP for tokenization and I am using the clearNLP APIs for this.
I am following the example code give by
you(https://github.com/clearnlp/clearnlp-demo/blob/master/src/main/java/com/clea
rnlp/demo/DemoNLPDecode.java). But I am getting this error when I try to
initialize "tokenizer". Here are the details :
================
String text ="here goes my text. Let's see how well does it perform"
String language = AbstractReader.LANG_EN;
AbstractTokenizer clearNLPTokenizer = NLPGetter.getTokenizer(language);
String modelType = "general-en";
List<String> tokens = this.clearNLPTokenizer.getTokens(text);
But I get error in line 3:
Exception in thread "main" java.lang.UnsupportedClassVersionError:
com/clearnlp/nlp/NLPGetter : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
==================
I have included all the jar files provided in
here(http://clearnlp.wikispaces.com/file/detail/clearnlp-lib-2.0.2.tgz) and I
have also included the dictionary jar.
I am making some mistake in using clearNLP. Please help me out.
Original issue reported on code.google.com by [email protected]
on 18 Mar 2014 at 12:34
When writing a tokenization unit test for the ClearTK wrappers for ClearNLP, I
found an inconsistency between OpenNLP's tokenization and ClearNLP's.
Consider the string:
String s = "\"John & Mary's dog,\" Jane thought (to herself).\n"
+ "\"What a #$%!\n"
+ "a- ``I like AT&T''.\""
I was expecting the following tokenization as this is what our unit test for
OpenNLP produces:
", John, &, Mary, 's, dog, ,, ", Jane, thought, (, to, herself, ), ., ", What,
a, #, $, %, !, a, -, ``, I, like, AT&T, '', ., "
ClearNLP's output is slightly different:
", John, &, Mary, 's, dog, ,, ", Jane, thought, (, to, herself, ), ., ", What,
a, #, $, %, !, a, -, `, `, I, like, AT, &, T, ', ', ., "
Specifically, the discrepancies are:
`` vs `,`
AT&T vs AT, &, T
'' vs ', '
Is this just a different style of tokenization or is it incorrect? Does it
make a difference for the parser?
Original issue reported on code.google.com by lee.becker
on 27 Oct 2012 at 6:01
For those of us wrapping in ClearNLP in another framework, it would be useful
to have lightweight, low-memory models to test how the ClearNLP APIs interface
with our own code.
Original issue reported on code.google.com by lee.becker
on 29 Oct 2012 at 12:16
What steps will reproduce the problem?
1. Run "java com.googlecode.clearnlp.run.MPAnalyze -c input\config_en_morph.xml
-i input\morph-sample.txt" as given in the Wiki
What is the expected output? What do you see instead?
input\morph-sample.txt.morph
java.lang.NullPointerException
at com.googlecode.clearnlp.morphology.EnglishMPAnalyzer.getException(EnglishMPAnalyzer.java:340)
at com.googlecode.clearnlp.morphology.EnglishMPAnalyzer.getLemmaAux(EnglishMPAnalyzer.java:306)
at com.googlecode.clearnlp.morphology.EnglishMPAnalyzer.getLemma(EnglishMPAnalyzer.java:274)
at com.googlecode.clearnlp.morphology.AbstractMPAnalyzer.lemmatize(AbstractMPAnalyzer.java:60)
at com.googlecode.clearnlp.run.MPAnalyze.analyze(MPAnalyze.java:87)
at com.googlecode.clearnlp.run.MPAnalyze.<init>(MPAnalyze.java:73)
at com.googlecode.clearnlp.run.MPAnalyze.main(MPAnalyze.java:96)
What version of the product are you using? On what operating system?
ClearNLP version 1.3.0
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 22 Feb 2013 at 2:20
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.