cogcomp / cogcomp-nlpy Goto Github PK

View Code? Open in Web Editor NEW

116.0 15.0 26.0 339 KB

CogComp's light-weight Python NLP annotators

Home Page: http://nlp.cogcomp.org/

License: Other

Python 100.00%

natural-language-processing text-processing nlp text-mining data-mining

cogcomp-nlpy's Issues

Generate api docs

Generate a standard-looking functional api.

File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/core/text_annotation.py", line 78, in _extract_char_offset assert sentence[characterId] == tokens[tokenId][tokenLength], sentence[characterId] + " expected, found " + tokens[tokenId][tokenLength] + " instead in sentence: " + sentence; AssertionError: � expected, found s instead in sentence:

Hello,
How can I fix the following?
Thanks for the help.

@realDonaldTrump @FoxNews @seanhannity @CNN @andersoncooper HE IS TRUMP!!!!!!!!!!!!!!!!! https://t.co/E0JGWvSKFB
NER_CONLL view: this view does not have constituents in your input text. 
Hillary Clinton�s Candidacy Reveals Generational Schism Among Women https://t.co/6u3lmN7nIL
Traceback (most recent call last):
  File "ccg_test_remote.py", line 14, in <module>
    doc = pipeline.doc(df.iloc[i]['Tweet'])
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/pipeline_base.py", line 38, in doc
    return TextAnnotation(response, self)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/core/text_annotation.py", line 34, in __init__
    self.char_offsets = self._extract_char_offset(self.text, self.tokens)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/core/text_annotation.py", line 78, in _extract_char_offset
    assert sentence[characterId] == tokens[tokenId][tokenLength], sentence[characterId] + " expected, found " + tokens[tokenId][tokenLength] + " instead in sentence: " + sentence;
AssertionError: � expected, found s instead in sentence: Hillary Clinton�s Candidacy Reveals Generational Schism Among Women https://t.co/6u3lmN7nIL

Remote server fails when specifying it with a trailing "/"

This fails:

pipeline = remote_pipeline.RemotePipeline(server_api='http://www.fancyUrlName.com:8080/')

but this works:

pipeline = remote_pipeline.RemotePipeline(server_api='http://www.fancyUrlName.com:8080')

Remove the temporary cache folders created by MapDB before running local pipeline

When I am testing the code, sometime it will throw out the following exception JVM exception occurred: Header checksum broken. Store was not closed correctly, or is corrupted when I initialize local pipeline. And it can be solved by removing the annotation-cache file refer to pipeline's documentation

Should I check and remove the file before initializing local pipeline, or there is another way to solve it?
@danyaljj @bhargav

Overriding the `str` function for objects

We have to override the __str__ so that upon printing the view object, it shoes the content of the object properly:
http://stackoverflow.com/a/4912856/1164246

NER Not working properly

SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT .
CHINA SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT .
AL-AIN , United Arab Emirates 1996-12-06

All three sentences get no named entities. Is it because of mentions being in caps?

@danyaljj

Config not there and setting some options.

If the config is not there and I set some options, it would give errors:

In [1]: from sioux import pipeliner # first load the module
   ...: pipeliner.init(use_server = True)  # this will ensure that I am using the pipeline server remotely
   ...: 
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-d5c0268ed507> in <module>()
----> 1 from sioux import pipeliner # first load the module
      2 pipeliner.init(use_server = True)  # this will ensure that I am using the pipeline server remotely

/usr/local/lib/python3.5/site-packages/sioux/pipeliner.py in <module>()
     23 Constructor of the pipeliner to setup the api address of pipeline server
     24 """
---> 25 config, models_downloaded = pipeline_config.get_current_config()
     26 
     27 # web server info

/usr/local/lib/python3.5/site-packages/sioux/pipeline_config.py in get_current_config()
     28         logger.warn('Models not found, using pipeline web server. To use pipeline locally, please refer the documentation for downloading models.')
     29 
---> 30     with codecs.open(config_file,mode='r',encoding='utf-8') as f:
     31         config.read_string(f.read())
     32     return config, models_downloaded

/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/codecs.py in open(filename, mode, encoding, errors, buffering)
    893         # Force opening of the file in binary mode
    894         mode = mode + 'b'
--> 895     file = builtins.open(filename, mode, buffering)
    896     if encoding is None:
    897         return file

FileNotFoundError: [Errno 2] No such file or directory: '/Users/daniel/.sioux/config.cfg'

Logging

For future, we should use some Python logging instead of print statements. https://logmatic.io/blog/python-logging-with-json-steroids/
Currently, only the configuration file that is packaged is being read. We should make sure that the user is able to provide his own configuration file as an override.

unable to Use Some Annotators

I am Unable to Use Some of Annotators Like Relations And Coref(Stanford)
as these are avaliable in the the package of cogcomp-nlp and also present in the demo version

Can Anyone Please Help

A few syntax suggestion

So I met with @haowu4 and he suggested a few things:

Using Python @property decorator you can get rid of parentheses: http://stackoverflow.com/questions/17330160/how-does-the-property-decorator-work
Simpifying the syntax: How easy/hard is to have this:

pipeline = remote_pipeline.RemotePipeline()
doc = pipeline.doc("Hello, how are you. I am doing fine")
print(doc.get_lemma) # will produce (hello Hello) (, ,) (how how) (be are) (you you) (. .) (i I) (be am) (do doing) (fine fine)
print(doc.get_pos) # will produce (UH Hello) (, ,) (WRB how) (VBP are) (PRP you) (. .) (PRP I) (VBP am) (VBG doing) (JJ fine)

Instead of this:

pipeline = remote_pipeline.RemotePipeline()
doc = pipeline.doc("Hello, how are you. I am doing fine")
print(pipeline.get_lemma(doc)) # will produce (hello Hello) (, ,) (how how) (be are) (you you) (. .) (i I) (be am) (do doing) (fine fine)
print(pipeline.get_pos(doc)) # will produce (UH Hello) (, ,) (WRB how) (VBP are) (PRP you) (. .) (PRP I) (VBP am) (VBG doing) (JJ fine)

Minor suggestion on usage

It seems like Pipeliner isn't necessary to be a class, we can make the pipeliner.py file as a collection of functions, then we can use it with usage similar to numpy:

>>> from sioux import pipeliner as p
>>> ta = p.doc("Hello world! I am Guan.")
>>> pos = p.get_pos(ta)
>>> print(pos)
POS view: (UH Hello) (NN world) (. !) (PRP I) (VBP am) (NNP Guan) (. .)

link doc no found!

Hello, link doc no found!
I would love to see this documentation, please!
http://cogcomp.org/software/doc/ccg_nlpy/pipeliner.m.html

infinite loop?

Inside get_view function we call add_view function, and inside add_view function we call get_view.
There's danger of falling in an infinite recursive loop.

NER Prediction Issue

When I run NER_ONTONOTES on the following text through ccg_nlpy.local_pipeline:

And still ahead on News Night . I carry this little packet my kid gave me . See what my daughter /- Daddy I love you . I miss you . be safe . come home ASAP . Danger on the ground in Iraq and under it . IED Improvised Explosive Devices take a terrible toll . And prison barely dented her popularity . but can Martha Stewart survive her primetime ratings ? Everyday in Iraq US soldiers work very hard to reassure local civilians train the Iraqi army and stay alive . but with the increasing power of Improvised Explosive Devices IEDs staying alive takes a lot of training and time . Eddie Shrahmin is imbedded with American soldiers in northern Bavil province . As the uh informants /- Come here Strovell . listen up please . Like the soldiers he commands Lieutenant Colonel Ross Brown suits up daily trying to rid his area of roadside bombs . Whenever you roll out of the gate and you re out there operating you never know if you re going to hit one of these or not . The first stop today is Route Tampa some of the worst stretch of highway in what is called the Triangle of Death . Salaam Alakum . where these stall owners Brown told are aware of impending attacks . Did you know in advance that the IED was going to go off here ? . Tell him to look me in the eye . . tell me that again . . It is a fine line to walk routing out information without creating new enemies battling an insurgency that kills at will , that turns civilians into accomplices . I think they re scared to death . I think they see us as temporary and they got to live with those people forever . Finding friends locally seems the toughest part of Brown s strategy . but his next task proves just as difficult . Here the Lieutenant Colonel has stopped at one of the firm bases one of the areas that Iraqis are manning their own position . The commander on duty emerges out of uniform . and the Lieutenant colonel struggles to find progress . they did nt do too much work yesterday . They did nt do too much work the day before . They have nt done too much work since they ve been here . . Brown is unsure if this unit can survive an insurgent attack uncertainties shared by the US forces as well each soldier with his own way to cope . Hey Morales you carry anything special with you on missions to help you out ? Well I carry my wedding ring a bracelet my wife sent me /- I carry a Bible Psalms ninety - one . A picture of an angel an archangel . I carry this little packet my kid gave me . See what my daughter /- Daddy I love you . I miss you . be safe . come home ASAP ? Is nt that cool ? All right let s go . Overhead helicopters are responding to an IED attack that moments ago killed Colonel William Wood the highest ranking US officer to die in combat in Iraq a personal friend of Brown s an added personal reason why tomorrow he ll be suiting up again . Eddie Shrahmin CNN northern Bavil Province .

I get the following 'PERSON' mentions:
[u'Stewart survive', u'province', u'suits up', u'are', u'strategy', u'unsure', u'carry', u'ranking US', u'personal']

Is this just because the model is making incorrect predictions, could there be some issue in the Python API, or is it that I am doing something wrong? Right now I am using the result from the pipeline as follows:
res = json.loads(pipeline.call_server(text, 'NER_ONTONOTES'))
, where text is the above text. Then I look at each constituent in res['views'][0]['viewData'][0]['constituents'] and if the label of the constituent is equal to 'PERSON', I added it to the list. I get the text for each constituent basically by:
words = text.split() for constituent in res['views'][0]['viewData'][0]['constituents'].keys(): if res['views'][0]['viewData'][0]['constituents'][constituent]['label'] == 'PERSON': start = res['views'][0]['viewData'][0]['constituents'][constituent]['start'] end = res['views'][0]['viewData'][0]['constituents'][constituent]['end'] print(words[start:end])

@danyaljj

Clarify: when I start with "init()" does it mean it adds all the views.

Update documentation regarding model download

Explain the need to download models and how to change CogComp versions.
Talk about the PyJNIUS way to run a JVM and point to the CogComp/cogcomp-nlp

create a test for pickle + unpickling a sample document

mention maven

Mention that we need maven.

why get_ner_ontonotoes is like 6% better than get_ner_conll?

Hi not an issue but I would like to know why I get an overall of 76.4% accuracy on tweet entities for get_ner_conll and 82.15% using get_ner_ontonotes? Is it something you expect to get better accuracy? What is the rationale behind this difference?

Also, if I am using the ontonotes model is it enough to cite your 'swiss army knife of nlp' paper or do I need to cite any other thing(?) that initially has invented OnToNotes?
http://christos-c.com/papers/khashabi_18_cogcompnlp.pdf

@inproceedings{KhashabiSaZhre18,
  author = {Daniel Khashabi and Mark Sammons and Ben Zhou and Tom Redman and Christos Christodoulopoulos and Vivek Srikumar and Nicholas Rizzolo and Lev Ratinov and Guanheng Luo and Quang Do and Chen-Tse Tsai and Subhro Roy and Stephen Mayhew and Zhilli Feng and John Wieting and Xiaodong Yu and Yangqiu Song and Shashank Gupta and Shyam Upadhyay and Naveen Arivazhagan and Qiang Ning and Shaoshi Ling and Dan Roth},
  title = {CogcompNLP: Your Swiss Army Knife for NLP},
  booktitle = {LREC},
  year = {2018},
  url = {papers/cogcompnlp2018.pdf},
  code = {https://github.com/CogComp/cogcomp-nlp},
  pubtype = conf
}

Also, is this sentence correct or how would you correct it?
For CogComp-NLP NER, we used Ontonotes 5.0 NER model.

Is this correct bibtex for OntoNotes 5.0 ?
@Article{weischedel2013ontonotes,
title={Ontonotes release 5.0 ldc2013t19},
author={Weischedel, Ralph and Palmer, Martha and Marcus, Mitchell and Hovy, Eduard and Pradhan, Sameer and Ramshaw, Lance and Xue, Nianwen and Taylor, Ann and Kaufman, Jeff and Franchini, Michelle and others},
journal={Linguistic Data Consortium, Philadelphia, PA},
year={2013}
}

https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf

Thanks

How to disable logging?

When annotating many small documents the output is overwhelming.
Any way to disable the output logging?

INFO:ccg_nlpy.pipeline_config:Using pipeline web server with API: http://macniece.seas.upenn.edu:400
INFO:ccg_nlpy.remote_pipeline:pipeline has been set up
INFO:ccg_nlpy.pipeline_config:Using pipeline web server with API: http://macniece.seas.upenn.edu:400
INFO:ccg_nlpy.remote_pipeline:pipeline has been set up
INFO:ccg_nlpy.pipeline_config:Using pipeline web server with API: http://macniece.seas.upenn.edu:400

Memory usage.

Also, did you get to do some basic analysis of the time taken and RAM used for simple pipeline-based annotation tasks? It would be good to document this.

Create class specific to each view

For each view (TreeView, TokenLabelView, etc) create a specialized class.

proper error message when the view does not exist.

Currently when the view doesn't exist, we don't show any errors.
We have to print proper error message if the view doesn't exist:

>>> from sioux.pipeliner import Pipeliner
>>> p = Pipeliner()
>>> doc = p.doc("Hello, how are you. I am doing fine")
>>> p.get_view(doc, "WEIRD_VIEW_NAME").getCons()
[u'', u'', u'', u'', u'', u'', u'', u'', u'', u'']
>>>

Tokens not aligned with Ontonotes Tokens

When I create a TextAnnotation for some text, the resulting tokens are not in the Ontonotes format. For instance, if I make a TextAnnotation for "Bin Laden 's", the tokens are ["Bin", "Laden", "'", "s"]. This is problematic, for instance, when I'm trying to compare the NER results that I get from the system with the gold results. Is there a way in which I can specify the list of tokens as input rather than the full text string?

Rename `get_end_pos` so that it doesn't get confused with "POS" view.

Here:
https://github.com/CogComp/sioux/blob/fd4231cc26c4aace01f356f70caeb8145e5cb407/sioux/pipeliner.py#L121-L122

Get sentences

We have sentence boundaries. Create a method for getting the sentence boundaries.

Python packaging and publishing

http://python-packaging.readthedocs.io/en/latest/minimal.html
http://docs.python-guide.org/en/latest/writing/structure/

Presenting view contains other views

I was testing with some views and found out my fix on #10 will throw out error for views like shallow parse and srl nom (which also means the fix is incomplete). And I noticed that:

The responses of those views from web server contain more than one related view (POS, lemma ...), my question is how should we present those views when user asks for them?

One of the possible ways is to have a dictionary for search view and response views:
ex. { SRL_NOM : [LEMMA, NER_CONLL, PARSE_STANFORD, POS, ...] }
and returns a list of information in response views when user want to get something from it:
ex. print(SRL_NOM)
output: LEMMA : content, NER_CONLL : content, ...

Add a minimum model version

The local pipeline hello world code was failing with the error

AttributeError: type object 'edu.illinois.cs.cogcomp.annotation.BasicTextAnnota' has no attribute 'createTextAnnotationFromListofListofTokens'

It turns out it was because the models I had in ~/.ccg_nlpy were outdated and the createTextAnnotationFromListofListofTokens method did not exist in the jar files that I had. After I redownloaded the models, the code ran successfully.

We should add a minimum required model version and check at runtime whether or not the available models are valid. If not, throw an error with instructions to download the new models.

This might have solved #106

No found Local Pipeline.

I would like help with Local Pipeline, I get the following error when I try to play the initial experiment.

INFO:ccg_nlpy.pipeline_config:Using local pipeline
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Traceback (most recent call last):
File "/home/rodriguesfas/Workspace/Text-Mining/Tool_NLP/CogComp/example-02.py", line 5, in
pipeline = local_pipeline.LocalPipeline()
File "/home/rodriguesfas/anaconda3/envs/text_minning/lib/python2.7/site-packages/ccg_nlpy/local_pipeline.py", line 62, in__init__
self.pipeline = self.PipelineFactory.buildPipelineWithAllViews(self.Boolean(True))
File "jnius/jnius_export_class.pxi", line 906, in jnius.JavaMultipleMethod.call
File "jnius/jnius_export_class.pxi", line 637, in jnius.JavaMethod.call
File "jnius/jnius_export_class.pxi", line 803, in jnius.JavaMethod.call_staticmethod
File "jnius/jnius_utils.pxi", line 93, in jnius.check_exception
JavaException: JVM exception occurred: Index: 0, Size: 0

Local_PIpeline: why protobuff?

@GHLgh @bhargav you guys remember why we are writing the output of the remote server into a proobuff file and then read it again? (instead of just returning the json output of the server).
https://github.com/CogComp/cogcomp-nlpy/blob/master/ccg_nlpy/local_pipeline.py#L70-L77

I have one guess, but I am not sure if it is correct: it is possible that we are not able to read the output of the java code directly, so we have to write it on disk, and read it again. Is that correct?

Serialization call fails to write the temporary proto

Thanks a lot for this awesome toolbox! After solving the issue #86 with the provided fix, another problem arises when running the same test example:

$ python test.py
INFO:ccg_nlpy.pipeline_config:Using local pipeline
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.6 sec].
INFO:ccg_nlpy.local_pipeline:pipeline has been set up
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    doc = pipeline.doc(d)
  File "/cluster/home/psarlin/test_cogcomp/env/lib/python3.6/site-packages/ccg_nlpy/pipeline_base.py", line 36, in doc
    response = self.call_server(text, "TOKENS")
  File "/cluster/home/psarlin/test_cogcomp/env/lib/python3.6/site-packages/ccg_nlpy/local_pipeline.py", line 75, in call_server
    with open(path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/cluster/home/psarlin/.ccg_nlpy/temp.temp'

It seems that the serialization call

cogcomp-nlpy/ccg_nlpy/local_pipeline.py

Line 70 in 5c3008d

self.ProtobufSerializer.writeToFile(text_annotation,path)

does not produce any file. Any suggestion ?

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core)
Release:        7.4.1708
Codename:       Core
$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
$ python --version
Python 3.6.1

Download required jars programmatically

We should be able to download jars from Maven Repositories to run our Java code.

One option is to download/bundle the jar for ivy and use that to resolve dependencies.
Other options is to but all of them on the datastore/minio and download. Resources would have to be bundled according to version of core-utilities.
Or implement our own maven client. [Problem: Recursive discovery of dependencies in all poms]

can't install ccg_nlpy (pyjnius install fails)

using Ubuntu 16.04.4, I get the following error during the pip install ccg_nlpy process:

  running build_ext
  cythoning jnius/jnius.pyx to jnius/jnius.c
  building 'jnius' extension
  creating build/temp.linux-x86_64-3.6
  creating build/temp.linux-x86_64-3.6/jnius
  gcc -pthread -B /home/mssammon/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/software/sun-jdk-1.8.0/latest-x86_64/jre/include -I/software/sun-jdk-1.8.0/latest-x86_64/jre/include/linux -I/home/mssammon/miniconda3/include/python3.6m -c jnius/jnius.c -o build/temp.linux-x86_64-3.6/jnius/jnius.o
  jnius/jnius.c:566:17: fatal error: jni.h: No such file or directory
  compilation terminated.
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for pyjnius
  Running setup.py clean for pyjnius
Failed to build pyjnius
Installing collected packages: pyjnius, ccg-nlpy
  Running setup.py install for pyjnius ... error
    Complete output from command /home/mssammon/miniconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-2n162x1g/pyjnius/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ai9rbwjx-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    copying jnius_config.py -> build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/jnius
    copying jnius/signatures.py -> build/lib.linux-x86_64-3.6/jnius
    copying jnius/__init__.py -> build/lib.linux-x86_64-3.6/jnius
    copying jnius/reflect.py -> build/lib.linux-x86_64-3.6/jnius
    running build_ext
    skipping 'jnius/jnius.c' Cython extension (up-to-date)
    building 'jnius' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/jnius
    gcc -pthread -B /home/mssammon/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/software/sun-jdk-1.8.0/latest-x86_64/jre/include -I/software/sun-jdk-1.8.0/latest-x86_64/jre/include/linux -I/home/mssammon/miniconda3/include/python3.6m -c jnius/jnius.c -o build/temp.linux-x86_64-3.6/jnius/jnius.o
    jnius/jnius.c:566:17: fatal error: jni.h: No such file or directory
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/home/mssammon/miniconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-2n162x1g/pyjnius/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-ai9rbwjx-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-2n162x1g/pyjnius/

Reached maximum query limit

Currently the output being shown is:

ValueError: No JSON object could be decoded

A problem about getting 'verb_srl' view with LocalPipeline.

Hi,

I want to get 'verb_srl' view with LocalPipeline.
However, I am meeting with a problem about edu.illinois.cs.cogcomp.sl.

from ccg_nlpy import local_pipeline

text = "Hello,  how are you.\n\n\n I am doing fine."

doc = [
    ["Hello", ",", "how", "are", "you", "."],
    ['I', 'am', 'doing', 'fine', '.']
]

local_p = local_pipeline.LocalPipeline()

local_tokenized_doc = local_p.doc(doc, pretokenized=True)
srl = local_tokenized_doc.get_srl_verb
print(srl)

Some error log about this problem:

Important Error Log:
... SOME LOG ...
Load trained Models.....
java.io.InvalidClassException: edu.illinois.cs.cogcomp.sl.core.SLModel; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = -2449880137698216590
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at edu.illinois.cs.cogcomp.sl.core.SLModel.loadModel(SLModel.java:81)
	at edu.illinois.cs.cogcomp.depparse.DepAnnotator.initialize(DepAnnotator.java:68)
	at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126)
	at edu.illinois.cs.cogcomp.annotation.Annotator.<init>(Annotator.java:106)
	at edu.illinois.cs.cogcomp.depparse.DepAnnotator.<init>(DepAnnotator.java:55)
	at edu.illinois.cs.cogcomp.depparse.DepAnnotator.<init>(DepAnnotator.java:51)
	at edu.illinois.cs.cogcomp.depparse.DepAnnotator.<init>(DepAnnotator.java:41)
	at edu.illinois.cs.cogcomp.pipeline.main.PipelineFactory.buildAnnotators(PipelineFactory.java:305)
	at edu.illinois.cs.cogcomp.pipeline.main.PipelineFactory.buildPipeline(PipelineFactory.java:183)
	at edu.illinois.cs.cogcomp.pipeline.main.PipelineFactory.buildPipelineWithAllViews(PipelineFactory.java:227)
... SOME LOG ...
ERROR:ccg_nlpy.local_pipeline:Failed to add view SRL_VERB
ERROR:ccg_nlpy.local_pipeline:JVM exception occurred: edu.illinois.cs.cogcomp.sl.util.WeightVector.dotProduct(Ledu/illinois/cs/cogcomp/sl/util/IFeatureVector;I)F
INFO:ccg_nlpy.core.text_annotation:The view is the collection of the following views: ['LEMMA', 'NER_CONLL', 'PARSE_STANFORD', 'POS', 'SENTENCE', 'SHALLOW_PARSE', 'TOKENS']
Traceback (most recent call last):
  File "debug.py", line 16, in <module>
    srl = local_tokenized_doc.get_srl_verb
  File "/home/yaojie/anaconda2/envs/py36/lib/python3.6/site-packages/ccg_nlpy/core/text_annotation.py", line 165, in get_srl_verb
    return self.get_view("SRL_VERB")
  File "/home/yaojie/anaconda2/envs/py36/lib/python3.6/site-packages/ccg_nlpy/core/text_annotation.py", line 264, in get_view
    if not isinstance(self.view_dictionary[view_name], list):
KeyError: 'SRL_VERB'

JNius requirements

Add the JNIus dependency to requirements.txt and setup.py

text_annotation = self.pipeline.createBasicTextAnnotation("", "", text) AttributeError: 'NoneType' object has no attribute 'createBasicTextAnnotation'

What is the minimum working example of a code that I feed in a string and gives me the named entities?

from ccg_nlpy import local_pipeline

pipeline = local_pipeline.LocalPipeline()

d = "RT @HuffingtonPost BREAKING: Hillary Clinton wins #NVCaucus https://t.co/ZVCgIDvrX1"

doc = pipeline.doc(d)

if doc is not None:
    # do sth with it
    ner_view = doc.get_ner_conll

For above code, I get the following error:

$ python ccg_ner.py
WARNING:ccg_nlpy.pipeline_config:Models not found. To use pipeline locally, please refer the documentation for downloading models.
INFO:ccg_nlpy.pipeline_config:Using local pipeline
ERROR:ccg_nlpy.local_pipeline:Fail to load models, please check if your Java version is up to date.
Traceback (most recent call last):
  File "ccg_ner.py", line 7, in <module>
    doc = pipeline.doc(d)
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/pipeline_base.py", line 36, in doc
    response = self.call_server(text, "TOKENS")
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/ccg_nlpy/local_pipeline.py", line 63, in call_server
    text_annotation = self.pipeline.createBasicTextAnnotation("", "", text)
AttributeError: 'NoneType' object has no attribute 'createBasicTextAnnotation'

My server is running on 02:13:57 INFO LabeledChuLiuEdmondsDecoder:72 - Loading cached PoS-to-dep dictionary from deprels.dict
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.4 sec].
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].
02:14:01 INFO MainServer:67 - Done with loading the pipeline . . .
02:14:01 INFO MainServer:227 - ##### Used Memory[MB]:1532
02:14:01 INFO MainServer:230 - / Free Memory[MB]:1027
02:14:01 INFO MainServer:233 - / Total Memory[MB]:2560
02:14:01 INFO MainServer:236 - / Max Memory[MB]:31858
02:14:01 INFO log:186 - Logging initialized @120702ms
02:14:01 INFO EmbeddedJettyServer:126 - == Spark has ignited ...
02:14:01 INFO EmbeddedJettyServer:127 - >> Listening on 0.0.0.0:8080
02:14:01 INFO Server:345 - jetty-9.3.6.v20151106
02:14:01 INFO ServerConnector:270 - Started ServerConnector@67e0a155{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
02:14:01 INFO Server:397 - Started @120785ms

Config parser for python

https://docs.python.org/3/library/configparser.html

Not everything should be a property in TextAnnotation

ta.get_pos and ta.get_ner should be vanilla methods instead of being properties.

For properties, I suggest have ta.pos and ta.ner if we want that functionality.

NER_CONLL nor working ....

This view doesn't work:

>>> p.get_view(doc, "NER_CONLL").getCons()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sioux/core/view.py", line 29, in getCons
    for constituent in self.viewJson["viewData"][0]["constituents"]:
KeyError: 'constituents'

Failed to load models, Class not found

I encountered this error when running with java 1.8

ERROR:ccg_nlpy.local_pipeline:Fail to load models, please check if your Java version is u
p to date.
ERROR:ccg_nlpy.local_pipeline:Class not found b'edu/illinois/cs/cogcomp/pipeline/main/Pip
elineFactory'

Code Coverage on Testing

I think we should have code coverage for testing to make sure unit tests cover everything.

This is the library I found and we can see how to incorporate it into our tests.

And related stackoverflow post:
http://stackoverflow.com/questions/34728734/python-unit-test-coverage

NER problem

I am trying to run NER_ONTONOTES on the following text:

But we begin tonight with an ongoing mystery . Who is that lone man discovered last month frozen in California 's Sierra Nevada mountains with just a few tantalizing clues to his identity ? This we do know at this hour . The man died more than sixty years ago . He was in the military apparently a World War Two airman . And while it looks like his plane crashed in the mountains so did twenty - five other planes around the same time . Thelma Gutierrez went inside the forensic laboratory where scientists are trying to solve this mystery . An address book a plastic comb a vintage penny . You 're looking at the last things a young airman put into his pockets on the day he died clues to a World War Two cold case that you 're about to see for the very first time . It is a mystery that begins high in the Sierra Nevada Mountains at the bottom of a glacier . Two weeks ago climbers discovered a frozen man face down in the snow still in his army air force uniform and an unopened silk parachute . After six decades the airman is exhumed from his icy tomb and thawed out . but he 's wearing no military dog tags or ID . Did this World War Two pilot perish when his training flight crashed in the mountains like twenty - five other ill fated flights more than sixty years ago ? The search for clues takes us to Honolulu Hawaii to the joint powmia accounting command or jpac . The mystery of the frozen airman is just one of more than a thousand different unsolved cases that scientists here at jpac are trying to solve . In this laboratory alone I 'm surrounded by the remains of at least twenty different service members who are in the process of being identified so that they too can go home . The investigation begins with a team of forensic specialists who probe and study the airman 's bones teeth and his belongings to piece together who he is . And almost immediately clues begin to surface . Dr. Robert Mann a forensic anthropologist has determined that the airman was Caucasian and had fair hair . Next though I 'm going to have to look at his clavicles . The airman 's collar bones and pelvic bones prove that he was in his twenties and died in an airplane crash . This is a person who likely died on impact versus perhaps freezing to death up in the mountains . I think that uh the injuries were so substantial um and severe that he would n't have felt anything . He would have died immediately . One two /- Another important clue . He has a significant number of uh fillings . Like the bones xrays of his wisdom teeth also tell us something about his age . Root tips are closed . they 're sealed up which is more indicative of someone who would be at least twenty - one years old . And Dr. Andy Henry notices something else . The airmen have straight teeth . so he had a nice smile good teeth . Yes I would have to say that yes . Then there are the material clues the things he had on him when he died that offer a snapshot into who he could have been . If there 's a badge or anything else that /- We know he was wearing a World War Two army air force uniform . Remnants of his sweater undergarments and socks are still intact . And more clues emerge from his tattered uniform a corroded nameplate this pin on his collar and this army aircorps insignia . When I found these insignia I was happy to see them . And our young white - haired airman also carried this black comb and some pocket change forty - five cents worth . Some of these dimes are in ranging from nineteen thirty - six to nineteen forty - two . In his uniform breast pocket Dr. Paul Emanovsky found this vintage Schafer pen and three small leatherbound address books . The pages have been decomposing . but could they contain names of friends and loved ones ? And we 're going to put it in a spectral comparator . At first nothing . then like magic clues begin to emerge . Uh you can see all these letters from the calendar Sunday Monday Tuesday up at the top one nine four and two . Nineteen forty - two . Yeah . After hours of meticulous examination of each address book they yield no personal information clues that could have faded with time . And so while we still do not know who our twenty - something fair - haired airman is enormous progress has been made . Out of the thousands of unidentified World War Two service members Dr. Mann says they 've narrowed it down to just ten . So what was it like to grow up there ? In Pleasant Grove we recently met these three sisters all in their eighties who have high hopes that the frozen airman proves to be their big brother Glen Munn whose plane went missing in the Sierra back in nineteen forty - two . Oh I just wanted you know to know that he was found and that we can have him brought home here for burial and /- We do n't know that though . We do n't . but that 's my wishes . And until they learn otherwise they say they will keep that hope alive . And in the weeks and months ahead scientists are convinced they will identify this airman and return him home to his family wherever they might be . Thelma Gutierrez CNN Honolulu Hawaii .

Here is what I get from running pipeline.call_server(text, "NER_ONTONOTES") and converting the result to a dictionary (where text is the text above):
{u'corpusId': u'',
u'id': u'',
u'sentences': {u'generator': u'UserSpecified',
u'score': 1.0,
u'sentenceEndPositions': [1]},
u'text': u'\xff\xff\xff\xff',
u'tokenOffsets': [{u'endCharOffset': 4,
u'form': u'\xff\xff\xff\xff',
u'startCharOffset': 0}],
u'tokens': [u'\xff\xff\xff\xff'],
u'views': [{u'viewData': [{u'generator': u'NER_ONTONOTES-annotator',
u'score': 1.0,
u'viewName': u'NER_ONTONOTES',
u'viewType': u'edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView'}],
u'viewName': u'NER_ONTONOTES'},
{u'viewData': [{u'constituents': [{u'end': 1,
u'label': u'',
u'score': 1.0,
u'start': 0}],
u'generator': u'UserSpecified',
u'score': 1.0,
u'viewName': u'TOKENS',
u'viewType': u'edu.illinois.cs.cogcomp.core.datastructures.textannotation.TokenLabelView'}],
u'viewName': u'TOKENS'}]}

Please note that I am using Python 2.7 in a virtualenv. When I tried using Python 3.4 outside a virtualenv, this worked, but I need to use Python 2.7 for my application. It doesn't work when I try using Python 2.7 outside a virtualenv either. Thank you!
@danyaljj (I'm guessing I should tag you, but please let me know if I shouldn't.)

examples

Find all the named entities with label "PER" followed by POS=V.
Find a bunch of documents mentioning both Obama and Trump (or just trump), analyze the distribution of name-entities,

Show relations in the view.

Here in the view:
https://github.com/CogComp/sioux/blob/fb67e448fcde5010413e2ce7dded54b95461be51/sioux/core/view.py#L7-L17

We only print the constituents; while we should also print the relations too. Maybe like this:

relation-label (source-cons-label, target-cons-label)

For example for parse tree:

det(construction-2, The-1)
nsubj(finished-8, construction-2)
case(library-7, of-3)
det(library-7, the-4)
compound(library-7, John-5)
compound(library-7, Smith-6)
nmod(construction-2, library-7)
root(ROOT-0, finished-8)
case(time-10, on-9)
nmod(finished-8, time-10)

In both cases, the user will need a check (check for key VS. check for empty list) and I not sure which is more standard. IMO, returning an empty list is more intuitive.

What ever you think is correct. @danyaljj

add functionality to get overlapping constituents

Suppose you choose a constituent from dependency view.
How can you get its lemma?
We have to implement similar functionalities like the ones we have in our java library.
https://github.com/CogComp/cogcomp-nlp/blob/35dea894ea9b02d1158dafec53c941c4e40b7547/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/datastructures/textannotation/View.java#L400

cogcomp / cogcomp-nlpy Goto Github PK

cogcomp-nlpy's Issues

Recommend Projects

Recommend Topics

Recommend Org