Giter Club home page Giter Club logo

deep-coref's People

Contributors

angledluffa avatar clarkkev avatar cruncher64 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-coref's Issues

Memory requirement for training on the conll-2012 corpus

Hi, I am trying to train your model on a AWS p2x instance (with a 12 Go K80 GPU) on the Conll-2012 corpus (2802 documents in the training dataset). The training eats all (RAM) memory (64 Go) in less than 30 % of the first epoch and gets killed before finishing it.

I was wondering on what type of machine you trained it ?
Is 64 Go of RAM too small for training on the conll corpus ?

Hello clarkkev

I am a master from ECNU(China), and my field is coreference resolution. I am reading your papers now, and your papers is very enlightening to me. Thank you for your paper.Last week I will translate your papers in my ppt,and I will explain your paper to my classmates.
If you don't mind, could you give me your email in private? I hope to know you and have a chance to communicate with you. My email is [email protected] or [email protected]

                               Thank you!

Error: A JNI error has occurred, please check your installation and try again

Hi, I'm trying to train a new model.

My console command:

java -Xmx5g -cp stanford-corenlp.jar:stanford-corenlp-models-current.jar:* edu.stanford.nlp.coref.neural.NeuralCorefDataExporter /home/PC/Desktop/Files/CoreNLP/src/edu/stanford/nlp/coref/properties/neural-turkish-conll.properties .

Here is my properties file:

coref.algorithm = neural
coref.conll = true

coref.data = /home/PC/Desktop/Files/CoreNLP/input.txt.conll
coref.conllOutputPath = /home/PC/Desktop/Files/CoreNLP
coref.scorer = /home/PC/Downloads/cc.tr.300.vec

Error Ouput:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: javax/json/JsonValue
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: javax.json.JsonValue
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more

What causes the error? How can I solve this?

example_file.txt file format needed, to run already trained model

I am trying to run an already-trained model. So can you provide the example_file.txt file format.

java -Xmx5g -cp stanford-corenlp-3.7.0.jar:stanford-corenlp-models-3.7.0.jar:* edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,coref -coref.algorithm neural -file example_file.txt

error Attempted to fetch annotator "parse" but the annotator pool does not store any such type!

Hello, I am studying the paper "Deep Reinforcement Learning for Mention-Ranking Coference Models" and follow the steps to train my own model. But in the third step(Run the NeuralCorefDataExporter class in the development version of Stanford's CoreNLP), I occur this error. I cannot find a solution in Google. Thank you!

This is my neural-english-conll.properties:
coref.algorithm = neural
coref.conll = true
coref.data = /home/hengru/deep-coref/conll-2012
coref.conllOutputPath = /home/hengru/deep-coref/data/data_out
coref.scorer = /home/hengru/deep-coref/data/scorer

This is the error:
hengru@dc6:~/CoreNLP/target$ java -Xmx2g -cp javax.json.jar:stanford-corenlp-3.7.0.jar:stanford-corenlp-models-3.7.0.jar:* edu.stanford.nlp.coref.neural.NeuralCorefDataExporter neural-english-conll.properties /home/hengru/deep-coref/data/data_out/
Jul 25, 2017 6:52:56 PM edu.stanford.nlp.coref.docreader.CoNLLDocumentReader
INFO: Reading 1940 CoNLL files from /home/hengru/deep-coref/conll-2012/v4/data/train/data/english/annotations/
Adding annotator lemma
Adding annotator mention
Using mention detector type: rule
Attempted to fetch annotator "parse" but the annotator pool does not store any such type!
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.coref.md.CorefMentionFinder.parse(CorefMentionFinder.java:646)
at edu.stanford.nlp.coref.md.CorefMentionFinder.findSyntacticHead(CorefMentionFinder.java:536)
at edu.stanford.nlp.coref.md.CorefMentionFinder.findHead(CorefMentionFinder.java:456)
at edu.stanford.nlp.coref.md.RuleBasedCorefMentionFinder.findMentions(RuleBasedCorefMentionFinder.java:100)
at edu.stanford.nlp.pipeline.MentionAnnotator.annotate(MentionAnnotator.java:102)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:648)
at edu.stanford.nlp.coref.data.DocumentMaker.nextDoc(DocumentMaker.java:145)
at edu.stanford.nlp.coref.CorefDocumentProcessor.run(CorefDocumentProcessor.java:40)
at edu.stanford.nlp.coref.CorefDocumentProcessor.run(CorefDocumentProcessor.java:25)
at edu.stanford.nlp.coref.neural.NeuralCorefDataExporter.exportData(NeuralCorefDataExporter.java:182)
at edu.stanford.nlp.coref.neural.NeuralCorefDataExporter.main(NeuralCorefDataExporter.java:189)

need a details guide on how to train the model

"Run the NeuralCorefDataExporter class in version Stanford's CoreNLP using the neural-coref-conll properties file. This does mention detection and feature extraction on the CoNLL data and then outputs the results as json"

how to do this, which command and its parameter?

ssplit.eolonly rises NullPointerException at edu.stanford.nlp.pipeline.NERCombinerAnnotato

So, basically we have an already tokenised corpus with golden sentence segmentation, which we want to preserve. Evidently, we found this parameters :
tokenize.whitespace = true
ssplit.eolonly = true

They work alright together with tokenize,ssplit,pos,lemma and parses, but it we want to pass all the annotators needed for the coreference resolution
annotators = tokenize,ssplit,pos,lemma,ner,parse,coref

it gives error Nullpointer exception specifically on NER annotation part.

Processing file /Users/nikahelicopter/Dropbox/data/new_gold/txt/xx00.txt ... writing to /Users/nikahelicopter/Downloads/stanford-corenlp-full-2018-10-05/xx00.txt.xml
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(NERCombinerAnnotator.java:322)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:637)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:647)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1226)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1060)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1326)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1389)

We use stanford-corenlp-full-2018-10-05 version 3.9.2
An example file:
xx01.txt

Parameters:
annotators = tokenize,ssplit,pos,lemma,ner,parse,coref
tokenize.whitespace = true
ssplit.eolonly = true
coref.algorithm = neural
file = /Users/nikahelicopter/Dropbox/data/new_gold/txt/xx00.txt

Calling NeuralCorefDataExporter

Hi there, in the latest CoreNLP, I found this class under edu.stanford.nlp.coref.neural.NeuralCorefDataExporter instead of edu.stanford.nlp.coref.NeuralCorefDataExporter.

Minh

Error: A JNI error has occurred

Can you please help, we are getting below mention error

~/softwares/CoreNLP-master$ java -Xmx2g -cp "stanford-corenlp.jar:stanford-corenlp-models-3.7.0.jar:*" edu.stanford.nlp.coref.neural.NeuralCorefDataExporter coref.properties “/home/development/devendra/deep-coref-master/output”
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: javax/json/JsonValue
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: javax.json.JsonValue
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more

This is the text from properties file

coref.algorithm = neural
coref.conll = true

coref.data = /home/devendra/Downloads/Internship+MTP/MTP_Stage_II/deep-coref-master/gold_conll/
coref.conllOutputPath = /home/devendra/Downloads/Internship+MTP/MTP_Stage_II/deep-coref-master/output/

Getting Error: ImportError: cannot import name Graph, while training the model

I am getting below error while training the model, following the given steps. Seems this code uses Graph, which is no more supported by newer version of Keras. Can you please help me to fix the problem.

devendra@krishna:~/deep-coref-master_2.7$ python run_all.py 
Using Theano backend.
Traceback (most recent call last):
  File "run_all.py", line 5, in <module>
    import clustering_learning
  File "/home/development/devendra/deep-coref-master_2.7/clustering_learning.py", line 2, in <module>
    import clustering_models
  File "/home/development/devendra/deep-coref-master_2.7/clustering_models.py", line 2, in <module>
    import pairwise_models
  File "/home/development/devendra/deep-coref-master_2.7/pairwise_models.py", line 9, in <module>
    from keras.models import Graph
ImportError: cannot import name Graph

Exception when training its own models

Hello,

I'm trying to reproduce the training and unfortunately I'm running to an exception during the process. I have extracted the features using the NeuralCorefDataExporter to get the features in JSON, and then run the Python code with python run_all.py. After a moment I get the following exception:

Loading data
Traceback (most recent call last):
  File "run_all.py", line 93, in <module>
    train_best_model()
  File "run_all.py", line 88, in train_best_model
    train_and_test_pairwise(model_properties.MentionRankingProps(), mode='reward_rescaling')
  File "run_all.py", line 68, in train_and_test_pairwise
    train_pairwise(model_props, mode=mode)
  File "run_all.py", line 59, in train_pairwise
    pretrain(model_props)
  File "run_all.py", line 33, in pretrain
    pairwise_learning.train(model_props, n_epochs=150)
  File "/opt/deep-coref/pairwise_learning.py", line 313, in train
    model_props, with_ids=True)
  File "/opt/deep-coref/datasets.py", line 309, in __init__
    for ana in range(0, me - ms)])
ValueError: need at least one array to concatenate

After checking the JSON from train, dev and test, apparently they contains empty features like:

{"mentions":{},"labels":{},"pair_feature_names":["same-speaker","antecedent-is-mention-speaker","mention-is-antecedent-speaker","relaxed-head-match","exact-string-match","relaxed-string-match"],"pair_features":{},"document_features":{"type":1,"source":"wb"}}

Which I suppose is normal as if there is no coref, there is nothing to extract then. So I checked the code in datasets.py to print the content of doc_mentions and indeed, the value me - ms can be 0 as the content looks like:

[[0   117]
 [117   305]
 [305  522]
 ...,
 [71818 71818]
 [71818 71818]
 [71818 71818]]

I certainly did something wrong in my process but I don't see what. Any help will be appreciated.

Thanks!

String matching features?

I'm reading Clark and Manning (2016) and the list of features include "String Matching Features: Head match, exact string match, and partial string match."

I look at datasets.py but can't pinpoint where those features are implemented.

Could anybody help me?

Clark, K., & Manning, C. D. (2016). Improving Coreference Resolution by Learning Entity-Level Distributed Representations. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 643–653. http://doi.org/10.18653/v1/P16-1061

cannot find edu/stanford/nlp/models/

I just followed the example, and got an error message.
java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL

There is no subdirectory "models"

Availability of pre-trained model

Hello @clarkkev ,

First of all thanks for pushing this code on GitHub. As training this model requires 7 days on GTX TITAN, can you please refer the link or push the pre-trained model so that it can be used for transfer learning?

Thanks.

dev conll score converge at first few epochs

When I train the model with conll 2012 data, the dev conll score is 0.657888797045302 at the first epoch. And remain stable around 0.658. This is even better than your result 0.657.

The dev conll score of the first 10 epochs:
(0, 0.657888797045302)
(1, 0.6585094296994586)
(2, 0.6582570233258541)
(3, 0.6582991986114178)
(4, 0.6590045611564873)
(5, 0.6580189901300686)
(6, 0.6585166713504456)
(7, 0.6574807587739219)
(8, 0.660184576001444)
(9, 0.6592850994459925)
(10, 0.6583200324261912)

What I did was just following the 4 steps in the instruction. Do you have any idea how does this happen?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.