fhamborg / giveme5w1h Goto Github PK

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

License: Apache License 2.0

Python 0.21% HTML 99.79%

question answering news event-detection event-extraction fivewoneh fivew 5w1h 5w question-answering

giveme5w1h's People

Contributors

Stargazers

Watchers

Forkers

gipplab jayeshwebline stenpiren knutole physikerwelt sidharthamallarapu shubhampachori12110095 sjibak fpinvidio xianhuaxizi pkishan69 zluo fernandoruizanton bingoko harrytuckerr motazsaad geonux rogersf kyoungrok0517 jirlong boshez nanavatisneha yratof atescobar awyshw zxlzr kznmft zgq7799 lastshogun jeromegill pranciskus yuweifamily meirmarcovich jordenro ackafever victortowne mr0din sunnymarkliu wkryst wesley-weiming nikolausn bigapartmentsin itonly prashanthcz titasdas sirajsandhu ml-ai-nlp-ir hbeychaner kgbicheno maximalwert bobycv06fpm basingh jin8 sujoykroy linatal intrical-ai rmarcacini deng-fankang greitzmann zfrb dddominikk seejay inquiron aleksandermako putssander baobei347 amalbinessa oumaymabech kleag aahmadai yusufcandraarif vishalbelsare kukupigs fjen aglaiawong cheetah97 itaru2622 xrq682 arenabox musta1337 diegopaucarv akram-95 fuugamo farallonwest

giveme5w1h's Issues

Other languages compatibility?

I want to know what is need to adapt it to another language? (e.g Brazilian Portuguese)
Or is it possible to integrate with Polyglot which already implement a lot of NLTK compatible languages?

Issue zipfile.BadZipFile: File is not a zip file

Results are not encouraging

Describe the bug
The output of who did what to whom when and where and how is not really encouraging..
Who - should not have ner tag O
Whom - should not have ner tag O
Where - should not have ner tag O

Also when run on whole news corpus the result does not make any sense. Running on individual sentence produce better result.

To Reproduce
After filtering by NER on Who, Whom, Where etc below results are achieved.
Run on following sample text to see the output.
Extractor run on sentence by sentence generate little better result but does not generate all the events properly.
Article 1:
Indian Navy Commander Abhilash Tomy, who suffered a severe back injury in September after his yacht was hit by a vicious storm with 14-metre-high waves mid-way across south Indian Ocean, says it will take him another couple of years to fully recover from the injury.

He was participating in the Golden Globe Race 2018(GGR) representing India in the historic race without modern navigation aids. He had to drop out of the competition due to the severity of the hit and was rescued after three days from Indian Ocean.

In an exclusive interview to CNN-News18, Commander Tomy says the experience of going through the storm was a once-in-a-lifetime moment and a complete ‘paisa vasool’. Excerpts:

Q. Yours has been an incredible journey. How does it feel to be alive?

It's good.

Not yet, not completely. My neurosurgeon has said that I've recovered, but I think I'm not back to being normal yet. It'll take another couple of years to be normal and then I'll start working on my fitness.

Q. You're the only Asian to have competed in the Golden Globe. Do you remember the moment when it happened? What you were thinking back then?

Very well. I've been through storms before…I usually look forward to storms. It's a once-in-a-life moment. You get a 30 knot storm almost every week, but a 70 knot storm is...

Q. It doesn't sound fun at all. Commander Tomy: For me, it's paisa vasool. I get my money back if I go through a storm like that. Otherwise I can always stay at home, you know what I mean. We were given some sort of warning about the storm. Nobody had any idea that this would turn out to be so bad. So the night before the 20th, I prepared the boat, took the main sail down and latched it completely. I checked the mast, checked all the split pins, I checked for cracks, damages or anything that could go wrong. I did full inspection of the boat and was prepared for the storm the next day.

When you're in a storm like this, you adjust the boat in the direction of the waves so that the boat would sail. With waves coming from two directions, it became very difficult to judge which the better way to put the boat is. If you put it in one direction, one wave slaps you. The other way round, the other wave slaps you. The first wave got to me close to noon (India time). The boat had a knock down, its boom broke. I went inside and it was a complete mess, everything was thrown out everywhere. I put things back in place ... the gas was leaking and I put the galley back in place, switched off the gas.

The side glass broke and diesel was leaking out, float boats were stuck on the roof. I put everything back, all the charts, everything back in place. Then I went out and started sailing the boat again and there was a second knock down and I found myself on top of the mast. I fell from there on top of the boom. The mast was about nine meters high and the waves were 15 meters high. I don't know by what height I fell. But I landed on the boom, fell on the deck and I thought there was something wrong with my back. I went inside and again started cleaning up the mess. Finally, after 30 minutes, I stood up and my knees were not obeying me. I was collapsing. I thought I would lie down for some time and maybe after 30 minutes I'll sail again. But it didn't improve. So I pulled myself on to the bunk and secured myself there.

Q. What happened in those three days when you were in the ocean all by yourself? It was like a black hole...how did you survive? What do you drink, eat? What was going on in your mind?

I'm lying in my bunk and the race organisers (are) asking questions and they want me to activate my EPER, which is the emergency processing indication radiator beacon. I don't know the extent of my injury — I think it's the lower back as my back has become stiff. And I decide to wait for a day, 24 hours. And maybe if the back is better, I can take the boat to Mauritius or to Australia.

Q. At any point during those three days did you think, ‘Ab toh main gaya’ (I may not survive)?

Not a chance. I'm a reconnaissance pilot in the navy. I've done this exercise so many times — gone looking for survivors, been the survivor, you know, in simulated drills. I know how this plays out.

Q. You clearly have recovered from the trauma. Has your family given an ultimatum of no more sailing?

There’s nothing like that. In fact, I was speaking to my wife one day and was kind of testing the waters. I told her, ‘Five years ago if I would have introduced myself, say as an IT engineer or something like that, would you have married me?’ She said no. I told her, ‘See, I'm a sailor, that's why you like me. So maybe I should go back to sailing’. Then she got the drift and said, ‘You get better, get fit, then maybe you can’.

Q. What about your mom?

We were having this conversation day before yesterday. She asked me, ‘I hope you're not planning to do something like this again’. And I said, ‘Maybe if I get fit I could think of these things’. And mom said that's

Output:
-- who --
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
('Tomy', 1.0)
-- what --
('was hit by a vicious storm', 1.0)
('has said that I', 1.0)
("'ve recovered", 1.0)
('think I', 1.0)
("'m not back to being normal yet", 1.0)
('was participating in the Golden Globe Race 2018 in the historic race without modern navigation aids', 1.0)
('will take him', 1.0)
("'ll take another couple and then I", 1.0)
("'ll start working on my fitness", 1.0)
('had to drop out of the competition and was rescued after three days', 1.0)
('has been an incredible journey', 1.0)
("'s good", 1.0)
("'ve been through storms before … I", 1.0)
('look forward to storms', 1.0)
("'re the only Asian", 1.0)
('happened', 1.0)
('get my money', 1.0)
('go through a storm like that', 1.0)
('were thinking back then', 1.0)
('can always stay at home', 1.0)
('mean', 1.0)
("'s a once-in-a-life moment", 1.0)
('prepared the boat , took the main sail and latched it', 1.0)
('get a 30 knot storm', 1.0)
('checked the mast , checked all the split pins , I', 1.0)
('checked for cracks', 1.0)
("does n't sound fun", 1.0)
('did full inspection and was prepared for the storm', 1.0)
("'s paisa vasool", 1.0)
('know what I', 1.0)
('went inside', 1.0)
('put things', 1.0)
('put the galley', 1.0)
('would sail', 1.0)
("'re in a storm", 1.0)
('adjust the boat', 1.0)
('became very difficult to judge', 1.0)
('put everything', 1.0)
('put it in one direction', 1.0)
('went out and started sailing the boat and there was a second knock down and I', 1.0)
('found myself on top of the mast', 1.0)
('slaps you', 1.0)
('fell from there', 1.0)
('had a knock down', 1.0)
('broke', 1.0)
('got to me close to noon ( India time', 1.0)
("do n't know by what height I", 1.0)
('fell', 1.0)
('was a complete mess', 1.0)
('landed on the boom , fell on the deck and I', 1.0)
('thought there', 1.0)
('was thrown out everywhere', 1.0)
('went inside and again started cleaning up the mess', 1.0)
('was leaking', 1.0)
('stood up', 1.0)
('were not obeying me', 1.0)
('was collapsing', 1.0)
('thought I', 1.0)
('would lie down for some time and maybe after 30 minutes', 1.0)
("'ll sail again", 1.0)
('pulled myself on to the bunk and secured myself', 1.0)
('was about nine meters', 1.0)
("'m lying in my bunk", 1.0)
('to activate my EPER', 1.0)
("do n't know the extent", 1.0)
('think it', 1.0)
('has become stiff', 1.0)
('decide to wait for a day , 24 hours', 1.0)
('were in the ocean all by yourself', 1.0)
('can take the boat or to Australia', 1.0)
('may not survive', 1.0)
("'m a reconnaissance pilot", 1.0)
("'ve done this exercise", 1.0)
('know how this', 1.0)
("'s the lower back as my back", 1.0)
('is better', 1.0)
('was speaking to my wife and was kind', 1.0)
('told her', 1.0)
('would have introduced myself', 1.0)
("'m a sailor", 1.0)
('have recovered from the trauma', 1.0)
('should go back to sailing', 1.0)
('hope you', 1.0)
('get fit I', 1.0)
('could think of these things ’', 1.0)
('get better , get fit', 1.0)
('can ’', 1.0)
('got the drift and said , ‘ You , then maybe you', 1.0)
("'re not planning to do something again", 1.0)
('asked me', 1.0)
('expects of me', 1.0)
-- where --
('Indian Ocean', 0.7)
('India', 0.7)
-- why --
('his yacht', 0.5860000000000001)
('it', 0.5860000000000001)
('He', 0.5860000000000001)
('Commander Tomy', 0.5860000000000001)
('the experience of going through the storm', 0.5860000000000001)
('Q. Yours', 0.5860000000000001)
('you', 0.5860000000000001)
('I', 0.5860000000000001)
('It', 0.5860000000000001)
('You', 0.5860000000000001)
('We', 0.5860000000000001)
('The mast', 0.5860000000000001)
('there', 0.5860000000000001)
('me', 0.5860000000000001)
-- how --
('a severe back injury in September after his yacht was', 1.0)
('a vicious storm with 14-metre-high waves mid-way across south Indian', 1.0)
('with 14-metre-high waves mid-way across south Indian Ocean , says', 1.0)
('waves mid-way across south Indian Ocean , says it will', 1.0)
('across south Indian Ocean , says it will take him', 1.0)
('to fully recover from the injury .', 1.0)
('the historic race without modern navigation aids .', 1.0)
('without modern navigation aids .', 1.0)
('from Indian Ocean .', 1.0)
('not back to being normal yet .', 1.0)
('competition due to the severity of the hit and was', 1.0)
('a once-in-a-lifetime moment and a complete ‘ paisa vasool ’', 1.0)
('a complete ‘ paisa vasool ’ .', 1.0)
('an exclusive interview to CNN-News18 , Commander Tomy says the', 1.0)
('an incredible journey .', 1.0)
('be alive ?', 1.0)
('thinking back then ?', 1.0)
("'s good .", 1.0)
('Not yet , not completely .', 1.0)
('not completely .', 1.0)
('being normal yet .', 1.0)
('normal yet .', 1.0)
("be normal and then I 'll start working on my", 1.0)
('only Asian to have competed in the Golden Globe .', 1.0)
('money back if I go through a storm like that', 1.0)
('Very well .', 1.0)
('I usually look forward to storms .', 1.0)
('look forward to storms .', 1.0)
('a once-in-a-life moment .', 1.0)
('storm almost every week , but a 70 knot storm', 1.0)
('Otherwise I can always stay at home , you know', 1.0)
('can always stay at home , you know what I', 1.0)
('so bad .', 1.0)
('the main sail down and latched it completely .', 1.0)
('it completely .', 1.0)
('go wrong .', 1.0)
('did full inspection of the boat and was prepared for', 1.0)
('very difficult to judge which the better way to put', 1.0)
('the better way to put the boat is .', 1.0)
('things back in place ... the gas was leaking and', 1.0)
('galley back in place , switched off the gas .', 1.0)
('everything back , all the charts , everything back in', 1.0)
('everything back in place .', 1.0)
('went inside and it was a complete mess , everything', 1.0)
('a complete mess , everything was thrown out everywhere .', 1.0)
('out everywhere .', 1.0)
('meters high and the waves were 15 meters high .', 1.0)
('meters high .', 1.0)
('something wrong with my back .', 1.0)
('went inside and again started cleaning up the mess .', 1.0)
("and maybe after 30 minutes I 'll sail again .", 1.0)
('Finally , after 30 minutes , I stood up and', 1.0)
('the lower back as my back has become stiff .', 1.0)
('lower back as my back has become stiff .', 1.0)
('a black hole ... how did you survive ?', 1.0)
('And maybe if the back is better , I can', 1.0)
('become stiff .', 1.0)
('I pulled myself on to the bunk and secured myself', 1.0)
('maybe I should go back to sailing', 1.0)
('is better , I can take the boat to Mauritius', 1.0)
("me , it 's paisa vasool", 1.0)
('toh main gaya ’ ( I may not survive )', 1.0)
("I 've recovered , but I think I 'm not", 1.0)
('so many times — gone looking for survivors , been', 1.0)
('in simulated drills .', 1.0)
('You clearly have recovered from the trauma .', 1.0)
('So maybe I should go back to sailing ’ .', 1.0)
('go back to sailing ’ .', 1.0)
('get better , get fit , then maybe you can', 1.0)
('then maybe you can ’ .', 1.0)
('‘ Maybe if I get fit I could think of', 1.0)
('get fit I could think of these things ’ .', 1.0)
("'s exactly what she expects of me .", 1.0)
('the', 1.0)
('this would turn out to be so bad', 1.0)
('some time and maybe after', 1.0)
('the storm', 1.0)
('cracks , damages or anything that could go wrong', 1.0)
('survivors , been the survivor , you know , in', 1.0)
('his yacht was hit by a vicious storm with 14-metre-high', 1.0)
('my back has become stiff', 1.0)
('an IT engineer or something like that , would you', 1.0)
('that the boat would sail', 1.0)

Expected behavior
Expecting more meaningful who-did-what-to-whom listing one by one.

Log
Add a log to help explain your problem.

Versions (please complete the following information):
OS: MacOS 10.13.6
Python Version 3.6

AttributeError: 'ParentedTree' object has no attribute 'unicode_repr'

Describe the bug:
In the script cause_extractor.py while trying to find a NP-VP-NP clause, uses unicode.repr() in line 131 which is not a method for parented trees in nltk and thus, results in an error. Would there be an alternate way to do this without using unicode.repr()?

Log
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/td/.local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
extractor.process(document)
File "/home/td/.local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 40, in process
self._extract_candidates(document)
File "/home/td/.local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 92, in _extract_candidates
for candidate in self._evaluate_tree(tree):
File "/home/td/.local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 131, in _evaluate_tree
if sibling.label() == 'VP' and "('NP'" in unicode_repr(sibling.label)):
AttributeError: 'ParentedTree' object has no attribute 'unicode_repr'

Input
Tried the same as the one given in the single file example.
titleshort = "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008."

title = "Taliban attacks German consulate in northern Afghan city of Mazar-i-Sharif with truck bomb"
lead = "The death toll from a powerful Taliban truck bombing at the German consulate in Afghanistan's Mazar-i-Sharif city rose to at least six Friday, with more than 100 others wounded in a major militant assault."
text = """The Taliban said the bombing late Thursday, which tore a massive crater in the road and overturned cars, was a "revenge attack" for US air strikes this month in the volatile province of Kunduz that left 32 civilians dead.
The explosion, followed by sporadic gunfire, reverberated across the usually tranquil northern city, smashing windows of nearby shops and leaving terrified local residents fleeing for cover.
"The suicide attacker rammed his explosives-laden car into the wall of the German consulate," local police chief Sayed Kamal Sadat told AFP.
All German staff from the consulate were unharmed, according to the foreign ministry in Berlin.
But seven Afghan civilians were killed, including two motorcyclists who were shot dead by German forces close to the consulate after they refused to heed their warning to stop, said deputy police chief Abdul Razaq Qadri.
A suspect had also been detained near the diplomatic mission on Friday morning, Qadri added.
Local doctor Noor Mohammad Fayez said the city hospitals received six dead bodies, including two killed by bullets.
At least 128 others were wounded, some of them critically and many with shrapnel injuries, he added.
"The consulate building has been heavily damaged," the German foreign ministry said in a statement. "Our sympathies go out to the Afghan injured and their families."
A diplomatic source in Berlin said Foreign Minister Frank-Walter Steinmeier had convened a crisis meeting.
"There was fighting outside and on the grounds of the consulate," a ministry spokesman said. "Afghan security forces and Resolute Support (NATO) forces from Camp Marmal (German base in Mazar-i-Sharif) are on the scene."
Afghan special forces have cordoned off the consulate, previously well-known as Mazar Hotel, as helicopters flew over the site and ambulances with wailing sirens rushed to the area after the explosion.
The carnage underscores worsening insecurity in Afghanistan as Taliban insurgents ramp up nationwide attacks despite repeated government attempts to jump-start stalled peace negotiations.
Taliban spokesman Zabihullah Mujahid said the "martyrdom attack" on the consulate had left "tens of invaders" dead. The insurgents routinely exaggerate battlefield claims.
Posting a Google Earth image of the consulate on Twitter, Mujahid said the assault was in retaliation for American air strikes in Kunduz.
US forces conceded last week that its air strikes "very likely" resulted in civilian casualties in Kunduz, pledging a full investigation into the incident.
The strikes killed several children, after a Taliban assault left two American soldiers and three Afghan special forces soldiers dead near Kunduz city.
The strikes triggered impassioned protests in Kunduz city, with the victims' relatives parading mutilated bodies of dead children piled into open trucks through the streets.
Civilian casualties caused by NATO forces have been one of the most contentious issues in the 15-year campaign against the insurgents, prompting strong public and government criticism.
The country's worsening conflict has prompted US forces to step up air strikes to support their struggling Afghan counterparts, fuelling the perception that they are increasingly being drawn back into the conflict.
The latest attack in Mazar-i-Sharif comes just two days after a bitter US presidential election.
Afghanistan got scarcely a passing mention in the election campaign - even though the situation there will be an urgent matter for the new president.
President-elect Donald Trump is set to inherit America's longest war with no end in sight.
"""
date_publish = '2016-11-10 07:44:00'

Versions

OS : Ubuntu 18.04.4 LTS
Python Version : 3.6.9
NLTK Version: 3.5

Example code does not work return self.get_answers(question=question)[0]

Describe the bug
The below example crashes.
https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_single_from_code.py

To Reproduce
run
https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_single_from_code.py

Expected behavior
File "/Users/samrat.saha/PycharmProjects/EventExtraction/event_extractor.py", line 84, in
top_when_answer = doc.get_top_answer('when').get_parts_as_text()
File "/Users/samrat.saha/miniconda3/envs/py36/lib/python3.6/site-packages/Giveme5W1H/extractor/document.py", line 151, in get_top_answer
return self.get_answers(question=question)[0]
IndexError: list index out of range

Process finished with exit code 1

Screenshots

Versions (please complete the following information):

OS: MacOS 10.13.6
Python Version 3.6

document._date never set but used

I found that the member _date in document.py is never actually being set, even though in environment_extractor.py its value is read. I suppose that the value should at some point be set with the value present in Document._rawData.publish_date, right? Could you do that? Probably during preprocessing?

Please also update afterward in https://github.com/bkrrr/Giveme5W/blob/6062f756de2f91f356d12cbfbf03741ecb25ba33/extractor/preprocessors/preprocessor_core_nlp.py#L88

Intallation error

after i run pip3 install giveme5w1h, i successfully installed the packages.
But when I try "giveme5w1h-corenlp install", the following error messages come

"from Giveme5W1H.examples.startup.environment import start
ImportError: No module named 'Giveme5W1H'
"

Versions (please complete the following information):

OS: [Ubuntu 16.04.6 LTS]
Python Version [3.6]
Giveme5W1H Version [1.2]
Stanford CoreNLP Version

giveme5w1h-corenlp install code not work for windows

When running giveme5w1h-corenlp install on windows, the code is not runable because rm, mv, unzip are not windows command.

For windows, need to change rm to del, mv to move and add an unzip.bat file for windows.

AttributeError: 'list' object has no attribute 'values'

I've installed the proper requireent files and am able to open a server and run the code aswell. But every time I run the script, it shows an error "AttributeError: 'list' object has no attribute 'values' " from the action_extractor.py line 104 i.e) doc_coref.values().
script: https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_single_from_code.py

action_extractor.py: https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/extractor/extractors/action_extractor.py

error from command line (kubuntu 19.10):
No extractors passed: initializing default configuration.
No combinedScorers passed: initializing default configuration.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/maheeth/Desktop/WP4/Giveme5W1H/Giveme5W1H/extractor/extractor.py", line 20, in run
extractor.process(document)
File "/home/maheeth/Desktop/WP4/Giveme5W1H/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process
self._evaluate_candidates(document)
File "/home/maheeth/Desktop/WP4/Giveme5W1H/Giveme5W1H/extractor/extractors/action_extractor.py", line 104, in _evaluate_candidates
if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'

Same probem comes while running in RestFul API

add some doc to combined scorer distanceofcandidate.py

it's currently unclear, what specifically happens. most importantly, what are the weights for? i've seen that the first weight[0] actually is not a weight, but a counter is checked against it. can you formulate the scoring process as an equation, e.g., in latex style. please either add that to the project report, or just within the comment in the code.

Issues with MasterExtractor()

Hi,

I am trying to get Giveme5W1H up and running but when calling MasterExtractor() in python I get the following:

ConfigurationError: Using Nominatim with default or sample user_agent "geopy/2.0.0" is strongly discouraged, as it violates Nominatim's ToS https://operations.osmfoundation.org/policies/nominatim/ and may possibly cause 403 and 429 HTTP errors. Please specify a custom user_agent with Nominatim(user_agent="my-application") or by overriding the default user_agent: geopy.geocoders.options.default_user_agent = "my-application".

Not sure what is wrong but I assume the issue might be caused by CoreNLP Server - I ran:

$ giveme5w1h-corenlp

and got:

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - using SR parser: edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - Threads: 12
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /[0:0:0:0:0:0:0:0]:9000

The readme said it could take a few minutes but I waited a long time and nothing ever happened - have anyone had the same issue?

where: check edinburgh geoparser

https://www.ltg.ed.ac.uk/software/geoparser/

"newsCluster" item in json files in Giveme5W1H/examples/datasets/40er/data

Thanks for your great work!

Recently, I'm working on a project to cluster news by the event. The item named "newCluster" in your dataset looks good for this purpose.

Is it extracted automatically? If so, would you mind provide us more details about how to get it?

"newsCluster": {
    "CategoryId": 1,
    "Category": "world",
    "TopicId": 2,
    "Topic": "legancy",
    "EventId": 49,
    "Event": "las_vegas_shooting",
    "Url": "http://usa.chinadaily.com.cn/world/2017-10/03/content_32788252.htm"
  }

add doc to learning classes

please add documentation to

I've added few todos with commit 81b8fc8
the classes/files Learn, Worker (in run.py), and evaluate.py could need some docstrings at file level (at the top), to inform other programmers what the class is actually doing

TypeError: unorderable types: int() < str()

Describe the bug
Hi,
There is an error occurs when I was applying the 5W1H extractor on my JSON news dataset.

The error occurs at evaluate_location file when it tried to run "raw_locations.sort(key=lambda x: x[1], reverse=True)", then the console gave the error says"TypeError: unorderable types: int() < str()".

My question is: Does this means something wrong with my dataset format? But if so shouldn't it consider all the news data as a simple long string when the extractor work on this corpus? I'm eagerly looking for a solution to this problem.

Log
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
extractor.process(document)
File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process
self._evaluate_candidates(document)
File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 75, in _evaluate_candidates
locations = self._evaluate_locations(document)
File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 224, in _evaluate_locations
raw_locations.sort(key=lambda x: x[1], reverse=True)
TypeError: unorderable types: int() < str()

Here is one of the news json news file that caused an error when I tried to analyse it with Giveme5w1h.

{
"title": "Martian rock named for Rolling StonesRolling Stones get name on little Martian rock that rolled",
"body": "PASADENA, Calif. - There is now a Rolling Stones Rock on Mars, and it's giving Mick, Keith and the boys some serious satisfaction.NASA named a little stone for the legendary rockers after its InSight robotic lander captured it rolling across the surface of Mars last year, and the new moniker was made public at Thursday night's Rolling Stones' concert at the Rose Bowl.NASA has given us something we have always dreamed of, our very own rock on Mars. I can't believe it, Mick Jagger told the crowd after grooving through a rendition of Tumbling Dice. I want to bring it back and put it on our mantelpiece.Robert Downey Jr. announced the name, taking the stage just before the band's set at the Southern California stadium that is just a stone's throw from NASA's Jet Propulsion Laboratory, which manages InSight.Cross-pollinating science and a legendary rock band is always a good thing, the Iron Man actor said backstage.He told the crowd that JPL scientists had come up with the name in a fit of fandom and clever association.Charlie, Ronnie, Keith and Mick - they were in no way opposed to the notion, Downey said, but in typical egalitarian fashion, they suggested I assist in procuring 60,000 votes to make it official, so that's my mission.He led the audience in a shout of aye before declaring the deed done.Jagger later said, I want to say a special thanks to our favorite action man Robert Downey Jr. That was a very nice intro he gave.The rock, just a little bigger than a golf ball, was moved by InSight's own thrusters as the robotic lander touched down on Mars on Nov. 26.It only moved about 3 feet, but that's the farthest NASA has seen a rock roll while landing a craft on another planet.I've seen a lot of Mars rocks over my career, Matt Golombek, a JPL geologist who has helped NASA land all its Mars missions since 1997, said in a statement. This one probably won't be in a lot of scientific papers, but it's definitely one of the coolest.The Rolling Stones and NASA logos were shown side by side in the run-up to the show as the sun set over the Rose Bowl, leaving many fans perplexed as to what the connection was before it was announced.The concert had originally been scheduled for spring, before the Stones postponed their No Filter North American tour because Jagger had heart surgery.",
"published_at": "2019-08-24",
}

Versions (please complete the following information):

OS: [e.g. Linux]
Python Version [e.g. 3.5]
Giveme5W1H Version [e.g. 1.2]
Stanford CoreNLP Version

eval: check if say discard for what (& who) candidates is necessary

Consider other NLP tools

@fhamborg you might be interested in https://github.com/zalandoresearch/flair developed by my former colleague @alanakbik
From a first glance, I get the impression that this library does not require starting a java rest server and is written in python. The performance results seem promising.

issue with geopy (after fresh installation)

felix@fxa:~/IdeaProjects/Giveme5W1H$ python3 -m examples.extracting.server
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/felix/IdeaProjects/Giveme5W1H/examples/extracting/server.py", line 55, in <module>
    extractor = FiveWExtractor()
  File "/Users/felix/IdeaProjects/Giveme5W1H/extractor/extractor.py", line 59, in __init__
    environment_extractor.EnvironmentExtractor(),
  File "/Users/felix/IdeaProjects/Giveme5W1H/extractor/extractors/environment_extractor.py", line 67, in __init__
    self._cache_nominatim = CacheManager.instance().get_cache('../examples/caches/Nominatim')
  File "/Users/felix/IdeaProjects/Giveme5W1H/extractor/tools/cache_manager.py", line 26, in get_cache
    instance = KeyValueCache(path)
  File "/Users/felix/IdeaProjects/Giveme5W1H/extractor/tools/key_value_cache.py", line 30, in __init__
    self._cache = pickle.load(ff)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/point.py", line 211, in __setstate__
    self.latitude, self.longitude, self.altitude = state
ValueError: not enough values to unpack (expected 3, got 2)

Upgrade Corenlp

Describe the bug
The current CoreNLP dependency is almost four years old.

Expected behavior
A more up to date version

`Encountered key without value` in corenlp console

Describe the bug
When I call doc = extractor.parse(doc), I get in CoreNLP server console:

java.lang.IllegalArgumentException: Encountered key without value
	at edu.stanford.nlp.util.StringUtils.decodeMap(StringUtils.java:2591)
	at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.getProperties(StanfordCoreNLPServer.java:738)
	at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:602)
	at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
	at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
	at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
	at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
	at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
	at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

To Reproduce

from newsplease import NewsPlease
from Giveme5W1H.extractor.document import Document
from Giveme5W1H.extractor.extractor import MasterExtractor

extractor = MasterExtractor()


def main():
    article = NewsPlease.from_url('https://www.foxnews.com/politics/house-democrat-subpoenas-mnuchin-irs-for-trumps-tax-returns')
    doc = Document.from_text(article.text, article.date_publish)
    doc = extractor.parse(doc)
    answers = doc.get_top_answer('who').get_parts_as_text()
    pass


if __name__ == '__main__':
    main()

Versions (please complete the following information):

OS: Running in Docker image Ubuntu 16.04 from Windows 10
Giveme5W1H Version: 1.0.15

can't connect to external CoreNLP host

Describe the bug
I noted the instructions for connecting to an external CoreNLP host. However, when I tried it I get an error saying Could not handle incoming annotation/AttributeError: 'list' object has no attribute 'values'. Two notes:

I had to specify the "http://" in front of my IP to get it to work, which is different than noted in the documentation
I am running a dockerized version of the java app (https://github.com/NLPbox/stanford-corenlp-docker). I verified that I can hit this install via HTTP and use it just fine (but can't share the URL because it is behind a firewall).

Wondering if I have things configured wrong or not.

To Reproduce
Clone the stanford-corenlp-docker repo and deploy it to a docker host. Modify a simple Giveme5W1H sample to try and connect to that:

from Giveme5W1H.extractor.preprocessors.preprocessor_core_nlp import Preprocessor
from Giveme5W1H.extractor.document import Document
from Giveme5W1H.extractor.extractor import MasterExtractor
preprocessor = Preprocessor('http://my.dockerized.server:9000')
extractor = MasterExtractor(preprocessor=preprocessor)
text = """The Taliban said the bombing late Thursday, which tore a massive crater in the road and overturned cars, was a "revenge attack" for US air strikes this month in the volatile province of Kunduz that left 32 civilians dead.
The explosion, followed by sporadic gunfire, reverberated across the usually tranquil northern city, smashing windows of nearby shops and leaving terrified local residents fleeing for cover.
"The suicide attacker rammed his explosives-laden car into the wall of the German consulate," local police chief Sayed Kamal Sadat told AFP.
"""
doc = Document.from_text(text)
doc = extractor.parse(doc)
parts = ['who', 'what', 'when', 'where', 'why', 'how']
for p in parts:
    print("{}: {}".format(p, doc.get_top_answer(p).get_parts_as_text()))

Expected behavior
I expect the w51h to be printed. Instead, after the extractor.parse(doc) line I get an error:

Could not handle incoming annotation
.... stack trace
File "[REMOVED PATH]/Giveme5W1H/extractor/extractors/action_extractor.py", line 104, in _evaluate_candidates
    if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'

Screenshots
None

Versions (please complete the following information):

OS: macOS 10.14.2
Python Version 3.6.5
Giveme5W1H Version 1.0.13
Stanford CoreNLP Version 3.9.1

Environment: use also organisation for where

Well known organisation have a location, but its seldom mentioned because everybody knows it.

dId:e8f99dba78d92238a950e1546491354e98b58b369741d67e74f5cf00
about: Houston_Astros (US-Baseball-Team)

Nominatim is also not aware of the location:
- https://nominatim.openstreetmap.org/search.php?q=Houston+Astros
Same for Wikipedia:
- https://de.wikipedia.org/wiki/Houston_Astros
And wikidata
- https://www.wikidata.org/wiki/Q848117
  - interesting fields
    - located in the administrative territorial entity (some, not Houston Astros)
    - country

Example:

"where": {
      "extracted": [
        {
          "parts": [
            [
              {
                "nlpToken": {
                  "index": 29,
                  "word": "City",
                  "originalText": "City",
                  "lemma": "City",
                  "characterOffsetBegin": 388,
                  "characterOffsetEnd": 392,
                  "pos": "NNP",
                  "ner": "LOCATION",
                  "speaker": "PER0",
                  "before": " ",
                  "after": " "
                },
                "aida": [
                  {
                    "mention": {
                      "allEntities": [
                        {
                          "kbIdentifier": "YAGO:Newcastle_City_Hall",
                          "disambiguationScore": "0.12285"
                        }
                      ],
                      "offset": 388,
                      "name": "City Hall",
                      "length": 9,
                      "bestEntity": {
                        "kbIdentifier": "YAGO:Newcastle_City_Hall",
                        "disambiguationScore": "0.12285"
                      }
                    },
                    "bestEntityMetadata": {
                      "knowledgebase": "YAGO",
                      "depictionurl": "http://upload.wikimedia.org/wikipedia/commons/5/51/Newcastle_City_Hall.jpg",
                      "depictionthumbnailurl": "http://upload.wikimedia.org/wikipedia/commons/thumbNewcastle_City_Hall.jpg/200px-Newcastle_City_Hall.jpg",
                      "importance": 0.013199101842693224,
                      "entityId": "Newcastle_City_Hall",
                      "type": [
                        "YAGO_yagoGeoEntity",
                        "YAGO_yagoPermanentlyLocatedEntity",
                        "YAGO_wordnet_physical_entity_100001930",
                        "YAGO_yagoLegalActorGeo",
                        "YAGO_wordnet_entity_100001740",
                        "YAGO_wordnet_object_100002684",
                        "YAGO_wordnet_area_108497294",
                        "YAGO_wordnet_scene_108645963",
                        "YAGO_wikicategory_Music_venues_in_Tyne_and_Wear",
                        "YAGO_wordnet_location_100027167",
                        "YAGO_wordnet_region_108630985",
                        "YAGO_wordnet_venue_108677628"
                      ],
                      "readableRepr": "Newcastle City Hall",
                      "url": "http://en.wikipedia.org/wiki/Newcastle%20City%20Hall"
                    }
                  }
                ]
              },
              "NNP"
            ],
            [
              {
                "nlpToken": {
                  "index": 30,
                  "word": "Hall",
                  "originalText": "Hall",
                  "lemma": "Hall",
                  "characterOffsetBegin": 393,
                  "characterOffsetEnd": 397,
                  "pos": "NNP",
                  "ner": "LOCATION",
                  "speaker": "PER0",
                  "before": " ",
                  "after": ""
                },
                "aida": [
                  {
                    "mention": {
                      "allEntities": [
                        {
                          "kbIdentifier": "YAGO:Newcastle_City_Hall",
                          "disambiguationScore": "0.12285"
                        }
                      ],
                      "offset": 388,
                      "name": "City Hall",
                      "length": 9,
                      "bestEntity": {
                        "kbIdentifier": "YAGO:Newcastle_City_Hall",
                        "disambiguationScore": "0.12285"
                      }
                    },
                    "bestEntityMetadata": {
                      "knowledgebase": "YAGO",
                      "depictionurl": "http://upload.wikimedia.org/wikipedia/commons/5/51/Newcastle_City_Hall.jpg",
                      "depictionthumbnailurl": "http://upload.wikimedia.org/wikipedia/commons/thumbNewcastle_City_Hall.jpg/200px-Newcastle_City_Hall.jpg",
                      "importance": 0.013199101842693224,
                      "entityId": "Newcastle_City_Hall",
                      "type": [
                        "YAGO_yagoGeoEntity",
                        "YAGO_yagoPermanentlyLocatedEntity",
                        "YAGO_wordnet_physical_entity_100001930",
                        "YAGO_yagoLegalActorGeo",
                        "YAGO_wordnet_entity_100001740",
                        "YAGO_wordnet_object_100002684",
                        "YAGO_wordnet_area_108497294",
                        "YAGO_wordnet_scene_108645963",
                        "YAGO_wikicategory_Music_venues_in_Tyne_and_Wear",
                        "YAGO_wordnet_location_100027167",
                        "YAGO_wordnet_region_108630985",
                        "YAGO_wordnet_venue_108677628"
                      ],
                      "readableRepr": "Newcastle City Hall",
                      "url": "http://en.wikipedia.org/wiki/Newcastle%20City%20Hall"
                    }
                  }
                ]
              },
              "NNP"
            ]
          ],
          "score": 0.6972377801930305,
          "text": "City Hall",
          "enhancement": {
            "openstreetmap_nominatim": {
              "place_id": "141523934",
              "licence": "Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright",
              "osm_type": "way",
              "osm_id": "317293294",
              "boundingbox": [
                42.049309,
                42.0496815,
                13.0389901,
                13.0394836
              ],
              "lat": "42.0494534",
              "lon": "13.0392638330905",
              "display_name": "city hall, Via Tiburtina Valeria, Oricola, RM, LAZ, 67063, Italia",
              "class": "historic",
              "type": "castle",
              "importance": 0.32063094963,
              "icon": "https://nominatim.openstreetmap.org/images/mapicons/tourist_castle.p.20.png"
            }
          },
          "nlpIndexSentence": 4
        }
      ],
      "label": "where"
    },

Error installing corenlp

Hello!
Thanks for making this project available, however I cannot successfully install the RESTful API for it. I run:
$ pip3 install giveme5w1h
$ giveme5w1h-corenlp install
The former installs successfully while the latter issues the following error:
giveme5w1h-corenlp: command not found
would you please advise? Thanks in advance

Error when I run the example

Describe the bug
I run the command "python3 -m Giveme5W1H.examples.extracting.parse_documents",
the following error occur.

Log
Add a log to help explain your problem.

Versions (please complete the following information):

OS: [Ubuntu]
Python Version [3.7]
Giveme5W1H Version [1.2]

Error whilst running extractor.parse(doc)

I'm getting an error when trying to run extractor.parse(doc). I've tried on both my text and also the example text and it occurs both times.

Error as follows:

Could not handle incoming annotation
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
    extractor.process(document)
  File "/usr/local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process
    self._evaluate_candidates(document)
  File "/usr/local/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/action_extractor.py", line 104, in _evaluate_candidates
    if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'

Thanks!

FileNotFoundError: While running giveme5w1h-rest example.

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

Hi,

I tried getting the 5W and the How from this article https://digiday.com/media/guardian-launches-nonprofit-fund-journalism/?utm_medium=email&utm_campaign=digidaydis&utm_source=publishing&utm_content=170828

For that I used:

news-please, to extract article title/text/date-publish (Thanks by the way, this tool is just awesome!)
I started the corenlp java on a remote server and configured the preprocessor like this:
from Giveme5W1H.extractor.preprocessors.preprocessor_core_nlp import Preprocessor preprocessor = Preprocessor('http://XX.XXX.XX.XXX:80')

and tried accessing it with this bit of code
extractor = FiveWExtractor(preprocessor=preprocessor) doc = Document(title=article.title, text=article.text, date=article.date_publish.strftime('%Y-%m-%d')) doc = extractor.parse(doc) print("done")
Unfortunately it yields the same error (after roughly 1 min).
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
I attached the logs for the extractor.parse call
logs.txt

Could you please help figure out how to make this work? Thanks!

Python 3.8 compatibility issue

I have the following code and I am trying to execute the library

extractor = MasterExtractor()
date_publish = '2016-11-10 07:44:00'
doc = Document.from_text(titleshort, date_publish)
print('doc annotation process....')
doc = extractor.parse(doc)
top_who_answer = doc.get_top_answer('who').get_parts_as_text()
top_what_answer = doc.get_top_answer('what').get_parts_as_text()
top_when_answer = doc.get_top_answer('when').get_parts_as_text()
top_where_answer = doc.get_top_answer('where').get_parts_as_text()
top_why_answer = doc.get_top_answer('why').get_parts_as_text()
top_how_answer = doc.get_top_answer('how').get_parts_as_text()
print(top_who_answer)
print(top_what_answer)
print(top_when_answer)
print(top_where_answer)
print(top_why_answer)
print(top_how_answer)

I got the following error after solving all other issues as appeared here on the repository.

Exception in thread Thread-2:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/Users/omarsayed/PycharmProjects/GiveMe5W1HTest/venv/lib/python3.8/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
extractor.process(document)
File "/Users/omarsayed/PycharmProjects/GiveMe5W1HTest/venv/lib/python3.8/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 40, in process
self._extract_candidates(document)
File "/Users/omarsayed/PycharmProjects/GiveMe5W1HTest/venv/lib/python3.8/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 92, in _extract_candidates
for candidate in self._evaluate_tree(tree):
File "/Users/omarsayed/PycharmProjects/GiveMe5W1HTest/venv/lib/python3.8/site-packages/Giveme5W1H/extractor/extractors/cause_extractor.py", line 131, in _evaluate_tree
if sibling.label() == 'VP' and "('NP'" in sibling.unicode_repr():
AttributeError: 'ParentedTree' object has no attribute 'unicode_repr'

Is this error related to the higher python version? If yes then how to solve it. I don't prefer to roll back to earlier versions of python. Thanks

**Versions **

OS: [MacOS 10.15.6
Python Version [3.8]
Giveme5W1H Version [the one installed using pip3 install giveme5w1h . 1.0.17 I guess]
Stanford CoreNLP Version (The one again installed using : giveme5w1h-corenlp install)

Things to improve

semantic similarity in learnweights
tree access is not speed optimized

Connection aborted

Describe the bug
CoreNLP server was Killed when I try parse some article.

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.ser.gz ... done [10.2 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [48.4 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.837 (s)
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [25.6 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Killed

Script return error:

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

To Reproduce
I ran this code https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_from_newsplease.py

CoreNLP server run in Docker ubuntu 18.04 image.
Example from http://localhost:9000 works, but when I try use as a library, it breaks.

Versions (please complete the following information):

OS: Windows 10 + Docker Ubuntu 18.04
Python Version: 3.7
Giveme5W1H Version: 1.0.16

AttributeError: 'list' object has no attribute 'values'

I just installed giveme5W1H by following the tutorial and tried to run a simple example (parse_single_from_code.py) in the terminal.

but it pops out the error as below:

No extractors passed: initializing default configuration.
No combinedScorers passed: initializing default configuration.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/chason/5w1h/lib/python3.6/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run extractor.process(document)
File "/home/chaosn/5w1h/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process self._evaluate_candidates(document)
File "/home/chaosn/5w1h/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/action_extractor.py", line 104, in _evaluate_candidates
if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'

Expected behavior
Can anyone tell me how to fix this problem?

**Versions **

OS: [e.g. Ubuntu18.04
Python Version 3.6
Giveme5W1H Version 1.0.17
Stanford CoreNLP Version 0.0.14

weight learning: MethodExtractor has no constructor, weights are fixed

the methodextractor currently only has two hardcoded weights. to have that part of the learning code, these should be moved to a constructor and the methodextractor needs to be added to the learner.

add cache in "nominatim extractor"

key value store (take from newscluster project)
key: phrase that would be sent to nominatim
value: nominatim response

ModuleNotFoundError: No module named 'extractor'

Hi,

First thanks for this very promising tool.

I followed the step by step installation guide and could get all the dependencies but when I run this snippet of code
`from extractor.document import Document
from extractor.extractor import FiveWExtractor

extractor = MasterExtractor()
doc = Document(title, lead, text, date_publish)
doc = extractor.parse(doc)`

It gives me the ModuleNotFoundError: No module named 'extractor'

I have python 3.6.2 and installed in Docker container with Ubuntu 16.04. I ran
$ pip3 install giveme5w1h
$ giveme5w1h-corenlp install
and install went OK.

Thanks for your help!

Error message when running parse_single_from_code.py

Hi,

I tried to run the example code parse_single_from_code.py and received this error message:

$ python parse_single_from_code.py
No extractors passed: initializing default configuration.
No combinedScorers passed: initializing default configuration.
/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/examples/caches/Nominatim.prickle CACHED: Mazar-i-Sharif: مزار شریف, بلخ, افغانستان
Exception in thread Thread-2:
Traceback (most recent call last):
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
extractor.process(document)
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 40, in process
self._extract_candidates(document)
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/extractor/extractors/environment_extractor.py", line 147, in _extract_candidates
self._cache_nominatim.cache(location_string, location)
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/extractor/tools/key_value_cache.py", line 58, in cache
self.persist()
File "/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/extractor/tools/key_value_cache.py", line 43, in persist
with open(self._cache_path, 'wb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/anaconda2/envs/Giveme5W1H/lib/python3.6/site-packages/Giveme5W1H/examples/caches/Nominatim.prickle'

versions:

OS: MacOS 10.14.1
Python 3.6.8
Giveme5W1H-1.0.13
Stanford CoreNLP Version : stanford-corenlp-full-2017-06-09

Thanks.

crash in combined scoring

running with everything on defaults (as checked out from repo)

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/felix/Library/Application Support/IntelliJIdea2017.3/python/helpers/pydev/pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 50787 --file /Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py
pydev debugger: process 1278 is connecting

Connected to pydev debugger (build 173.3942.27)
No extractors passed: initializing default configuration.
KeyValueCache: /Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle restored
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle entries: 654 size: 18.1 KB
No combinedScorers passed: initializing default configuration.
processing documents from file system

Handler: 	Title:	Barack Obama describes first meeting with Donald Trump at White House as 'excellent'
         	Id:   	36ed35f8c78eda121afe93cbe867946d07b7e59278d2a125a23ec1bd
          	already preprocessed
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: US: [Location(United States of America, (39.7837304, -100.4458825, 0.0)), '1512382161.288082']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: White House: [Location(White House, 1600, Pennsylvania Avenue Northwest, Golden Triangle, Washington, District of Columbia, 20500, United States of America, (38.8976998, -77.0365534886228, 0.0)), '1512398588.765672']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: US: [Location(United States of America, (39.7837304, -100.4458825, 0.0)), '1512382161.288082']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: America: [Location(America, Pereira, Risaralda, Colombia, (4.81804785, -75.6888617199333, 0.0)), '1512382173.760488']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: Trump Tower: [Location(Trump Tower, 721/725, 5th Avenue, Diamond District, Manhattan Community Board 5, New York County, NYC, New York, 10022, United States of America, (40.7623148, -73.9739028212589, 0.0)), '1512398597.173803']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: US: [Location(United States of America, (39.7837304, -100.4458825, 0.0)), '1512382161.288082']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: US: [Location(United States of America, (39.7837304, -100.4458825, 0.0)), '1512382161.288082']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: White House: [Location(White House, 1600, Pennsylvania Avenue Northwest, Golden Triangle, Washington, District of Columbia, 20500, United States of America, (38.8976998, -77.0365534886228, 0.0)), '1512398588.765672']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: US: [Location(United States of America, (39.7837304, -100.4458825, 0.0)), '1512382161.288082']
/Users/felix/IdeaProjects/Giveme5W/examples/caches/Nominatim.prickle LOADED: White House: [Location(White House, 1600, Pennsylvania Avenue Northwest, Golden Triangle, Washington, District of Columbia, 20500, United States of America, (38.8976998, -77.0365534886228, 0.0)), '1512398588.765672']
Traceback (most recent call last):
  File "/Users/felix/Library/Application Support/IntelliJIdea2017.3/python/helpers/pydev/pydevd.py", line 1683, in <module>
    main()
  File "/Users/felix/Library/Application Support/IntelliJIdea2017.3/python/helpers/pydev/pydevd.py", line 1677, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Users/felix/Library/Application Support/IntelliJIdea2017.3/python/helpers/pydev/pydevd.py", line 1087, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/felix/Library/Application Support/IntelliJIdea2017.3/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py", line 60, in <module>
    .set_preprocessed_path(preprocessedPath)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/handler.py", line 173, in process
    self._process_document(document)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/handler.py", line 142, in _process_document
    self._extractor.parse(document)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/extractor.py", line 114, in parse
    combinedScorer.score(doc)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/combined_scoring/distance_of_candidate.py", line 97, in score
    dist_factor += distance_matrix[question][i] * self._weight[iq]
IndexError: list index out of range

move nominatim to preprocessing

we initially had the nominatim queries directly in the environment extractor, to spare some requests. however, as shown in https://github.com/bkrrr/Giveme5W/blob/master/extractor/extractors/environment_extractor.py we query each phrase that is a LOCATION so that we can for the sake of clean architecture perform the nominatim querying also in preprocessing.

Giveme5W1H issues 25# - The error seems to be coming from Stanford CoreNLP

I am getting the same error that Tommo565 was getting a couple of years ago.

I have executed both my own code and sample code. Both produce the following error:

Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python38\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "C:\Python38\lib\site-packages\Giveme5W1H\extractor\extractor.py", line 20, in run
extractor.process(document)
File "C:\Python38\lib\site-packages\Giveme5W1H\extractor\extractors\abs_extractor.py", line 41, in process
self._evaluate_candidates(document)
File "C:\Python38\lib\site-packages\Giveme5W1H\extractor\extractors\action_extractor.py", line 104, in _evaluate_candidates
if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'
Could not handle incoming annotation

The error seems to be coming from Stanford CoreNLP, not our tool. Could you provide a full, minimal code example, with which I can reproduce the error, including also an article text, please?

Originally posted by @fhamborg in #25 (comment)

Fails to install after 'giveme5w1h-corenlp install'

I was trying the installation process as described in the readme. However after the second line, I run into an error and am unable to figure out why. Please help.

Desktop % giveme5w1h-corenlp install
mkdir: runtime-resources: File exists
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/giveme5w1h-corenlp", line 11, in
load_entry_point('giveme5w1h==1.0.17', 'console_scripts', 'giveme5w1h-corenlp')()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Giveme5W1H/examples/startup/environment.py", line 9, in start
RuntimeResourcesInstaller.check_and_install()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/Giveme5W1H/examples/startup/util.py", line 54, in check_and_install
check_output(cmd, shell=True, cwd=path_giveme5w_installation)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'mkdir runtime-resources && cd runtime-resources && wget http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip && unzip stanford-corenlp-full-2017-06-09.zip && rm stanford-corenlp-full-2017-06-09.zip && wget http://nlp.stanford.edu/software/stanford-english-corenlp-2017-06-09-models.jar && mv stanford-english-corenlp-2017-06-09-models.jar stanford-corenlp-full-2017-06-09/ && cd ..' returned non-zero exit status 1.

Versions (please complete the following information):

OS: [e.g. MacOS 10.15]
Python Version [e.g. 3.8]
Giveme5W1H Version [e.g. 1.2]
Stanford CoreNLP Version

About the parameter in the "environment extraction

Thanks for the answer, and I'm wondering could you show me where can I find the specific definition of the three parameters" entailment, distance_from_publisher_date, accurate" in environment initialization program? Just in order to justify them properly.

create pypi package

use SUtime

merge time occurrences for multiple tokens (look for this in previous code of giveme5w)

learn ALL parameters

unfortunately, only the weights are learned, but not other critical parameters, such as the merge range. for instance, in https://github.com/bkrrr/Giveme5W/blob/master/extractor/extractors/environment_extractor.py the phrase_range=3 which was only found by trial and error without any serious basis for argumentation. hence, all the parameters in each of the four extractors & evaluators need to be moved into the constructor and added to the learner. (where is the learner btw?). please let me know once this is done, and i'll revise.

Serialization does not work, maybe Timex class (?)

Something of the document cannot be serialized. It might be the Timex JSON representation, not quite sure though, as it looks fine with me. Tried to debug it, but didn't get through. I suppose you are more familiar with the workflow, so could you please have a look at it? FYI, I've changed the get_json() method in Timex.py but that should be fine: it's a regular array with a dict inside (also tried removing the array, but this still does not work) in commit 2cd58d7. In case it was me, sorry, promise it will be the last time, I'm through with my changes :-)

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py
No extractors passed: initializing default configuration.
No combinedScorers passed: initializing default configuration.
processing documents from file system

Handler: 	Title:	Equifax Seized 138 Scammy Lookalike Domains Instead of Just Changing Its Dumb 'Security' Site
         	Id:   	8c9b29cd27637c7db20792acbcb554139e9f91c9ae7f8c0ae0671c5c
Timex(2017-10-01 00:00:00, 2017-10-31 23:59:59)
Timex(2017-09-01 00:00:00, 2017-09-30 23:59:59)
Timex(2017-11-08 00:00:00, 2017-11-08 23:59:59)
Timex(2017-10-01 00:00:00, 2017-10-31 23:59:59)
Timex(2017-09-01 00:00:00, 2017-09-30 23:59:59)
Timex(2017-09-27 00:00:00, 2017-09-27 23:59:59)
         	processed
         	saved to output
Timex(2017-09-01 00:00:00, 2017-09-30 23:59:59)
Timex(2017-11-14 00:00:00, 2017-11-14 23:59:59)
Timex(2017-09-01 00:00:00, 2017-09-30 23:59:59)
Timex(2016-11-15 16:00:00, 2016-11-15 16:00:59)
Traceback (most recent call last):
  File "/Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py", line 57, in <module>
    .set_output_path(outputPath)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/handler.py", line 141, in process
    self._process_document(document)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/handler.py", line 115, in _process_document
    self._writer.write(document)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/writer.py", line 94, in write
    self._write_json(self.generate_json(document))
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/writer.py", line 21, in _write_json
    outfile.write(json.dumps(output_object, sort_keys=False, indent=2))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 430, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 437, in _iterencode
    o = _default(o)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'builtin_function_or_method' is not JSON serializable

Process finished with exit code 1

Confusing results for small text

I was trying 5W1H for smaller texts (2-3 lines), for example this sentence : "taxes on bill not revised after new financial year. taxes applicable for last month to be recalculated".

The results sometimes

don't show why/whats/hows/whens/wheres (whs) that could have been drawn easily by a human.
eg. what could have been - taxes, financial year,bill
when - year, last month
whs are empty when there could have been atleast something
eg. why - can be not revised
whs hard to interpret.
eg. see hows below (to be recalculated fits but it is accompanied by noisy text)

I am attaching a test case I tried. I understand this is not a bug/technical glitch and may have something to with the logic of the code. Even so, I'd like to know why I am getting such results. Is there something on my part that I could improve/tune? Does the text show a significant improvement on increase in the length of the text? Is the code designed specifically for a certain size or type of text?

{
"dId": "bdeb60d977fa2f95e871198ca96204d7928da3b35b40288011b288f9",
"title": "taxes on bill not revised after new financial year. taxes applicable for last month to be recalculated.",
"text": "",
"description": "",
"category": "billing",
"filename": "5.txt",
"fiveWoneH": {
"who": {
"extracted": [],
"label": "who"
},
"what": {
"extracted": [],
"label": "what"
},
"where": {
"extracted": [],
"label": "where"
},
"when": {
"extracted": [],
"label": "when"
},
"why": {
"extracted": [],
"label": "why"
},
"how": {
.................
"text": "after new financial year ."
"text": "new financial year ."
"text": "taxes applicable for last month to be recalculated .",
"text": "new financial",
...................
}

How to change the parameter in the "enviroment extraction"

As mentioned in the question, I would like to modify the "when" and "where" scoring weight according to your paper, but the weight setting format in initialization doesn't look the same as you mentioned in the paper. Please give me some clue how to change W0 W1 W2 W3 in S(when) calculation and W0, W1 in S(where) calculation, thanks a lot.

add docstring to _find_vb_cc_vb_parts in method_extractor.py

.. explaining what the function does

AttributeError: 'list' object has no attribute 'values' (LIKELY: CoreNLP issue)

I am running the giveme5w1h-rest command. I had followed what #30 @abhimanyuNitSri had mentioned, I had moved the cache into the local package folder and I had replaced the '_' inside the cache folder. But I am still unable to get the library to be working.

Output at giveme5w1h-rest console

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff3af333c18>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff3af333c18>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/pycorenlp/corenlp.py", line 19, in annotate
    requests.get(self.server_url)
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff3af333c18>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/examples/extracting/server.py", line 101, in extract
    extractor.parse(document)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/extractor.py", line 104, in parse
    self.preprocess(doc)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/extractor.py", line 87, in preprocess
    self.preprocessor.preprocess(doc)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/preprocessors/preprocessor_core_nlp.py", line 112, in preprocess
    annotation = self.cnlp.annotate(document.get_full_text(), actual_config)
  File "/usr/local/lib/python3.6/dist-packages/pycorenlp/corenlp.py", line 21, in annotate
    raise Exception('Check whether you have started the CoreNLP server e.g.\n'
Exception: Check whether you have started the CoreNLP server e.g.
$ cd stanford-corenlp-full-2015-12-09/ 
$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
192.168.121.149 - - [05/Mar/2019 17:35:48] "POST /extract HTTP/1.1" 500 -
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/extractor.py", line 20, in run
    extractor.process(document)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/extractors/abs_extractor.py", line 41, in process
    self._evaluate_candidates(document)
  File "/usr/local/lib/python3.6/dist-packages/Giveme5W1H/extractor/extractors/action_extractor.py", line 104, in _evaluate_candidates
    if any(doc_coref.values()):
AttributeError: 'list' object has no attribute 'values'

Output at giveme5w1h-corenlp console

[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.ser.gz ... done [4.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [0.9 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:40)
	at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(TimeExpressionExtractorFactory.java:57)
	at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(TimeExpressionExtractorFactory.java:38)
	at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.<init>(NumberSequenceClassifier.java:86)
	at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:136)
	at edu.stanford.nlp.pipeline.NERCombinerAnnotator.<init>(NERCombinerAnnotator.java:91)
	at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:70)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$44(StanfordCoreNLP.java:498)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
	at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
	at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
	at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
	at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:319)
	at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)
	at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:642)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:691)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:663)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {}]
	at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
	at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382)
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:38)
	... 27 more
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
	... 29 more
Caused by: java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException
	at de.jollyday.util.CalendarUtil.<init>(CalendarUtil.java:42)
	at de.jollyday.HolidayManager.<init>(HolidayManager.java:66)
	at de.jollyday.impl.DefaultHolidayManager.<init>(DefaultHolidayManager.java:46)
	at edu.stanford.nlp.time.JollyDayHolidays$MyXMLManager.<init>(JollyDayHolidays.java:148)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
	at java.base/java.lang.Class.newInstance(Class.java:560)
	at de.jollyday.caching.HolidayManagerValueHandler.instantiateManagerImpl(HolidayManagerValueHandler.java:60)
	at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:41)
	at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:13)
	at de.jollyday.util.Cache.get(Cache.java:51)
	at de.jollyday.HolidayManager.createManager(HolidayManager.java:168)
	at de.jollyday.HolidayManager.getInstance(HolidayManager.java:148)
	at edu.stanford.nlp.time.JollyDayHolidays.init(JollyDayHolidays.java:57)
	at edu.stanford.nlp.time.Options.<init>(Options.java:90)
	at edu.stanford.nlp.time.TimeExpressionExtractorImpl.init(TimeExpressionExtractorImpl.java:44)
	at edu.stanford.nlp.time.TimeExpressionExtractorImpl.<init>(TimeExpressionExtractorImpl.java:39)
	... 34 more
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBException
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
	... 53 more

OS: Ubuntu 18.04
Python Version 3.6.7
Giveme5W1H Version 1.0.13
Stanford CoreNLP Version 2017-06-09

KeyError: 'text' when running parse_documents

I get the following when runing parse_documents.py (after starting up the environment using /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/felix/IdeaProjects/Giveme5W/examples/startup/environment.py). Also did a cleanup of the cache directory.

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py
No extractors passed, initializing default configuration.
No combinedScorers: initializing default configuration.
processing documents from file system

Traceback (most recent call last):
  File "/Users/felix/IdeaProjects/Giveme5W/examples/extracting/parse_documents.py", line 44, in <module>
    .set_output_path(outputPath)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/handler.py", line 125, in process
    document = self._reader.read(filepath)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/reader.py", line 69, in read
    document = self.parse_newsplease(data, path)
  File "/Users/felix/IdeaProjects/Giveme5W/extractor/tools/file/reader.py", line 61, in parse_newsplease
    tmp_anno.append([None, None, annotation['text']])
KeyError: 'text'

Process finished with exit code 1

something wrong with“giveme5w1h-corenlp”

giveme5w1h-corenlp：
/bin/sh: 1: java: not found
CoreNLPclosed. Return code = 127

Versions (please complete the following information):

OS: [e.g.Linux]
Python Version [e.g. 3.7]
Giveme5W1H Version [e.g. 1.0.18]
Stanford CoreNLP Version stanford-corenlp-full-2017-06-09

fhamborg / giveme5w1h Goto Github PK

giveme5w1h's People

Contributors

Stargazers

Watchers

Forkers

giveme5w1h's Issues

Recommend Projects

Recommend Topics

Recommend Org