openeventdata / petrarch2 Goto Github PK
View Code? Open in Web Editor NEWAnother next-generation event coding platform.
License: MIT License
Another next-generation event coding platform.
License: MIT License
The README does not have a contribution section, which makes it hard for me to know if I can open PRs, open issues, or what pathways there are for contribution in general. While there is a Contribute section in the docs, this only covers coding guidelines and a task-like process, not whether or not this is encouraged. It would be good to make this explicit here, in the README.
This proposed change is blocking openjournals/joss-reviews#133.
The install instructions currently call for:
pip install git+https://github.com/openeventdata/petrarch2.git
I was able to get this working with:
pip install git+https://github.com/openeventdata/petrarch2.git --ignore-installed
As I am running El Capitan, which has some issues. Unfortunately, the following steps didn't work:
petrach -h
Because this installs the executable:
petrarch2
The instructions should be modified to add the digit 2
to the program name.
This was brought up in openjournals/joss-reviews#133.
The functions that @philip-schrodt wrote to extract verb and actor phrases from coded sentences need documentation and unit tests.
@philip-schrodt is working on adding support in Petrarch2 for the entity coreference information that CoreNLP outputs. This should increase yield for later sentences in a story. Feel free to add details below, Phil!
Petrarch2's code should be distinct from the dictionaries it uses. To make changes to the dictionaries more visible and to make it easier to switch in custom dictionaries, take the built-in dictionaries out of Petrarch2 and have it look for them at a location defined in the config file. Move the built-in dictionaries to the Dictionaries repo.
This should help clear up questions like #19.
In the read_verb_dictionary function of the PETRreader.py file, when reading a pattern line(from line 860 to line 956), i had found a difference between "#pre-verb prepositional phrase" and "#post-verb prepositional phrase".
When dealing with "#pre-verb prepositional phrase", we add a pipe '|' before the for
loop on line 896. But when dealing with "#post-verb prepositional phrase", we add the pipe '|' in the for loop on line 939.
I think the latter way is correct.
Is this a little bug or for some other reason?
The NullVerbs and NullActor options that @philip-schrodt added here are turned on and off in PETRglobals
. Add an option to the config file and the config parser (similar to here) to enable the config file to change these options.
This sentence codes:
'(S (NP (NNP GERMANY ) ) (VP (VBD INVADED ) (NP (NNP FRANCE ) ) ) ) '
This sentence doesn't:
'(S (NP (NNP GERMANY ) ) (VP (VBD INVADED ) (NP (NNP FRANCE ) ) ) ) '
The needed input looks like:
(ROOT (S (NP (NNP GERMANY)) (VP (VBD INVADED) (NP (NNP FRANCE)))))
The difference is in the space between the parentheses. This has apparently changed at some point since a lot of helper software I have has broken so the format of the parse tree needs to be 1) documented and 2) frozen.
Used via hypnos
see the following error:
petrarch_1 | File "/usr/local/lib/python2.7/dist-packages/petrarch2/petrarch2.py", line 250, in do_coding
petrarch_1 | print(sentence.txt)
petrarch_1 | UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 67: ordinal not in range(128)
Probably want to yank or surpress that print()
. Alternatively coerce everything into utf-8
.
In the check_date
function, there are lines of code here, here,here, and here that compare integers with string using a value comparison. The date
attribute of the NounPhrase
class and its Phrase
superclass are strings. These lines don't throw errors in Python2 but they do in Python3.
Also, I'm not sure it works as intended. When going through the test_noun_meaning1
test, we see the following conflict.
>>> '081315' >= int('150578')
True
>>> int('081315') >= int('150578')
False
Here is the output of Python3
>>> '081315' >= int('150578')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>=' not supported between instances of 'str' and 'int'
>>> int('081315') >= int('150578')
False
What should the expected behavior be here?
It looks like we should be comparing the dstr_to_ordate(curdate)
to the integer date. Here the dstr_to_ordate.
Don't really want to write every event/sentence to stdout
all the time. Would be nice to have an arg to control that.
When using the petratch output, it would be helpful to make the output python friendly and json. Currently, the petrarch output is a Python specific. Can we make the output abide by json rules? This way the data can still be read in python (via json package) and other language can easily read and use the output without custom converters (i.e. mongo, Redis).
Below is a snippet out petrarch2 output that shows. The main issues is that the actorroot
, actortext
,eventtext
contains dictionaries, those dictionaries have a Tuple as its key.
python3
>>> s = """{u'nytasiapacific20160622.0002': {'sents': {1: {'geo-location': [{u'placename': u'Beirut', u'countrycode': u'LBN', u'lon': 35.49442, u'admin1': u'Beyrouth', u'lat': 33.88894, u'searchterm': u'Beirut'}], u'events': [(u'TUNJUD', u'NGAEDU', u'173')], 'content': u'A Tunisian court has jailed a Nigerian student for two years for helping young militants join an armed Islamic group in Beirut, his lawyer said Wednesday.', u'meta': {u'actorroot': {(u'TUNJUD', u'NGAEDU', u'173'): [u'', u'']}, (u'TUNJUD', u'NGAEDU', u'173'): [[u'JAILED'], [u'HAS']], u'eventtext': {(u'TUNJUD', u'NGAEDU', u'173'): u'has jailed'}, u'nouns': [([u' TUNISIAN', u' COURT'], [u'TUNJUD'], [(u'TUN', []), [u'~']]), ([u' NIGERIAN', u' STUDENT'], [u'NGAEDU'], [(u'NGA', []), [u'~']]), ([u' MILITANTS', u' ARMED ISLAMIC GROUP', u' BEIRUT'], [u'DZAREBUAF', u'LBNUAF'], [[u'~'], (u'DZAREB', []), (u'LBN', [])]), ([u' LAWYER'], [u'~JUD'], [[u'~']])], u'actortext': {(u'TUNJUD', u'NGAEDU', u'173'): [u'Tunisian court', u'Nigerian student']}}, 'parsed': u'(S (S (NP (DT A ) (NNP TUNISIAN ) (NN COURT ) ) (VP (VBZ HAS ) (VP (VBN JAILED ) (NP (DT A ) (NNP NIGERIAN ) (NN STUDENT ) ) (PP (IN FOR ) (NP (NP (CD TWO ) (NNS YEARS ) ) (PP (IN FOR ) (S (VP (VBG HELPING ) (S (NP (JJ YOUNG ) (NNS MILITANTS ) ) (VP (VB JOIN ) (NP (DT AN ) (JJ ARMED ) (JJ ISLAMIC ) (NN GROUP ) ) (PP (IN IN ) (NP (NNP BEIRUT ) ) ) ) ) ) ) ) ) ) ) ) ) (, , ) (NP (PRP$ HIS ) (NN LAWYER ) ) (VP (VBD SAID ) (NP (NNP WEDNESDAY ) ) ) (. . ) ) ', u'issues': [[u'STUDENTS', 1], [u'NAMED_TERROR_GROUP', 1]]}}, 'meta': {'date': '20160621', 'headline': u'Lightning Ridge Journal: An Amateur Undertaking in Australian Mining Town With No Funeral Home', u'verbs': {u'actorroot': {(u'TUNJUD', u'NGAEDU', u'173'): [u'', u'']}, (u'TUNJUD', u'NGAEDU', u'173'): [[u'JAILED'], [u'HAS']], u'eventtext': {(u'TUNJUD', u'NGAEDU', u'173'): u'has jailed'}, u'nouns': [([u' TUNISIAN', u' COURT'], [u'TUNJUD'], [(u'TUN', []), [u'~']]), ([u' NIGERIAN', u' STUDENT'], [u'NGAEDU'], [(u'NGA', []), [u'~']]), ([u' MILITANTS', u' ARMED ISLAMIC GROUP', u' BEIRUT'], [u'DZAREBUAF', u'LBNUAF'], [[u'~'], (u'DZAREB', []), (u'LBN', [])]), ([u' LAWYER'], [u'~JUD'], [[u'~']])], u'actortext': {(u'TUNJUD', u'NGAEDU', u'173'): [u'Tunisian court', u'Nigerian student']}}}}}"""
>>> import pprint
>>> pprint.pprint(z)
{'nytasiapacific20160622.0002':
{'meta':
{'date': '20160621',
'headline': 'Lightning Ridge Journal: An Amateur Undertaking in Australian Mining Town With No Funeral Home',
'verbs': {'actorroot': {('TUNJUD', 'NGAEDU', '173'): ['', '']},
'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
'eventtext': {('TUNJUD', 'NGAEDU', '173'): 'has jailed'},
'nouns': [([' TUNISIAN', ' COURT'], ['TUNJUD'], [('TUN', []), ['~']]),
([' NIGERIAN', ' STUDENT'], ['NGAEDU'], [('NGA', []), ['~']]),
([' MILITANTS', ' ARMED ISLAMIC GROUP', ' BEIRUT'], ['DZAREBUAF', 'LBNUAF'],
[['~'], ('DZAREB', []), ('LBN', [])]),
([' LAWYER'], ['~JUD'], [['~']])],
('TUNJUD', 'NGAEDU', '173'): [['JAILED'], ['HAS']]}},
'sents': {1: {'content': 'A Tunisian court has jailed a Nigerian student for two years for helping young militants join an armed '
'Islamic group in Beirut, his lawyer said Wednesday.',
'events': [('TUNJUD', 'NGAEDU', '173')],
'geo-location': [{'admin1': 'Beyrouth',
'countrycode': 'LBN',
'lat': 33.88894,
'lon': 35.49442,
'placename': 'Beirut',
'searchterm': 'Beirut'}],
'issues': [['STUDENTS', 1], ['NAMED_TERROR_GROUP', 1]],
'meta': {'actorroot': {('TUNJUD', 'NGAEDU', '173'): ['', '']},
'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
'eventtext': {('TUNJUD', 'NGAEDU', '173'): 'has jailed'},
'nouns': [([' TUNISIAN', ' COURT'], ['TUNJUD'], [('TUN', []), ['~']]),
([' NIGERIAN', ' STUDENT'], ['NGAEDU'], [('NGA', []), ['~']]),
([' MILITANTS', ' ARMED ISLAMIC GROUP', ' BEIRUT'], ['DZAREBUAF', 'LBNUAF'],
[['~'], ('DZAREB', []), ('LBN', [])]),
([' LAWYER'], ['~JUD'], [['~']])],
('TUNJUD', 'NGAEDU', '173'): [['JAILED'], ['HAS']]},
'parsed': '(S (S (NP (DT A ) (NNP TUNISIAN ) (NN COURT ) ) (VP (VBZ HAS ) (VP (VBN JAILED ) (NP (DT A ) (NNP '
'NIGERIAN ) (NN STUDENT ) ) (PP (IN FOR ) (NP (NP (CD TWO ) (NNS YEARS ) ) (PP (IN FOR ) (S (VP '
'(VBG HELPING ) (S (NP (JJ YOUNG ) (NNS MILITANTS ) ) (VP (VB JOIN ) (NP (DT AN ) (JJ ARMED ) (JJ '
'ISLAMIC ) (NN GROUP ) ) (PP (IN IN ) (NP (NNP BEIRUT ) ) ) ) ) ) ) ) ) ) ) ) ) (, , ) '
'(NP (PRP$ HIS ) (NN LAWYER ) ) (VP (VBD SAID ) (NP (NNP WEDNESDAY ) ) ) (. . ) ) '}}}}
Three alternatives are:
Quotify the key:
'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
to
'actortext': {"['TUNJUD', 'NGAEDU', '173']": ["Tunisian court", 'Nigerian student"]},
Use arrays instead of tuples/dictionaries
'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
to
'actortext': [["TUNJUD", "NGAEDU", "173"], ["Tunisian court", "Nigerian student"]},
Use more descriptive dictionaries (code, text key pairs)
'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
to
'actortext': {"code" : ["TUNJUD", "NGAEDU", "173"], "text": ["Tunisian court", "Nigerian student"]},
This output/structure is decided during the do_coding
phase of petrarch2. It seems like this change may break a lot of existing code.
As part of our dictionary development work, we need a function that will take in a CoreNLP parsed sentence and return a list/dict/tuple of the source actor phrase, the verb phrase, and the target actor phrase. Clayton improved the functionality for this (5d724b1), but we still need an easy function to return this.
Example pseudocode:
parse = "(ROOT (S (NP (JJ German) (NNS troops)) (VP (VBD drove) (NP (NNS tanks)) (PP (IN into) (NP (NNP France)))) (. .)))"
def get_phrases(parse):
parsed = utilities._format_parsed_str(parse)
# ...
return (source_actor_phrase, verb_phrase, target_actor_phrase)
get_phrases(parse)
("German troops", "drove", "France")
(or whatever phrases Petrarch would actually find)
I think both @PTB-OEDA's team at UT Dallas and @philip-schrodt may be working on this.
I am trying to use PETRARCH and the output file is empty. I tried to use a single sentence from the file GigaWord.sample.PETR. in particular
""" Israel on Wednesday released the mayor of the northern West Bank city of
Nablus, a member of the Islamist Hamas movement jailed over a year ago,
Palestinian security officials said.""
How is it possible that the library doesn't found anything. I have of course tried with all the sentence in the file as well.
Do you have an idea of why is this happening?
There are actually two issues here:
Hi,
I was trying to use PETRARCH2 with newer version of CoreNLP specially Stanza. It also includes new pipeline which is robust and based on Neural Networks.
The parse tree is somewhat different from the one available from CoreNLP Docker available within this repo. An example is given below. Depending on small differences, I cannot get events from the new parsed tree format.
Original Text
Winterfell has asked the Lannister families to clarify those issues but Beijing has not yet straightened them out, he said, adding that Japan would continue to talk to China about this.
** Parsed Output from included CoreNLP Docker**
(S (S (NP (NNP WINTERFELL ) ) (VP (VP (VBZ HAS ) (VP (VBN ASKED ) (NP (DT THE ) (NNP LANNISTER ) (NNS FAMILIES ) ) (S (VP (TO TO ) (VP (VB CLARIFY ) (NP (DT THOSE ) (NNS ISSUES ) ) ) ) ) ) ) (CC BUT ) (S (NP (NNP BEIJING ) ) (VP (VBZ HAS ) (RB NOT ) (ADVP (RB YET ) ) (VP (VBD STRAIGHTENED ) (NP (PRP THEM ) ) (PRT (RP OUT ) ) ) ) ) ) ) (PRN (, , ) (S (NP (PRP HE ) ) (VP (VBD SAID ) ) ) (, , ) ) (S (VP (VBG ADDING ) (SBAR (IN THAT ) (S (NP (NNP JAPAN ) ) (VP (MD WOULD ) (VP (VB CONTINUE ) (S (VP (TO TO ) (VP (VB TALK ) (PP (TO TO ) (NP (NNP CHINA ) ) ) (PP (IN ABOUT ) (NP (DT THIS ) ) ) ) ) ) ) ) ) ) ) ) (. . ) )
** Paresed Output from Stanza CoreNLP CLient**
(S (S (S (NP (NNP WINTERFELL ) ) (VP (VBZ HAS ) (VP (VBN ASKED ) (NP (DT THE ) (NNP LANNISTER ) (NNS FAMILIES ) ) (S (VP (TO TO ) (VP (VB CLARIFY ) (NP (DT THOSE ) (NNS ISSUES ) ) ) ) ) ) ) ) (CC BUT ) (S (NP (NNP BEIJING ) ) (VP (VBZ HAS ) (RB NOT ) (ADVP (RB YET ) ) (VP (VBN STRAIGHTENED ) (NP (PRP THEM ) ) (PRT (RP OUT ) ) ) ) ) ) (, , ) (NP (PRP HE ) ) (VP (VBD SAID ) (, , ) (S (VP (VBG ADDING ) (SBAR (IN THAT ) (S (NP (NNP JAPAN ) ) (VP (MD WOULD ) (VP (VB CONTINUE ) (S (VP (TO TO ) (VP (VB TALK ) (PP (IN TO ) (NP (NNP CHINA ) ) ) (PP (IN ABOUT ) (NP (DT THIS ) ) ) ) ) ) ) ) ) ) ) ) ) (. . ) )
Could you please highlight how can I use new Treebank format with PETRARCH2.
Although we'll be switching to Universal Dependencies in the medium term (#7), we have an immediate need for the ability to extract actor and verb phrases from Arabic so that we can begin human coding (and machine-assisted) dictionary development. The phrases extracted should be suitable for adding to the dictionaries and dictionary entries alongside their human-assigned code, meaning that they should mirror the way that Petrarch2 will see the phrases.
This code will be somewhat parallel to the later UD approach, but hopefully writing it will help with the future integration of Arabic text into Petrarch2.
python2.7 petrarch2.py parse -i ./data/text/GigaWord.sample.PETR.xml -o my_output
leads to the following problem:
Traceback (most recent call last):
...
File ".../petrarch2/petrarch2.py", line 530, in <module>
main()
File ".../petrarch2/petrarch2.py", line 438, in main
if cli_args.outputs:
AttributeError: 'Namespace' object has no attribute 'outputs'
Furthermore, "-o" argument of the batch mode has an incorrect description.
Petrarch2 currently codes only English language articles in the Stanford dependencies format. Other languages, including the Spanish and Arabic that we need to code, are parsed with different tags. We should consider switching Petrarch2's internals to using Universal Dependencies, and then convert the input to universal dependencies. This will avoid having to have separate Petr2s for each language.
CoreNLP itself has also switched to UD in the most recent versions: http://nlp.stanford.edu/software/stanford-dependencies.shtml
Information on Universal Dependencies is here: http://universaldependencies.org/
Right now the code will not create the plural for multi-word synset nouns, such as "WRITTEN_AGREEMENT" and "DIFFERENCE_OF_OPINION_". For the second example, i know that we do not want to create plural because of the "_" at the end of the string. But should we create plural for the first example?
There are still an assortment of these -- very low probability contingencies, probably -- but they should go to the log, since generally no one is watching the screen except when debugging.
When internal cooperation in compounds is being expanded in Sentence.get_events() [PETRtree.py], the new events aren't being added to the 'meta' storage of information, so consequently the routines for picking up the actor, event and actor-root texts don't have this information and instead just return '---' for all of the fields. Or rather do this because I've trapped this in a couple of places; otherwise the program crashes on a key-error due to the incompatibility of the primary event list and the information available in 'meta': I've inserted comments at the various points where this is relevant.
More generally, now that the actor/eventtext and actorroot options have been added, the 'meta' storage needs to be consolidated and refactored -- again, I've made a couple of notes on this.
The input file below will generate this issue:
The United States , United Kingdom and European Union have come down heavily on the violence and shrinking democratic space in Bangladesh and urged all parties to engage in dialogue . (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN come) (PRT (RP down)) (ADVP (RB heavily)) (PP (IN on) (NP (NP (DT the) (NN violence)) (CC and) (VP (VBG shrinking) (NP (JJ democratic) (NN space)) (PP (IN in) (NP (NNP Bangladesh)))))))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue)))))))) (. .))) The United States , United Kingdom and European Union have criticized Bangladesh and urged all parties to engage in dialogue (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN criticized) (NP (NNP Bangladesh)))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue) ))))))))) China , the US , South Africa , India , and Pakistan , who stockpiled their current net requirements , would now deplete their rubber in hand on releasing their rubber stocks to the market over the next few months . (ROOT (S (NP (NP (NP (NNP China)) (, ,) (NP (DT the) (NNP US)) (, ,) (NP (NNP South) (NNP Africa)) (, ,) (NP (NNP India)) (, ,) (CC and) (NP (NNP Pakistan))) (, ,) (SBAR (WHNP (WP who)) (S (VP (VBD stockpiled) (NP (PRP$ their) (JJ current) (JJ net) (NNS requirements))))) (, ,)) (VP (MD would) (ADVP (RB now)) (VP (VB deplete) (NP (PRP$ their) (NN rubber)) (PP (IN in) (NP (NN hand))) (PP (IN on) (S (VP (VBG releasing) (NP (PRP$ their) (NN rubber) (NNS stocks)) (PP (TO to) (NP (NP (DT the) (NN market)) (PP (IN over) (NP (DT the) (JJ next) (JJ few) (NNS months)))))))))) (. .)))======= Event output ==========
(actor/eventtext and actorroot == True)
20150823 CHN IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150115 IGOEUREEC BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx European Union Bangladesh ... criticized THE EUROPEAN UNION BANGLADESH
20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx The United States Bangladesh ... criticized UNITED STATES OF AMERICA BANGLADESH
20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 GBR BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx United ... Kingdom Bangladesh ... criticized UNITED KINGDOM BANGLADESH
20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
I installed patrarch2 in ubuntu 16.04 using pip3.5. but using petrarch2 -h gives me following error
ImportError: No module named 'PETRglobals'
Also I am not able to install PETRglobals using pip install PETRglobals
However, It works perfectly fine with pip2.7
How can I run a xml input file without parse block?
I tried to delete the content in of GigaWord.sample.PETR.xml, and run it. But failed.
From ReadMe, it says the block is optional.
Why I cannot run it without parse block?
What is the right way to use petrarch2 to run plain text?
Thanks!!!!!
Hi guys, is it possible to include a customized event type dictionary into Petrarch(2)? I understand that Petrarch(2) is CAMEO based and I have some organic definitions derived from other sources. Any suggestions? Thanks!
Trying to run some sample data to explore the phrase extraction pieces. I'm using the following data:
{'abc123': {'meta': {'date': '20010101'},
'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}
I then run it through the do_coding
routine:
event_dict_updated = petrarch2.do_coding(event_dict, None)
Which yields the following updated dictionary:
{'abc123': {'meta': {'date': '20010101',
u'verbs': {u'nouns': [([u' PEOPLE'], [u'~PPL'], [[u'~']]),
([u' ISLAMIST', u' BOKO HARAM'],
[u'NGAREBMUS'],
[[u'~'], (u'NGAREB', [])]),
([u' NIGERIA'], [u'NGA'], [(u'NGA', [])])]}},
'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}
There are a couple issues here:
meta
, verbs
, nouns
construct is incorrect.[[u'~'], (u'NGAREB', [])])
construct refers to in the sentence.This isn't relevant to this issue, but it should also be noted that this sentence doesn't code an event even though PETR is clearly identifying potential source and target actors and "assaulted" should be a relevant verb.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.