Giter Club home page Giter Club logo

petrarch2's People

Contributors

ahalterman avatar arfon avatar civet-software avatar cnnorris avatar gitter-badger avatar janekdb avatar johnb30 avatar marcusdeng22 avatar myi100 avatar paktor132 avatar philip-schrodt avatar richardlitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

petrarch2's Issues

Add a Contribute section to README

The README does not have a contribution section, which makes it hard for me to know if I can open PRs, open issues, or what pathways there are for contribution in general. While there is a Contribute section in the docs, this only covers coding guidelines and a task-like process, not whether or not this is encouraged. It would be good to make this explicit here, in the README.

This proposed change is blocking openjournals/joss-reviews#133.

Install instructions reference incorrect petrarch version

The install instructions currently call for:

pip install git+https://github.com/openeventdata/petrarch2.git

I was able to get this working with:

pip install git+https://github.com/openeventdata/petrarch2.git --ignore-installed

As I am running El Capitan, which has some issues. Unfortunately, the following steps didn't work:

petrach -h

Because this installs the executable:

petrarch2

The instructions should be modified to add the digit 2 to the program name.

This was brought up in openjournals/joss-reviews#133.

Add entity coreference

@philip-schrodt is working on adding support in Petrarch2 for the entity coreference information that CoreNLP outputs. This should increase yield for later sentences in a story. Feel free to add details below, Phil!

Pull dictionaries out of repo

Petrarch2's code should be distinct from the dictionaries it uses. To make changes to the dictionaries more visible and to make it easier to switch in custom dictionaries, take the built-in dictionaries out of Petrarch2 and have it look for them at a location defined in the config file. Move the built-in dictionaries to the Dictionaries repo.

This should help clear up questions like #19.

When to add a pipe โ€˜|โ€™

In the read_verb_dictionary function of the PETRreader.py file, when reading a pattern line(from line 860 to line 956), i had found a difference between "#pre-verb prepositional phrase" and "#post-verb prepositional phrase".
When dealing with "#pre-verb prepositional phrase", we add a pipe '|' before the for
loop on line 896. But when dealing with "#post-verb prepositional phrase", we add the pipe '|' in the for loop on line 939.
I think the latter way is correct.
Is this a little bug or for some other reason?

Strict documentation/freezing of parse tree input is needed

This sentence codes:

  1. '(S (NP (NNP GERMANY ) ) (VP (VBD INVADED ) (NP (NNP FRANCE ) ) ) ) '

This sentence doesn't:

  1. '(S (NP (NNP GERMANY ) ) (VP (VBD INVADED ) (NP (NNP FRANCE ) ) ) ) '

The needed input looks like:

  1. (ROOT (S (NP (NNP GERMANY)) (VP (VBD INVADED) (NP (NNP FRANCE)))))

The difference is in the space between the parentheses. This has apparently changed at some point since a lot of helper software I have has broken so the format of the parse tree needs to be 1) documented and 2) frozen.

Printing causes UnicodeEncodeError

Used via hypnos see the following error:

petrarch_1 |   File "/usr/local/lib/python2.7/dist-packages/petrarch2/petrarch2.py", line 250, in do_coding
petrarch_1 |     print(sentence.txt)
petrarch_1 | UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 67: ordinal not in range(128)

Probably want to yank or surpress that print(). Alternatively coerce everything into utf-8.

Date comparison bug

In the check_date function, there are lines of code here, here,here, and here that compare integers with string using a value comparison. The date attribute of the NounPhrase class and its Phrase superclass are strings. These lines don't throw errors in Python2 but they do in Python3.

Also, I'm not sure it works as intended. When going through the test_noun_meaning1 test, we see the following conflict.

>>> '081315' >= int('150578')
True
>>> int('081315') >= int('150578')
False

Here is the output of Python3

>>> '081315' >= int('150578')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>=' not supported between instances of 'str' and 'int'
>>> int('081315') >= int('150578')
False

What should the expected behavior be here?

It looks like we should be comparing the dstr_to_ordate(curdate) to the integer date. Here the dstr_to_ordate.

Option to control log level

Don't really want to write every event/sentence to stdout all the time. Would be nice to have an arg to control that.

Make petrarch2 output more JSON friendly

When using the petratch output, it would be helpful to make the output python friendly and json. Currently, the petrarch output is a Python specific. Can we make the output abide by json rules? This way the data can still be read in python (via json package) and other language can easily read and use the output without custom converters (i.e. mongo, Redis).

Below is a snippet out petrarch2 output that shows. The main issues is that the actorroot, actortext,eventtext contains dictionaries, those dictionaries have a Tuple as its key.

python3
>>> s = """{u'nytasiapacific20160622.0002': {'sents': {1: {'geo-location': [{u'placename': u'Beirut', u'countrycode': u'LBN', u'lon': 35.49442, u'admin1': u'Beyrouth', u'lat': 33.88894, u'searchterm': u'Beirut'}], u'events': [(u'TUNJUD', u'NGAEDU', u'173')], 'content': u'A Tunisian court has jailed a Nigerian student for two years for helping young militants join an armed Islamic group in Beirut, his lawyer said Wednesday.', u'meta': {u'actorroot': {(u'TUNJUD', u'NGAEDU', u'173'): [u'', u'']}, (u'TUNJUD', u'NGAEDU', u'173'): [[u'JAILED'], [u'HAS']], u'eventtext': {(u'TUNJUD', u'NGAEDU', u'173'): u'has jailed'}, u'nouns': [([u' TUNISIAN', u' COURT'], [u'TUNJUD'], [(u'TUN', []), [u'~']]), ([u' NIGERIAN', u' STUDENT'], [u'NGAEDU'], [(u'NGA', []), [u'~']]), ([u' MILITANTS', u' ARMED ISLAMIC GROUP', u' BEIRUT'], [u'DZAREBUAF', u'LBNUAF'], [[u'~'], (u'DZAREB', []), (u'LBN', [])]), ([u' LAWYER'], [u'~JUD'], [[u'~']])], u'actortext': {(u'TUNJUD', u'NGAEDU', u'173'): [u'Tunisian court', u'Nigerian student']}}, 'parsed': u'(S (S (NP (DT A )  (NNP TUNISIAN )  (NN COURT )  )  (VP (VBZ HAS )  (VP (VBN JAILED )  (NP (DT A )  (NNP NIGERIAN )  (NN STUDENT )  )  (PP (IN FOR )  (NP (NP (CD TWO )  (NNS YEARS )  )  (PP (IN FOR )  (S (VP (VBG HELPING )  (S (NP (JJ YOUNG )  (NNS MILITANTS )  )  (VP (VB JOIN )  (NP (DT AN )  (JJ ARMED )  (JJ ISLAMIC )  (NN GROUP )  )  (PP (IN IN )  (NP (NNP BEIRUT )  )  )  )  )  )  )  )  )  )  )  )  )  (, , )  (NP (PRP$ HIS )  (NN LAWYER )  )  (VP (VBD SAID )  (NP (NNP WEDNESDAY )  )  )  (. . )  )  ', u'issues': [[u'STUDENTS', 1], [u'NAMED_TERROR_GROUP', 1]]}}, 'meta': {'date': '20160621', 'headline': u'Lightning Ridge Journal: An Amateur Undertaking in Australian Mining Town With No Funeral Home', u'verbs': {u'actorroot': {(u'TUNJUD', u'NGAEDU', u'173'): [u'', u'']}, (u'TUNJUD', u'NGAEDU', u'173'): [[u'JAILED'], [u'HAS']], u'eventtext': {(u'TUNJUD', u'NGAEDU', u'173'): u'has jailed'}, u'nouns': [([u' TUNISIAN', u' COURT'], [u'TUNJUD'], [(u'TUN', []), [u'~']]), ([u' NIGERIAN', u' STUDENT'], [u'NGAEDU'], [(u'NGA', []), [u'~']]), ([u' MILITANTS', u' ARMED ISLAMIC GROUP', u' BEIRUT'], [u'DZAREBUAF', u'LBNUAF'], [[u'~'], (u'DZAREB', []), (u'LBN', [])]), ([u' LAWYER'], [u'~JUD'], [[u'~']])], u'actortext': {(u'TUNJUD', u'NGAEDU', u'173'): [u'Tunisian court', u'Nigerian student']}}}}}"""
>>> import pprint
>>> pprint.pprint(z)
{'nytasiapacific20160622.0002':
  {'meta': 
    {'date': '20160621',
             'headline': 'Lightning Ridge Journal: An Amateur Undertaking in Australian Mining Town With No Funeral Home',
             'verbs': {'actorroot': {('TUNJUD', 'NGAEDU', '173'): ['', '']},
                      'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
                      'eventtext': {('TUNJUD', 'NGAEDU', '173'): 'has jailed'},
                      'nouns': [([' TUNISIAN', ' COURT'], ['TUNJUD'], [('TUN', []), ['~']]),
                               ([' NIGERIAN', ' STUDENT'], ['NGAEDU'], [('NGA', []), ['~']]),
                               ([' MILITANTS', ' ARMED ISLAMIC GROUP', ' BEIRUT'], ['DZAREBUAF', 'LBNUAF'],
                               [['~'], ('DZAREB', []), ('LBN', [])]),
                               ([' LAWYER'], ['~JUD'], [['~']])],
                      ('TUNJUD', 'NGAEDU', '173'): [['JAILED'], ['HAS']]}},
   'sents': {1: {'content': 'A Tunisian court has jailed a Nigerian student for two years for helping young militants join an armed '
                          'Islamic group in Beirut, his lawyer said Wednesday.',
               'events': [('TUNJUD', 'NGAEDU', '173')],
               'geo-location': [{'admin1': 'Beyrouth',
                               'countrycode': 'LBN',
                               'lat': 33.88894,
                               'lon': 35.49442,
                               'placename': 'Beirut',
                               'searchterm': 'Beirut'}],
               'issues': [['STUDENTS', 1], ['NAMED_TERROR_GROUP', 1]],
               'meta': {'actorroot': {('TUNJUD', 'NGAEDU', '173'): ['', '']},
                       'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
                       'eventtext': {('TUNJUD', 'NGAEDU', '173'): 'has jailed'},
                       'nouns': [([' TUNISIAN', ' COURT'], ['TUNJUD'], [('TUN', []), ['~']]),
                                ([' NIGERIAN', ' STUDENT'], ['NGAEDU'], [('NGA', []), ['~']]),
                                ([' MILITANTS', ' ARMED ISLAMIC GROUP', ' BEIRUT'], ['DZAREBUAF', 'LBNUAF'],
                                [['~'], ('DZAREB', []), ('LBN', [])]),
                                ([' LAWYER'], ['~JUD'], [['~']])],
                       ('TUNJUD', 'NGAEDU', '173'): [['JAILED'], ['HAS']]},
               'parsed': '(S (S (NP (DT A )  (NNP TUNISIAN )  (NN COURT )  )  (VP (VBZ HAS )  (VP (VBN JAILED )  (NP (DT A )  (NNP '
                         'NIGERIAN )  (NN STUDENT )  )  (PP (IN FOR )  (NP (NP (CD TWO )  (NNS YEARS )  )  (PP (IN FOR )  (S (VP '
                         '(VBG HELPING )  (S (NP (JJ YOUNG )  (NNS MILITANTS )  )  (VP (VB JOIN )  (NP (DT AN )  (JJ ARMED )  (JJ '
                         'ISLAMIC )  (NN GROUP )  )  (PP (IN IN )  (NP (NNP BEIRUT )  )  )  )  )  )  )  )  )  )  )  )  )  (, , )  '
                         '(NP (PRP$ HIS )  (NN LAWYER )  )  (VP (VBD SAID )  (NP (NNP WEDNESDAY )  )  )  (. . )  )  '}}}}

Three alternatives are:

  1. Quotify the key:
    'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
    to
    'actortext': {"['TUNJUD', 'NGAEDU', '173']": ["Tunisian court", 'Nigerian student"]},

  2. Use arrays instead of tuples/dictionaries
    'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
    to
    'actortext': [["TUNJUD", "NGAEDU", "173"], ["Tunisian court", "Nigerian student"]},

  3. Use more descriptive dictionaries (code, text key pairs)
    'actortext': {('TUNJUD', 'NGAEDU', '173'): ['Tunisian court', 'Nigerian student']},
    to
    'actortext': {"code" : ["TUNJUD", "NGAEDU", "173"], "text": ["Tunisian court", "Nigerian student"]},

This output/structure is decided during the do_coding phase of petrarch2. It seems like this change may break a lot of existing code.

Function to return phrases tuple from CoreNLP input

As part of our dictionary development work, we need a function that will take in a CoreNLP parsed sentence and return a list/dict/tuple of the source actor phrase, the verb phrase, and the target actor phrase. Clayton improved the functionality for this (5d724b1), but we still need an easy function to return this.

Example pseudocode:

parse = "(ROOT (S (NP (JJ German) (NNS troops)) (VP (VBD drove) (NP (NNS tanks)) (PP (IN into) (NP (NNP France)))) (. .)))"

def get_phrases(parse):
    parsed = utilities._format_parsed_str(parse)
    # ...
    return (source_actor_phrase, verb_phrase, target_actor_phrase)

get_phrases(parse)
("German troops", "drove", "France")

(or whatever phrases Petrarch would actually find)

I think both @PTB-OEDA's team at UT Dallas and @philip-schrodt may be working on this.

Bug in generating text

I am trying to use PETRARCH and the output file is empty. I tried to use a single sentence from the file GigaWord.sample.PETR. in particular

""" Israel on Wednesday released the mayor of the northern West Bank city of
Nablus, a member of the Islamist Hamas movement jailed over a year ago,
Palestinian security officials said.""

How is it possible that the library doesn't found anything. I have of course tried with all the sentence in the file as well.
Do you have an idea of why is this happening?

Code to extract noun or verb phrases from sentences that don't have complete events

There are actually two issues here:

  1. In situations where there is a valid verb/pattern that would generate a code and there are noun phrases where the source and/or target actor are expected, return the noun phrase[s]. PETR-1 had this capability: see these validation records


Gryffindor's head Minerva McGonagall left for the Ministry of Magic on Wednesday for meetings of the joint OWL standards committee with Albus Dumbledore, Luna Lovegood's news agency reported. (ROOT (S ...


Gryffindor's head Minerva McGonagall left for Minas Tirith on Wednesday for meetings of the joint OWL standards committee with Albus Dumbledore, Luna Lovegood's news agency reported. (ROOT (S... 1. Situations where either an source or target (or more selectively, both) is present but there is no recognized verb phrase, which indicates that a dyad is doing something we aren't capturing in the dictionaries.

Adapting new Treebank format

Hi,

I was trying to use PETRARCH2 with newer version of CoreNLP specially Stanza. It also includes new pipeline which is robust and based on Neural Networks.

The parse tree is somewhat different from the one available from CoreNLP Docker available within this repo. An example is given below. Depending on small differences, I cannot get events from the new parsed tree format.

Original Text

Winterfell has asked the Lannister families to clarify those issues but Beijing has not yet straightened them out, he said, adding that Japan would continue to talk to China about this.

** Parsed Output from included CoreNLP Docker**

(S (S (NP (NNP WINTERFELL )  )  (VP (VP (VBZ HAS )  (VP (VBN ASKED )  (NP (DT THE )  (NNP LANNISTER )  (NNS FAMILIES )  )  (S (VP (TO TO )  (VP (VB CLARIFY )  (NP (DT THOSE )  (NNS ISSUES )  )  )  )  )  )  )  (CC BUT )  (S (NP (NNP BEIJING )  )  (VP (VBZ HAS )  (RB NOT )  (ADVP (RB YET )  )  (VP (VBD STRAIGHTENED )  (NP (PRP THEM )  )  (PRT (RP OUT )  )  )  )  )  )  )  (PRN (, , )  (S (NP (PRP HE )  )  (VP (VBD SAID )  )  )  (, , )  )  (S (VP (VBG ADDING )  (SBAR (IN THAT )  (S (NP (NNP JAPAN )  )  (VP (MD WOULD )  (VP (VB CONTINUE )  (S (VP (TO TO )  (VP (VB TALK )  (PP (TO TO )  (NP (NNP CHINA )  )  )  (PP (IN ABOUT )  (NP (DT THIS )  )  )  )  )  )  )  )  )  )  )  )  (. . )  )  

** Paresed Output from Stanza CoreNLP CLient**

(S (S (S (NP (NNP WINTERFELL )  )  (VP (VBZ HAS )  (VP (VBN ASKED )  (NP (DT THE )  (NNP LANNISTER )  (NNS FAMILIES )  )  (S (VP (TO TO )  (VP (VB CLARIFY )  (NP (DT THOSE )  (NNS ISSUES )  )  )  )  )  )  )  )  (CC BUT )  (S (NP (NNP BEIJING )  )  (VP (VBZ HAS )  (RB NOT )  (ADVP (RB YET )  )  (VP (VBN STRAIGHTENED )  (NP (PRP THEM )  )  (PRT (RP OUT )  )  )  )  )  )  (, , )  (NP (PRP HE )  )  (VP (VBD SAID )  (, , )  (S (VP (VBG ADDING )  (SBAR (IN THAT )  (S (NP (NNP JAPAN )  )  (VP (MD WOULD )  (VP (VB CONTINUE )  (S (VP (TO TO )  (VP (VB TALK )  (PP (IN TO )  (NP (NNP CHINA )  )  )  (PP (IN ABOUT )  (NP (DT THIS )  )  )  )  )  )  )  )  )  )  )  )  )  (. . )  )  

Could you please highlight how can I use new Treebank format with PETRARCH2.

Code to extract Arabic actor and event phrases

Although we'll be switching to Universal Dependencies in the medium term (#7), we have an immediate need for the ability to extract actor and verb phrases from Arabic so that we can begin human coding (and machine-assisted) dictionary development. The phrases extracted should be suitable for adding to the dictionaries and dictionary entries alongside their human-assigned code, meaning that they should mirror the way that Petrarch2 will see the phrases.

This code will be somewhat parallel to the later UD approach, but hopefully writing it will help with the future integration of Arabic text into Petrarch2.

Incorrect Command line Parsing Function: parse_cli_args

python2.7 petrarch2.py parse -i ./data/text/GigaWord.sample.PETR.xml -o my_output leads to the following problem:

Traceback (most recent call last):
...
  File ".../petrarch2/petrarch2.py", line 530, in <module>
    main()
  File ".../petrarch2/petrarch2.py", line 438, in main
    if cli_args.outputs:
AttributeError: 'Namespace' object has no attribute 'outputs'

Furthermore, "-o" argument of the batch mode has an incorrect description.

Switch to Universal Dependencies (UD)

Petrarch2 currently codes only English language articles in the Stanford dependencies format. Other languages, including the Spanish and Arabic that we need to code, are parsed with different tags. We should consider switching Petrarch2's internals to using Universal Dependencies, and then convert the input to universal dependencies. This will avoid having to have separate Petr2s for each language.

CoreNLP itself has also switched to UD in the most recent versions: http://nlp.stanford.edu/software/stanford-dependencies.shtml

Information on Universal Dependencies is here: http://universaldependencies.org/

make_plural_noun(noun) function when reading verb dictionary

Right now the code will not create the plural for multi-word synset nouns, such as "WRITTEN_AGREEMENT" and "DIFFERENCE_OF_OPINION_". For the second example, i know that we do not want to create plural because of the "_" at the end of the string. But should we create plural for the first example?

Adding information to 'meta' when expanding cooperating compounds

When internal cooperation in compounds is being expanded in Sentence.get_events() [PETRtree.py], the new events aren't being added to the 'meta' storage of information, so consequently the routines for picking up the actor, event and actor-root texts don't have this information and instead just return '---' for all of the fields. Or rather do this because I've trapped this in a couple of places; otherwise the program crashes on a key-error due to the incompatibility of the primary event list and the information available in 'meta': I've inserted comments at the various points where this is relevant.

More generally, now that the actor/eventtext and actorroot options have been added, the 'meta' storage needs to be consolidated and refactored -- again, I've made a couple of notes on this.

The input file below will generate this issue:

The United States , United Kingdom and European Union have come down heavily on the violence and shrinking democratic space in Bangladesh and urged all parties to engage in dialogue . (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN come) (PRT (RP down)) (ADVP (RB heavily)) (PP (IN on) (NP (NP (DT the) (NN violence)) (CC and) (VP (VBG shrinking) (NP (JJ democratic) (NN space)) (PP (IN in) (NP (NNP Bangladesh)))))))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue)))))))) (. .))) The United States , United Kingdom and European Union have criticized Bangladesh and urged all parties to engage in dialogue (ROOT (S (NP (NP (DT The) (NNP United) (NNPS States)) (, ,) (NP (NNP United) (NNP Kingdom)) (CC and) (NP (NNP European) (NNP Union))) (VP (VP (VBP have) (VP (VBN criticized) (NP (NNP Bangladesh)))) (CC and) (VP (VBD urged) (NP (DT all) (NNS parties)) (S (VP (TO to) (VP (VB engage) (PP (IN in) (NP (NN dialogue) ))))))))) China , the US , South Africa , India , and Pakistan , who stockpiled their current net requirements , would now deplete their rubber in hand on releasing their rubber stocks to the market over the next few months . (ROOT (S (NP (NP (NP (NNP China)) (, ,) (NP (DT the) (NNP US)) (, ,) (NP (NNP South) (NNP Africa)) (, ,) (NP (NNP India)) (, ,) (CC and) (NP (NNP Pakistan))) (, ,) (SBAR (WHNP (WP who)) (S (VP (VBD stockpiled) (NP (PRP$ their) (JJ current) (JJ net) (NNS requirements))))) (, ,)) (VP (MD would) (ADVP (RB now)) (VP (VB deplete) (NP (PRP$ their) (NN rubber)) (PP (IN in) (NP (NN hand))) (PP (IN on) (S (VP (VBG releasing) (NP (PRP$ their) (NN rubber) (NNS stocks)) (PP (TO to) (NP (NP (DT the) (NN market)) (PP (IN over) (NP (DT the) (JJ next) (JJ few) (NNS months)))))))))) (. .)))

======= Event output ==========
(actor/eventtext and actorroot == True)

20150823 CHN IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 CHN ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF USA 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND ZAF 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 USA IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 ZAF PAK 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 IND CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK CHN 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150823 PAK IND 030 000dc436-5eea-4062-b807-43f5e2808d10_6 en1_1-5_story+1+000dc436-5eea-4062-b807-43f5e2808d10 --- --- --- --- ---
20150115 IGOEUREEC BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx European Union Bangladesh ... criticized THE EUROPEAN UNION BANGLADESH
20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx The United States Bangladesh ... criticized UNITED STATES OF AMERICA BANGLADESH
20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx --- --- --- --- ---
20150115 GBR BGD 111 0026f8d5-744c-4199-ae99-1ca9d160d8xx_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d8xx United ... Kingdom Bangladesh ... criticized UNITED KINGDOM BANGLADESH
20150115 GBR USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 IGOEUREEC USA 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 GBR IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 USA IGOEUREEC 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 USA GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---
20150115 IGOEUREEC GBR 044 0026f8d5-744c-4199-ae99-1ca9d160d877_1 en1_1-5_story+1+0026f8d5-744c-4199-ae99-1ca9d160d877 --- --- --- --- ---

ImportError: No module named 'PETRglobals'

I installed patrarch2 in ubuntu 16.04 using pip3.5. but using petrarch2 -h gives me following error
ImportError: No module named 'PETRglobals'

Also I am not able to install PETRglobals using pip install PETRglobals

However, It works perfectly fine with pip2.7

GigaWord.sample.PETR.xml file without parse blocks

How can I run a xml input file without parse block?
I tried to delete the content in of GigaWord.sample.PETR.xml, and run it. But failed.
From ReadMe, it says the block is optional.
Why I cannot run it without parse block?

What is the right way to use petrarch2 to run plain text?

Thanks!!!!!

how do i include custom dictionary in petrarch(2)?

Hi guys, is it possible to include a customized event type dictionary into Petrarch(2)? I understand that Petrarch(2) is CAMEO based and I have some organic definitions derived from other sources. Any suggestions? Thanks!

Strange output format for phrase extraction.

Trying to run some sample data to explore the phrase extraction pieces. I'm using the following data:

{'abc123': {'meta': {'date': '20010101'},
  'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
    'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}

I then run it through the do_coding routine:

event_dict_updated = petrarch2.do_coding(event_dict, None)

Which yields the following updated dictionary:

{'abc123': {'meta': {'date': '20010101',
   u'verbs': {u'nouns': [([u' PEOPLE'], [u'~PPL'], [[u'~']]),
     ([u' ISLAMIST', u' BOKO HARAM'],
      [u'NGAREBMUS'],
      [[u'~'], (u'NGAREB', [])]),
     ([u' NIGERIA'], [u'NGA'], [(u'NGA', [])])]}},
  'sents': {0: {'content': u'At least 37 people are dead after Islamist radical group Boko Haram assaulted a town in northeastern Nigeria .',
    'parsed': u'(ROOT (S (NP (QP (IN AT ) (JJS LEAST ) (CD 37 ) ) (NNS PEOPLE ) ) (VP (VBP ARE ) (ADJP (JJ DEAD ) ) (SBAR (IN AFTER ) (S (NP (JJ ISLAMIST ) (JJ RADICAL ) (NN GROUP ) (NNP BOKO ) (NNP HARAM ) ) (VP (VBD ASSAULTED ) (NP (NP (DT A ) (NN TOWN ) ) (PP (IN IN ) (NP (JJ NORTHEASTERN ) (NNP NIGERIA ) ) ) ) ) ) ) ) (. . ) ) )'}}}}

There are a couple issues here:

  1. The nested meta, verbs, nouns construct is incorrect.
  2. It's unclear what, exactly, is associated with what. For example, it isn't clear what the [[u'~'], (u'NGAREB', [])]) construct refers to in the sentence.

This isn't relevant to this issue, but it should also be noted that this sentence doesn't code an event even though PETR is clearly identifying potential source and target actors and "assaulted" should be a relevant verb.

cc @philip-schrodt @ahalterman

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.