How can I run a xml input file without parse block? I tried to delete the content

As seen in this section of the README: <a href="https://github.com/openeventdata/petra

Thank you very much!! <a href="#

GigaWord.sample.PETR.xml file without parse blocks about petrarch2 HOT 8 CLOSED

ZxXinZhang commented on July 30, 2024

GigaWord.sample.PETR.xml file without parse blocks

from petrarch2.

Comments (8)

johnb30 commented on July 30, 2024

As seen in this section of the README: https://github.com/openeventdata/petrarch2/blob/master/README.md#installing under the heading StanfordNLP, PETRARCH2 no longer supports direct integration with CoreNLP, which means that raw text input is no longer supported.

from petrarch2.

ZxXinZhang commented on July 30, 2024

Hi John, Thank you very much! So I have to parse raw text using StanfordNLP or Hypnos first? Then add the parsed content into <Parse> block? Hope you had a good holiday! Best, Xin

…

On May 29, 2017, at 12:53 PM, John Beieler ***@***.***> wrote: As seen in this section of the README: https://github.com/openeventdata/petrarch2/blob/master/README.md#installing <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openeventdata_petrarch2_blob_master_README.md-23installing&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=u09cKGB4zrN1K8EyKfdB8nlEfL9Pbz_d0KhCb_vhwIU&s=uyyUygTs2VBEf9G5jiNkrfERkT44-K0AKcdX3p14z80&e=> under the heading StanfordNLP, PETRARCH2 no longer supports direct integration with CoreNLP, which means that raw text input is no longer supported. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openeventdata_petrarch2_issues_43-23issuecomment-2D304722233&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=u09cKGB4zrN1K8EyKfdB8nlEfL9Pbz_d0KhCb_vhwIU&s=Im0NzEsf0RFe35tmTryorEhIFzFm35qBju17MZLK_Ew&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AXt3P7-5FouQweuuxJZWaBOO5vbdhsKrlJks5r-2DyIrgaJpZM4NoKXN&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=u09cKGB4zrN1K8EyKfdB8nlEfL9Pbz_d0KhCb_vhwIU&s=uTPSGab9padU3c4wvZk5SGKI8uSaKFepqMuM8acjPhs&e=>.

from petrarch2.

ahalterman commented on July 30, 2024

The easiest way to go from text to event data is to use the full pipeline. There are step-by-step instructions here: https://andrewhalterman.com/2017/05/08/making-event-data-from-scratch-a-step-by-step-guide/

I don't think any of us have used the XML-based method in years so I'm not sure what would happen.

from petrarch2.

ZxXinZhang commented on July 30, 2024

Hi Andy, Thanks for your sweet answer! I just have one more question. The full pipeline solution seems perfect for scraping the content from webpages. What if my raw data is plain text? Can I figure out someway to process my raw data into desirable format in Mongo Database system? Best, Xin

…

On May 30, 2017, at 6:34 AM, Andy Halterman ***@***.***> wrote: The easiest way to go from text to event data is to use the full pipeline. There are step-by-step instructions here: https://andrewhalterman.com/2017/05/08/making-event-data-from-scratch-a-step-by-step-guide/ <https://urldefense.proofpoint.com/v2/url?u=https-3A__andrewhalterman.com_2017_05_08_making-2Devent-2Ddata-2Dfrom-2Dscratch-2Da-2Dstep-2Dby-2Dstep-2Dguide_&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=b12WqMdRXdFUXx4WCJbZYcWYWbsnfLDBTsotdvT0axI&s=8Zoc_wLFjQnXgVqLOQEvIX0v9isqHRWIBx-dF2ntdtk&e=> I don't think any of us have used the XML-based method in years so I'm not sure what would happen. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openeventdata_petrarch2_issues_43-23issuecomment-2D304879506&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=b12WqMdRXdFUXx4WCJbZYcWYWbsnfLDBTsotdvT0axI&s=sAkyJ3su6Xe1Q6b0dgKoJyhO9ht3HaPXYqbLHDQ-Pfc&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AXt3Pxx17YQjsX3v1cou4zNcuoLBvbT6ks5r-5FBrJgaJpZM4NoKXN&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=b12WqMdRXdFUXx4WCJbZYcWYWbsnfLDBTsotdvT0axI&s=w4SOkFqIFBptYybz04CjxuqZ5Z8bdoTRquzyrIO9ugI&e=>.

from petrarch2.

philip-schrodt commented on July 30, 2024

The XML method definitely still works -- I used it on about 25-million stories about a year ago (okay, so at least it still worked a year ago, but I don't think there have been any changes that would break it since then). However, unlike the pipeline, you need a customized program that will convert from your input format (in my case, it was the NewsML standard) to the XML standard.

from petrarch2.

ZxXinZhang commented on July 30, 2024

Hi Philip, Thank you. I tried to convert my content into XML format. What I did is that I run my text using StanfordNLP first and I put the Stanford parsed text into <Parse> blocks in XML file. But it seemed there are few events generated. A paragraph generated 0 event. But sometimes it works for another sentence, though it is rare. So I am just want to know how did you do that? With <Parse> block or leave it empty? Best, Xin

…

On May 30, 2017, at 12:51 PM, Philip Schrodt ***@***.***> wrote: The XML method definitely still works -- I used it on about 25-million stories about a year ago (okay, so at least it still worked a year ago, but I don't think there have been any changes that would break it since then). However, unlike the pipeline, you need a customized program that will convert from your input format (in my case, it was the NewsML standard) to the XML standard. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openeventdata_petrarch2_issues_43-23issuecomment-2D304988706&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=o6I95M-59zVQ-I9VJb8wDYvYS9AX0EPbxEJx28BICec&s=Uqoj05YawXl8OVh8cRmhbYKBFXgp0Smf_kmp4Ljdsrs&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AXt3P-2D7gTWx2d79QLKqmx1gO-2Db9vAYj1ks5r-5FHM9gaJpZM4NoKXN&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=o6I95M-59zVQ-I9VJb8wDYvYS9AX0EPbxEJx28BICec&s=6uRU__5CopqnjxTCll55_ddSlmUqGJECCaFcg0SV8mc&e=>.

from petrarch2.

philip-schrodt commented on July 30, 2024

If you are getting any events, then you've got things formatted correctly, and definitely keep the block, since that is where the program is getting the information. Usually, however, PETRARCH is used to code individual sentences, rather than paragraphs (in fact I'm not sure what it would do with a paragraph-length Stanford parse, though probably it would stop at the end of the first sentence), so you'll probably get a higher yield of events if you split the paragraphs into sentences.

The number of events generated will very much depend on the texts you are trying to code (the existing verbs dictionary is designed to primarily code events associated with political conflict situations, since that was the focus of the CAMEO ontology) and the actor dictionaries you are using. You can over-ride the actor dictionaries and have the program produce any events where it finds a verb phrase in the dictionary by setting the variable new_actor_length in the file PETR_config.ini to a value > 0: I'd suggest something in the range 15 - 35 -- the higher the number, the more cases you will get. It is also relatively easy to add in actors to the dictionaries if you are interested in specific cases.

from petrarch2.

ZxXinZhang commented on July 30, 2024

Thank you very much!!

…

On May 31, 2017, at 10:43 AM, Philip Schrodt ***@***.***> wrote: If you are getting any events, then you've got things formatted correctly, and definitely keep the block, since that is where the program is getting the information. Usually, however, PETRARCH is used to code individual sentences, rather than paragraphs (in fact I'm not sure what it would do with a paragraph-length Stanford parse, though probably it would stop at the end of the first sentence), so you'll probably get a higher yield of events if you split the paragraphs into sentences. The number of events generated will very much depend on the texts you are trying to code (the existing verbs dictionary is designed to primarily code events associated with political conflict situations, since that was the focus of the CAMEO ontology) and the actor dictionaries you are using. You can over-ride the actor dictionaries and have the program produce any events where it finds a verb phrase in the dictionary by setting the variable new_actor_length in the file PETR_config.ini to a value > 0: I'd suggest something in the range 15 - 35 -- the higher the number, the more cases you will get. It is also relatively easy to add in actors to the dictionaries if you are interested in specific cases. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openeventdata_petrarch2_issues_43-23issuecomment-2D305263363&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=1EIPsVWFrdBr5N4vjy3fN1H9RMT-fbdDlUUtf5CbOrg&s=ZBVQiKK2UuO90xM1zGU8Ph-9r6iJ3bsqZxPlMk_d5ns&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AXt3PwCZs9Uv33CxUCz6LwAShpOMgy5dks5r-5FaaogaJpZM4NoKXN&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=lmxfUACZv5e0liMMFCXFTA&m=1EIPsVWFrdBr5N4vjy3fN1H9RMT-fbdDlUUtf5CbOrg&s=hiqh_p8iGGAeNXAJBjLpbULPDoLQzE9-IgvtRGZ4mMM&e=>.

from petrarch2.

GigaWord.sample.PETR.xml file without parse blocks about petrarch2 HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent