Giter Club home page Giter Club logo

Comments (8)

johnb30 avatar johnb30 commented on July 30, 2024

As seen in this section of the README: https://github.com/openeventdata/petrarch2/blob/master/README.md#installing under the heading StanfordNLP, PETRARCH2 no longer supports direct integration with CoreNLP, which means that raw text input is no longer supported.

from petrarch2.

ZxXinZhang avatar ZxXinZhang commented on July 30, 2024

from petrarch2.

ahalterman avatar ahalterman commented on July 30, 2024

The easiest way to go from text to event data is to use the full pipeline. There are step-by-step instructions here: https://andrewhalterman.com/2017/05/08/making-event-data-from-scratch-a-step-by-step-guide/

I don't think any of us have used the XML-based method in years so I'm not sure what would happen.

from petrarch2.

ZxXinZhang avatar ZxXinZhang commented on July 30, 2024

from petrarch2.

philip-schrodt avatar philip-schrodt commented on July 30, 2024

The XML method definitely still works -- I used it on about 25-million stories about a year ago (okay, so at least it still worked a year ago, but I don't think there have been any changes that would break it since then). However, unlike the pipeline, you need a customized program that will convert from your input format (in my case, it was the NewsML standard) to the XML standard.

from petrarch2.

ZxXinZhang avatar ZxXinZhang commented on July 30, 2024

from petrarch2.

philip-schrodt avatar philip-schrodt commented on July 30, 2024

If you are getting any events, then you've got things formatted correctly, and definitely keep the block, since that is where the program is getting the information. Usually, however, PETRARCH is used to code individual sentences, rather than paragraphs (in fact I'm not sure what it would do with a paragraph-length Stanford parse, though probably it would stop at the end of the first sentence), so you'll probably get a higher yield of events if you split the paragraphs into sentences.

The number of events generated will very much depend on the texts you are trying to code (the existing verbs dictionary is designed to primarily code events associated with political conflict situations, since that was the focus of the CAMEO ontology) and the actor dictionaries you are using. You can over-ride the actor dictionaries and have the program produce any events where it finds a verb phrase in the dictionary by setting the variable new_actor_length in the file PETR_config.ini to a value > 0: I'd suggest something in the range 15 - 35 -- the higher the number, the more cases you will get. It is also relatively easy to add in actors to the dictionaries if you are interested in specific cases.

from petrarch2.

ZxXinZhang avatar ZxXinZhang commented on July 30, 2024

from petrarch2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.