Comments (8)
As seen in this section of the README: https://github.com/openeventdata/petrarch2/blob/master/README.md#installing under the heading StanfordNLP, PETRARCH2 no longer supports direct integration with CoreNLP, which means that raw text input is no longer supported.
from petrarch2.
from petrarch2.
The easiest way to go from text to event data is to use the full pipeline. There are step-by-step instructions here: https://andrewhalterman.com/2017/05/08/making-event-data-from-scratch-a-step-by-step-guide/
I don't think any of us have used the XML-based method in years so I'm not sure what would happen.
from petrarch2.
from petrarch2.
The XML method definitely still works -- I used it on about 25-million stories about a year ago (okay, so at least it still worked a year ago, but I don't think there have been any changes that would break it since then). However, unlike the pipeline, you need a customized program that will convert from your input format (in my case, it was the NewsML standard) to the XML standard.
from petrarch2.
from petrarch2.
If you are getting any events, then you've got things formatted correctly, and definitely keep the block, since that is where the program is getting the information. Usually, however, PETRARCH is used to code individual sentences, rather than paragraphs (in fact I'm not sure what it would do with a paragraph-length Stanford parse, though probably it would stop at the end of the first sentence), so you'll probably get a higher yield of events if you split the paragraphs into sentences.
The number of events generated will very much depend on the texts you are trying to code (the existing verbs dictionary is designed to primarily code events associated with political conflict situations, since that was the focus of the CAMEO ontology) and the actor dictionaries you are using. You can over-ride the actor dictionaries and have the program produce any events where it finds a verb phrase in the dictionary by setting the variable new_actor_length in the file PETR_config.ini to a value > 0: I'd suggest something in the range 15 - 35 -- the higher the number, the more cases you will get. It is also relatively easy to add in actors to the dictionaries if you are interested in specific cases.
from petrarch2.
from petrarch2.
Related Issues (20)
- Adding information to 'meta' when expanding cooperating compounds HOT 1
- Add documentation and unit tests to output from #6 HOT 4
- Strange output format for phrase extraction. HOT 2
- Does Petrarch2 take care of Event Coreference Resolution? HOT 4
- Strict documentation/freezing of parse tree input is needed HOT 2
- how do i include custom dictionary in petrarch(2)? HOT 1
- Finish writing error messages to log rather than using print() HOT 1
- Pull dictionaries out of repo HOT 2
- Config file and parsing for NullVerbs and NullActors HOT 1
- Incorrect Command line Parsing Function: parse_cli_args HOT 2
- make_plural_noun(noun) function when reading verb dictionary HOT 1
- Install instructions reference incorrect petrarch version HOT 3
- Add a Contribute section to README HOT 2
- Make petrarch2 output more JSON friendly HOT 4
- When to add a pipe ‘|’
- ImportError: No module named 'PETRglobals' HOT 1
- Date comparison bug HOT 1
- Bug in generating text
- Adapting new Treebank format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from petrarch2.