Giter Club home page Giter Club logo

pyrata's Introduction

PyRATA

Current Release Version

Python 3

MIT License

PyRATA is an acronym which stands both for "Python Rule-based feAture sTructure Analysis" and "Python Rule-bAsed Text Analysis". Indeed, PyRATA is not only dedicated to process textual data.

Features

In short, PyRATA

  • provides regular expression (re) matching methods over more complex structures than a list of characters (string), namely a sequence of features set (i.e. list of dict in python jargon);
  • in addition to the re methods, it provides modification methods to replace, update or extend (sub-parts of) the data structure itself (also named annotation) ;
  • offers a similar re API to the python re module in order not to disturb python re users;
  • defines a pattern matching language whose syntax follows the Perl regexes de facto standard;
  • is implemented in python 3;
  • can be used for processing textual data but is not limited to (the only restriction is the respect of the data structure to explore);
  • is released under the MIT Licence which is a short and simple permissive license;
  • is fun and easy to use to explore data for research or pedagocial motivations, define machine learning features, formulate expert knowledge in a declarative way.

Documentation ===========

See the Quick overview section below and the user guide for more details and examples.

Download and installation procedure ===========

The simplest way

Right now pyrata is published on PyPI, so the simplest procedure to install is to type in a console:

sudo pip3 install pyrata

Alternatively you can manually ------------------------

Download the latest PyRATA release

wget https://github.com/nicolashernandez/PyRATA/archive/master.zip
unzip master.zip -d .
cd PyRATA-master/

or clone it

git clone https://github.com/nicolashernandez/PyRATA.git
cd pyrata/

Then install pyrata :

sudo pip3 install . 

Of course, as any python module you can barely copy the pyrata sub dir in your project to make it available. This solution can be an alternative if you do not have root privileges or do not want to use a virtualenv.

Requirement

PyRATA use the PLY implementation of lex and yacc parsing tools for Python (version 3.10).

You do not need to care about this stage if you performed the pip3 install procedure above.

If you do not properly install pyrata, you will have to manually install ply (or download it manually to copy it in your local working dir). :

sudo pip3 install ply

Run tests (optional)

::

python3 test_pyrata.py

The test named test_search_any_class_step_error_step_in_data may fail. It is due to a syntactic parsing error - unexpected token type="NAME" with value="pos" at position 35. Search an error before this point. So far the process of a pattern is not stopped when it encounters a parsing error, we would like to prevent this behavior (expected result). So the current obtained result differs from the one expected, and consequently gives a fail.

Quick overview (in console) ==================

First run python

python3

Then import the main pyrata regular expression module:

Let's work with the following sentence:

Let's say your processing result in the pyrata data structure format, a list of dict i.e. a sequence of features set, each feature having a name and a value.

There is no requirement on the names of the features. You can easily turn a sentence into the pyrata data structure, for example by doing:

In the previous code, you see that the names raw and pos have been arbitrary choosen to means respectively the surface form of a word and its part-of-speech.

At this point you can use the regular expression methods available to explore the data. Let's say you want to search all the adjectives in the sentence. By chance there is a property which specifies the part of speech of tokens, pos, the value of pos which stands for adjectives is JJ. Your pattern will be:

To find all the non-overlapping matches of pattern in data, you will use the findall method:

To go further, the next step is to have a look to the user guide.

pyrata's People

Contributors

nicolashernandez avatar

Watchers

James Cloos avatar normanj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.