Giter Club home page Giter Club logo

python-korp's Introduction

Korp API for Python

Library for Python to use Korp API. This library provides an easy way to query Korp systems for language corpora.

Installation

sudo pip install korp

Usage

You can initialise Korp with either service_name (språkbanken, kielipankki or GT) or url to your Korp’s API interface such as https://korp.csc.fi/cgi-bin/korp.cgi .

An example for getting all concordances for North Sami corpora in Giellatekno Korp for query [pos=”A”] “go” [pos=”N”].

from korp.korp import Korp
korppi = Korp(service_name="GT") #uses Giellatekno, "kielipankki" and "språkbanken" are other possible service_name values
corpora = korppi.list_corpora("SME") #lists corpora returns the ones starting with the North Sami language code
number_of_results, concordances = korppi.all_concordances('[pos="A"] "go" [pos="N"]', corpora)

More information

See the Wiki for a complete description or my blog for a real life Korp example.

Need for NLP solutions for your business?

Rootroo logo

My company, Rootroo offers consulting related to multilingual NLP tasks. We have a strong academic background in the state-of-the-art AI solutions for every NLP need. Just contact us, we won't bite.

Cite

If you use this in an academic publication, I would be ever so grateful if you cited it as follows:

Mika Hämäläinen. (2018, January 9). Python Korp Library (Version v1). Zenodo. http://doi.org/10.5281/zenodo.1143374

DOI

Licence

Apache License 2.0 (C) 2017-2019 Mika Hämäläinen

python-korp's People

Contributors

mikahama avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

python-korp's Issues

Receiving additional attributes with query result

I see that with corpus_information I can get all the attributes available for corpus, tokens and texts, but I'm not sure how I can receive this information alongside the query. I would preferably receive just the matching token with the rest of the attributes, since I'm interested to study distribution of specific patterns and forms. I don't necessarily need KWIC to anything, so it would be good if that could be omitted from the response somehow.

One possibility could be to list the wanted attributes, or then have some way to specify that all that exist should be returned. Of course this gets very specific to individual corpora, but I guess that can't be helped. Maybe there already is a way to do this, but I didn't see it now in documentation, sorry if I have have missed something that is already there.

Otherwise the package works really well and is useful to access Korp, thanks for your good work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.