Giter Club home page Giter Club logo

finner's Introduction

rorqual (finner)

TL;DR: Like Apache OpenNLP (TokenNameFinder) but faster and actually working.

Detects all kinds of entities in financial news texts, like ISINs, FIGIs (BBGIDs), monetary values, date and time stamps, etc.

Red tape

Motivation

Unlike the NERs from the big NLP packages this is rule based and hence the false positive rate is low while the false negative rate is zero.

Instead of accounting for every edge case finner aims to cover the top 95% of use cases and is meant to be part of a booster setup. This way a bunch of heuristics, each one mediocre in coverage and accuracy, can outperform a well-trained single (but more complex) model.

Technicalities

The command-line tool finner comes with a built-in tokeniser. This is mostly due to the finner anno subcommand which would otherwise be hard to implement. However, if interactive annotation is not one of your use cases anyway and the built-in tokeniser has proved to be problematic for your input, it's highly advised to pre-tokenise the input and pass it to finner.

A somewhat more comprehensive tokeniser, based on Unicode's character classes (and therefore with full UTF8 support) is terms(1) from the glod project.

Example

A simple showcase:

$ echo '5 May 2015 Issue of $500,000,000 0.875 percent' | finner anno
5 May 2015/date Issue of $500,000,000/amt(USD) 0.875 percent/num(*0.01)

Or alternatively

$ echo '5 May 2015 Issue of $500,000,000 0.875 percent' | finner extr
5 May 2015      date    [0,10)
5       num     [0,1)
May     date    [2,5)
2015    date    [6,10)
$ 500,000,000   amt(USD)        [20,32)
$       ccy(USD)        [20,21)
500,000,000     num     [21,32)
0.875 percent   num(*0.01)      [33,46)
0.875   num     [33,38)
percent unit(*0.01)     [39,46)

finner's People

Contributors

hroptatyr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.