Giter Club home page Giter Club logo

brill-pos-tagger's Introduction

Brill's POS Tagger

Installation

npm install natural

Usage

var Tagger = require("./lib/brill_pos_tagger");

var base_folder = "/home/hugo/workspace/brill-pos-tagger";
var rules_file = base_folder + "/data/tr_from_pos.txt";
var lexicon_file = base_folder + "/data/lexicon.json";
var default_category = 'N';

var tagger = new Tagger(lexicon_file, rules_file, default_category, function(error) {
  if (error) {
    console.log(error);
  }
  else {
    var sentence = ["I", "see", "the", "man", "with", "the", "telescope"];
    console.log(JSON.stringify(tagger.tag(sentence)));
  }
});

Lexicon

The lexicon is either a JSON file that has the following structure:

{
  "word1": ["cat1"],
  "word2": ["cat2", "cat3"],
  ...
}

or a text file:

word1 cat1 cat2
word2 cat3
...

Words may have multiple categories in the lexicon file. The tagger uses only the first one.

Specifying transformation rules

Transformation rules are specified as follows:

OLD_CAT NEW_CAT PREDICATE PARAMETER

This means that if the predicate is true that if the category of the current position is OLD_CAT, the category is replaced by NEW_CAT. The predicate may use the parameter in distinct ways: sometimes the parameter is used for specifying the outcome of the predicate:

NN CD CURRENT-WORD-IS-NUMBER YES

This means that if the outcome of CURRENT-WORD-IS-NUMBER is YES, the category is replaced by CD The parameter can also be used to check the category of a word in the sentence:

VBD NN PREV-TAG DT

Here the category of the previous word must be DT for the rule to be applied.

Algorithm

The tagger applies transformation rules that may change the category of words. The input sentence must be split into words which are assigned with categories. The tagged sentence is then processed from left to right. At each step all rules are applied once; rules are applied in the order in which they are specified. Algorithm:

function(sentence) {
  var tagged_sentence = new Array(sentence.length);

  // snip

  // Apply transformation rules
  for (var i = 0, size = sentence.length; i < size; i++) {
    this.transformation_rules.forEach(function(rule) {
      rule.apply(tagged_sentence, i);
    });
  }
  return(tagged_sentence);
}

Adding a predicate

Predicates are defined in module lib/Predicate.js. In that file a function must be created that serves as predicate. A predicate accepts a tagged sentence, the current position in the sentence that should be tagged, and the outcome(s) of the predicate. An example of a predicate that checks the category of the current word:

function current_word_is_tag(tagged_sentence, i, parameter) {
  return(tagged_sentence[i][0] === parameter);
}

Some predicates accept two parameters. Next step is to map a keyword to this predicate so that it can be used in the transformation rules. The mapping is also defined in lib/Predicate.js:

var predicates = {
  "CURRENT-WORD-IS-TAG": current_word_is_tag,
  "PREV-WORD-IS-CAP": prev_word_is_cap
}

Acknowledgements/references

brill-pos-tagger's People

Contributors

hugo-ter-doest avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.