Giter Club home page Giter Club logo

crf4j's Introduction

crf4j: CRF model training and testing for Java

Build Status

This is a pure Java port of taku's crfpp(also known as crf++), which is based on codes of crfpp-0.58.

Credits to komiya's for his Java double array trie implementation.

Features

  • pure Java, with least dependencies(only commons-cli as runtime deps)
  • compatible commandline options and template/input format with crfpp
  • load model from classpath
  • compatible text model format with crfpp
  • convert text model to (our)binary model and (our)binary model to text model
  • multi-threading support
  • CRF-L1/CRF-L2/MIRA algorithms supports
  • n-best outputs
  • CRF Model wrapper for API call
  • Tests and demo for usage demonstration

Usage

Building

mvn clean package

Run tests:

mvn test

Training

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfLearn <template file> <train datafile> <model path>

For more options, please run

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfLearn -h

For details on format of template file and train file, please refer to original page of crfpp.

Testing

to print output to console:

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfTest -m <model path> <test datafile>

to print output to file:

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfTest -m <model path> <test datafile> -o <outputfile>

API call

please refer to CrfDemo.java.

Performance

Concurrent Access

In an example of using crf4j model to recognize name entity, we used jmeter to test 400 concurrent access to the same Http interface, and here is the result.

#Samples Average Median 90% Line Min Max Throughput
4000 41 4 60 0 746 1250/sec

The test environment is:

OS CPU MEM
Windows 7x64 Intel Core [email protected] 8GB

Notes

The binary model generated by CrfLearn is incompatible with crfpp, but the text model is. If you somehow want to reuse a crfpp model with crf4j, please generate a text model when you train with crfpp(add -t option), and then run java -cp crf4j.jar com.github.zhifac.crf4j.EncoderFeatureIndex <crfpp_text_model> <output_crf4j_binarymodel> to convert the crfpp text model to crf4j binary model. Or if you somehow can not retrain the same text model(e.g. missing train data), you can still convert an existing crfpp binary model to text model with modified version of crfpp from here.

TODO

  • Optimize memory usage when training(it currently consumes about 8GB heap memory for 24224128 features, whereas crfpp uses 2GB)

License

LGPL & Modified BSD


Chinese version:

crf4j: crfpp(crf++)的Java实现

(基于crfpp 0.58)

crf4j's People

Contributors

zhifac avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.