Giter Club home page Giter Club logo

morpha's Introduction

Build Status Download

Morpha Stemmer

This is a Java version of the Morpha stemmer, a fast and robust morphological analyser for English based on finite-state techniques that returns the lemma and inflection type of a word, given the word form and its part of speech. (The latter is optional but accuracy is degraded if it is not present).

The original .lex and verb-stem files (see src/main/resources) were converted to Java using JFlex by Michael Schmitz [email protected]. This fork of his Morpha wrapper has been rewritten to retain affix information and throw fewer exceptions.

Usage

import org.samthomson.morpha.Morpha.Lemma;
import static org.samthomson.morpha.Morpha.stem;

// when part-of-speech tags are available:
Lemma sawVerb = stem("saw", "VBD");  // Lemma(see, ed)
sawVerb.lemma;  // "see"
sawVerb.affix;  // "ed"
Lemma sawNoun = stem("saw", "NN");  // Lemma(saw)
sawNoun.lemma;   // "saw"
sawNoun.affix;   // ""

// when part-of-speech tags are not available a best effort is made:
stem("saw");    // Lemma("see", "ed")
stem("finding") // Lemma("find", "ing")

SBT

To use this library as a dependency, add the following to your build.sbt:

libraryDependencies ++= Seq(
  "org.samthomson" % "morpha" % "1.1"
)

Maven

To use this library as a dependency, add the following to your pom.xml:

<repositories>
  <repository>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
    <id>central</id>
    <name>bintray</name>
    <url>http://jcenter.bintray.com</url>
  </repository>
</repositories>
<dependencies>
  <dependency>
    <groupId>org.samthomson</groupId>
    <artifactId>morpha</artifactId>
    <version>1.1</version>
  </dependency>
</dependencies>

From the README for the original distribution:

(full version)

This directory contains software for morphological processing of English as developed by Kevin Humphreys [email protected], John Carroll [email protected] and Guido Minnen.

To be used for research purposes only (see section 4 below). If you make any changes, the authors would appreciate it if you sent them details of what you have done.

Covers the English inflectional suffixes:

  • -s plural of nouns, 3rd person singular present of verbs
  • -ed past tense
  • -en past participle
  • -ing progressive of verbs
  1. Acknowledgements, copyrights etc.

Copyright (c) 1995-2000 University of Sheffield, University of Sussex All rights reserved.

Redistribution and use of source and derived binary forms are permitted without fee provided that:

  • they are not used in commercial products
  • the above copyright notice and this paragraph are duplicated in all such forms
  • any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by Kevin Humphreys [email protected], John Carroll [email protected] and Guido Minnen and refer to the following related publication:

Guido Minnen, John Carroll and Darren Pearce. 2001. `Applied morphological processing of English'. Natural Language Engineering, 7(3). 207-223.

The name of University of Sheffield may not be used to endorse or promote products derived from this software without specific prior written permission.

This software is provided "as is" and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.

The exception lists were derived semi-automatically from WordNet 1.5, and various other corpora and MRDs.

Many thanks to Tim Baldwin, Chris Brew, Bill Fisher, Gerald Gazdar, Dale Gerdemann, Adam Kilgarriff and Ehud Reiter for suggested improvements.

WordNet 1.5 Copyright 1995 by Princeton University. All rights reseved.

THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same.

morpha's People

Contributors

sammthomson avatar schmmd avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.