-
Phrase candidate generation The phrase candidates are generated from a Lucene index (offset should be indexed), or a text collection (indexed by Lucene's RAMDirectory internally)
-
Phrase quality scoring P(hrase)F-IDF and CValue based scorers are implemented, i.e. two unsuperivsed baselines of SegPhrase. A reference Lucene index can be plugged to provide DF statistics.
Configure your pom.xml by adding this repository
<repository>
<id>ziy-mvnrepo-releases</id>
<name>ziy GitHub Personal Repo</name>
<url>https://raw.github.com/ziy/mvn-releases/master/</url>
</repository>
and adding this dependency
<dependency>
<groupId>edu.cmu.lti.oaqa.core</groupId>
<artifactId>lucene-frequent-phrase</artifactId>
<version>0.0.1</version>
</dependency>