Giter Club home page Giter Club logo

epic's Introduction

Archived

NLP, like all of AI, has changed a lot since I wrote this back in 2012-2014. I don't have the time to maintain this library, much less modernize it. Maybe one day...

Epic

(c) 2014 David Hall.

Epic is a structured prediction framework for Scala. It also includes classes for training high-accuracy syntactic parsers, part-of-speech taggers, name entity recognizers, and more.

Epic is distributed under the Apache License, Version 2.0.

The current version is 0.3.

Documentation

Documentation will (eventually) live at the GitHub wiki: https://github.com/dlwh/epic/wiki

See some example usages at https://github.com/dlwh/epic-demo.

Using Epic

Epic can be used programmatically or from the command line, using either pretrained models (see below) or with models you have trained yourself.

Currently, Epic has support for three kinds of models: parsers, sequence labelers, and segmenters. Parsers produce syntactic representations of sentences. Sequence labelers are things like part-of-speech taggers. These associate each word in a sentence with a label. For instance, a part-of-speech tagger can identify nouns, verbs, etc. Segmenters break a sentence into a sequence of fields. For instance, a named entity recognition system might identify all the people, places and things in a sentence.

Command-line Usage

Epic bundles command line interfaces for using parsers, NER systems, and POS taggers (and more generally, segmentation and tagging systems). There are three classes, one for each kind of system:

  • epic.parser.ParseText runs a parser.
  • epic.sequences.SegmentText runs an NER system, or any kind of segmentation system.
  • epic.sequences.TagText runs a POS tagger, or any kind of tagging system.

All of these systems expect plain text files as input, along with a path to a model file. The syntax is:

java -Xmx4g -cp /path/to/epic-assembly-0.3-SNAPSHOT.jar epic.parser.ParseText --model /path/to/model.ser.gz --nthreads <number of threads> [files]

Currently, all text is output to standard out. In the future, we will support output in a way that differentiates the files. If no files are given, the system will read from standard input. By default, the system will use all available cores for execution.

Models can be downloaded from http://www.scalanlp.org/models/ or from Maven Central. (See below.)

Programmatic Usage

Epic also supports programmatic usage. All of the models assume that text has been segmented and tokenized.

Preprocessing text

To preprocess text so that the models can use them, you will need to segment out sentences and tokenize the sentences into individual words. Epic comes with classes to do both.

Once you have a sentence, you can tokenize it using a epic.preprocess.TreebankTokenizer, which takes a string and returns a sequence of tokens. All told, the pipeline looks like this:

val text = getSomeText();

val sentenceSplitter = MLSentenceSegmenter.bundled().get
val tokenizer = new epic.preprocess.TreebankTokenizer()

val sentences: IndexedSeq[IndexedSeq[String]] = sentenceSplitter(text).map(tokenizer).toIndexedSeq

for(sentence <- sentences) {
  // use the sentence tokens
}

Parser

To use the parser programmaticaly, deserialize a parser model--either using epic.models.deserialize[Parser[AnnotatedLabel, String]](path) or using the ParserSelector. Then, give the parser segmented and tokenized text:

val parser = epic.models.deserialize[Parser[AnnotataedLabel, String]](path)

// or:

val parser = epic.models.ParserSelector.loadParser("en").get // or another 2 letter code.

val tree = parser(sentence)

println(tree.render(sentence))

Trees have a number of methods on them. See the class definition or API docs.

Part-of-Speech Tagger

Using a Part-of-Speech tagger is similar to using a parser: load a model, tokenize some text, run the tagger. All taggers are (currently) linear chain conditional random fields, or CRFs. (You don't need to understand them to use them. They are just a machine learning method for assigning a sequence of tags to a sequence of words.)

val tagger = epic.models.deserialize[CRF[AnnotatedLabel, String]](path)

// or:

val tagger = epic.models.PosTagSelector.loadTagger("en").get // or another 2 letter code.

val tags = tagger.bestSequence(sentence)

println(tags.render)

Named Entity Recognition

Using a named entity recognizer is similar to using a pos tagger: load a model, tokenize some text, run the recognizer. All NER systems are (currently) linear chain semi-Markov conditional random fields, or SemiCRFs. (You don't need to understand them to use them. They are just a machine learning method for segmenting text into fields.)

val ner = epic.models.deserialize[SemiCRF[AnnotatedLabel, String]](path)

// or:

val ner = epic.models.NerSelector.loadNer("en").get// or another 2 letter code.

val segments = ner.bestSequence(sentence)

println(segments.render)

The outside label of a SemiCRF is the label that is consider not part of a "real" segment. For instance, in NER, it is the label given to words that are not named entities.

Pre-trained Models

Epic provides a number of pre-trained models. These are available as Maven artifacts from Maven Central, and can be loaded at runtime. To use a specific model, just depend on it (or alternatively download the jar file). You can then load the parser by calling, for example:

epic.parser.models.en.span.EnglishSpanParser.load()

This will load the model and return a Parser object. If you want to not hardwire dependencies, either for internationalization or to potentially try different models, use epic.models.ParserSelector.loadParser(language), where language is the two letter code for the language you want to use.

To following models are available at this time:

AS OF WRITING ONLY MODELS FOR ENGLISH ARE AVAILABLE! Write me if you want these other models.

  • Parser
    • English:
      "org.scalanlp" %% "epic-parser-en-span" % "2015.1.25"
      
  • POS Taggers
    • English:
      "org.scalanlp" %% "epic-pos-en" % "2015.1.25"
      
  • Named Entity Recognizers
    • English:
      "org.scalanlp" %% "epic-ner-en-conll" % "2015.1.25"
      

There is also a meta-dependency that includes the above three models:

"org.scalanlp" %% "english"  % "2015.1.25"

I meant to name that "epic-english" but messed up. So it's that for now. Expect it to change.

TODO:

  • Parser
    • English:
      "org.scalanlp" %% "epic-parser-en-span" % "2014.9.15-SNAPSHOT"
      
    • Basque:
      "org.scalanlp" %% "epic-parser-eu-span" % "2014.9.15-SNAPSHOT"
      
    • French:
      "org.scalanlp" %% "epic-parser-fr-span" % "2014.9.15-SNAPSHOT"
      
    • German:
      "org.scalanlp" %% "epic-parser-de-span" % "2014.9.15-SNAPSHOT"
      
    • Hungarian:
      "org.scalanlp" %% "epic-parser-hu-span" % "2014.9.15-SNAPSHOT"
      
    • Korean:
      "org.scalanlp" %% "epic-parser-ko-span" % "2014.9.15-SNAPSHOT"
      
    • Polish:
      "org.scalanlp" %% "epic-parser-pl-span" % "2014.9.15-SNAPSHOT"
      
    • Swedish:
      "org.scalanlp" %% "epic-parser-sv-span" % "2014.9.15-SNAPSHOT"
      
  • POS Taggers
    • Basque:
      "org.scalanlp" %% "epic-pos-eu" % "2014.9.15-SNAPSHOT"
      
    • French:
      "org.scalanlp" %% "epic-pos-fr" % "2014.9.15-SNAPSHOT"
      
    • German:
      "org.scalanlp" %% "epic-pos-de" % "2014.9.15-SNAPSHOT"
      
    • Hungarian:
      "org.scalanlp" %% "epic-pos-hu" % "2014.9.15-SNAPSHOT"
      
    • Polish:
      "org.scalanlp" %% "epic-pos-pl" % "2014.9.15-SNAPSHOT"
      
    • Swedish:
      "org.scalanlp" %% "epic-pos-sv" % "2014.9.15-SNAPSHOT"
      
  • Named Entity Recognizers
    • English:
      "org.scalanlp" %% "epic-ner-en-conll" % "2014.9.15-SNAPSHOT"
      

If you use any of the parser models in research publications, please cite:

David Hall, Greg Durrett, and Dan Klein. 2014. Less Grammar, More Features. In ACL.

If you use the other things, just link to Epic.

Building Epic

In order to do anything besides use pre-trained models, you will probably need to build Epic.

To build, you need a release of SBT 0.13.2

then run

$ sbt assembly

which will compile everything, run tests, and build a fatjar that includes all dependencies.

Training Models

Training Parsers

There are several different discriminative parsers you can train, and the trainer main class has lots of options. To get a sense of them, run the following command:

$ java -cp target/scala-2.10/epic-assembly-0.2-SNAPSHOT.jar epic.parser.models.ParserTrainer --help

You'll get a list of all the available options (so many!) The important ones are:

--treebank.path "path/to/treebank"
--cache.path "constraint.cache"
--modelFactory  XXX                              # the kind of parser to train. See below.
--opt.useStochastic true                         # turn on stochastic gradient
--opt.regularization 1.0                         # regularization constant. you need to regularize, badly.

There are 4 kinds of base models you can train, and you can tie them together with an EPParserModel, if you want. The 4 base models are:

  • epic.parser.models.LatentModelFactory: Latent annotation (like the Berkeley parser)
  • epic.parser.models.LexModelFactory: Lexical annotation (kind of like the Collins parser)
  • epic.parser.models.StructModelFactory: Structural annotation (kind of like the Stanford parser)
  • epic.parser.models.SpanModelFactory: Span features (Hall, Durrett, and Klein, 2014)

These models all have their own options. You can see those by specifying the modelFactory and adding --help:

$ java -cp target/scala-2.10/epic-assembly-0.2-SNAPSHOT.jar epic.parser.models.ParserTrainer --modelFactory "model" --help

If you use the first three in research papers, please cite

David Hall and Dan Klein. 2012. Training Factored PCFGs with Expectation Propagation. In EMNLP.

If you use the SpanModel, please cite:

David Hall, Greg Durrett, and Dan Klein. 2014. Less Grammar, More Features. In ACL.

If you use something else, cite one of these, or something.

For training a SpanModel, the following configurations are known to work well in general:

  • English:
epic.parser.models.ParserTrainer \
  --modelFactory epic.parser.models.SpanModelFactory \
  --cache.path constraints.cache \
  --opt.useStochastic \
  --opt.regularization 5 \
  --opt.batchSize 500 \
  --alpha 0.1 \
  --maxIterations 1000 \
  --trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator \
  --ann.0 epic.trees.annotations.FilterAnnotations \
  --ann.1 epic.trees.annotations.ForgetHeadTag \
  --ann.2 epic.trees.annotations.Markovize \
  --vertical 1 \
  --horizontal 0 \
  --treebank.path /home/dlwh/wsj/
  • Other (SPMRL languages):
epic.parser.models.ParserTrainer \
  --treebankType spmrl \
  --binarization head \
  --modelFactory epic.parser.models.SpanModelFactory \
  --opt.useStochastic --opt.regularization 5.0 \
  --opt.batchSize 400 --maxIterations 502 \
  --iterationsPerEval 100 \
  --alpha 0.1 \
  --trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator \
  --ann.0 epic.trees.annotations.FilterAnnotations  \
  --ann.1 epic.trees.annotations.ForgetHeadTag \
  --ann.2 epic.trees.annotations.Markovize \
  --ann.2.vertical 1 \
  --ann.2.horizontal 0 \
  --ann.3 epic.trees.annotations.SplitPunct \
  --cache.path $languc-constraints.cache \
  --treebank.path ${SPMRL}/${languc}_SPMRL/gold/ptb/ \
  --supervisedHeadFinderPtbPath ${SPMRL}/${languc}_SPMRL/gold/ptb/${train}/${train}.$lang.gold.ptb \
  --supervisedHeadFinderConllPath ${SPMRL}/${languc}_SPMRL/gold/conll/${train}/${train}.$lang.gold.conll \
  --threads 8 

Training a parser currently needs four files that are cached to the pwd:

  • xbar.gr: caches the topology of the grammar
  • constraints.cache, constraints.cache.*: remembers pruning masks computed from the base grammar.

TODO: remove this reliance.

Treebank types

There is a treebank.type commandline flag that supports a few different formats for treebanks. They are:

  • penn: Reads from the wsj/ subdirectory of the Penn Treebank. This expects a set of directories 00-24, each of which contains a number of mrg files. Standard splits are used.
  • chinese: Expects a number of chtbNN.mrg files in a single directory.
  • negra: Expects a directory with three files, negra_[1-3].mrg
  • conllonto: Expects data formatted like the 2011 CoNLL shared task. Only reads the trees.
  • spmrl: Expects a directory layout like that used in the 2012 SPMRL shared task.
  • simple: Expects a directory with 3 files: {train, dev, test}.txt
Training a parser programmatically

You can also train a span model programmatically, by using the SpanModelFactory.buildSimple method. For example:

SpanModelFactory.buildSimple(trees, OptParams(regularization=1.0, useStochastic = true))

The build simple model also supports using custom featurizers.

Training POS taggers and other sequence models

The main class epic.sequences.TrainPosTagger can be used to train a POS Tagger from a treebank. It expects the same treebank options (namely treebank.path and treebank.type) as the Parser trainer does, as well as the same optimization options.

The following configuration is known to work well:

  • English:
     epic.sequences.TrainPosTagger \
     --treebank.path $PATH_TO/wsj \
     --opt.regularization 2.0 \
     --useStochastic \
     --maxIterations 1000
  • Others (SPMRL):
  epic.sequences.TrainPosTagger --opt.regularization 2.0 --useStochastic --maxIterations 1000 \
  --treebankType spmrl \
  --binarization left \
  --treebank.path ${SPMRL}/${languc}_SPMRL/gold/ptb/

If you want to train other kinds of models, you will probably need to build CRFs programmatically. For inspiration, you should probably look at the source code for TrainPosTagger. It's wonderfully short:

object TrainPosTagger extends LazyLogging {
  case class Params(opt: OptParams, treebank: ProcessedTreebank, hashFeatureScale: Double = 0.00)

  def main(args: Array[String]) {
    val params = CommandLineParser.readIn[Params](args)
    logger.info("Command line arguments for recovery:\n" + Configuration.fromObject(params).toCommandLineString)
    import params._
    val train = treebank.trainTrees.map(_.asTaggedSequence)
    val test = treebank.devTrees.map(_.asTaggedSequence)

    val crf = CRF.buildSimple(train, AnnotatedLabel("TOP"), opt = opt, hashFeatures = hashFeatureScale)

    val stats = TaggedSequenceEval.eval(crf, test)
    println("Final Stats: " + stats)
    println("Confusion Matrix:\n" + stats.confusion)

  }

}

Basically, you need to create a collection of TaggedSequences, which is a pair of sequences, one for tags and one for words. Then pass in the training data to CRF.buildSimple, along with a start symbol (used for the "beginning of sentence" tag), an optional Gazetteer (not shown), and an OptParams, which is used to control the optimization. There is also an optional hashFeatures argument, which isn't used.

We can also pass in two [WordFeaturizer] instances, one for "label" features, and one for "transition" features. Most of the featurizers in Epic have a cross product form (Label x Surface), where Label is a feature on the label (e.g. the pos tag) and the Surface feature is a feature on the surface string.Here, the label featurizer features are crossed with the tag, and the transition featurizer features are crossed with pairs of sucessive labels. See the wiki page on [[Featurizers]] for more detail.

Training NER systems and other segmentation models

Training an NER system or other SemiCRF is very similar to training a CRF. The main difference is that the inputs are Segmentations, rather than TaggedSequences. The main class epic.sequences.SemiConllNERPipeline can be used to train NER models, with data in the CoNLL 2003 shared task format. This class completely ignores all fields except the first and last. The commandline takes two paths, --train and --test, to specify training and test set files, respectively.

If you need to do something more complicated, you will need to write your own code. As an example, here is the code for epic.sequences.SemiConllNERPipeline. This code is somewhat more complicated, as the CoNLL sequences need to be turned into segmentations.

def main(args: Array[String]) {
    val params:Params = CommandLineParser.readIn[Params](args)
    logger.info("Command line arguments for recovery:\n" + Configuration.fromObject(params).toCommandLineString)
    val (train,test) = {
      val standardTrain = CONLLSequenceReader.readTrain(new FileInputStream(params.path), params.path.getName).toIndexedSeq
      val standardTest = CONLLSequenceReader.readTrain(new FileInputStream(params.test), params.path.getName).toIndexedSeq

      standardTrain.take(params.nsents).map(makeSegmentation) -> standardTest.map(makeSegmentation)
    }


    // you can optionally pass in an a Gazetteer, though I've not gotten much mileage with them.
    val crf = SemiCRF.buildSimple(train, "--BEGIN--", "O", params.opt)

    val stats = SegmentationEval.eval(crf, test)

    println(stats)


  }

We can also pass in featurizers, like in the CRF trainer. In this case, we can pass in a [WordFeaturizer] and a [SpanFeaturizer]. WordFeatures are like before, while SpanFeaturizer give features over the entire input span. For NER, this can be useful for adding features noting that an entity is entirely surrounded by quotation marks, for instance, or for matching against entries in a [[Gazetteer]].

OptParams

OptParams is a configuration class that controls the optimizer. There are a bunch of different options:

--opt.batchSize: Int = 512                                                                                                                                                                                                                 
--opt.regularization: Double = 0.0                                                                                                                                                                                                         
--opt.alpha: Double = 0.5                                                                                                                                                                                                                  
--opt.maxIterations: Int = 1000                                                                                                                                                                                                            
--opt.useL1: Boolean = false                                                                                                                                                                                                               
--opt.tolerance: Double = 1.0E-5                                                                                                                                                                                                           
--opt.useStochastic: Boolean = false                                                                                                                                                                                                       
--opt.randomSeed: Int = 0     

Regularization is generally very important. Using a value of 1.0 usually works pretty well. 5.0 works better on the SpanModel for parsing. useStochastic turns on stochastic gradient descent (rather than full batch optimization). It makes training much faster, usually.

epic's People

Contributors

adampauls avatar adampingel avatar alexnisnevich avatar bethard avatar briantopping avatar dlwh avatar gregdurrett avatar iantabolt avatar jacobandreas avatar jacopofar avatar jasonbaldridge avatar matbesancon avatar reactormonk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epic's Issues

Where is the parsing model?

Dear dlwh,

Thank you so much for publishing epic!

I have trained a new parsing model using the following command:
"java -cp target/scala-2.11/epic-assembly-0.3.jar epic.parser.models.ParserTrainer
--treebankType simple
--treebank.path "src/main/resources/smallbank"
--modelFactory epic.parser.models.SpanModelFactory
--cache.path constraints.cache
--opt.useStochastic true
--opt.regularization 1.0"

The result after training is as follows:
screen shot 2017-01-21 at 4 06 12 pm

Could you please let me know where I can find the parsing model?

Quy Nguyen

Generality in CRFModel

Hi,
that would be nice to change all String in CRFModel to W type, it requires string when you want make a CRF.

Am I right?

Documentation updates.

1
val sentences: IndexedSeq[IndexedSeq[String]] = sentenceSplitter(text).map(tokenizer).toIndexedSeq

Should be
val sentences: IndexedSeq[IndexedSeq[String]] = sentenceSplitter(text).map(tokenizer(_)).toIndexedSeq

2
loadTaqgger returns an Option, needs for to access
epic.models.PosTagSelector.loadTagger("en")

3
tagger.sentence)

Should be
tagger.bestSequence(sentence)

ClassCastException loading model in Apache Spark

Hi there,

I'm trying to use epic in an Apache Spark Streaming environment but I'm experiencing some difficulty loading the models. I'm not really sure whether this is an Epic issue, a Breeze issue, a Spark issue or where/how to solve this now! I get the following exception (for English NER):


Exception in thread "main" java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field epic.features.BrownClusterFeaturizer.epic$features$BrownClusterFeaturizer$$clusterFeatures of type scala.collection.immutable.Map in instance of epic.features.BrownClusterFeaturizer
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
    ... trimmed ...
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at breeze.util.package$.readObject(package.scala:21)
    at epic.models.package$.deserialize(package.scala:54)
        ... trimmed calls from my code ...

I've tried running my code (compiled into uberjar using 'sbt assembly') in a raw scala console and I can load the model and run it fine. However, using Spark, I get the exception described. The ONLY difference as far as I can tell is the way the model file is referenced. For the raw scala environment, I can point directly at the model file on disk (e.g. new File("mymodels/model.ser.gz")) and it loads. In Spark, I have to load the file doing something similar to:

sc.addFile("model.ser.gz")
new File(SparkFiles.get("model.ser.gz")

I've tried narrowing the code down and depending whether I point at the model extracted from the jar or the jar itself I get the same result. It's definitely loading the file (I think) as it fails in other ways if the file doesn't exist. I even tried bypassing the Breeze nonStupidObjectInputStream to no avail.

Any idea what's going on or how to test? For reference, my JVM is 1.7.0_51 and same in both scala and Spark environments.

Thanks.

Packaging refactor

Hiyas, I'd like to create a PR for packaging and wanted to see if it would be accepted before spending time on it. I'm still stuck on e0238ce given epic-parser-en-span_2.11/2015.2.19 being incompatible with anything newer, so I could just as easily make the changes I need locally and remain forked.

What I'm after at the minimum is to create two modules, one for "core" and one for "tools". Goal here is to get Tika out of the core dependencies, used only in epic.preprocess.TextExtractor, which is really a command-line tool. I don't know how many other tools there are like this or what other effects it might have on the dependency closure, but I think it will be significant.

The reason I am even doing that is Tika depends on Apache POI and a kitchen sink of other detritus. POI has a split-package problem. Once everything is cleaned up, I should at least be able to make the core module into an OSGi bundle.

Lacks documentation

Hello,
How to use trained models using neural CRF parsing is not described ?

Couldn't deserialize model

Sorry to bother you.
In epic/target/scala-2.11

java -Xmx4g -cp epic-assembly-0.4.4.jar epic.sequences.TagText --model epic-pos-en_2.10-2014.6.3-SNAPSHOT.jar --nthreads 4 test

[main] ERROR breeze.util.HashIndex$ - Deserializing an old-style HashIndex. Taking counter measures Couldn't deserialize model due to exception, epic.features.WordFeature; local class incompatible: stream classdesc serialVersionUID = 1068855398746279726, local class serialVersionUID = 1. Trying classPathLoad... Exception in thread "main" java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at epic.sequences.TagText$.classPathLoad(TagText.scala:20) at epic.sequences.TagText$.classPathLoad(TagText.scala:11) at epic.util.ProcessTextMain$class.main(ProcessTextMain.scala:47) at epic.sequences.TagText$.main(TagText.scala:11) at epic.sequences.TagText.main(TagText.scala)

I don't know how to solve the problem, it's English model.

scala -version Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

Implementation of CRF parser in another language

Hi, recently I want to reimplement the CRF parser in another language like (C++/flex, python/boost and so on). I have read the paper "Less grammar, more features" which proposes more features for CRF model. I am trying to figure out how the parser is implemented using CRF model.

  1. syntactic parser, implement transition functions (anchor rule production in Context Free Gramma) to build up nodes of a syntactic tree. (part 1)
  2. CRF based model training with proposed features to provide baseline for syntactic transition function. (part 2)

I am trying to read the source code to understand how this works. I wish some the author can help me to figure out how these two parts are implemented.

For the first part, since I am going to use Neural CRF, more details about data preprocessing are appreciated.

Have a nice weekend!

Serialization with Epic and Breeze Dependencies

My project depends on both epic and breeze (because they're both awesome!)

Unfortunately, in the process of upgrading to breeze 0.12 (we also depend on MLlib which depends on breeze 0.12 as of recent versions of spark) our code broke because the current English POS model:

"org.scalanlp" %% "epic-pos-en" % "2015.2.19"

Depends on breeze 0.11-M0.

If my project depends on both breeze 0.12 and this model, when I go to load the model I get the following:

scala> val model: SemiCRF[Any, String] = epic.models.NerSelector.loadNer("en").get
java.lang.ClassCastException: cannot assign instance of epic.lexicon.SimpleLexicon$SerializedForm to field epic.constraints.LabeledSpanConstraints$LayeredTagConstraintsFactory.lexicon of type epic.constraints.TagConstraints$Factory in instance of epic.constraints.LabeledSpanConstraints$LayeredTagConstraintsFactory
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1995)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at epic.models.ClassPathModelLoader.load(ModelLoader.scala:23)
	at epic.models.DelegatingLoader.load(ModelLoader.scala:33)
	at epic.models.NerSelector$.loadNer(NerModelLoader.scala:12)
	at .<init>(<console>:8)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
	at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904)
	at xsbt.ConsoleInterface.run(ConsoleInterface.scala:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
	at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:76)
	at sbt.Console.sbt$Console$$console0$1(Console.scala:22)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:23)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
	at sbt.Logger$$anon$4.apply(Logger.scala:85)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:724)


I'm not sure what the best way forward is here. I can envision:

  1. no longer relying on java serialization for pre-trained models
  2. publishing a version of the models for several versions of breeze.
  3. some classpath hacks (?)

Production ready ?

Hi i just wanted to know if you are production ready ?

How do you compare or related to Core-NLP?

I need to Library to do POS Tagging both in english and spanish. I need extract noun or compound non in document.

Documentation

It is really hard to evaluate this library when no documentation related to usage is available.

POS tagging fails on word "1stgeneration" with java.lang.AssertionError

build.sbt:

...
libraryDependencies += "org.scalanlp" %% "epic" % "0.4.3"
libraryDependencies += "org.scalanlp" %% "epic-pos-en" % "2017.3.10"
...

code snippet:

...
val tagger = epic.models.PosTagSelector.loadTagger("en").get
val titleTags = tagger.bestSequence(titleSentence)
...

titleSentence is a Vector of strings containing the string "1stgeneration"
When I try to tag the string "1stgeneration", my application fails with the error below (which is rather incomprehensible to me). Is this a bug or can anyone help me fix this? My own research led to nothing.

[error] (run-main-0) java.lang.AssertionError: assertion failed: Vector(WordFeature(-LC-NUM-ion,'Class)) Index(IndicatorFeature(In),IndicatorFeature(an),IndicatorFeature(Oct.),WordFeature(-LC-NUM,'Class),WordFeature(-LC,'Class),IndicatorFeature(of),IndicatorFeature(``),IndicatorFeature(The),WordFeature(-INITC,'Class),IndicatorFeature(''),IndicatorFeature(at),IndicatorFeature(Chicago),IndicatorFeature('s),IndicatorFeature((),WordFeature(-INITC-ed,'Class),WordFeature(-INITC-s,'Class),IndicatorFeature(the),IndicatorFeature(in),WordFeature(-INITC-y,'Class),IndicatorFeature(City),IndicatorFeature(,),IndicatorFeature(&),IndicatorFeature()),IndicatorFeature(role),WordFeature(-LC-ed,'Class),IndicatorFeature(by),IndicatorFeature(was),WordFeature(-LC-ly,'Class),IndicatorFeature(to),IndicatorFeature(.),IndicatorFeature(Ms.),WordFeature(-LC-s,'Class),WordFeature(-CAPS-DASH,'Class),IndicatorFeature(Inc.),IndicatorFeature(said),IndicatorFeature(it),IndicatorFeature(expects),IndicatorFeature(its),IndicatorFeature(U.S.),IndicatorFeature(sales),IndicatorFeature(remain),WordFeature(-LC-y,'Class),IndicatorFeature(about),IndicatorFeature(cars),IndicatorFeature(1990),IndicatorFeature(auto),IndicatorFeature(maker),IndicatorFeature(last),IndicatorFeature(year),IndicatorFeature(sold),WordFeature(-INITC-er,'Class),IndicatorFeature(president),IndicatorFeature(and),IndicatorFeature(chief),IndicatorFeature(executive),IndicatorFeature(officer),IndicatorFeature(he),IndicatorFeature(growth),IndicatorFeature(for),IndicatorFeature(Britain),IndicatorFeature(Europe),IndicatorFeature(Eastern),IndicatorFeature(markets),WordFeature(-CAPS,'Class),WordFeature(-CAPS-s,'Class),IndicatorFeature(increased),IndicatorFeature(10),IndicatorFeature(cents),IndicatorFeature(from),IndicatorFeature(seven),IndicatorFeature(a),IndicatorFeature(share),IndicatorFeature(new),IndicatorFeature(rate),IndicatorFeature(will),IndicatorFeature(be),IndicatorFeature(15),IndicatorFeature(A),IndicatorFeature(record),IndicatorFeature(has),IndicatorFeature(n't),IndicatorFeature(been),IndicatorFeature(set),IndicatorFeature(based),IndicatorFeature(Los),IndicatorFeature(Angeles),IndicatorFeature(makes),IndicatorFeature(computer),IndicatorFeature(building),IndicatorFeature(products),IndicatorFeature(Investors),IndicatorFeature(are),WordFeature(-LC-ing,'Class),IndicatorFeature(Securities),IndicatorFeature(Exchange),IndicatorFeature(Commission),IndicatorFeature(not),IndicatorFeature(their),IndicatorFeature(information),IndicatorFeature(stock),IndicatorFeature(corporate),IndicatorFeature(proposal),IndicatorFeature(some),IndicatorFeature(company),IndicatorFeature(executives),IndicatorFeature(would),IndicatorFeature(on),WordFeature(-LC-er,'Class),IndicatorFeature(as),WordFeature(-LC-DASH,'Class),IndicatorFeature(individual),IndicatorFeature(investors),WordFeature(-LC-al,'Class),IndicatorFeature(money),IndicatorFeature(managers),IndicatorFeature(They),IndicatorFeature(make),IndicatorFeature(agency),IndicatorFeature(changes),IndicatorFeature(proposed),IndicatorFeature(this),IndicatorFeature(past),IndicatorFeature(that),IndicatorFeature(among),IndicatorFeature(other),IndicatorFeature(things),IndicatorFeature(many),IndicatorFeature(own),IndicatorFeature(companies),IndicatorFeature('),IndicatorFeature(shares),IndicatorFeature(also),IndicatorFeature(allow),IndicatorFeature(report),IndicatorFeature(options),IndicatorFeature(later),IndicatorFeature(less),IndicatorFeature(often),IndicatorFeature(Many),IndicatorFeature(investor),IndicatorFeature(so),IndicatorFeature(1987),IndicatorFeature(market),IndicatorFeature(--),IndicatorFeature(already),IndicatorFeature(against),IndicatorFeature(little),IndicatorFeature(any),IndicatorFeature(might),IndicatorFeature(get),IndicatorFeature(out),IndicatorFeature(stocks),IndicatorFeature(paid),IndicatorFeature(level),IndicatorFeature(one),IndicatorFeature(received),IndicatorFeature(since),IndicatorFeature(were),IndicatorFeature(17),WordFeature(-INITC-ly,'Class),WordFeature(-LC-ion,'Class),IndicatorFeature(did),IndicatorFeature(really),IndicatorFeature(believe),IndicatorFeature(rules),IndicatorFeature(force),IndicatorFeature(directors),IndicatorFeature(within),IndicatorFeature(month),IndicatorFeature(after),IndicatorFeature(transaction),IndicatorFeature(But),IndicatorFeature(25),IndicatorFeature(%),IndicatorFeature(according),IndicatorFeature(figures),IndicatorFeature(reports),IndicatorFeature(late),IndicatorFeature(effort),IndicatorFeature(federal),IndicatorFeature(boost),IndicatorFeature(who),IndicatorFeature(special),IndicatorFeature(office),IndicatorFeature(policy),IndicatorFeature(which),IndicatorFeature(officials),IndicatorFeature(had),IndicatorFeature(until),IndicatorFeature(today),IndicatorFeature(comment),IndicatorFeature(issue),IndicatorFeature(more),IndicatorFeature(than),IndicatorFeature(almost),IndicatorFeature(Mr.),IndicatorFeature(probably),IndicatorFeature(vote),IndicatorFeature(early),IndicatorFeature(next),IndicatorFeature(all),IndicatorFeature(those),IndicatorFeature(Committee),IndicatorFeature(Federal),WordFeature(-INITC-ion,'Class),IndicatorFeature(American),IndicatorFeature(Association),IndicatorFeature(example),IndicatorFeature({),IndicatorFeature(law),IndicatorFeature(}),IndicatorFeature(business),IndicatorFeature(What),IndicatorFeature(most),IndicatorFeature(is),IndicatorFeature(effect),IndicatorFeature(they),IndicatorFeature(say),IndicatorFeature(have),WordFeature(-LC-ity,'Class),IndicatorFeature(trading),IndicatorFeature(activity),IndicatorFeature(buying),IndicatorFeature(or),IndicatorFeature(selling),IndicatorFeature(director),IndicatorFeature(short),IndicatorFeature(period),IndicatorFeature(time),WordFeature(-INITC-ing,'Class),IndicatorFeature(estimates),IndicatorFeature(cut),IndicatorFeature(third),IndicatorFeature(such),IndicatorFeature(marketing),IndicatorFeature(finance),IndicatorFeature(research),IndicatorFeature(development),IndicatorFeature(still),IndicatorFeature(required),IndicatorFeature(annual),IndicatorFeature(under),IndicatorFeature(least),IndicatorFeature(if),IndicatorFeature(following),IndicatorFeature(Robert),IndicatorFeature(North),IndicatorFeature(data),IndicatorFeature(key),IndicatorFeature(may),IndicatorFeature(while),IndicatorFeature(them),IndicatorFeature(when),IndicatorFeature(do),IndicatorFeature(want),IndicatorFeature(change),IndicatorFeature(should),IndicatorFeature(Congress),IndicatorFeature(added),IndicatorFeature(likely),IndicatorFeature(legislation),IndicatorFeature(basis),IndicatorFeature(nation),IndicatorFeature(largest),IndicatorFeature(fund),IndicatorFeature($),IndicatorFeature(billion),IndicatorFeature(employees),IndicatorFeature(plans),IndicatorFeature(offer),IndicatorFeature(two),IndicatorFeature(investment),IndicatorFeature(million),WordFeature(-INITC-ity,'Class),IndicatorFeature(bond),WordFeature(-LC-est,'Class),IndicatorFeature(Both),IndicatorFeature(funds),IndicatorFeature(expected),IndicatorFeature(begin),IndicatorFeature(around),IndicatorFeature(March),IndicatorFeature(1),IndicatorFeature(subject),IndicatorFeature(approval),IndicatorFeature(For),IndicatorFeature(up),IndicatorFeature(must),IndicatorFeature(plan),IndicatorFeature(Some),IndicatorFeature(institutions),IndicatorFeature(part),IndicatorFeature(agreement),IndicatorFeature(pressure),IndicatorFeature(provide),IndicatorFeature(reached),IndicatorFeature(with),IndicatorFeature(December),IndicatorFeature(securities),IndicatorFeature(South),IndicatorFeature(power),IndicatorFeature(cases),IndicatorFeature(investments),IndicatorFeature(significant),IndicatorFeature(going),IndicatorFeature(into),IndicatorFeature(bonds),IndicatorFeature(including),IndicatorFeature(much),IndicatorFeature(foreign),IndicatorFeature(buy),IndicatorFeature(sell),IndicatorFeature(futures),IndicatorFeature(contracts),IndicatorFeature(New),IndicatorFeature(York),IndicatorFeature(State),IndicatorFeature(Department),IndicatorFeature(Under),IndicatorFeature(able),IndicatorFeature(receive),IndicatorFeature(cash),IndicatorFeature(offered),IndicatorFeature(currently),IndicatorFeature(limited),IndicatorFeature(Co.),IndicatorFeature(equipment),IndicatorFeature(shareholders),IndicatorFeature(right),IndicatorFeature(purchase),IndicatorFeature(half),IndicatorFeature(price),IndicatorFeature(certain),IndicatorFeature(takeover),IndicatorFeature(years),IndicatorFeature(old),IndicatorFeature(senior),IndicatorFeature(vice),IndicatorFeature(concern),IndicatorFeature(technology),IndicatorFeature(group),IndicatorFeature(position),IndicatorFeature(work),IndicatorFeature(lot),IndicatorFeature(because),IndicatorFeature(taken),IndicatorFeature(hard),IndicatorFeature(line),IndicatorFeature(problem),IndicatorFeature(:),IndicatorFeature(He),IndicatorFeature(does),IndicatorFeature(same),IndicatorFeature(over),IndicatorFeature(again),IndicatorFeature(Richard),IndicatorFeature(ago),IndicatorFeature(very),IndicatorFeature(like),IndicatorFeature(dropped),IndicatorFeature(though),IndicatorFeature(now),IndicatorFeature(show),IndicatorFeature(few),IndicatorFeature(thing),WordFeature(-INITC-DASH,'Class),IndicatorFeature(you),IndicatorFeature(ca),IndicatorFeature(his),IndicatorFeature(commercial),IndicatorFeature(what),IndicatorFeature(His),IndicatorFeature(recent),WordFeature(-INITC-al,'Class),IndicatorFeature(case),IndicatorFeature(point),IndicatorFeature(It),IndicatorFeature(black),IndicatorFeature(suit),IndicatorFeature(announced),IndicatorFeature(just),IndicatorFeature(family),IndicatorFeature(her),WordFeature(-CAPS-DASH-s,'Class),IndicatorFeature(no),IndicatorFeature(could),IndicatorFeature(well),IndicatorFeature(second),IndicatorFeature(And),IndicatorFeature(went),IndicatorFeature(through),IndicatorFeature(first),IndicatorFeature(longer),IndicatorFeature(five),IndicatorFeature(An),IndicatorFeature(beginning),IndicatorFeature(lead),IndicatorFeature(high),IndicatorFeature(off),IndicatorFeature(Air),IndicatorFeature(him),IndicatorFeature(great),IndicatorFeature(then),IndicatorFeature(way),IndicatorFeature(end),IndicatorFeature(however),IndicatorFeature(ever),IndicatorFeature(but),IndicatorFeature(too),IndicatorFeature(away),IndicatorFeature(That),IndicatorFeature(before),IndicatorFeature(during),IndicatorFeature(?),IndicatorFeature(take),IndicatorFeature(us),IndicatorFeature(11),IndicatorFeature(1\/2),IndicatorFeature(without),IndicatorFeature(having),IndicatorFeature(future),IndicatorFeature(can),IndicatorFeature(only),IndicatorFeature(performance),IndicatorFeature(One),IndicatorFeature(President),IndicatorFeature(international),IndicatorFeature(fact),IndicatorFeature(world),IndicatorFeature(This),IndicatorFeature(members),IndicatorFeature(United),IndicatorFeature(Now),IndicatorFeature(Bush),IndicatorFeature(decision),IndicatorFeature(we),IndicatorFeature(think),IndicatorFeature(along),IndicatorFeature(left),IndicatorFeature(financial),IndicatorFeature(top),IndicatorFeature(got),IndicatorFeature(personal),IndicatorFeature(several),IndicatorFeature(plants),IndicatorFeature(France),IndicatorFeature(sent),IndicatorFeature(even),IndicatorFeature(projects),IndicatorFeature(continue),IndicatorFeature(International),IndicatorFeature(means),IndicatorFeature(West),IndicatorFeature(pay),IndicatorFeature(World),IndicatorFeature(give),IndicatorFeature(government),IndicatorFeature(rights),IndicatorFeature(;),IndicatorFeature(held),IndicatorFeature(support),IndicatorFeature(impact),IndicatorFeature(Soviet),IndicatorFeature(holding),IndicatorFeature(seems),IndicatorFeature(current),IndicatorFeature(public),IndicatorFeature(charge),IndicatorFeature(programs),IndicatorFeature(former),IndicatorFeature(head),IndicatorFeature(military),WordFeature(-LC-DASH-s,'Class),IndicatorFeature(division),IndicatorFeature(staff),IndicatorFeature(working),IndicatorFeature(once),IndicatorFeature(budget),IndicatorFeature(changed),IndicatorFeature(John),IndicatorFeature(state),IndicatorFeature(told),IndicatorFeature(soon),IndicatorFeature(week),IndicatorFeature(raise),WordFeature(-LC-NUM-DASH,'Class),IndicatorFeature(French),IndicatorFeature(owns),IndicatorFeature(Other),IndicatorFeature(countries),IndicatorFeature(Germany),IndicatorFeature(continued),IndicatorFeature(We),IndicatorFeature(see),IndicatorFeature(free),IndicatorFeature(course),IndicatorFeature(made),IndicatorFeature(largely),IndicatorFeature(these),IndicatorFeature(home),IndicatorFeature(times),IndicatorFeature(there),IndicatorFeature(reason),IndicatorFeature(Western),IndicatorFeature(Systems),IndicatorFeature(number),IndicatorFeature(plant),IndicatorFeature(production),IndicatorFeature(itself),IndicatorFeature(another),IndicatorFeature(known),IndicatorFeature(similar),IndicatorFeature(seen),IndicatorFeature(especially),IndicatorFeature(subsidiary),IndicatorFeature(producers),IndicatorFeature(On),IndicatorFeature(process),IndicatorFeature(each),IndicatorFeature(making),IndicatorFeature(using),WordFeature(-LC-NUM-s,'Class),IndicatorFeature(20),IndicatorFeature(majority),IndicatorFeature(difficult),IndicatorFeature(China),IndicatorFeature(workers),IndicatorFeature(country),IndicatorFeature(At),IndicatorFeature(major),IndicatorFeature(Canada),IndicatorFeature(University),IndicatorFeature(California),IndicatorFeature(every),IndicatorFeature(hurt),IndicatorFeature(order),IndicatorFeature(large),IndicatorFeature(enough),IndicatorFeature(produce),IndicatorFeature(yield),IndicatorFeature(30),IndicatorFeature(used),IndicatorFeature(include),IndicatorFeature(need),IndicatorFeature(costs),IndicatorFeature(demand),IndicatorFeature(Co),IndicatorFeature(problems),IndicatorFeature(However),IndicatorFeature(Paul),IndicatorFeature(Corp.),IndicatorFeature(Calif.),IndicatorFeature(called),IndicatorFeature(remains),IndicatorFeature(active),IndicatorFeature(days),IndicatorFeature(rather),IndicatorFeature(There),IndicatorFeature(acquire),IndicatorFeature(try),IndicatorFeature(between),IndicatorFeature(Bank),IndicatorFeature(Japan),IndicatorFeature('re),IndicatorFeature(being),IndicatorFeature(latest),IndicatorFeature(50),IndicatorFeature(Japanese),IndicatorFeature(banks),IndicatorFeature(led),IndicatorFeature(help),IndicatorFeature(loans),IndicatorFeature(bank),IndicatorFeature(owned),IndicatorFeature(put),IndicatorFeature(financing),IndicatorFeature(biggest),IndicatorFeature(far),IndicatorFeature(domestic),IndicatorFeature(issues),IndicatorFeature(3),IndicatorFeature(dollars),IndicatorFeature(4),IndicatorFeature(come),IndicatorFeature(fiscal),IndicatorFeature(June),IndicatorFeature(With),IndicatorFeature(here),IndicatorFeature(David),IndicatorFeature(Tuesday),IndicatorFeature(Ltd.),IndicatorFeature(National),IndicatorFeature(I),IndicatorFeature(interests),IndicatorFeature(banking),IndicatorFeature(lower),IndicatorFeature(months),IndicatorFeature(Meanwhile),IndicatorFeature(smaller),IndicatorFeature(seeking),IndicatorFeature(open),IndicatorFeature(full),IndicatorFeature(competition),IndicatorFeature(capital),IndicatorFeature(run),IndicatorFeature(both),IndicatorFeature(businesses),IndicatorFeature(started),IndicatorFeature(venture),IndicatorFeature(use),IndicatorFeature(accounts),IndicatorFeature(account),IndicatorFeature(London),IndicatorFeature(move),IndicatorFeature(brokerage),IndicatorFeature(firms),IndicatorFeature(simply),IndicatorFeature(reported),IndicatorFeature(third-quarter),IndicatorFeature(earnings),IndicatorFeature(yesterday),IndicatorFeature(average),IndicatorFeature(analysts),IndicatorFeature(higher),IndicatorFeature(named),IndicatorFeature(industrial),IndicatorFeature(systems),IndicatorFeature(A.),IndicatorFeature(Monday),IndicatorFeature(equity),IndicatorFeature(As),IndicatorFeature(manager),IndicatorFeature(different),IndicatorFeature(four),IndicatorFeature(Series),IndicatorFeature(says),IndicatorFeature(long),IndicatorFeature(good),IndicatorFeature(television),IndicatorFeature(contract),IndicatorFeature(near),IndicatorFeature(down),IndicatorFeature(involved),IndicatorFeature(always),IndicatorFeature(You),IndicatorFeature(hit),IndicatorFeature(lawyers),IndicatorFeature(real),IndicatorFeature(estate),IndicatorFeature(better),IndicatorFeature(never),IndicatorFeature(car),IndicatorFeature(six),IndicatorFeature(James),IndicatorFeature(When),IndicatorFeature(offering),IndicatorFeature(available),IndicatorFeature(life),IndicatorFeature(back),IndicatorFeature(clear),IndicatorFeature(themselves),IndicatorFeature(something),IndicatorFeature(close),IndicatorFeature(Washington),IndicatorFeature(claims),IndicatorFeature(insurance),IndicatorFeature(Texas),IndicatorFeature(team),IndicatorFeature(me),IndicatorFeature(White),WordFeature(-CAPS-y,'Class),IndicatorFeature(Big),IndicatorFeature(These),IndicatorFeature(While),IndicatorFeature(getting),IndicatorFeature(man),IndicatorFeature(magazine),IndicatorFeature(people),IndicatorFeature(Most),IndicatorFeature(others),IndicatorFeature(recently),IndicatorFeature('m),IndicatorFeature(look),IndicatorFeature(whether),IndicatorFeature(So),IndicatorFeature(base),IndicatorFeature(Still),IndicatorFeature(how),IndicatorFeature(expect),IndicatorFeature(three),IndicatorFeature(Even),IndicatorFeature(If),IndicatorFeature(know),IndicatorFeature(wo),IndicatorFeature(lost),IndicatorFeature(become),IndicatorFeature(control),IndicatorFeature(keep),IndicatorFeature(After),IndicatorFeature(disclosed),IndicatorFeature(operations),IndicatorFeature(local),IndicatorFeature(service),IndicatorFeature(states),IndicatorFeature(build),IndicatorFeature(gas),IndicatorFeature(acquisition),IndicatorFeature(April),IndicatorFeature(completed),IndicatorFeature(common),IndicatorFeature(outstanding),IndicatorFeature(assets),IndicatorFeature(department),IndicatorFeature(Nov.),IndicatorFeature(previously),IndicatorFeature(operating),IndicatorFeature(retail),IndicatorFeature(Bay),IndicatorFeature(October),IndicatorFeature(31),IndicatorFeature(1989),IndicatorFeature(interest),IndicatorFeature(rates),IndicatorFeature(below),IndicatorFeature(general),IndicatorFeature(levels),WordFeature(-CAPS-al,'Class),IndicatorFeature(9),IndicatorFeature(8),IndicatorFeature(low),IndicatorFeature(7\/8),IndicatorFeature(bid),IndicatorFeature(traded),IndicatorFeature(Inc),IndicatorFeature(7),IndicatorFeature(3\/4),IndicatorFeature(brokers),IndicatorFeature(exchange),WordFeature(-CAPS-er,'Class),IndicatorFeature(General),IndicatorFeature(60),IndicatorFeature(150),IndicatorFeature(notes),IndicatorFeature(dealers),IndicatorFeature(Average),IndicatorFeature(unit),IndicatorFeature(credit),IndicatorFeature(5\/8),IndicatorFeature(3\/8),WordFeature(-CAPS-ed,'Class),IndicatorFeature(dollar),IndicatorFeature(auction),IndicatorFeature(bills),IndicatorFeature(face),IndicatorFeature(value),IndicatorFeature(units),IndicatorFeature(13),IndicatorFeature(weeks),IndicatorFeature(mortgage),IndicatorFeature(2),WordFeature(-CAPS-ion,'Class),IndicatorFeature(priced),IndicatorFeature(return),IndicatorFeature(returns),IndicatorFeature(product),IndicatorFeature(rose),IndicatorFeature(August),IndicatorFeature(result),IndicatorFeature(year-earlier),IndicatorFeature(total),IndicatorFeature(goods),IndicatorFeature(services),IndicatorFeature(July),IndicatorFeature(index),IndicatorFeature(September),IndicatorFeature(decline),IndicatorFeature(industry),IndicatorFeature(Dow),IndicatorFeature(Jones),IndicatorFeature(acquired),IndicatorFeature(Financial),IndicatorFeature(loan),IndicatorFeature(losses),IndicatorFeature(construction),IndicatorFeature(caused),IndicatorFeature(previous),IndicatorFeature(sale),IndicatorFeature(asked),IndicatorFeature(British),IndicatorFeature(Jaguar),IndicatorFeature(PLC),IndicatorFeature(House),IndicatorFeature(best),IndicatorFeature(quickly),IndicatorFeature(possible),IndicatorFeature(leading),IndicatorFeature(Ford),IndicatorFeature(trying),IndicatorFeature(GM),IndicatorFeature(joint),IndicatorFeature(stake),IndicatorFeature(action),IndicatorFeature(management),IndicatorFeature(deal),IndicatorFeature(European),IndicatorFeature(analyst),IndicatorFeature(Stock),IndicatorFeature(gain),IndicatorFeature(heavy),IndicatorFeature(volume),IndicatorFeature(closed),IndicatorFeature(Analysts),IndicatorFeature(#),IndicatorFeature(Sept.),IndicatorFeature(independent),IndicatorFeature(wants),IndicatorFeature(huge),IndicatorFeature(Institute),IndicatorFeature(talks),IndicatorFeature(agreed),IndicatorFeature(either),IndicatorFeature(start),IndicatorFeature(statement),IndicatorFeature(board),IndicatorFeature(holders),IndicatorFeature(meeting),IndicatorFeature(noted),IndicatorFeature(call),IndicatorFeature(declined),IndicatorFeature(our),IndicatorFeature(further),IndicatorFeature(Although),IndicatorFeature(yet),IndicatorFeature(long-term),IndicatorFeature(taking),IndicatorFeature(risk),IndicatorFeature(administration),IndicatorFeature(done),IndicatorFeature(economy),IndicatorFeature(Mrs.),IndicatorFeature(America),IndicatorFeature(revenue),IndicatorFeature(William),IndicatorFeature(additional),IndicatorFeature(chairman),IndicatorFeature(parent),IndicatorFeature(due),IndicatorFeature(parts),IndicatorFeature(D.),IndicatorFeature(J.),IndicatorFeature(nine),IndicatorFeature(ended),IndicatorFeature(net),IndicatorFeature(loss),IndicatorFeature(compared),IndicatorFeature(national),IndicatorFeature(Sales),IndicatorFeature(fell),IndicatorFeature(earlier),IndicatorFeature(ahead),IndicatorFeature(12),IndicatorFeature(showed),IndicatorFeature(Tokyo),IndicatorFeature(points),IndicatorFeature(First),IndicatorFeature(estimated),IndicatorFeature(failed),IndicatorFeature(bought),IndicatorFeature(official),IndicatorFeature(profits),IndicatorFeature(despite),IndicatorFeature(political),IndicatorFeature(14),IndicatorFeature(inflation),IndicatorFeature(consumer),IndicatorFeature(prices),IndicatorFeature(economic),IndicatorFeature(oil),IndicatorFeature(although),IndicatorFeature(supply),IndicatorFeature(day),IndicatorFeature(gains),IndicatorFeature(traders),IndicatorFeature(40),IndicatorFeature(gained),IndicatorFeature(turned),IndicatorFeature(news),IndicatorFeature(Friday),IndicatorFeature(yen),IndicatorFeature(Wall),IndicatorFeature(Street),IndicatorFeature(makers),IndicatorFeature(scheduled),IndicatorFeature(currency),IndicatorFeature(helped),IndicatorFeature(health),IndicatorFeature(Hong),IndicatorFeature(Kong),IndicatorFeature(Morgan),IndicatorFeature(Capital),IndicatorFeature(To),IndicatorFeature(100),IndicatorFeature(percentage),IndicatorFeature(food),IndicatorFeature(rise),IndicatorFeature(above),IndicatorFeature(strong),IndicatorFeature(orders),IndicatorFeature(measure),IndicatorFeature(name),IndicatorFeature(5),IndicatorFeature(Board),IndicatorFeature(increase),IndicatorFeature(terms),IndicatorFeature(slightly),IndicatorFeature(Treasury),IndicatorFeature(bill),IndicatorFeature(considered),IndicatorFeature(increases),IndicatorFeature(Among),IndicatorFeature(range),IndicatorFeature(San),IndicatorFeature(Francisco),IndicatorFeature(my),IndicatorFeature(go),IndicatorFeature(kind),IndicatorFeature(500),IndicatorFeature(6),IndicatorFeature(began),IndicatorFeature(came),IndicatorFeature(hold),IndicatorFeature(turn),IndicatorFeature(view),IndicatorFeature(your),IndicatorFeature(coming),IndicatorFeature(doing),IndicatorFeature(whose),WordFeature(-LC-NUM-DASH-s,'Class),IndicatorFeature(where),IndicatorFeature(aid),IndicatorFeature(included),IndicatorFeature(efforts),IndicatorFeature(fall),IndicatorFeature(saying),IndicatorFeature(damage),IndicatorFeature(drop),IndicatorFeature(reduce),IndicatorFeature(Fed),IndicatorFeature(Chairman),IndicatorFeature(Rep.),IndicatorFeature(instead),IndicatorFeature(spokesman),IndicatorFeature(German),IndicatorFeature(trade),IndicatorFeature(firm),IndicatorFeature(Michael),IndicatorFeature(partner),IndicatorFeature(defense),IndicatorFeature(raised),IndicatorFeature(potential),IndicatorFeature(leader),IndicatorFeature(seem),IndicatorFeature(Airlines),IndicatorFeature(1988),IndicatorFeature(nearly),IndicatorFeature(Pacific),IndicatorFeature(area),IndicatorFeature(gold),IndicatorFeature(growing),IndicatorFeature(income),IndicatorFeature(By),IndicatorFeature(results),IndicatorFeature(given),IndicatorFeature(Dec.),IndicatorFeature(quarter),IndicatorFeature(approved),IndicatorFeature(18),IndicatorFeature(airline),IndicatorFeature(network),IndicatorFeature(Drexel),IndicatorFeature(leaders),IndicatorFeature(Senate),IndicatorFeature(program),IndicatorFeature(includes),IndicatorFeature(history),IndicatorFeature(labor),IndicatorFeature(small),IndicatorFeature(meet),IndicatorFeature(job),IndicatorFeature(toward),IndicatorFeature(tax),IndicatorFeature(thought),IndicatorFeature(Industries),WordFeature(-INITC-NUM-DASH,'Class),IndicatorFeature(debt),IndicatorFeature(paper),IndicatorFeature(situation),IndicatorFeature(manufacturing),IndicatorFeature(profit),IndicatorFeature(planned),IndicatorFeature(composite),IndicatorFeature(fourth),IndicatorFeature(particularly),IndicatorFeature(reduced),IndicatorFeature(found),IndicatorFeature(All),IndicatorFeature(Boston),IndicatorFeature(settlement),IndicatorFeature(charges),IndicatorFeature(customers),IndicatorFeature(computers),IndicatorFeature(East),IndicatorFeature(system),IndicatorFeature(1986),IndicatorFeature(lines),IndicatorFeature(legal),IndicatorFeature(took),IndicatorFeature(Corp),WordFeature(-CAPS-NUM,'Class),IndicatorFeature(cost),IndicatorFeature(concerns),IndicatorFeature(she),IndicatorFeature(Last),IndicatorFeature(Group),IndicatorFeature(amount),IndicatorFeature(deficit),IndicatorFeature(issued),IndicatorFeature(Trust),IndicatorFeature(spending),IndicatorFeature(bad),IndicatorFeature(big),IndicatorFeature(question),IndicatorFeature(city),IndicatorFeature('ve),IndicatorFeature(house),IndicatorFeature('ll),IndicatorFeature(attorney),IndicatorFeature(dividend),IndicatorFeature(16),WordFeature(-CAPS-est,'Class),IndicatorFeature(payments),IndicatorFeature(trust),IndicatorFeature(portfolio),IndicatorFeature(note),IndicatorFeature(addition),IndicatorFeature(Judge),IndicatorFeature(judge),IndicatorFeature(steel),IndicatorFeature(court),IndicatorFeature(find),IndicatorFeature(areas),IndicatorFeature(clients),IndicatorFeature(outside),IndicatorFeature(Court),IndicatorFeature(...),IndicatorFeature(hours),IndicatorFeature(filing),IndicatorFeature(filed),IndicatorFeature(Union),IndicatorFeature(earthquake),IndicatorFeature(private),IndicatorFeature(1\/4),IndicatorFeature(S&P),IndicatorFeature(Merrill),IndicatorFeature(Lynch),WordFeature(-INITC-est,'Class),IndicatorFeature(via),IndicatorFeature(200),IndicatorFeature(marks),IndicatorFeature(Revenue),IndicatorFeature(taxes),IndicatorFeature(creditors),IndicatorFeature(Since),IndicatorFeature(strategy),IndicatorFeature(Canadian),IndicatorFeature(property),IndicatorFeature(IBM),IndicatorFeature(Business),IndicatorFeature(place),IndicatorFeature(needed),IndicatorFeature(jumped),IndicatorFeature(project),IndicatorFeature(Warner),IndicatorFeature(CBS),IndicatorFeature(committee),IndicatorFeature(advertising),IndicatorFeature(ad),IndicatorFeature(`),IndicatorFeature(campaign),IndicatorFeature(stores),WordFeature(-CAPS-ing,'Class),WordFeature(-CAPS-NUM-DASH,'Class),IndicatorFeature(pilots),IndicatorFeature(estimate),IndicatorFeature(calls),IndicatorFeature(union),IndicatorFeature(drug),IndicatorFeature(important),IndicatorFeature(adds),IndicatorFeature(eight),WordFeature(-INITC-NUM,'Class),IndicatorFeature(George),IndicatorFeature(groups),IndicatorFeature(conference),IndicatorFeature(looking),IndicatorFeature(TV),IndicatorFeature(quake),IndicatorFeature(posted),IndicatorFeature(related),IndicatorFeature(Sen.),IndicatorFeature(22),IndicatorFeature(Wednesday),IndicatorFeature(reserves),IndicatorFeature(restructuring),IndicatorFeature(buyers),IndicatorFeature(buy-out),IndicatorFeature(Shearson),IndicatorFeature(UAL),IndicatorFeature(francs),IndicatorFeature(1\/8),WordFeature(-INITC-DASH-s,'Class),IndicatorFeature(junk),IndicatorFeature(study),IndicatorFeature(Thursday),IndicatorFeature(abortion),WordFeature(-CAPS-NUM-DASH-s,'Class),IndicatorFeature(Noriega),WordFeature(-CAPS-ly,'Class),WordFeature(-INITC-NUM-DASH-s,'Class),WordFeature(-CAPS-ity,'Class),WordFeature(-INITC-NUM-s,'Class))
[error] java.lang.AssertionError: assertion failed: Vector(WordFeature(-LC-NUM-ion,'Class)) Index(IndicatorFeature(In),IndicatorFeature(an),IndicatorFeature(Oct.),WordFeature(-LC-NUM,'Class),WordFeature(-LC,'Class),IndicatorFeature(of),IndicatorFeature(``),IndicatorFeature(The),WordFeature(-INITC,'Class),IndicatorFeature(''),IndicatorFeature(at),IndicatorFeature(Chicago),IndicatorFeature('s),IndicatorFeature((),WordFeature(-INITC-ed,'Class),WordFeature(-INITC-s,'Class),IndicatorFeature(the),IndicatorFeature(in),WordFeature(-INITC-y,'Class),IndicatorFeature(City),IndicatorFeature(,),IndicatorFeature(&),IndicatorFeature()),IndicatorFeature(role),WordFeature(-LC-ed,'Class),IndicatorFeature(by),IndicatorFeature(was),WordFeature(-LC-ly,'Class),IndicatorFeature(to),IndicatorFeature(.),IndicatorFeature(Ms.),WordFeature(-LC-s,'Class),WordFeature(-CAPS-DASH,'Class),IndicatorFeature(Inc.),IndicatorFeature(said),IndicatorFeature(it),IndicatorFeature(expects),IndicatorFeature(its),IndicatorFeature(U.S.),IndicatorFeature(sales),IndicatorFeature(remain),WordFeature(-LC-y,'Class),IndicatorFeature(about),IndicatorFeature(cars),IndicatorFeature(1990),IndicatorFeature(auto),IndicatorFeature(maker),IndicatorFeature(last),IndicatorFeature(year),IndicatorFeature(sold),WordFeature(-INITC-er,'Class),IndicatorFeature(president),IndicatorFeature(and),IndicatorFeature(chief),IndicatorFeature(executive),IndicatorFeature(officer),IndicatorFeature(he),IndicatorFeature(growth),IndicatorFeature(for),IndicatorFeature(Britain),IndicatorFeature(Europe),IndicatorFeature(Eastern),IndicatorFeature(markets),WordFeature(-CAPS,'Class),WordFeature(-CAPS-s,'Class),IndicatorFeature(increased),IndicatorFeature(10),IndicatorFeature(cents),IndicatorFeature(from),IndicatorFeature(seven),IndicatorFeature(a),IndicatorFeature(share),IndicatorFeature(new),IndicatorFeature(rate),IndicatorFeature(will),IndicatorFeature(be),IndicatorFeature(15),IndicatorFeature(A),IndicatorFeature(record),IndicatorFeature(has),IndicatorFeature(n't),IndicatorFeature(been),IndicatorFeature(set),IndicatorFeature(based),IndicatorFeature(Los),IndicatorFeature(Angeles),IndicatorFeature(makes),IndicatorFeature(computer),IndicatorFeature(building),IndicatorFeature(products),IndicatorFeature(Investors),IndicatorFeature(are),WordFeature(-LC-ing,'Class),IndicatorFeature(Securities),IndicatorFeature(Exchange),IndicatorFeature(Commission),IndicatorFeature(not),IndicatorFeature(their),IndicatorFeature(information),IndicatorFeature(stock),IndicatorFeature(corporate),IndicatorFeature(proposal),IndicatorFeature(some),IndicatorFeature(company),IndicatorFeature(executives),IndicatorFeature(would),IndicatorFeature(on),WordFeature(-LC-er,'Class),IndicatorFeature(as),WordFeature(-LC-DASH,'Class),IndicatorFeature(individual),IndicatorFeature(investors),WordFeature(-LC-al,'Class),IndicatorFeature(money),IndicatorFeature(managers),IndicatorFeature(They),IndicatorFeature(make),IndicatorFeature(agency),IndicatorFeature(changes),IndicatorFeature(proposed),IndicatorFeature(this),IndicatorFeature(past),IndicatorFeature(that),IndicatorFeature(among),IndicatorFeature(other),IndicatorFeature(things),IndicatorFeature(many),IndicatorFeature(own),IndicatorFeature(companies),IndicatorFeature('),IndicatorFeature(shares),IndicatorFeature(also),IndicatorFeature(allow),IndicatorFeature(report),IndicatorFeature(options),IndicatorFeature(later),IndicatorFeature(less),IndicatorFeature(often),IndicatorFeature(Many),IndicatorFeature(investor),IndicatorFeature(so),IndicatorFeature(1987),IndicatorFeature(market),IndicatorFeature(--),IndicatorFeature(already),IndicatorFeature(against),IndicatorFeature(little),IndicatorFeature(any),IndicatorFeature(might),IndicatorFeature(get),IndicatorFeature(out),IndicatorFeature(stocks),IndicatorFeature(paid),IndicatorFeature(level),IndicatorFeature(one),IndicatorFeature(received),IndicatorFeature(since),IndicatorFeature(were),IndicatorFeature(17),WordFeature(-INITC-ly,'Class),WordFeature(-LC-ion,'Class),IndicatorFeature(did),IndicatorFeature(really),IndicatorFeature(believe),IndicatorFeature(rules),IndicatorFeature(force),IndicatorFeature(directors),IndicatorFeature(within),IndicatorFeature(month),IndicatorFeature(after),IndicatorFeature(transaction),IndicatorFeature(But),IndicatorFeature(25),IndicatorFeature(%),IndicatorFeature(according),IndicatorFeature(figures),IndicatorFeature(reports),IndicatorFeature(late),IndicatorFeature(effort),IndicatorFeature(federal),IndicatorFeature(boost),IndicatorFeature(who),IndicatorFeature(special),IndicatorFeature(office),IndicatorFeature(policy),IndicatorFeature(which),IndicatorFeature(officials),IndicatorFeature(had),IndicatorFeature(until),IndicatorFeature(today),IndicatorFeature(comment),IndicatorFeature(issue),IndicatorFeature(more),IndicatorFeature(than),IndicatorFeature(almost),IndicatorFeature(Mr.),IndicatorFeature(probably),IndicatorFeature(vote),IndicatorFeature(early),IndicatorFeature(next),IndicatorFeature(all),IndicatorFeature(those),IndicatorFeature(Committee),IndicatorFeature(Federal),WordFeature(-INITC-ion,'Class),IndicatorFeature(American),IndicatorFeature(Association),IndicatorFeature(example),IndicatorFeature({),IndicatorFeature(law),IndicatorFeature(}),IndicatorFeature(business),IndicatorFeature(What),IndicatorFeature(most),IndicatorFeature(is),IndicatorFeature(effect),IndicatorFeature(they),IndicatorFeature(say),IndicatorFeature(have),WordFeature(-LC-ity,'Class),IndicatorFeature(trading),IndicatorFeature(activity),IndicatorFeature(buying),IndicatorFeature(or),IndicatorFeature(selling),IndicatorFeature(director),IndicatorFeature(short),IndicatorFeature(period),IndicatorFeature(time),WordFeature(-INITC-ing,'Class),IndicatorFeature(estimates),IndicatorFeature(cut),IndicatorFeature(third),IndicatorFeature(such),IndicatorFeature(marketing),IndicatorFeature(finance),IndicatorFeature(research),IndicatorFeature(development),IndicatorFeature(still),IndicatorFeature(required),IndicatorFeature(annual),IndicatorFeature(under),IndicatorFeature(least),IndicatorFeature(if),IndicatorFeature(following),IndicatorFeature(Robert),IndicatorFeature(North),IndicatorFeature(data),IndicatorFeature(key),IndicatorFeature(may),IndicatorFeature(while),IndicatorFeature(them),IndicatorFeature(when),IndicatorFeature(do),IndicatorFeature(want),IndicatorFeature(change),IndicatorFeature(should),IndicatorFeature(Congress),IndicatorFeature(added),IndicatorFeature(likely),IndicatorFeature(legislation),IndicatorFeature(basis),IndicatorFeature(nation),IndicatorFeature(largest),IndicatorFeature(fund),IndicatorFeature($),IndicatorFeature(billion),IndicatorFeature(employees),IndicatorFeature(plans),IndicatorFeature(offer),IndicatorFeature(two),IndicatorFeature(investment),IndicatorFeature(million),WordFeature(-INITC-ity,'Class),IndicatorFeature(bond),WordFeature(-LC-est,'Class),IndicatorFeature(Both),IndicatorFeature(funds),IndicatorFeature(expected),IndicatorFeature(begin),IndicatorFeature(around),IndicatorFeature(March),IndicatorFeature(1),IndicatorFeature(subject),IndicatorFeature(approval),IndicatorFeature(For),IndicatorFeature(up),IndicatorFeature(must),IndicatorFeature(plan),IndicatorFeature(Some),IndicatorFeature(institutions),IndicatorFeature(part),IndicatorFeature(agreement),IndicatorFeature(pressure),IndicatorFeature(provide),IndicatorFeature(reached),IndicatorFeature(with),IndicatorFeature(December),IndicatorFeature(securities),IndicatorFeature(South),IndicatorFeature(power),IndicatorFeature(cases),IndicatorFeature(investments),IndicatorFeature(significant),IndicatorFeature(going),IndicatorFeature(into),IndicatorFeature(bonds),IndicatorFeature(including),IndicatorFeature(much),IndicatorFeature(foreign),IndicatorFeature(buy),IndicatorFeature(sell),IndicatorFeature(futures),IndicatorFeature(contracts),IndicatorFeature(New),IndicatorFeature(York),IndicatorFeature(State),IndicatorFeature(Department),IndicatorFeature(Under),IndicatorFeature(able),IndicatorFeature(receive),IndicatorFeature(cash),IndicatorFeature(offered),IndicatorFeature(currently),IndicatorFeature(limited),IndicatorFeature(Co.),IndicatorFeature(equipment),IndicatorFeature(shareholders),IndicatorFeature(right),IndicatorFeature(purchase),IndicatorFeature(half),IndicatorFeature(price),IndicatorFeature(certain),IndicatorFeature(takeover),IndicatorFeature(years),IndicatorFeature(old),IndicatorFeature(senior),IndicatorFeature(vice),IndicatorFeature(concern),IndicatorFeature(technology),IndicatorFeature(group),IndicatorFeature(position),IndicatorFeature(work),IndicatorFeature(lot),IndicatorFeature(because),IndicatorFeature(taken),IndicatorFeature(hard),IndicatorFeature(line),IndicatorFeature(problem),IndicatorFeature(:),IndicatorFeature(He),IndicatorFeature(does),IndicatorFeature(same),IndicatorFeature(over),IndicatorFeature(again),IndicatorFeature(Richard),IndicatorFeature(ago),IndicatorFeature(very),IndicatorFeature(like),IndicatorFeature(dropped),IndicatorFeature(though),IndicatorFeature(now),IndicatorFeature(show),IndicatorFeature(few),IndicatorFeature(thing),WordFeature(-INITC-DASH,'Class),IndicatorFeature(you),IndicatorFeature(ca),IndicatorFeature(his),IndicatorFeature(commercial),IndicatorFeature(what),IndicatorFeature(His),IndicatorFeature(recent),WordFeature(-INITC-al,'Class),IndicatorFeature(case),IndicatorFeature(point),IndicatorFeature(It),IndicatorFeature(black),IndicatorFeature(suit),IndicatorFeature(announced),IndicatorFeature(just),IndicatorFeature(family),IndicatorFeature(her),WordFeature(-CAPS-DASH-s,'Class),IndicatorFeature(no),IndicatorFeature(could),IndicatorFeature(well),IndicatorFeature(second),IndicatorFeature(And),IndicatorFeature(went),IndicatorFeature(through),IndicatorFeature(first),IndicatorFeature(longer),IndicatorFeature(five),IndicatorFeature(An),IndicatorFeature(beginning),IndicatorFeature(lead),IndicatorFeature(high),IndicatorFeature(off),IndicatorFeature(Air),IndicatorFeature(him),IndicatorFeature(great),IndicatorFeature(then),IndicatorFeature(way),IndicatorFeature(end),IndicatorFeature(however),IndicatorFeature(ever),IndicatorFeature(but),IndicatorFeature(too),IndicatorFeature(away),IndicatorFeature(That),IndicatorFeature(before),IndicatorFeature(during),IndicatorFeature(?),IndicatorFeature(take),IndicatorFeature(us),IndicatorFeature(11),IndicatorFeature(1\/2),IndicatorFeature(without),IndicatorFeature(having),IndicatorFeature(future),IndicatorFeature(can),IndicatorFeature(only),IndicatorFeature(performance),IndicatorFeature(One),IndicatorFeature(President),IndicatorFeature(international),IndicatorFeature(fact),IndicatorFeature(world),IndicatorFeature(This),IndicatorFeature(members),IndicatorFeature(United),IndicatorFeature(Now),IndicatorFeature(Bush),IndicatorFeature(decision),IndicatorFeature(we),IndicatorFeature(think),IndicatorFeature(along),IndicatorFeature(left),IndicatorFeature(financial),IndicatorFeature(top),IndicatorFeature(got),IndicatorFeature(personal),IndicatorFeature(several),IndicatorFeature(plants),IndicatorFeature(France),IndicatorFeature(sent),IndicatorFeature(even),IndicatorFeature(projects),IndicatorFeature(continue),IndicatorFeature(International),IndicatorFeature(means),IndicatorFeature(West),IndicatorFeature(pay),IndicatorFeature(World),IndicatorFeature(give),IndicatorFeature(government),IndicatorFeature(rights),IndicatorFeature(;),IndicatorFeature(held),IndicatorFeature(support),IndicatorFeature(impact),IndicatorFeature(Soviet),IndicatorFeature(holding),IndicatorFeature(seems),IndicatorFeature(current),IndicatorFeature(public),IndicatorFeature(charge),IndicatorFeature(programs),IndicatorFeature(former),IndicatorFeature(head),IndicatorFeature(military),WordFeature(-LC-DASH-s,'Class),IndicatorFeature(division),IndicatorFeature(staff),IndicatorFeature(working),IndicatorFeature(once),IndicatorFeature(budget),IndicatorFeature(changed),IndicatorFeature(John),IndicatorFeature(state),IndicatorFeature(told),IndicatorFeature(soon),IndicatorFeature(week),IndicatorFeature(raise),WordFeature(-LC-NUM-DASH,'Class),IndicatorFeature(French),IndicatorFeature(owns),IndicatorFeature(Other),IndicatorFeature(countries),IndicatorFeature(Germany),IndicatorFeature(continued),IndicatorFeature(We),IndicatorFeature(see),IndicatorFeature(free),IndicatorFeature(course),IndicatorFeature(made),IndicatorFeature(largely),IndicatorFeature(these),IndicatorFeature(home),IndicatorFeature(times),IndicatorFeature(there),IndicatorFeature(reason),IndicatorFeature(Western),IndicatorFeature(Systems),IndicatorFeature(number),IndicatorFeature(plant),IndicatorFeature(production),IndicatorFeature(itself),IndicatorFeature(another),IndicatorFeature(known),IndicatorFeature(similar),IndicatorFeature(seen),IndicatorFeature(especially),IndicatorFeature(subsidiary),IndicatorFeature(producers),IndicatorFeature(On),IndicatorFeature(process),IndicatorFeature(each),IndicatorFeature(making),IndicatorFeature(using),WordFeature(-LC-NUM-s,'Class),IndicatorFeature(20),IndicatorFeature(majority),IndicatorFeature(difficult),IndicatorFeature(China),IndicatorFeature(workers),IndicatorFeature(country),IndicatorFeature(At),IndicatorFeature(major),IndicatorFeature(Canada),IndicatorFeature(University),IndicatorFeature(California),IndicatorFeature(every),IndicatorFeature(hurt),IndicatorFeature(order),IndicatorFeature(large),IndicatorFeature(enough),IndicatorFeature(produce),IndicatorFeature(yield),IndicatorFeature(30),IndicatorFeature(used),IndicatorFeature(include),IndicatorFeature(need),IndicatorFeature(costs),IndicatorFeature(demand),IndicatorFeature(Co),IndicatorFeature(problems),IndicatorFeature(However),IndicatorFeature(Paul),IndicatorFeature(Corp.),IndicatorFeature(Calif.),IndicatorFeature(called),IndicatorFeature(remains),IndicatorFeature(active),IndicatorFeature(days),IndicatorFeature(rather),IndicatorFeature(There),IndicatorFeature(acquire),IndicatorFeature(try),IndicatorFeature(between),IndicatorFeature(Bank),IndicatorFeature(Japan),IndicatorFeature('re),IndicatorFeature(being),IndicatorFeature(latest),IndicatorFeature(50),IndicatorFeature(Japanese),IndicatorFeature(banks),IndicatorFeature(led),IndicatorFeature(help),IndicatorFeature(loans),IndicatorFeature(bank),IndicatorFeature(owned),IndicatorFeature(put),IndicatorFeature(financing),IndicatorFeature(biggest),IndicatorFeature(far),IndicatorFeature(domestic),IndicatorFeature(issues),IndicatorFeature(3),IndicatorFeature(dollars),IndicatorFeature(4),IndicatorFeature(come),IndicatorFeature(fiscal),IndicatorFeature(June),IndicatorFeature(With),IndicatorFeature(here),IndicatorFeature(David),IndicatorFeature(Tuesday),IndicatorFeature(Ltd.),IndicatorFeature(National),IndicatorFeature(I),IndicatorFeature(interests),IndicatorFeature(banking),IndicatorFeature(lower),IndicatorFeature(months),IndicatorFeature(Meanwhile),IndicatorFeature(smaller),IndicatorFeature(seeking),IndicatorFeature(open),IndicatorFeature(full),IndicatorFeature(competition),IndicatorFeature(capital),IndicatorFeature(run),IndicatorFeature(both),IndicatorFeature(businesses),IndicatorFeature(started),IndicatorFeature(venture),IndicatorFeature(use),IndicatorFeature(accounts),IndicatorFeature(account),IndicatorFeature(London),IndicatorFeature(move),IndicatorFeature(brokerage),IndicatorFeature(firms),IndicatorFeature(simply),IndicatorFeature(reported),IndicatorFeature(third-quarter),IndicatorFeature(earnings),IndicatorFeature(yesterday),IndicatorFeature(average),IndicatorFeature(analysts),IndicatorFeature(higher),IndicatorFeature(named),IndicatorFeature(industrial),IndicatorFeature(systems),IndicatorFeature(A.),IndicatorFeature(Monday),IndicatorFeature(equity),IndicatorFeature(As),IndicatorFeature(manager),IndicatorFeature(different),IndicatorFeature(four),IndicatorFeature(Series),IndicatorFeature(says),IndicatorFeature(long),IndicatorFeature(good),IndicatorFeature(television),IndicatorFeature(contract),IndicatorFeature(near),IndicatorFeature(down),IndicatorFeature(involved),IndicatorFeature(always),IndicatorFeature(You),IndicatorFeature(hit),IndicatorFeature(lawyers),IndicatorFeature(real),IndicatorFeature(estate),IndicatorFeature(better),IndicatorFeature(never),IndicatorFeature(car),IndicatorFeature(six),IndicatorFeature(James),IndicatorFeature(When),IndicatorFeature(offering),IndicatorFeature(available),IndicatorFeature(life),IndicatorFeature(back),IndicatorFeature(clear),IndicatorFeature(themselves),IndicatorFeature(something),IndicatorFeature(close),IndicatorFeature(Washington),IndicatorFeature(claims),IndicatorFeature(insurance),IndicatorFeature(Texas),IndicatorFeature(team),IndicatorFeature(me),IndicatorFeature(White),WordFeature(-CAPS-y,'Class),IndicatorFeature(Big),IndicatorFeature(These),IndicatorFeature(While),IndicatorFeature(getting),IndicatorFeature(man),IndicatorFeature(magazine),IndicatorFeature(people),IndicatorFeature(Most),IndicatorFeature(others),IndicatorFeature(recently),IndicatorFeature('m),IndicatorFeature(look),IndicatorFeature(whether),IndicatorFeature(So),IndicatorFeature(base),IndicatorFeature(Still),IndicatorFeature(how),IndicatorFeature(expect),IndicatorFeature(three),IndicatorFeature(Even),IndicatorFeature(If),IndicatorFeature(know),IndicatorFeature(wo),IndicatorFeature(lost),IndicatorFeature(become),IndicatorFeature(control),IndicatorFeature(keep),IndicatorFeature(After),IndicatorFeature(disclosed),IndicatorFeature(operations),IndicatorFeature(local),IndicatorFeature(service),IndicatorFeature(states),IndicatorFeature(build),IndicatorFeature(gas),IndicatorFeature(acquisition),IndicatorFeature(April),IndicatorFeature(completed),IndicatorFeature(common),IndicatorFeature(outstanding),IndicatorFeature(assets),IndicatorFeature(department),IndicatorFeature(Nov.),IndicatorFeature(previously),IndicatorFeature(operating),IndicatorFeature(retail),IndicatorFeature(Bay),IndicatorFeature(October),IndicatorFeature(31),IndicatorFeature(1989),IndicatorFeature(interest),IndicatorFeature(rates),IndicatorFeature(below),IndicatorFeature(general),IndicatorFeature(levels),WordFeature(-CAPS-al,'Class),IndicatorFeature(9),IndicatorFeature(8),IndicatorFeature(low),IndicatorFeature(7\/8),IndicatorFeature(bid),IndicatorFeature(traded),IndicatorFeature(Inc),IndicatorFeature(7),IndicatorFeature(3\/4),IndicatorFeature(brokers),IndicatorFeature(exchange),WordFeature(-CAPS-er,'Class),IndicatorFeature(General),IndicatorFeature(60),IndicatorFeature(150),IndicatorFeature(notes),IndicatorFeature(dealers),IndicatorFeature(Average),IndicatorFeature(unit),IndicatorFeature(credit),IndicatorFeature(5\/8),IndicatorFeature(3\/8),WordFeature(-CAPS-ed,'Class),IndicatorFeature(dollar),IndicatorFeature(auction),IndicatorFeature(bills),IndicatorFeature(face),IndicatorFeature(value),IndicatorFeature(units),IndicatorFeature(13),IndicatorFeature(weeks),IndicatorFeature(mortgage),IndicatorFeature(2),WordFeature(-CAPS-ion,'Class),IndicatorFeature(priced),IndicatorFeature(return),IndicatorFeature(returns),IndicatorFeature(product),IndicatorFeature(rose),IndicatorFeature(August),IndicatorFeature(result),IndicatorFeature(year-earlier),IndicatorFeature(total),IndicatorFeature(goods),IndicatorFeature(services),IndicatorFeature(July),IndicatorFeature(index),IndicatorFeature(September),IndicatorFeature(decline),IndicatorFeature(industry),IndicatorFeature(Dow),IndicatorFeature(Jones),IndicatorFeature(acquired),IndicatorFeature(Financial),IndicatorFeature(loan),IndicatorFeature(losses),IndicatorFeature(construction),IndicatorFeature(caused),IndicatorFeature(previous),IndicatorFeature(sale),IndicatorFeature(asked),IndicatorFeature(British),IndicatorFeature(Jaguar),IndicatorFeature(PLC),IndicatorFeature(House),IndicatorFeature(best),IndicatorFeature(quickly),IndicatorFeature(possible),IndicatorFeature(leading),IndicatorFeature(Ford),IndicatorFeature(trying),IndicatorFeature(GM),IndicatorFeature(joint),IndicatorFeature(stake),IndicatorFeature(action),IndicatorFeature(management),IndicatorFeature(deal),IndicatorFeature(European),IndicatorFeature(analyst),IndicatorFeature(Stock),IndicatorFeature(gain),IndicatorFeature(heavy),IndicatorFeature(volume),IndicatorFeature(closed),IndicatorFeature(Analysts),IndicatorFeature(#),IndicatorFeature(Sept.),IndicatorFeature(independent),IndicatorFeature(wants),IndicatorFeature(huge),IndicatorFeature(Institute),IndicatorFeature(talks),IndicatorFeature(agreed),IndicatorFeature(either),IndicatorFeature(start),IndicatorFeature(statement),IndicatorFeature(board),IndicatorFeature(holders),IndicatorFeature(meeting),IndicatorFeature(noted),IndicatorFeature(call),IndicatorFeature(declined),IndicatorFeature(our),IndicatorFeature(further),IndicatorFeature(Although),IndicatorFeature(yet),IndicatorFeature(long-term),IndicatorFeature(taking),IndicatorFeature(risk),IndicatorFeature(administration),IndicatorFeature(done),IndicatorFeature(economy),IndicatorFeature(Mrs.),IndicatorFeature(America),IndicatorFeature(revenue),IndicatorFeature(William),IndicatorFeature(additional),IndicatorFeature(chairman),IndicatorFeature(parent),IndicatorFeature(due),IndicatorFeature(parts),IndicatorFeature(D.),IndicatorFeature(J.),IndicatorFeature(nine),IndicatorFeature(ended),IndicatorFeature(net),IndicatorFeature(loss),IndicatorFeature(compared),IndicatorFeature(national),IndicatorFeature(Sales),IndicatorFeature(fell),IndicatorFeature(earlier),IndicatorFeature(ahead),IndicatorFeature(12),IndicatorFeature(showed),IndicatorFeature(Tokyo),IndicatorFeature(points),IndicatorFeature(First),IndicatorFeature(estimated),IndicatorFeature(failed),IndicatorFeature(bought),IndicatorFeature(official),IndicatorFeature(profits),IndicatorFeature(despite),IndicatorFeature(political),IndicatorFeature(14),IndicatorFeature(inflation),IndicatorFeature(consumer),IndicatorFeature(prices),IndicatorFeature(economic),IndicatorFeature(oil),IndicatorFeature(although),IndicatorFeature(supply),IndicatorFeature(day),IndicatorFeature(gains),IndicatorFeature(traders),IndicatorFeature(40),IndicatorFeature(gained),IndicatorFeature(turned),IndicatorFeature(news),IndicatorFeature(Friday),IndicatorFeature(yen),IndicatorFeature(Wall),IndicatorFeature(Street),IndicatorFeature(makers),IndicatorFeature(scheduled),IndicatorFeature(currency),IndicatorFeature(helped),IndicatorFeature(health),IndicatorFeature(Hong),IndicatorFeature(Kong),IndicatorFeature(Morgan),IndicatorFeature(Capital),IndicatorFeature(To),IndicatorFeature(100),IndicatorFeature(percentage),IndicatorFeature(food),IndicatorFeature(rise),IndicatorFeature(above),IndicatorFeature(strong),IndicatorFeature(orders),IndicatorFeature(measure),IndicatorFeature(name),IndicatorFeature(5),IndicatorFeature(Board),IndicatorFeature(increase),IndicatorFeature(terms),IndicatorFeature(slightly),IndicatorFeature(Treasury),IndicatorFeature(bill),IndicatorFeature(considered),IndicatorFeature(increases),IndicatorFeature(Among),IndicatorFeature(range),IndicatorFeature(San),IndicatorFeature(Francisco),IndicatorFeature(my),IndicatorFeature(go),IndicatorFeature(kind),IndicatorFeature(500),IndicatorFeature(6),IndicatorFeature(began),IndicatorFeature(came),IndicatorFeature(hold),IndicatorFeature(turn),IndicatorFeature(view),IndicatorFeature(your),IndicatorFeature(coming),IndicatorFeature(doing),IndicatorFeature(whose),WordFeature(-LC-NUM-DASH-s,'Class),IndicatorFeature(where),IndicatorFeature(aid),IndicatorFeature(included),IndicatorFeature(efforts),IndicatorFeature(fall),IndicatorFeature(saying),IndicatorFeature(damage),IndicatorFeature(drop),IndicatorFeature(reduce),IndicatorFeature(Fed),IndicatorFeature(Chairman),IndicatorFeature(Rep.),IndicatorFeature(instead),IndicatorFeature(spokesman),IndicatorFeature(German),IndicatorFeature(trade),IndicatorFeature(firm),IndicatorFeature(Michael),IndicatorFeature(partner),IndicatorFeature(defense),IndicatorFeature(raised),IndicatorFeature(potential),IndicatorFeature(leader),IndicatorFeature(seem),IndicatorFeature(Airlines),IndicatorFeature(1988),IndicatorFeature(nearly),IndicatorFeature(Pacific),IndicatorFeature(area),IndicatorFeature(gold),IndicatorFeature(growing),IndicatorFeature(income),IndicatorFeature(By),IndicatorFeature(results),IndicatorFeature(given),IndicatorFeature(Dec.),IndicatorFeature(quarter),IndicatorFeature(approved),IndicatorFeature(18),IndicatorFeature(airline),IndicatorFeature(network),IndicatorFeature(Drexel),IndicatorFeature(leaders),IndicatorFeature(Senate),IndicatorFeature(program),IndicatorFeature(includes),IndicatorFeature(history),IndicatorFeature(labor),IndicatorFeature(small),IndicatorFeature(meet),IndicatorFeature(job),IndicatorFeature(toward),IndicatorFeature(tax),IndicatorFeature(thought),IndicatorFeature(Industries),WordFeature(-INITC-NUM-DASH,'Class),IndicatorFeature(debt),IndicatorFeature(paper),IndicatorFeature(situation),IndicatorFeature(manufacturing),IndicatorFeature(profit),IndicatorFeature(planned),IndicatorFeature(composite),IndicatorFeature(fourth),IndicatorFeature(particularly),IndicatorFeature(reduced),IndicatorFeature(found),IndicatorFeature(All),IndicatorFeature(Boston),IndicatorFeature(settlement),IndicatorFeature(charges),IndicatorFeature(customers),IndicatorFeature(computers),IndicatorFeature(East),IndicatorFeature(system),IndicatorFeature(1986),IndicatorFeature(lines),IndicatorFeature(legal),IndicatorFeature(took),IndicatorFeature(Corp),WordFeature(-CAPS-NUM,'Class),IndicatorFeature(cost),IndicatorFeature(concerns),IndicatorFeature(she),IndicatorFeature(Last),IndicatorFeature(Group),IndicatorFeature(amount),IndicatorFeature(deficit),IndicatorFeature(issued),IndicatorFeature(Trust),IndicatorFeature(spending),IndicatorFeature(bad),IndicatorFeature(big),IndicatorFeature(question),IndicatorFeature(city),IndicatorFeature('ve),IndicatorFeature(house),IndicatorFeature('ll),IndicatorFeature(attorney),IndicatorFeature(dividend),IndicatorFeature(16),WordFeature(-CAPS-est,'Class),IndicatorFeature(payments),IndicatorFeature(trust),IndicatorFeature(portfolio),IndicatorFeature(note),IndicatorFeature(addition),IndicatorFeature(Judge),IndicatorFeature(judge),IndicatorFeature(steel),IndicatorFeature(court),IndicatorFeature(find),IndicatorFeature(areas),IndicatorFeature(clients),IndicatorFeature(outside),IndicatorFeature(Court),IndicatorFeature(...),IndicatorFeature(hours),IndicatorFeature(filing),IndicatorFeature(filed),IndicatorFeature(Union),IndicatorFeature(earthquake),IndicatorFeature(private),IndicatorFeature(1\/4),IndicatorFeature(S&P),IndicatorFeature(Merrill),IndicatorFeature(Lynch),WordFeature(-INITC-est,'Class),IndicatorFeature(via),IndicatorFeature(200),IndicatorFeature(marks),IndicatorFeature(Revenue),IndicatorFeature(taxes),IndicatorFeature(creditors),IndicatorFeature(Since),IndicatorFeature(strategy),IndicatorFeature(Canadian),IndicatorFeature(property),IndicatorFeature(IBM),IndicatorFeature(Business),IndicatorFeature(place),IndicatorFeature(needed),IndicatorFeature(jumped),IndicatorFeature(project),IndicatorFeature(Warner),IndicatorFeature(CBS),IndicatorFeature(committee),IndicatorFeature(advertising),IndicatorFeature(ad),IndicatorFeature(`),IndicatorFeature(campaign),IndicatorFeature(stores),WordFeature(-CAPS-ing,'Class),WordFeature(-CAPS-NUM-DASH,'Class),IndicatorFeature(pilots),IndicatorFeature(estimate),IndicatorFeature(calls),IndicatorFeature(union),IndicatorFeature(drug),IndicatorFeature(important),IndicatorFeature(adds),IndicatorFeature(eight),WordFeature(-INITC-NUM,'Class),IndicatorFeature(George),IndicatorFeature(groups),IndicatorFeature(conference),IndicatorFeature(looking),IndicatorFeature(TV),IndicatorFeature(quake),IndicatorFeature(posted),IndicatorFeature(related),IndicatorFeature(Sen.),IndicatorFeature(22),IndicatorFeature(Wednesday),IndicatorFeature(reserves),IndicatorFeature(restructuring),IndicatorFeature(buyers),IndicatorFeature(buy-out),IndicatorFeature(Shearson),IndicatorFeature(UAL),IndicatorFeature(francs),IndicatorFeature(1\/8),WordFeature(-INITC-DASH-s,'Class),IndicatorFeature(junk),IndicatorFeature(study),IndicatorFeature(Thursday),IndicatorFeature(abortion),WordFeature(-CAPS-NUM-DASH-s,'Class),IndicatorFeature(Noriega),WordFeature(-CAPS-ly,'Class),WordFeature(-INITC-NUM-DASH-s,'Class),WordFeature(-CAPS-ity,'Class),WordFeature(-INITC-NUM-s,'Class))
[error] 	at epic.features.IndexedWordFeaturizer$.epic$features$IndexedWordFeaturizer$$stripEncode(IndexedWordFeaturizer.scala:56)
[error] 	at epic.features.IndexedWordFeaturizer$MyWordFeaturizer$$anonfun$1.apply(IndexedWordFeaturizer.scala:39)
[error] 	at epic.features.IndexedWordFeaturizer$MyWordFeaturizer$$anonfun$1.apply(IndexedWordFeaturizer.scala:39)
[error] 	at scala.Array$.tabulate(Array.scala:331)
[error] 	at epic.features.IndexedWordFeaturizer$MyWordFeaturizer.anchor(IndexedWordFeaturizer.scala:39)
[error] 	at epic.sequences.TaggedSequenceModelFactory$IndexedStandardFeaturizer$$anon$2.<init>(CRFModel.scala:207)
[error] 	at epic.sequences.TaggedSequenceModelFactory$IndexedStandardFeaturizer.anchor(CRFModel.scala:205)
[error] 	at epic.sequences.CRFInference$Anchoring.<init>(CRFModel.scala:99)
[error] 	at epic.sequences.CRFInference.anchor(CRFModel.scala:79)
[error] 	at epic.sequences.CRFInference.anchor(CRFModel.scala:58)
[error] 	at epic.sequences.CRF$class.bestSequence(CRF.scala:42)
[error] 	at epic.sequences.CRFInference.bestSequence(CRFModel.scala:58)
[error] 	at scraper$$anonfun$9.apply(main.scala:81)
[error] 	at scraper$$anonfun$9.apply(main.scala:57)
[error] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error] 	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
[error] 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
[error] 	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
[error] 	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[error] 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[error] 	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[error] 	at scraper$.delayedEndpoint$scraper$1(main.scala:57)
[error] 	at scraper$delayedInit$body.apply(main.scala:17)
[error] 	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[error] 	at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] 	at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] 	at scala.collection.immutable.List.foreach(List.scala:392)
[error] 	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
[error] 	at scala.App$class.main(App.scala:76)
[error] 	at scraper$.main(main.scala:17)
[error] 	at scraper.main(main.scala)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] 	at java.lang.reflect.Method.invoke(Method.java:498)
[error] Nonzero exit code: 1
[error] (Compile / run) Nonzero exit code: 1

Dependencies Old! Bug

For the newbie who want to use it, sbt-0.13.11, sbt-assembly-0.14.5 are preferred. Before sbt-0.13.8, sbt is extremely slow. Community has resolved the problem recently by bumping up to 0.13.11.

Pos-en model load failure

Hello. I found something...
pos-en model can't be loaded.
Tryed following code,

val tagger: CRF[AnnotatedLabel,String] = epic.models.PosTagSelector.loadTagger("en").get

and got following error,

Exception in thread "main" java.lang.NullPointerException
    at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:83)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:76)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:90)
    at epic.models.ClassPathModelLoader.load(ModelLoader.scala:21)
    at epic.models.DelegatingLoader.load(ModelLoader.scala:33)
    at epic.models.PosTagSelector$.loadTagger(PosTagModelLoader.scala:13)

My epic/epic-models versions:

libraryDependencies ++= Seq(
  "org.scalanlp"                  % "epic_2.10"                 % "0.2-SNAPSHOT",
  "org.scalanlp"                  %% "epic-pos-en"              % "2014.6.2-SNAPSHOT",
  "org.scalanlp"                  %% "epic-parser-en-span"      % "2014.6.2-SNAPSHOT",
  "org.scalanlp"                  %% "epic-ner-en-conll"        % "2014.6.2-SNAPSHOT"
//  "org.scalanlp"                  %% "epic-pos-en"              % "0.1",
//  "org.scalanlp"                  %% "epic-parser-en-span"      % "0.1",
//  "org.scalanlp"                  %% "epic-ner-en-conll"        % "0.1"
)

Problem with English Parser in Scala 2.10

Hi Epic Team,

I love using your library but get an error when using the english parser in scala 2.10:

Exception in thread "main" java.io.InvalidClassException: breeze.linalg.Counter2$$anon$1; local class incompatible: stream classdesc serialVersionUID = -8653601685403516672, local class serialVersionUID = 6118148492784004600
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at epic.models.ClassPathModelLoader.load(ModelLoader.scala:23)
    at epic.models.DelegatingLoader.load(ModelLoader.scala:33)
    at epic.models.PosTagSelector$.loadTagger(PosTagModelLoader.scala:13)
    at de.unima.dws.oamatching.measures.StringMeasureHelper$.addPosTag(StringMeasureHelper.scala:116)
    at de.unima.dws.oamatching.measures.StringMeasureHelper$delayedInit$body.apply(StringMeasureHelper.scala:18)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at de.unima.dws.oamatching.measures.StringMeasureHelper$.main(StringMeasureHelper.scala:15)
    at de.unima.dws.oamatching.measures.StringMeasureHelper.main(StringMeasureHelper.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

sbt Issue using Epic model

Hi,

I tried to use Epic model in my project, I have a problem with the sbt dependencies. The following is the error I got:
"unresolved dependency: org.scala-sbt#sbt;0.13.1: not found"

Here is the code I use that can reproduce the error.
https://github.com/jiangss/epic-example

Please advise.

Thanks!

Documentation issues

Hi - could you please check the below Readme extracts

val tagger = epic.models.deserialize[CRF[AnnotatedLabel, String]]("lib/epic-ner-en-conll_2.10-2014.6.3-SNAPSHOT.jar")
val segments = tagger(sentence)
println(tags.render(tagger.outsideLabel))

Gives a compile error "epic.sequences.CRF[epic.trees.AnnotatedLabel,String] does not take parameters"

I tried updating this slightly to the below,

val tagger = epic.models.deserialize[SemiCRF[AnnotatedLabel, String]](("lib/epic-ner-en-conll_2.10-2014.6.3-SNAPSHOT.jar"))

val sentenceSplitter = MLSentenceSegmenter.bundled().get
val tokenizer = new epic.preprocess.TreebankTokenizer()
val sentences: IndexedSeq[IndexedSeq[String]] = sentenceSplitter(input).map(tokenizer(_))

var result = ""

sentences.map{
  sentence =>

    val tags = tagger.bestSequence(sentence)
    result = tags.render(tagger.outsideSymbol)

}

result

Compiles and runs - but doesn't give very interesting output

Thanks,
Brent

Exception in thread "main" java.lang.NullPointerException

[main] INFO epic.parser.models.ParserTrainer$ - Training Parser...
Exception in thread "main" java.lang.NullPointerException
at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
at scala.collection.IndexedSeqLike$class.iterator(IndexedSeqLike.scala:90)
at scala.collection.mutable.ArrayOps$ofRef.iterator(ArrayOps.scala:186)
at epic.trees.Treebank$$anon$2.treesFromSection(Treebank.scala:125)
at epic.trees.Treebank$$anonfun$treesFromSections$1.apply(Treebank.scala:67)
at epic.trees.Treebank$$anonfun$treesFromSections$1.apply(Treebank.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:836)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:732)
at scala.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:708)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toIndexedSeq(TraversableOnce.scala:300)
at scala.collection.AbstractIterator.toIndexedSeq(Iterator.scala:1336)
at epic.trees.ProcessedTreebank.transformTrees(ProcessedTreebank.scala:84)
at epic.trees.ProcessedTreebank.devTrees$lzycompute(ProcessedTreebank.scala:73)
at epic.trees.ProcessedTreebank.devTrees(ProcessedTreebank.scala:73)
at epic.parser.ParserPipeline$class.trainParser(ParserPipeline.scala:88)
at epic.parser.models.ParserTrainer$.trainParser(ParserTrainer.scala:47)
at epic.parser.ParserPipeline$class.main(ParserPipeline.scala:107)
at epic.parser.models.ParserTrainer$.main(ParserTrainer.scala:47)
at epic.parser.models.ParserTrainer.main(ParserTrainer.scala)

Exception in thread "main" java.lang.IllegalAccessError: DB has been closed

WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[main] INFO epic.framework.ModelObjective - Inference took: 2.876s
[main] INFO epic.parser.models.NeuralParserTrainer$ - Validating...
[ForkJoinPool-1-worker-13] INFO epic.parser.ParseEval$ - Sentences parsed 100/100 (0.224s elapsed.)
[main] INFO epic.parser.models.NeuralParserTrainer$ - Overall statistics for validation: Statistics(precision=1, recall=0.9403, f1=0.9692, exact=0.92, tagAccuracy=1)
[main] INFO epic.dense.AdadeltaGradientDescentDVD - Step Size: 1.000
[main] INFO epic.framework.ModelObjective - Inference took: 2.660s
[main] INFO epic.dense.AdadeltaGradientDescentDVD - Val and Grad Norm: 782.761 (rel: 0.329) 1884.69
[main] INFO epic.dense.AdadeltaGradientDescentDVD - Step Size: 1.000
Exception in thread "main" java.lang.IllegalAccessError: DB has been closed
at org.mapdb.EngineWrapper.checkClosed(EngineWrapper.java:297)
at org.mapdb.CacheWeakSoftRef.get(CacheWeakSoftRef.java:123)
at org.mapdb.HTreeMap.get(HTreeMap.java:414)
at scala.collection.convert.Wrappers$JConcurrentMapWrapper.get(Wrappers.scala:323)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:192)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at epic.util.CacheBroker$CacheMap.getOrElseUpdate(Cache.scala:151)
at epic.constraints.CachedChartConstraintsFactory.constraints(CachedChartConstraintsFactory.scala:31)
at epic.parser.models.ParserInference$class.scorer(ParserModel.scala:51)
at epic.parser.models.PositionalNeuralModel$Inference.scorer(PositionalNeuralModel.scala:148)
at epic.parser.models.PositionalNeuralModel$Inference.scorer(PositionalNeuralModel.scala:148)
at epic.framework.Model$class.accumulateCounts(Model.scala:52)
at epic.parser.models.PositionalNeuralModel.accumulateCounts(PositionalNeuralModel.scala:30)
at epic.framework.ModelObjective$$anonfun$3.apply(ModelObjective.scala:68)
at epic.framework.ModelObjective$$anonfun$3.apply(ModelObjective.scala:65)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foldLeft_quick(ParArray.scala:174)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foldLeft(ParArray.scala:165)
at scala.collection.parallel.ParIterableLike$Aggregate.leaf(ParIterableLike.scala:1008)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
at scala.collection.parallel.ParIterableLike$Aggregate.tryLeaf(ParIterableLike.scala:1005)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:169)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Compilation failed

Hi, there are two errors occurred when I tried to build epic.
2015-09-29 12 53 02
Is there any possible way that I can fix it?

EpicSeqDemo doesn't compile

With epic-2.11 and breeze-2.11, I get

missing or invalid dependency detected while loading class file 'SafeLogging.class'. Could not access term scalalogging in value com.typesafe, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.) A full rebuild may help if 'SafeLogging.class' was compiled against an incompatible version of com.typesafe. Epic Unknown Scala Problem

although SafeLogging is shown in the class browser in my Eclipse window.

java.io.InvalidClassException on updating to Epic 0.3 and using POS tagger

I upgraded a project I had to use Epic 0.3 and the most recent taggers and parsers. I'm getting a pretty nasty error here:

[info]   java.io.InvalidClassException: breeze.linalg.Counter2$$anon$1; local class incompatible: stream classdesc serialVersionUID = -8653601685403516672, local class serialVersionUID = 6118148492784004600
...
[info]   at epic.models.PosTagSelector$.loadTagger(PosTagModelLoader.scala:13)

(full stacktrace below)

It looks like there's a problem in breeze.lingalg.Counter. Did the API possibly change between this upgrade? Perhaps it's related to the fact that I'm using Spark as well as Epic in my project? Here are all of my dependencies

"org.scalanlp" %% "epic" % "0.3",
"org.scalanlp" %% "epic-parser-en-span" % "2015.1.25",
"org.scalanlp" %% "epic-ner-en-conll" % "2015.1.25",
"org.scalanlp" %% "epic-pos-en" % "2015.1.25",
"org.apache.spark" % "spark-core_2.10" % "1.3.0",
"org.apache.spark" % "spark-mllib_2.10" % "1.3.0",

Stacktrace:

[info]   java.io.InvalidClassException: breeze.linalg.Counter2$$anon$1; local class incompatible: stream classdesc serialVersionUID = -8653601685403516672, local class serialVersionUID = 6118148492784004600
[info]   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
[info]   at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
[info]   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[info]   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
[info]   at epic.models.ClassPathModelLoader.load(ModelLoader.scala:23)
[info]   at epic.models.DelegatingLoader.load(ModelLoader.scala:33)
[info]   at epic.models.PosTagSelector$.loadTagger(PosTagModelLoader.scala:13)

at org.mapdb.Volume$ByteBufferVol.getLong(Volume.java:300)

Exception in thread "main" java.lang.NullPointerException
at org.mapdb.Volume$ByteBufferVol.getLong(Volume.java:300)
at org.mapdb.StoreDirect.checkHeaders(StoreDirect.java:112)
at org.mapdb.StoreDirect.(StoreDirect.java:100)
at org.mapdb.StoreWAL.(StoreWAL.java:46)
at org.mapdb.DBMaker.makeEngine(DBMaker.java:582)
at org.mapdb.DBMaker.make(DBMaker.java:556)
at epic.util.CacheBroker$ActualCache.db$lzycompute(Cache.scala:50)
at epic.util.CacheBroker$ActualCache.db(Cache.scala:49)
at epic.util.CacheBroker.db(Cache.scala:38)
at epic.util.CacheBroker$CacheMap.liftedTree1$1(Cache.scala:103)
at epic.util.CacheBroker$CacheMap.theMap(Cache.scala:100)
at epic.util.CacheBroker$CacheMap.getOrElseUpdate(Cache.scala:151)
at epic.constraints.CachedChartConstraintsFactory.constraints(CachedChartConstraintsFactory.scala:31)
at epic.parser.models.NeuralParserTrainer$$anonfun$trainParser$1.apply(NeuralParserTrainer.scala:113)
at epic.parser.models.NeuralParserTrainer$$anonfun$trainParser$1.apply(NeuralParserTrainer.scala:113)
at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:159)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)
at scala.collection.parallel.ExecutionContextTasks$class.executeAndWaitResult(Tasks.scala:558)
at scala.collection.parallel.ExecutionContextTaskSupport.executeAndWaitResult(TaskSupport.scala:80)
at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:958)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:953)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

ModelSelector is broken.

This probably has a simple answer, but Loading the english training data through maven and calling it with NerSelector.loadNer breaks. However, calling it directly with epic.parser.models.en.span.EnglishSpanParser.load() works.

POM dependency

       <dependency>
            <groupId>org.scalanlp</groupId>
            <artifactId>epic-parser-en-span_2.10</artifactId>
            <version>0.1</version>
        </dependency>

Load the NerSelector

val tagger = epic.models.NerSelector.loadNer("en").get

epic.models.NerSelector.loadNer("en") returns none and explodes all over the .get

On line 24 in the ModelSelector: The serviceLoader.asScala returns 0 which causes the filter to return none.

Training a Parser: Error while Indexing BinaryRule

Hi,

Experiencing problems using the parser trainer. My goal is to train a parser with CRAFT's treebank for the biology domain; not sure if the layout of CRAFT's treebank is supported by epic or not. Tried to start out by training a parser on "smallbank" -- no luck. Any advice?

$ java -cp target/scala-2.11/epic-assembly-0.2.jar epic.parser.models.ParserTrainer \
   --treebankType simple
   --treebank.path "src/main/resources/smallbank"
   --modelFactory epic.parser.models.SpanModelFactory
   --cache.path constraints.cache
   --opt.useStochastic true 
   --opt.regularization 1.0
[main] INFO epic.parser.models.ParserTrainer$ - Training Parser...
Exception in thread "main" java.lang.RuntimeException: 
error while indexingBinaryRule(VP[^SINV], VBN[^VP], PP[^VP]) to 
BinaryRule(VP, VBN, PP)0
at epic.parser.projections.ProjectionIndexer$$anonfun$apply$5.apply(ProjectionIndexer.scala:115)
...

new release for "epic-parser-en-span"

Hi, David,

May I ask whether you have a latest release for "epic-parser-en-span" after "2014.9.15"? If not, would you please make one for both Scala 2.10 and 2.11?

Thanks!

French models can't be used with last Epic version

When I call the French model of the dependency parser I get:

Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...  done [2,8 sec].
Exception in thread "main" java.io.InvalidClassException: epic.parser.models.ParserTrainer$$anonfun$2; local class incompatible: stream classdesc serialVersionUID = 0, local class serialVersionUID = 5531977503861241212
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at epic.models.ClassPathModelLoader.load(ModelLoader.scala:23)
    at epic.models.DelegatingLoader.load(ModelLoader.scala:33)
    at epic.models.ParserSelector$.loadParser(ParserSelector.scala:15)
    at prepare.data.Parser$.delayedEndpoint$prepare$data$Parser$1(Parser.scala:19)
    at prepare.data.Parser$delayedInit$body.apply(Parser.scala:6)

Seems to be very related to #26

Is there a way to download new models?

Kind regards,
Michaël

correct models for 0.4-SNAPSHOT

whichmaven dependency of the models (POS & NER ) should I have to pick with 0.4-SNAPSHOT release?

I tried:
"org.scalanlp" %% "english" % "2015.1.25"

and other older releases, but when I do

 val ner = epic.models.NerSelector.loadNer("en").get 

I keep on getting serialization errors.

Can't build

Hello, I had just cloned the master branch and tried to build epic just as README instructions: sbt assembly

Then I've got the following error:
module not found: org.scalanlp#sbt-jflex_2.10_0.13;0.1-SNAPSHOT
...
org.scalanlp:sbt-jflex_2.10_0.13:0.1-SNAPSHOT (sbtVersion=0.13, scalaVersion=2.10)

Since I'm also not being able to use any of the epic versions published to https://oss.sonatype.org/content/repositories/snapshots/org/scalanlp/ (because they miss breezer 0.8-SNAPSHOT, and sbt just cant find that), I'm willing to build it from sources, but as you can see it is not being possible.

Failed to train CTB

I tried to retrain the parser on CTB by converting original .fid files to .mrg files and adding the parameter --treebank.treebankType chinese, but it failed and here is the error message.

$ java -Xmx47g -cp path/to/assembly.jar epic.parser.models.NeuralParserTrainer --cache.path constraints.cache --opt.useStochastic -treebank.path path/to/ctb/ --treebank.treebankType chinese --evalOnTest --includeDevInTrain --trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator --ann.0 epic.trees.annotations.FilterAnnotations --ann.1 epic.trees.annotations.ForgetHeadTag --ann.2 epic.trees.annotations.Markovize --ann.2.horizontal 0 --ann.2.vertical 0 --modelFactory epic.parser.models.PositionalNeuralModelFactory --threads 8
[main] INFO epic.parser.models.NeuralParserTrainer$ - Training Parser...
Exception in thread "main" java.lang.RuntimeException: error while indexingBinaryRule(@QP[^DP],QP[^QP],CC[^QP]) to BinaryRule(@QP,QP,CC)0
        at epic.parser.projections.ProjectionIndexer$$anonfun$apply$5.apply(ProjectionIndexer.scala:114)
        at epic.parser.projections.ProjectionIndexer$$anonfun$apply$5.apply(ProjectionIndexer.scala:110)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at epic.parser.projections.ProjectionIndexer$.apply(ProjectionIndexer.scala:110)
        at epic.parser.projections.GrammarRefinements$.apply(GrammarRefinements.scala:177)
        at epic.parser.GenerativeParser$.annotated(GenerativeParser.scala:124)
        at epic.parser.GenerativeParser$.annotatedParser(GenerativeParser.scala:107)
        at epic.parser.models.NeuralParserTrainer$.trainParser(NeuralParserTrainer.scala:91)
        at epic.parser.models.NeuralParserTrainer$.trainParser(NeuralParserTrainer.scala:35)
        at epic.parser.ParserPipeline$class.trainParser(ParserPipeline.scala:92)
        at epic.parser.models.NeuralParserTrainer$.trainParser(NeuralParserTrainer.scala:35)
        at epic.parser.ParserPipeline$class.main(ParserPipeline.scala:107)
        at epic.parser.models.NeuralParserTrainer$.main(NeuralParserTrainer.scala:35)
        at epic.parser.models.NeuralParserTrainer.main(NeuralParserTrainer.scala)

So could you help me to find where the mistake is? Thanks a lot!

"Parsing" with gold segmentation

Hi @dlwh,

first off, thanks for making Epic available, it's a great tool!

I have a "parsing" problem where I would like to restrict the search space to a fully segmented binarized tree and would like to use the neural parser to only do the labelling for me (the trees come from a customized treebank that I pass to epic for training). I have been trying to do this via constraints (using GoldConstraintsFactory) in NeuralParserTrainer but so far without success. Is there any established/recommended way for using "gold spans" and only letting Epic do the labelling?

Jo

PL parser doesn't work

Trained parser of polish language from http://www.scalanlp.org/models/ does not work with current master branch. I tried to run:
java -Xmx6g -cp target/scala-2.11/epic-assembly-0.4-SNAPSHOT.jar epic.parser.ParseText --model epic-parser-pl-span_2.10-2014.6.3-SNAPSHOT.jar --nthreads 4 exampleArticle.txt

and I get:
Couldn't deserialize model due to exception, epic.parser.models.ParserTrainer$$anonfun$2; local class incompatible: stream classdesc serialVersionUID = 0, local class serialVersionUID = 5531977503861241212. Trying classPathLoad...
Exception in thread "main" java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at epic.parser.ParseText$.classPathLoad(ParseText.scala:21)
at epic.parser.ParseText$.classPathLoad(ParseText.scala:11)
at epic.util.ProcessTextMain$class.main(ProcessTextMain.scala:47)
at epic.parser.ParseText$.main(ParseText.scala:11)
at epic.parser.ParseText.main(ParseText.scala)

which says, that versions of classes are not the same.

I also downloaded JAR from maven repository and there is the same error.

weird behavior when using LBFGS

java -Xmx1g -cp target/scala-2.10/epic-assembly-0.2-SNAPSHOT.jar epic.parser.models.ParserTrainer --treebank.path wsj --modelFactory epic.parser.models.SpanModelFactory --maxLength 10

org.scalanlp#breeze_2.11;0.12-SNAPSHOT: not found

I just cloned Epic and went to try and run "sbt assembly" and got the following error message:

error sbt.ResolveException: unresolved dependency: org.scalanlp#breeze_2.11;0.12-SNAPSHOT: not found

So I went ahead and installed Breeze just fine and reran in the Epic install, but still got the same error. Suggetions?

Can not find the path to the Model file

Got the model (from here: http://www.scalanlp.org/models/ ) and put it into the project home

Then:
val tagger = epic.models.deserialize[CRF[AnnotatedLabel, String]]("model.ser.gz") {code}

But still have exception:
Exception in thread "main" java.lang.RuntimeException: Could not find model model.ser.gz in path /home/myuser/projects/nlp/epic-mast

But the model file is definitely there

Please make the epic-pos-en model avaliable

If you have time, please make the epic-pos-en model available. I have the Treebank data, but it would be easier to not build it myself (and other people might use it if it were available).

Development environment

Hi!
I try to setup the project in a Intellij 14 IDE with a Scala plugin and sbt 0.13.7
I can see a lot of errors posted by IntelliJ in build.sbt as well as in classes. However, the project compiles well, I can run sbt assembly and get a jar.

Please, write in Wiki your development environment IDE+additional tools and versions to be able to load the project without errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.