Giter Club home page Giter Club logo

creg's People

Contributors

brendano avatar nschneid avatar redpony avatar vchahun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

creg's Issues

Proposal: composite labels

Currently, responses are always atomic. However, for some applications it would be nice to represent labels (categorical responses) with limited structure such that features are extracted over parts of the structure, as well as the full label. We are still talking about making a single prediction, not structured prediction; the proposal is simply to enable a richer space of features over labels.

For example, in building a part-of-speech classifier, one could have features that score the full, fine-grained POS tag, as well as features that group together related tags into coarser categories to share statistical strength.

Define a composite label as a categorical response that is made up of multiple categorical parts, or components. The components could be characters in a string (such as a bit string), or in an explicit structure (such as a JSON data structure).

In the model, there will be a feature for every input characteristic (percept) and any full (simple or composite) label. In addition, when a percept is scored with a composite label, a feature for every component will fire that conjoins the percept with that component. So if every label is a POS tag and consists of two components, a coarse component and a fine component, there will be three features that fire for the label for every percept: one with the coarse component, one with the fine component, and one with the full label.

We assume the output space of the classifier will not be affected by the use of composite labels—only full labels (simple or composite) seen during training will be candidates for prediction.

Interface

Information about label structure could be (a) inferred automatically from the name of the label, (b) specified in the response file, in place of a single string name for the label, or (c) specified in some other file as a mapping of label names to richer structures. The interface proposed here will allow (a) or (b).

Let the option --composite-labels [json|string] [positional|bag] enable this feature:

  • If json (the default format) is specified, then all responses will be read as JSON objects. There are three allowed types of responses: JSON strings, lists of strings, and maps from strings to strings. JSON strings are interpreted as simple labels; in a list of strings, each string is a component; and in a map, the key-value pairs are components.
  • If string is specified, then all responses will be read as unquoted strings and treated as composite; the components are individual characters.
  • If positional (the default ordering) is specified, then any sequential composite labels (the label name in string mode, lists in json mode) are treated as ordered slot-fillers; i.e., each component is conjoined with its offset in the sequence.
  • If bag is specified, then any sequential composite labels are interpreted as bags of components; within a label, any repetition of a component will trigger an error. JSON maps are always treated as bags of key-value pairs.

Examples

If all labels are length-2 POS tags like NN = noun singular, NS = noun plural, PN = pronoun singular, PS = pronoun plural, etc., the following are equivalent ways to specify the response:

  • PN with --composite-labels string positional (note that bag would conflate the two possible uses of N!)
  • ["P", "N"] with --composite-labels json positional
  • {"coarse": "P", "fine": "N"} with --composite-labels json

If all labels are fixed-length bitstrings, the following are equivalent:

  • 01011 with --composite-labels string positional
  • ["0", "1", "0", "1", "1"] with --composite-labels json positional
  • {"0": "0", "1": "1", "2": "0", "3": "1", "4": "1"} with --composite-labels json

If the labels are clusters of morphosyntactic attributes, then with --composite-labels json bag, the two labels ["noun", "singular", "accusative"] and ["verb", "past", "singular", "causative"] would share one component in common: features associated with the "singular" component would fire for both.

better interface to weight vector?

Right now the weight vector is always a vector<double>, and for discrete regression (ordinal regression and multiclass logistic regression) the vector subscript is computed inline.

Suggested interface for the discrete weight vector: an "enhanced" vector in which the () operator is overloaded to take the class (label index) and the feature id as separate arguments. (Can this be done by subclassing or wrapping vector and still play nicely with optimization routines?) So weights(k, fid) would access the weight for a given class-feature pair, and weights(k, fid, w) would assign the value w to that weight. This should make working with the weights vector more intuitive.

An additional benefit would be that the weights instance can store extra information, e.g. whether or not one of the K classes should be treated as a background class (which affects indexing into the vector).

Better support for feature engineering

For feature engineering, it would be nice not to have to train on a different feature file for each combination of features. One solution would be to allow multiple feature files to be loaded for the same training instances (the instance IDs should prevent any ambiguity). (This has the advantage that features can be extracted in parallel.) Another would be a command-line regex option for features to ablate.

use OpenMP to parallelize learning

During learning, computing the loss and its gradient relative to the parameters (especially with large numbers of training instances or features) can be quite expensive. OpenMP (http://openmp.org/wp/), which is supported by default with g++, could easily be used to parallelize this computation. Basically, all the loops of the following form
for (unsigned i = 0; i < training.size(); ++i)
are good candidates for parallelization. Reading about OpenMP such "reductions" will have to be implemented by creating a gradients buffer per thread and then summing them at the end (although this summing could also be parallelized).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.