Giter Club home page Giter Club logo

modelmatrix's Introduction

Model Matrix

Build Status

Machine Learning Feature Engineering

Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.

Where to get it

Model Matrix workflow focused around command line interface, however you can use client library to apply model matrix transformations to DataFrame in your application.

To get the latest version of the model matrix, add the following to your SBT build:

resolvers += "Collective Media Bintray" at "https://dl.bintray.com/collectivemedia/releases"

And use following library dependencies:

libraryDependencies +=  "com.collective.modelmatrix" %% "modelmatrix-client" % "0.0.1"

Developing

Local PostgreSQL database required for integration tests

Database config

Modelmatrix can either use H2 or Postgres as databases.

modelmatrix-cli is configured to use Postgres database by default. The database configuration is located in modelmatrix-cli/src/main/resources/reference.conf

url      = "jdbc:postgresql://localhost/modelmatrix"  
user     = "modelmatrix"  
password = "modelmatrix"  

modelmatrix-core unit and integration tests are configured to use H2 (in memory):

  • Integration test databse configuration is located: modelmatrix-core/src/it/resources/database_it.conf
  • Unit test database configuration is located: modelmatrix-core/src/test/resources/database_test.conf

N.B.: DATABASE_TO_UPPER=FALSE setting is required for H2 because of compatibility issues between Flyway and Slick

Install schema

Schema migrations managed by Flyway,

If you want to add test that excepect modelmatrix matrix schema and tables to be present please implement trait com.collective.modelmatrix.catalog.InstallSchemaBefore

Testing

Unit and Integration test are automatically creating/updating schema and using by default H2

sbt test
sbt it:test

If you want to test against Postgres you can overwrite the database file-based configuration by running sbt with the following runtime system properties:

sbt -Dmodelmatrix.catalog.db.url="jdbc:postgresql://localhost/modelmatrix?user=modelmatrix&password=modelmatrix" -Dmodelmatrix.catalog.db.driver="org.postgresql.Driver" test
sbt -Dmodelmatrix.catalog.db.url="jdbc:postgresql://localhost/modelmatrix?user=modelmatrix&password=modelmatrix" -Dmodelmatrix.catalog.db.driver="org.postgresql.Driver" it:test

N.B. This will require you to have Postgres running locally with schema modelmatrix created and owned by user modelmatrix

Assembling CLI application

To run CLI you need to build application distribution first (zip or tar.gz)

sbt universal:packageBin        
sbt universal:packageXzTarball

Application will be packaged in modelmatrix-cli/target/universal

Git Workflow

This repository workflow is based on A successful Git branching model with two main branches with an infinite lifetime:

  • master
  • develop

The master branch at origin should be familiar to every Git user. Parallel to the master branch, another branch exists called develop.

We consider origin/master to be the main branch where the source code of HEAD always reflects a production-ready state.

We consider origin/develop to be the main branch where the source code of HEAD always reflects a state with the latest delivered development changes for the next release. Some would call this the “integration branch”. This is where any automatic nightly builds are built from.

Further details are available in A successful Git branching model blog post.

modelmatrix's People

Contributors

ezhulenev avatar jpocalan-collective avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modelmatrix's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.