Giter Club home page Giter Club logo

trove's Introduction

Trove

🔥 Deploy machine learning models in Ruby (and Rails)

Works great with XGBoost, Torch.rb, fastText, and many other gems

Installation

Add this line to your application’s Gemfile:

gem "trove"

And run:

bundle install
trove init

And configure your storage in .trove.yml.

Storage

Amazon S3

Create a bucket and enable object versioning.

Next, set up your AWS credentials. You can use the AWS CLI:

pip install awscli
aws configure

Or environment variables:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...

IAM users need:

  • s3:GetObject and s3:GetObjectVersion to pull files
  • s3:PutObject to push files
  • s3:ListBucket and s3:ListBucketVersions to list files and versions
  • s3:DeleteObject and s3:DeleteObjectVersion to delete files

Here’s an example policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Trove",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/trove/*"
            ]
        }
    ]
}

If your production servers only need to pull files, only give them s3:GetObject and s3:GetObjectVersion permissions.

How It Works

Git is great for code, but it’s not ideal for large files like models. Instead, we use an object store like Amazon S3 to store and version them.

Trove creates a trove directory for you to use as a workspace. Files in this directory are ignored by Git but can be pushed and pulled from the object store. By default, files are tracked in .trove.yml to make it easy to deploy specific versions with code changes.

Getting Started

Use the trove directory to save and load models.

# training code
model.save_model("trove/model.bin")

# prediction code
model = FastText.load_model("trove/model.bin")

When a model is ready, push it to the object store with:

trove push model.bin

And commit the changes to .trove.yml. The model is now ready to be deployed.

Deployment

We recommend pulling files during the build process.

Make sure your storage credentials are available in the build environment.

Heroku and Dokku

Add to your Rakefile:

Rake::Task["assets:precompile"].enhance do
  Trove.pull
end

This will pull files at the very end of the asset precompile. Check the build output for:

remote:        Pulling model.bin...
remote:        Asset precompilation completed (30.00s)

Docker

Add to your Dockerfile:

RUN bundle exec trove pull

Commands

Push a file

trove push model.bin

Pull all files in .trove.yml

trove pull

Pull a specific file (uses the version in .trove.yml if present)

trove pull model.bin

Pull a specific version of a file

trove pull model.bin --version 123

Delete a file

trove delete model.bin

List files

trove list

List versions

trove versions model.bin

Ruby API

You can use the Ruby API in addition to the CLI.

Trove.push(filename)
Trove.pull
Trove.pull(filename)
Trove.pull(filename, version: version)
Trove.delete(filename)
Trove.list
Trove.versions(filename)

This makes it easy to perform operations from code, iRuby notebooks, and the Rails console.

Automated Training

By default, Trove tracks files in .trove.yml to make it easy to deploy specific versions with code changes. However, this functionality is entirely optional. Disable it with:

vcs: false

This is useful if you want to automate training or build more complex workflows.

Non-Ruby

Trove can be used in non-Ruby projects as well.

gem install trove
trove init

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/trove.git
cd trove
bundle install

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...
export S3_BUCKET=my-bucket

bundle exec rake test

trove's People

Contributors

ankane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

trove's Issues

Loading model on initialization

Hey thanks for the gem. Really useful.

Just curious if you had a need to pre load the model and keep it in memory so that you don't have to re-read the model file every time you want to parse some text since there is a decent delay of a second or two.

I tried loading the model in a constant in a rakefile after fetching the file from aws like you documented but it's not accessible in the app afterwards. And I can't load the model on initialization since the file is not present until after the precompilation step

Can this be used for more than ML models?

I stumbled on this while looking for a simple way to push and pull blobs of data.

Trove seems to do this with models, but from a cursory glance it can be used to pass up and down sqlite files as well (for example).

Cannot pull

  1. I push the model
$ bundle exec trove push lid.176.ftz
Already up-to-date
  1. I remove the model:
rm trove/lid.176.ftz
  1. I pull the model:
$ bundle exec trove pull lid.176.ftz
File not found

Expected result: the model appears as the trove/lid.176.ftz
Actual result: I see the File not found error.

The model was indeed pushed to AWS:

$ bundle exec trove list
FILENAME           SIZE      UPDATED
lid.176.ftz        916KB     5 days ago

Ideas

Ideas

  • Support more storage providers
    • Google Cloud Storage
  • Support latest version identifier

Heroku deployment question

Hi - I'm using the NER and sentiment analysis models, and together they push me over the Heroku slug limit. I tried taking Trove.pull out of my Rakefile and instead putting in an initializer, but it looks like the initializers gets loaded during complication, and the deploy fails.

Any tips on how to get around this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.