Giter Club home page Giter Club logo

langpacks's Introduction

LangPacks

LangPack: A language-specific package that encompasses steps to set up, build, and run language-specific algorithms.

LangServer: A server that serves a LangPack's bin/pipe runner in a way that emulates a light-weight version of the Algorithmia API.

LangServer

LangServer could be simplified like this: it's about 1000 lines of code that emulates a simple API that looks/feels like our API server's API for calling an algo (lacking features like auth).

It translates that HTTP standard into a STDIO-based input and a named pipe for output (defined in the langpack_guide.md). Some considerations:

  • It was important that multiple subsequent requests could reuse the same process.
  • The focus was a standard that every language could easily implement. Having each language implement a web-server was considered, but there was uncertainty about how easy that would be for langs like R, and if we wanted to alter how it integrates with the rest of the back end, it'd need to be reimplemented multiple times (e.g., if we wanted to expose stdout/stderr via websockets, it would only need to be implemented in LangServer, not each LangPack).
  • I also considered other queues, e.g., posix message queues, but struggled to gain confidence it would work well for all languages (R again being a concern).
  • Ultimately, every language has very simple ways of interacting with files (including stdio and named pipes), so stdio and a named pipe (fifo) were chosen for the simplicity to work with them in any language. The fifo choice allowed us to leave stdout/stderr intact.

Weirder details:

  1. LangServer spins up 2 threads that collect stdout/stderr and recombine them into the result.
  2. LangServer has 2 modes: sync vs. async. Basically, sync is how it was originally built, to be easy to debug as a simple web-server that looks like the API server. async mode was added to make it integrate with dockherder, so that dockherder could call it and forget about it until a callback informed dockherder that it was complete.

Building LangServer(s) (Partially deprecated)

Disclaimer: The intent was to prototype LangServer in Rust (because I knew it better), but finally write it in Go (lower barrier to entry), but it turned into an official project before the rewrite happened. So, for now: start by installing latest stable Rust, and then:

bin/build langserver     # just builds the base LangServer images (default)
bin/build <lang>         # builds language-specific image (and deps)
bin/build all            # builds all images for all langpacks
bin/build single-runner  # builds 1 image containing the LangServer runner and running setup on all langpacks
bin/build single-builder # builds 1 image containing the LangServer builder and running setup on all langpacks
bin/build single         # builds the single-runner and single-builder

Note: the initial plan is to NOT use these images, but they are helpful for implementing and testing langpacks locally, as well as provide some "code documentation" for how setup/build/pipe/langserver all fit together.

Building LangServer with Libraries

We're in the process of refactoring the way that images get generated and algorithms are compiled. The initial approach would create an algorithm.zip that contains a compiled binary (or source for interpreted languages) along with any dependencies needed. Additionally, a number of libraries were installed side-by-side which made it difficult to debug in certain scenarios or independently evolve various languages. In particular, some libraries required certain variables set during install/compilation but not during execution and it was difficult to determine what variables or even system packages were needed for what libraries in particular.

The new process (still experimental) involves templating a Dockerfile based on a set of desired libraries (which could be language runtimes/buildtimes, services, or deep-learning frameworks) and then building an image with just that subset of libraries. Ideally, libraries' install.sh script should be able to run on an Ubuntu 16.04 host/VM the same as it could during Docker build time (this greatly eases creating the install script).

Algorithms no longer have a single bin/build script but two separate scripts, one to install-dependencies (which would do an appropriate pip/npm/cargo/etc. install/fetch) and one to install-algorithm which compiles or bundles the algorithm source to /opt/algorithmia.

Templating and building a Dockerfile:

$ ./bin/build-template --help
usage: build-template [-h] [-l LIBRARY] [-p TEMPLATE] -t TAG [-o OUTPUT]
                      [-u USER_ID]

Creates a Dockerfile, templating in any needed files and environment variables
to set up different libraries. Libraries will be installed _in order
specified_ so if one needs to be installed before another, then list them in
that order on the command line

Will then run a docker build and tag operation

Library directories should include the following:
  - install.sh : a script to install the library
  - config.json (optional): a json file containing configuration such as:
    - env_variables: dictionary of environment variables to
      set at the end of execution
    - install_scripts: list of order to run scripts in to create
      multiple layers (particularly for testing)

optional arguments:
  -h, --help            show this help message and exit
  -l LIBRARY, --library LIBRARY
                        library directories to include in generating this
                        Dockerfile
  -p TEMPLATE, --template TEMPLATE
                        location of the Dockerfile template file
  -t TAG, --tag TAG     tag to label the docker image once produced
  -o OUTPUT, --output OUTPUT
                        name of file to write output to
  -u USER_ID, --user_id USER_ID
                        user id to use for the "algo" user, defaults to
                        current user

Examples:

# Create a langpack consisting only of python2 and tag it as algorithmia/langpack-runner:python2
./bin/build-template -u 1001 -t algorithmia/langpack-runner:python2 -l python2 -o docker/templated/Dockerfile.python2

# Create a langpack with NVIDIA GPU drivers, python and caffe and tag it as algorithmia/langpack-runner:python2-caffe
./bin/build-template -u 1001 -t algorithmia/langpack-runner:python2-caffe -l gpu-driver -l python2 -l caffe -o Dockerfile.python2-advanced

Building an algorithm

  1. Bind mount an algorithm working directory to /tmp/build - docker run -it -v \pwd`:/tmp/build algorithmia/langpack-runner:python2`
  2. Run /tmp/build/bin/install-dependencies
  3. Run /tmp/build/bin/install-algorithm
  4. Outside of the container commit the image with appropriate entrypoint - docker commit -c 'ENTRYPOINT /bin/init-langserver' -c 'WORKDIR /opt/algorithm' <container_id> algorithmia/<algorithm_name>

Running an algorithm

  1. docker run --rm -ti -p 9999:9999 algorithmia/<algorithm_name>

Building an algorithm

Bind mount an algorithm working directory to /tmp/build and start the langbuilder- image. It should create an algorithm.zip that can be served by the init-langserver script (containing bin/pipe, the algorithm, and any dependencies):

docker run --rm -it -v `pwd`:/tmp/build algorithmia/langbuilder-<lang>

Note, unless using Docker user namespacing, don't be shocked if bind-mount writing results in permission errors.

Running LangServer

The init-langserver script provides 2 ways to run an algorithm:

Bind mount algorithm.zip to /tmp/algorithm.zip

Note: Make sure you use the absolute path to the algorithm.zip.

docker run --rm -it -v /path/to/algorithm.zip:/tmp/algorithm.zip -p 9999:9999 algorithmia/langserver-<lang>

Bind mount algorithm directory to /tmp/algorithm

docker run --rm -it -v `pwd`:/tmp/algorithm -p 9999:9999 algorithmia/langserver-<lang>

Contributing

Bonus 🌮🌮tacos🌮🌮 for you if you write a LangPack.

More to come...

langpacks's People

Contributors

adnaanm avatar alexpjohnson avatar anowell avatar aslisabanci avatar besirkurtulmus avatar danielfrg avatar dmorris-algo avatar field-cady avatar gmoomau avatar idlelearner avatar jamesatha avatar kennydaniel avatar lemonez avatar neildf avatar peckjon avatar platypii avatar pmcq avatar robert-close avatar tcurley1 avatar zeryx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langpacks's Issues

Importing a library from Anaconda Python times out the algorithm. [python2]

We use anaconda python to support majority of Machine Learning libraries in python. When I try to import the package gensim the algorithm times out.

This is the code I used:

import Algorithmia
import gensim

def apply(input):
    return "hello {}".format(input)

This is the error I always get:

Error: Operation timed out after 300000 millis

I haven't changed the dependency file, it uses the default langpack python dependency file:

algorithmia>=1.0.0
six

MemoryError in Python 2.7 Langpack

I get a python error when I try to run memory intensive methods for the word2vec algorithm.

Locally speaking, when I load the model into memory it takes around 8.3 GB of memory and works without much of a problem.

How can we address this issue?

No error when R package install fails.

This is very much related to #24 but I am getting errors in loading R packages. If the install of a package fails at build, should this not break the build process with the error propagated from R. Or else is there a build log that can be inspected by the user?

Cleaning up stdout from R

We filter out the PIPE_INIT_COMPLETE message from an algorithm's stdout to be a little cleaner, however R prints stuff in a strange format, so doesn't get filtered:
[1] "PIPE_INIT_COMPLETE"

1597500

More reason to move that message to a separate channel (or to algoout?)

R version default

I don't know if this is the most desirable behaviour for an R build or not but could it be that the latest version of R could be the default and then the user could override that with a specific version if they needed it? This might also reduce package installation problems.

RunnerOutput error type does not match langpack guide specification

The RunnerOutput object is defined here:

Failure { error: ErrorMessage },

The Output section of bin/pipe makes no mention of an error key:
https://github.com/algorithmiaio/langpacks/blob/master/langpack_guide.md#binpipe

In the existing Python runtime, you can see a encapsulating error key that wraps the error object, here:
https://github.com/algorithmiaio/langpacks/blob/master/libraries/python3-runtime/context/pipe#L66-L77

stdout/stderr control characters aren't properly escaped

stdout and stderr are parsed as lossy UTF8 and inserted as Value::String(..) exactly as output by the user.

However, if stdout/stderr contains an ASCII control code, langserver doesn't properly escape it, and for now dockherder turns around and throws something like Unable to parse work output: Illegal unquoted character ((CTRL-CHAR, code 14)): has to be escaped using backslash to be included in string value

It looks like we need only wait on serde-rs/json#58 to be merged, or perhaps serde-rs/json#66.

Trouble installing R packages via rip.py

I am trying to install the following R packages by adding the following lines to packages.txt

dplyr
tidytext

When I subsequently compile and run the algorithm I get either:

Error: Failed to start algorithm - Loading required package: methods Error in library("tidytext") : there is no package called ‘tidytext’ Calls: source -> withVisible -> eval -> eval -> library Execution halted Failed to load: exited with code 1

or:

Error: Failed to start algorithm - Loading required package: methods Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : namespace ‘R6’ 2.2.0 is being loaded, but >= 2.2.2 is required Error: package or namespace load failed for ‘dplyr’ Execution halted Failed to load: exited with code 1

depending on which package I load first. I started to go through a combination of installations from the archives of those two packages (sometimes older versions installed and ran fine) but is there something in the workflow of build that would be breaking these installations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.