ucbrise / clipper Goto Github PK

View Code? Open in Web Editor NEW

1.4K 86.0 278.0 24.28 MB

A low-latency prediction-serving system

Home Page: http://clipper.ai

License: Apache License 2.0

CMake 8.74% C++ 55.16% Python 25.92% Shell 2.35% Java 1.99% Scala 2.74% Makefile 0.03% R 1.77% Dockerfile 1.29%

clipper's Introduction

Clipper

Note: Clipper is not actively maintained currently. It is available as a research artifact.

What is Clipper?

Clipper is a prediction serving system that sits between user-facing applications and a wide range of commonly used machine learning models and frameworks. Learn more about Clipper and view documentation at our website http://clipper.ai.

What does Clipper do?

Clipper simplifies integration of machine learning techniques into user facing applications by providing a simple standard REST interface for prediction and feedback across a wide range of commonly used machine learning frameworks. Clipper makes product teams happy.
Clipper simplifies model deployment and helps reduce common bugs by using the same tools and libraries used in model development to render live predictions. Clipper makes data scientists happy.
Clipper improves throughput and ensures reliable millisecond latencies by introducing adaptive batching, caching, and straggler mitigation techniques. Clipper makes the infra-team less unhappy.
Clipper improves prediction accuracy by introducing state-of-the-art bandit and ensemble methods to intelligently select and combine predictions and achieve real-time personalization across machine learning frameworks. Clipper makes users happy.

Quickstart

Note: This quickstart works for the latest version of code. For a quickstart that works with the released version of Clipper available on PyPi, go to our website

This quickstart requires Docker and supports Python 2.7, 3.5, 3.6 and 3.7.

Clipper Example Code

Basic query: https://github.com/ucbrise/clipper/tree/develop/examples/basic_query
Image query: https://github.com/ucbrise/clipper/tree/develop/examples/image_query
Examples including metrics: https://github.com/ucbrise/clipper/tree/develop/examples

Start a Clipper Instance and Deploy a Model

Install Clipper

You can either install Clipper directly from GitHub:

pip install git+https://github.com/ucbrise/clipper.git@develop#subdirectory=clipper_admin

or by cloning Clipper and installing directly from the file system:

pip install -e </path/to/clipper_repo>/clipper_admin

Start a local Clipper cluster

First start a Python interpreter session.

$ python

# Or start one with iPython
$ conda install ipython
$ ipython

Create a ClipperConnection object and start Clipper. Running this command for the first time will download several Docker containers, so it may take some time.

from clipper_admin import ClipperConnection, DockerContainerManager
clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.start_clipper()

17-08-30:15:48:41 INFO     [docker_container_manager.py:95] Starting managed Redis instance in Docker
17-08-30:15:48:43 INFO     [clipper_admin.py:105] Clipper still initializing.
17-08-30:15:48:44 INFO     [clipper_admin.py:107] Clipper is running

Register an application called "hello-world". This will create a prediction REST endpoint at http://localhost:1337/hello-world/predict

clipper_conn.register_application(name="hello-world", input_type="doubles", default_output="-1.0", slo_micros=100000)

17-08-30:15:51:42 INFO     [clipper_admin.py:182] Application hello-world was successfully registered

Inspect Clipper to see the registered apps

clipper_conn.get_all_apps()

[u'hello-world']

Define a simple model that just returns the sum of each feature vector. Note that the prediction function takes a list of feature vectors as input and returns a list of strings.

def feature_sum(xs):
    return [str(sum(x)) for x in xs]

Import the python deployer package

from clipper_admin.deployers import python as python_deployer

Deploy the "feature_sum" function as a model. Notice that the application and model must have the same input type.

python_deployer.deploy_python_closure(clipper_conn, name="sum-model", version=1, input_type="doubles", func=feature_sum)

17-08-30:15:59:56 INFO     [deployer_utils.py:50] Anaconda environment found. Verifying packages.
17-08-30:16:00:04 INFO     [deployer_utils.py:150] Fetching package metadata .........
Solving package specifications: .

17-08-30:16:00:04 INFO     [deployer_utils.py:151]
17-08-30:16:00:04 INFO     [deployer_utils.py:59] Supplied environment details
17-08-30:16:00:04 INFO     [deployer_utils.py:71] Supplied local modules
17-08-30:16:00:04 INFO     [deployer_utils.py:77] Serialized and supplied predict function
17-08-30:16:00:04 INFO     [python.py:127] Python closure saved
17-08-30:16:00:04 INFO     [clipper_admin.py:375] Building model Docker image with model data from /tmp/python_func_serializations/sum-model
17-08-30:16:00:05 INFO     [clipper_admin.py:378] Pushing model Docker image to sum-model:1
17-08-30:16:00:07 INFO     [docker_container_manager.py:204] Found 0 replicas for sum-model:1. Adding 1
17-08-30:16:00:07 INFO     [clipper_admin.py:519] Successfully registered model sum-model:1
17-08-30:16:00:07 INFO     [clipper_admin.py:447] Done deploying model sum-model:1.

Possible Error If start_clipper() is stuck at this logs, try pip install -U cloudpickle==0.5.3

18-05-21:12:19:59 INFO     [deployer_utils.py:44] Saving function to /tmp/clipper/tmpx6d_zqeq
18-05-21:12:19:59 INFO     [deployer_utils.py:54] Serialized and supplied predict function
18-05-21:12:19:59 INFO     [python.py:192] Python closure saved
18-05-21:12:19:59 INFO     [python.py:206] Using Python 3.6 base image
18-05-21:12:19:59 INFO     [clipper_admin.py:451] Building model Docker image with model data from /tmp/clipper/tmpx6d_zqeq
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': 'Step 1/2 : FROM clipper/python36-closure-container:develop'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': '\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': ' ---> 1aaddfa3945e\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': 'Step 2/2 : COPY /tmp/clipper/tmpx6d_zqeq /model/'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': '\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': ' ---> b7c29f531d2e\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'aux': {'ID': 'sha256:b7c29f531d2eaf59dd39579dbe512538be398dcb5fdd182db14e4d58770d2055'}}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': 'Successfully built b7c29f531d2e\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:455] {'stream': 'Successfully tagged sum-model:1\n'}
18-05-21:12:20:00 INFO     [clipper_admin.py:457] Pushing model Docker image to sum-model:1
18-05-21:12:20:02 INFO     [docker_container_manager.py:247] Found 0 replicas for sum-model:1. Adding 1

It is because of cloudpickle dependency version issue. You may see this error logs from model container docker log.

$ docker logs 439ba722d79a # model container logs. For this example, it will be simple-example model container
Starting Python Closure container
Connecting to Clipper with default port: 7000
Traceback (most recent call last):
  File "/container/python_closure_container.py", line 56, in <module>
    rpc_service.get_input_type())
  File "/container/python_closure_container.py", line 28, in __init__
    self.predict_func = load_predict_func(predict_path)
  File "/container/python_closure_container.py", line 17, in load_predict_func
    return cloudpickle.load(serialized_func_file)
  File "/usr/local/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 1060, in _make_skel_func
    base_globals['__builtins__'] = __builtins__
TypeError: 'str' object does not support item assignment

Tell Clipper to route requests for the "hello-world" application to the "sum-model"

clipper_conn.link_model_to_app(app_name="hello-world", model_name="sum-model")

17-08-30:16:08:50 INFO     [clipper_admin.py:224] Model sum-model is now linked to application hello-world

Your application is now ready to serve predictions

Query Clipper for predictions

Now that you've deployed your first model, you can start requesting predictions at the REST endpoint that clipper created for your application: http://localhost:1337/hello-world/predict

With cURL:

$ curl -X POST --header "Content-Type:application/json" -d '{"input": [1.1, 2.2, 3.3]}' 127.0.0.1:1337/hello-world/predict

With Python:

import requests, json, numpy as np
headers = {"Content-type": "application/json"}
requests.post("http://localhost:1337/hello-world/predict", headers=headers, data=json.dumps({"input": list(np.random.random(10))})).json()

Clean up

If you closed the Python REPL you were using to start Clipper, you will need to start a new Python REPL and create another connection to the Clipper cluster. If you still have the Python REPL session active from earlier, you can re-use your existing ClipperConnection object.

Create a new connection. If you have still have the Python REPL from earlier, you can skip this step.

from clipper_admin import ClipperConnection, DockerContainerManager
clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.connect()

Stop all Clipper docker containers

clipper_conn.stop_all()

17-08-30:16:15:38 INFO     [clipper_admin.py:1141] Stopped all Clipper cluster and all model containers

Contributing

To file a bug or request a feature, please file a GitHub issue. Pull requests are welcome. Additional help and instructions for contributors can be found on our website at http://clipper.ai/contributing.

The Team

You can contact us at [email protected]

Acknowledgements

This research is supported in part by DHS Award HSHQDC-16-3-00083, DOE Award SN10040 DE-SC0012463, NSF CISE Expeditions Award CCF-1139158, and gifts from Ant Financial, Amazon Web Services, CapitalOne, Ericsson, GE, Google, Huawei, Intel, IBM, Microsoft and VMware.

clipper's People

Contributors

Stargazers

Watchers

Forkers

dcrankshaw ljzzju hangelwen codeaudit lucentcosmos corey-zumar almostimplemented giulio-zhou yujialuo anshul-cached jegonzal shaneknapp yunshengb salemameen rishabhbhardwaj nishadsingh1 lucasmoura frank2wang87 sueann delding andyhyh etsangsplk huangyanjuner hrishikeshvganu duzhanyuan rmdort lyntel dubeyabhi07 cfandy weialexzhang phanther feynmanliang hemdec24 mindis gdtm86 wxdublin smrtslckr robinsingh1 andyk wtx626 bigdata2 xc35 ryanchr tanwanirahul the-sea vicavo cube3power lomascolo nikolayvoronchikhin tatevikstep withsmilo vickkyy raytsui stratospark bigrlab reppala yukunzhang santi81 akayeshmantha bdod6 unous1996 sreddybr3 pallavi-rao benzei es1024 prongs kakru jedoe cs99485 dnovokme chenchen-hci jeffwan zmoon111 dalenicholson liuguoyou simon-mo jwirjo haofanwang ryanhoque rbala19 chester-leung yrbahn nduggirala amogkam vikash0837 ankitsd persona-tech okazari dohyunkim-dev gtfierro blackhat06 pelluru mylinyuzhi rogervaas oribaldi jayshenoy daminig jojoyu xunzhang khush-bhatia

clipper's Issues

Pausing an application or a model

Q: Is there a concept of a “paused model / application” in Clipper?
Can you “turn off” a model (container) and turn it back on without deleting & re-deploying?

boost error when executing make

I could successfully make the old clipper source code (cloned on March). But I came across the following boost error when I executed make instruction after I updated the new clipper source code today.

../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state_base::set_exception_at_thread_exit(boost::exception_ptr)': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:434: undefined reference to boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
../libclipper/libclipper.a(query_processor.cpp.o): In function boost::detail::shared_state<void>::set_value_at_thread_exit()': /home/guest/tools/boost-1.60.0/include/boost-1_60/boost/thread/future.hpp:791: undefined reference to boost::detail::make_ready_at_thread_exit(boost::shared_ptrboost::detail::shared_state_base)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/query_frontend] Error 1
make[1]: *** [src/frontends/CMakeFiles/query_frontend.dir/all] Error 2
make: *** [all] Error 2

Error when tried run the clipper

Hi, I ran into two issues when tried the clipper project:

when run the "pip install -r requirements.txt" , the error occur:

Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 377, in move_wheel_files
clobber(source, dest, False, fixer=fixer, filter=filter)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/init.py", line 83, in ensure_dir
os.makedirs(path)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 1] Operation not permitted: '/System/Library/Frameworks/Python.framework/Versions/2.7/share'

this error happened after I ignored installing the "six", the error was "OSError: [Errno 1] Operation not permitted: '/var/folders/wj/wkz200q12ssg05sqrxzqb7sr0000gp/T/pip-JbPkys-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'", Is there any thing I need to do to install all these? this error might something with OS, my OS is macOS Sierra.

2.When I tried to run the “example_client.py”. I got the error message which is:

lTraceback (most recent call last):
File "example_client.py", line 29, in
"-1.0", 40000)
File "/Users/jidai/Downloads/clipper/management/clipper_manager.py", line 294, in register_application
r = requests.post(url, headers=headers, data=req_json)
File "/Library/Python/2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=1338): Max retries exceeded with url: /admin/add_app (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x10d916450>: Failed to establish a new connection: [Errno 61] Connection refused',))

Should I try another host and port and if yes, any suggestions for the host and port? (these dockers are on:

clipper_redis_1,

clipper_mgmt_frontend_1,

clipper_query_frontend_1)

Looking forward to hearing from you.

Basic implementation of RPC Service with ZeroMQ

@Corey-Zumar can you submit a PR for the RPC service implementation (whatever you have) before you go out of town?

Unimplemented `state_key_hash` function.

size_t state_key_hash(const StateKey& key) is declared inside of persistent_state.hpp, but it is never implemented and never called.

Model release cycle

Do you have plans to support the model release cycle? What I mean is a workflow where:

The calling application triggers a model retrain based on deterioration in performance
Then a new model could be mounted in place of the old version.

#1 can be handled by the calling application but #2 is where I wanted to know if it's possible to programmatically upload a new version.

Allow model versions to be strings

There's no real reason to enforce that model versions be numeric. We should allow them to be strings, which for example would allow versions to be SHAs (e.g. git hashes).

Use SemVer for model versioning

Model version is current stored as an int (see https://github.com/ucbrise/clipper/blob/develop/src/libclipper/include/clipper/datatypes.hpp#L14 and https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L453) which makes it difficult to distinguish whether a version bump is a bug-fix or feature and whether it's is backwards compatible.

SemVer could be adopted for managing model versions to better faciliate this.

pthread_mutex_lock error after deploying two applications

The clipper works well when it is deployed one application. When it is deployed two applications ,it also works well for request related to the first deployed application, but it reports the following error for request related to the second deployed application.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffc7fff700 (LWP 59250)] __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66 66 ../nptl/pthread_mutex_lock.c: No such file or directory.

Clipper does not build on OS X 10.10 due to dependency on std::shared_timed_mutex

Full error message:

[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
Scanning dependencies of target clipper
[ 31%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/query_processor.cpp.o
[ 34%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/metrics.cpp.o
[ 36%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/selection_policies.cpp.o
[ 39%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/task_executor.cpp.o
[ 42%] Building CXX object src/libclipper/CMakeFiles/clipper.dir/src/rpc_service.cpp.o
[ 44%] Linking CXX static library libclipper.a
[ 57%] Built target clipper
Scanning dependencies of target frontendtests
Scanning dependencies of target libclippertests
[ 60%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 63%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/test_main.cpp.o
[ 65%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/metrics_test.cpp.o
[ 68%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/rpc_service_test.cpp.o
[ 71%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/timers_test.cpp.o
[ 73%] Linking CXX executable frontendtests
Undefined symbols for architecture x86_64:
  "std::__1::shared_timed_mutex::lock_shared()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::lock()", referenced from:
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::unlock()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      ...
  "std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
      clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/frontends/frontendtests] Error 1
make[2]: *** [src/frontends/CMakeFiles/frontendtests.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
[ 76%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/serialization_test.cpp.o
[ 78%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/persistent_state_test.cpp.o
[ 81%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/redis_test.cpp.o
[ 84%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/config_test.cpp.o
[ 86%] Building CXX object src/libclipper/CMakeFiles/libclippertests.dir/test/selection_policies_test.cpp.o
[ 89%] Linking CXX executable libclippertests
Undefined symbols for architecture x86_64:
  "std::__1::shared_timed_mutex::lock_shared()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::compute_stats() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::lock()", referenced from:
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::insert(long long) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::clear() in libclipper.a(metrics.cpp.o)
  "std::__1::shared_timed_mutex::unlock()", referenced from:
      clipper::metrics::RatioCounter::increment(unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::get_ratio() in libclipper.a(metrics.cpp.o)
      clipper::metrics::RatioCounter::clear() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::tick() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::reset() in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::get_rate_seconds() in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::get_rate_micros() in libclipper.a(metrics.cpp.o)
      ...
  "std::__1::shared_timed_mutex::shared_timed_mutex()", referenced from:
      clipper::metrics::RatioCounter::RatioCounter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned int, unsigned int) in libclipper.a(metrics.cpp.o)
      clipper::metrics::EWMA::EWMA(long, clipper::metrics::LoadAverage) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Meter::Meter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<clipper::metrics::MeterClock>) in libclipper.a(metrics.cpp.o)
      clipper::metrics::Histogram::Histogram(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long) in libclipper.a(metrics.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [src/libclipper/libclippertests] Error 1
make[2]: *** [src/libclipper/CMakeFiles/libclippertests.dir/all] Error 2
make[1]: *** [CMakeFiles/unittests.dir/rule] Error 2
make: *** [unittests] Error 2

The project builds successfully when the std::shared_timed_mutex's are replaced with boost::shared_timed_mutex (which turns out to be the same thing as boost::shared_mutex) but produces the following error when I run the unit tests using run_unittests.sh:

[ 13%] Built target redox
[ 23%] Built target gmock_main
[ 28%] Built target gtest
[ 57%] Built target clipper
[ 65%] Built target managementtests
[ 92%] Built target libclippertests
Scanning dependencies of target frontendtests
[ 94%] Building CXX object src/frontends/CMakeFiles/frontendtests.dir/src/query_frontend_tests.cpp.o
[ 97%] Linking CXX executable frontendtests
[100%] Built target frontendtests
Scanning dependencies of target unittests
[100%] Built target unittests
[==========] Running 41 tests from 8 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from MetricsTests
[ RUN      ] MetricsTests.CounterCorrectness
[       OK ] MetricsTests.CounterCorrectness (0 ms)
[ RUN      ] MetricsTests.RatioCounterCorrectness
Ratio Test Ratio Counter has denominator zero!
Assertion failed: (exclusive), function assert_locked, file /usr/local/include/boost/thread/pthread/shared_mutex.hpp, line 51.
./bin/run_unittests.sh: line 43: 18339 Abort trap: 6           ./src/libclipper/libclippertests

@dcrankshaw @Corey-Zumar
@atumanov Whenever you can, see if the current Clipper project builds for you on OS X 10.11 (I believe you mentioned you had El Capitan installed).

boost error when make

hi, I filed an issues titled by "make can't find the shared_mutex". Dcrankshaw proposed I need to compile with GCC 5.2 or later. And now I compile it with GCC 5.4.0 and run into other problems about boost.

My PC information is
OS: Ubuntu 14.04.3 LTS
gcc: 5.4.0
boost: boost-1.63

[ 73%] Linking CXX executable bench
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::serialization::throw_exception<boost::archive::archive_exception>(boost::archive::archive_exception const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/serialization/throw_exception.hpp:36: undefined reference to boost::archive::archive_exception::archive_exception(boost::archive::archive_exception const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::save_access::save_primitive<boost::archive::binary_oarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_oarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/oserializer.hpp:89: undefined reference to boost::archive::basic_binary_oprimitive<boost::archive::binary_oarchive, char, std::char_traits >::save(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::load_access::load_primitive<boost::archive::binary_iarchive, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::archive::binary_iarchive&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/detail/iserializer.hpp:107: undefined reference to boost::archive::basic_binary_iprimitive<boost::archive::binary_iarchive, char, std::char_traits >::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
../libclipper/libclipper.a(selection_policies.cpp.o): In function void boost::archive::binary_iarchive_impl<boost::archive::binary_iarchive, char, std::char_traits<char> >::load_override<boost::archive::class_name_type>(boost::archive::class_name_type&)': /home/test/tools/boost-1.63/include/boost-1_63/boost/archive/binary_iarchive_impl.hpp:58: undefined reference to boost::archive::basic_binary_iarchiveboost::archive::binary_iarchive::load_override(boost::archive::class_name_type&)'
collect2: error: ld returned 1 exit status
make[2]: *** [src/frontends/bench] Error 1
make[1]: *** [src/frontends/CMakeFiles/bench.dir/all] Error 2
make: *** [all] Error 2

Verify that model containers are serving previously registered models

When a model container connects to Clipper, we should check the Redis model table to ensure that the container represents a registered Clipper model. At a minimum, we should log an error message if this is not the case.

mkdir permission denied issue

I am getting the following error when deploying models

Fatal error: run() received nonzero return code 1 while executing!

Requested: mkdir -p /tmp/clipper-models/mom_predict_model/3
Executed: /bin/bash -l -c "mkdir -p /tmp/clipper-models/mom_predict_model/3"

=============================== Standard output ===============================

mkdir: cannot create directory ‘/tmp/clipper-models/mom_predict_model/3’: Permission denied

================================================================================

Aborting.

How to replicate

Start clipper
Deploy a model
sudo reboot the server
Deploy a new model

Make Clipper compatible with Python 3

While running the tutorials, I realized that Clipper is not supporting Python 3. It will be necessary to update the python files to add support to Python 3 while also maintaining support for Python 2.

It will also be necessary to check the dependencies of the project. For example, fabric is only available for Python 2.

Support for different SSH port

Can you support different sshd port when initializing clipper.

clipper = cm.Clipper(host, PORT_NUMBER, user, key)

When i specify port within the IP, redis attempts to connect to incorrect address

Could not connect to Redis at XXX:XX:XX:XX:2201:6379: nodename nor servname provided, or not known

Model-container implementation documentation

We should have some documentation on the how to implement a model-container

clipper asks for SSH password

When i connect to clipper, even though i am providing the SSH key, clipper still asks for the password. This only happens once.

cm.Clipper(host, user, key)

Maybe something to do with Fabric api? I am not logging in as root, but the user has sudo permission

Volume mounting host paths into docker containers precludes distributed deployment of models

To deploy a model, clipper currently:

On the client running clipper_manager, serializes the prediction function to disk (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L666) and copies the pickled function to the clipper host (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L539) at a magical /tmp/{model_repo}/{name}/{version} file path (https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L497)
On the clipper host, spins up a docker container using model_metadata[container_name] and read-only volume mounts the serialized model from the magical file path https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L955

Reading models from the host's filesystem is not durable, as all models are lost when a single machine fails. Furthermore, it prevents distributed deployment since which physical host a container is scheduled to is not readily guaranteed. Under the current design: a model can only be deployed if its host machine has the model in its filesystem.

Kubernetes provides a volumes concept, but these are backed either by the host filesystem or a cloud provider volume (e.g. AWS EBS), both of which k8s restricts to be accessible from a single pod (EBS can only be mounted to one EC2 instance at a time).

There are a few possible solutions to the problem:

When deploying a model, build a docker image on the client and push the image to a docker registry. Have Redis track the pushed docker image instead of the magical /tmp/{model_repo/... file path. This will make build significantly slower, but improves durability (if the docker registry is durable) and scalability (each replica pulls the image and deploys it with no sharing of global state)
Have deployed containers request the model after container startup (e.g. from S3, or from the clipper host). This keeps builds at the same speed as before (time-cost of predict function serialization), but introduces additional setup costs (S3 bucket setup requires configuration and couples design to AWS) or could also lack durability (if models are still only persisted on the clipper host)

is the tutorial supposed to work?

Remove uid from prediction queries until personalization is supported

Currently, users must supply a uid as part of the prediction request body but it is never used and must always be 0. We should remove the uid field entirely. The simplest way to do this is to just stop checking for uid when we parse the prediction request JSON and hardcode the uid as 0.

Replace this line with

long uid = 0;

and update the corresponding schema here

make can't find the shared_mutex

hi, I configured the clipper environments following the instruction on the github.
I could successfully execute the configure command. But when I executed the "make" command, I came across the following error.

clipper-develop/src/libclipper/include/clipper/persistent_state.hpp:6:35: fatal error: shared_mutex: No such file or directory

Codes of persistent_state.hpp are following:
4 #include
5 #include
6 #include <shared_mutex>

A shared_mutex.hpp file is found in path ''boost-1.63/include/boost-1_63/boost/thread/pthread/ . So I modified the include line to "#include <boost/thread/pthread/shared_mutex.hpp> ". But there came another error referred to boost.

So could anyone give me some suggestion?

Thanks.

Reduce time until container is ready to serve predictions

It now takes ~10 minutes to get a container up and running, ready to serve predictions. We should improve that.

dowload_cifar not working on python3

Because of how urllib is being used, the download_cifar script is not working on python 3, since URLOpener is not availabe since python 3.3.

Should we add support to python 3 to the files related to the tutorials, or just inform that the user should run them with python 2 ?

Duplicate `models_used_` field.

Inside of datatypes.hpp, the Output class has a std::vector<VersionedModelId> models_used_ field. The Response class includes an Output output_ field, but it also includes a std::vector<VersionedModelId> models_used_ field! I'm guessing the repeated models_used_ was an accident?

Add a content type header in the response

Clipper returns json without a proper Content-Type header.

Can we set it to application/json

kubernetes integration

do you plan to allow for kubernetes integration instead of using docker-compose / swarm?

No selection state found for query with user_id: 1

when I send a query request with "uid=0"to the PREDICT port of the registered application, it works well. But if i assign uid with 1 or other value(i.e, uid=1, or uid=2), it reports the following error. What is the variable uid used for?
No selection state found for query with user_id: 1

Codes are similar to the cifar_utils.py in L122
uid=1
url = "http://%s:1337/%s/predict" % (host, app)
json = json.dumps({'uid': uid, 'input': list(x)})

Randomize Redis port selection in unit tests

cc @shaneknapp

stop_all only removes clipper/management_frontend

clipper.stop_all() only removes management_frontend container. It should remove all clipper containers.

Allow redis_persistence_path to read from an existing Redis snapshot

I just realized that forcing redis_persistence_path to not already exist will be problematic if you want to restart the Redis container and have it read from an existing Redis snapshot.

@Corey-Zumar Can you check if you can replicate the Docker container issue I was having when setting redis_persistence_path to /tmp?

Possible `labels_to_str` undefined behavior.

The labels_to_str function dereferences labels.end() - 1 where labels is a vector. If labels is an empty vector than *(label.end() - 1) is undefined behavior (I think).

Support deploy_predict_function running outside Anaconda

If you call deploy_predict_function running outside of an Anaconda environment, we should still serialize the function and try to run it in the python container. Basically, we should wrap this block in an if-statement that saves and checks the anaconda environment if present, otherwise prints a warning to the user and just serializes the function.

JSON response from prediction

Correct me if i am wrong. Currently query_front.hpp returns a string output all the time

Line 145
ss << "qid:" << r.query_id_ << ", predict:" << r.output_.y_hat_;

I want to return JSON response from predict_strings handler in my docker container. Is this currently possible?

Failed to test the tutorial in the clipper source code version

I could run the tutorial successfully in the clipper docker version. But it always reports "No containers found for model tf_cifar:1" when I run the clipper source code version.

The source code clipper was generated in Ubuntu 14.04. (** can clippper work on Ubuntu 14.04?**)

My working steps of the clipper source code version are the same as the clipper docker version ,which are marked as following:

run ./bin/start_clipper.sh to start clipper.
Run the following python code to deploy new model container

################################################################3
cifar_loc="./data"
import sys
import os
sys.path.append(os.path.abspath('../../management/'))
import clipper_manager as cm
import cifar_utils
test_x, test_y = cifar_utils.filter_data(*cifar_utils.load_cifar(cifar_loc, cifar_filename="cifar_test.data", norm=True))

user = ""
key = ""
host = "localhost"
clipper = cm.Clipper(host, user, key)

## The clipper has been stared in source code , so not to start the docker version here
##clipper.start()

app_name = "cifar_demo"
candidate_models = [
{"model_name": "tf_cifar", "model_version": 1},
]

clipper.register_application(
app_name,
candidate_models,
"doubles",
"EXP4",
slo_micros=20000)

model_added = clipper.deploy_model(
"tf_cifar",
1,
os.path.abspath("tf_cifar_model"),
"clipper/tf_cifar_container:latest",
["cifar", "tf"],
"doubles",
num_containers=1
)
print("Model deploy successful? {success}".format(success=model_added))
###############################################################
After executing these codes, it outputs:
Found clipper/tf_cifar_container:latest in Docker hub
Copied model data to host
Published model to Clipper
Model deploy successful? True

So the model container was deployed to clipper successfully.
##################################################################

However, the outputs for " ./bin/start_clipper.sh" are:
[22:38:54.493][info] [REDIS] Successfully issued command "SELECT 2"
[22:38:54.493][info] [REDIS] Successfully issued command "HMSET tf_cifar:1 model_name tf_cifar model_version 1 load 0.000000 input_type doubles labels cifar,tf container_name clipper/tf_cifar_container:latest model_data_path /tmp/clipper-models/tf_cifar/1/tf_cifar_model"
[22:39:24.963][error] [REDIS] Error with command "GET cifar_demo:0:0":
[22:39:24.964][info] [QUERYPR...] Found 1 tasks
[22:39:24.964][info] [TASKEXE...] No active containers found for model tf_cifar:1

AND if I sent a prediction request, it was the random result. Obviously, the new model container wasn't used.

In addition, the following RPC related information was found when I deployed the new container in the clipper docker version. But it can't be found in the clipper source code version. So I guess it maybe some problems to the RPC interface. What's the exact problem ?
[14:33:04.732][info] [RPC] Found message to receive
[14:33:04.732][info] [RPC] New container connected
[14:33:04.732][info] [RPC] Container added
[14:33:04.733][info] [REDIS] Successfully issued command "SELECT 3"
[14:33:04.733][info] [REDIS] Successfully issued command "HMSET tf_cifar:1:0 model_id tf_cifar:1 model_name tf_cifar model_version 1 model_replica_id 0 zmq_connection_id 0 batch_size 1 input_type doubles"

Clipper prediction doesn't seem to match the model prediction before deployment

We're getting mostly 1's for prediction coming from clipper. The model we deployed was giving 0's on some examples, but that is no longer true querying through clipper.

One hypothesis:
The latency requirement for the application is set at 20ms and the /metrics endpoint shows mean latency of 100ms, so maybe it's not serving real predictions to meet the latency goal? Is there a way to check that?

Are there other things we could check?

Also some ideas for making debugging easier:

Some way to easily view errors as metrics -- number of errors, of what type. I'm guessing currently we can look at the clipper & model container logs to get this? (not sure what is currently logged)
Model metadata such as when a model was deployed, what (name, version) a particular model container is using, which model containers are connected to clipper (basically a birds-eye view of the system).

Unloading old models

When a new model is deployed, can clipper unload previous versions of that model ?

We deploy at least 10 versions of a model a day. All the old docker containers hogs the machine. Right now we manually stop the old ones. Is it possible to have an API to undeploy?

port collisions with hard-coded redis ports

the hard-coded port number will cause collisions when the PRB and master build simultaneously.

i suggest something like this in your run_unittests.sh script:

set +e  # turn of exit on command fail
REDIS_PORT=$((34256 + RANDOM % 1000))
lsof -i :$REDIS_PORT &> /dev/null

if [ $? -eq 0 ]; then # existing port in use found
  while true; do
    REDIS_PORT=$(($REDIS_PORT + RANDOM % 1000))
    lsof -i :$REDIS_PORT &> /dev/null
    if [ $? -eq 1 ]; then  # port not in use
      break
    fi
  done
fi
export $REDIS_PORT  # if you want to slurp this in to your test_constants before compilation or something
set -e  # turn exit on fail back on

now you need to update your test_constants.hpp file to respect this new port. a couple of ideas:

export the REDIS_PORT variable from your bash, and upon compilation, slurp it in and set it that way (i think this should work)
stupid unix tricks to the rescue! sed -i "s/34256/$REDIS_PORT/g" /home/jenkins/workspace/$JOB_NAME/src/libclipper/include/clipper/test_constants.hpp

Failing tests when executing run_unittests.sh

Some tests appear to fail when executing the run_unittests.sh script, but in the end, it appears that all the tests were executed without any errors. This can be seen on the following image:

This is also happening in the project build. Is that the expected behavior, or should these tests break the current build ?

safe handling of duplicate register_application requests

Q: what happens if you try to register the same application twice (same name)?

A: The update will overwrite the old application with the new one. I think the place this will cause problems is you will have multiple query handlers registered to the same REST endpoint and I'm not sure which one will get called. We should probably prevent this from happening (which is a 2 line code change).

Return default prediction explanation in query frontend response JSON

We should return a user-readable explanation in the event that Clipper returns a JSON response containing the default prediction (i.e "no active containers available" when no container is available to field the query)

Make clipper_manager use version tag not latest versions of Docker containers

In clipper_manager, all the Docker containers use the latest tag when they should be used the 0.1 tag.

README needs information about redis

When installing clipper and attempting to run ipython tutorial I get the following:

Possibly need documentation about installing redis before running clipper?

Confusing parts of Clipper Manager API

The following are a list of peculiarities I found in the python admin API:

The clipper_admin python module contains the clipper_manager. Is the second clipper needed?
The register_application function takes a model argument but must the model exist? It seems (from the demo) that the model does not need to be present when registering it in an application. This should probably raise an error.
Are selection_policies active in this release? (See inspect_selection_policy)?
What are external_models (See register external models)
The use of labels is a bit unclear.

Restart clipper containers and app containers on server reboot

Upon server reboot, can we restart

Clipper containers - I think this can be by adding restart:always to docker-compose.yaml file
Model containers - If a model is currently active, can we restart the specific containers. Not sure how this can be done.

Can we also move redis storage to a persistent volume in docker-compose. I think i saw a discussion about this on your Jira, but the issue was closed.

services:
  redis:
     volumes:
          - redisdata:/data

query_frontend - very high CPU usage

I am on Google cloud platform server (n1-highmem-2 - 2 vCPUs, 13 GB memory) . No apps running

query_frontend is using 200% of the CPU

Is this normal. Do you need any other logs?

Clipper returns default predictions when a new version of a model is deployed but not connected

When a user updates a model version, Clipper is informed that the model version has changed and it immediately starts to query the new version. However, it can take several minutes for the container for that new version to initialize and connect to Clipper. In that intervening period, Clipper attempts to query the latest version, cannot find any containers for that model version, and instead returns the default prediction.

Instead, it may be desirable for Clipper to wait until the new container has finished initializing and connects to Clipper before switching to the new version.

Why it exits in configure line 221?

Is the "exit 0" in line 221 for testing only? Should it be removed? Thanks!

  # cd into the extracted directory and install
  cd cmake-*
  exit 0

Deploy model causes docker container to exit

To reproduce this

Deploy any model with version no as 1
Try the /predict api to do a simple call
Deploy version 2 of the same model
Try /predict

query_frontend container exits with code 139

clipper/query_frontend:latest        "/clipper/release/..."   38 minutes ago      Exited (139) 26 seconds ago

This is the container log before exiting

[15:17:44.212][info]     [CLIPPER] Adding new container - model: captcha_predict_model, version: 6, ID: 1, input_type: strings
[15:17:44.212][info]  [TASKEXE...] Created queue for new model: captcha_predict_model : 6
[15:17:44.224][info]       [REDIS] Successfully issued command "HMSET captcha_predict_model,6,0 model_id captcha_predict_model:6 model_name captcha_predict_model model_version 6 model_replica_id 0 zmq_connection_id 1 batch_size 1 input_type strings"
[15:17:46.600][info]         [RPC] Found message to receive
[15:17:49.217][info]         [RPC] Found message to receive
[15:17:51.609][info]         [RPC] Found message to receive
[15:17:52.517][info]       [REDIS] Successfully issued command "GET captcha_predict:0:0"
[15:17:52.517][info]  [QUERYPR...] Found 1 tasks
[15:17:52.525][info]         [RPC] Found message to receive

Is there any other log i can look at whats happening. I am using the NoopContainer

Deploying model from s3 fails for clipper running inside non-Debian distros

https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_manager.py#L513 requires dpkg-query, a tool only available in Debian-based linux distros. As a result, dockerized deployments of clipper require the use of a ubuntu/debian base and lighter-weight container distros (e.g. alpine) cannot be used. This definitely impacts container image size and may also introduce performance overhead due to unneeded OS subsystems.

Instead of using fabric to run aws s3, we could use boto to remove this dependence on the OS distro.

Clipper_manager does not support sudo in local mode

Clipper manager should support running Docker commands that require sudo in both local and remote mode. It currently only supports this in remote mode. The relevant code is here and was written this way because Fabric doesn't support running local commands with sudo. We should fix this, but for now one workaround is to add your user to the docker group.