PyTorch Logo

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component Description
torch A Tensor library like NumPy, with strong GPU support
torch.autograd A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Dynamic graph

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast — whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.



Commands to install binaries via Conda or pip wheels are on our website:

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX1/TX2, Jetson Xavier NX/AGX, and Jetson AGX Orin are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source


If you are installing from source, you will need:

  • Python 3.8 or later (for Linux, Python 3.8.1+ is needed)
  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required)

We highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

If you want to compile with CUDA support, select a supported version of CUDA from our support matrix, then install the following:

Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

If you want to compile with ROCm support, install

  • AMD ROCm 4.0 and above installation
  • ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in

Install Dependencies


conda install cmake ninja
# Run this command from the PyTorch directory after cloning the source code using the “Get the PyTorch Source“ section below
pip install -r requirements.txt

On Linux

conda install intel::mkl-static intel::mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from

# (optional) If using torch.compile with inductor/triton, install the matching version of triton
# Run from the pytorch directory after cloning
make triton

On MacOS

# Add this package on intel x86 processor machines only
conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

conda install intel::mkl-static intel::mkl-include
# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Get the PyTorch Source

git clone --recursive
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install PyTorch

On Linux

If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:


If you're compiling for AMD ROCm then first run this command:

# Only run this if you're compiling for ROCm
python tools/amd_build/

Install PyTorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python develop

Aside: If you are using Anaconda, you may experience an error caused by the linker:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

This is caused by ld from the Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.8.1+.

On macOS

python3 develop

On Windows

Choose Correct Visual Studio Version.

PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from The build tools do not come with Visual Studio Code by default.

If you want to build legacy python code, please refer to Building on legacy code and CUDA

CPU-only builds

In this mode PyTorch computations will run on your CPU, not your GPU

conda activate
python develop

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

CUDA based build

In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a. MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations


:: Set the environment variables after you have downloaded and unzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python develop
Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as${your_docker_username}/pytorch

You can also pass the CMAKE_VARS="..." environment variable to specify additional CMake variables to be passed to CMake during the build. See for the list of available variables.


Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Note: if you installed nodejs with a different package manager (e.g., conda) then npm will probably install a version of katex that is not compatible with your version of nodejs and doc builds will fail. A combination of versions that is known to work is [email protected] and [email protected]. To install the latter with npm you can run npm install -g [email protected]

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:



Releases and Contributing

Typically, PyTorch has three minor releases a year. Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page. For more information about PyTorch releases, see Release page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Soumith Chintala, Gregory Chanan, Dmytro Dzhulgakov, Edward Yang, and Nikita Shulga with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.


PyTorch has a BSD-style license, as found in the LICENSE file.

serve's Issues

Problem loading image_classifier handler?

When I do an image classification task, I get the following error:

2020-02-18 23:29:48,871 [INFO ] W-9001-densenet161_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - No module named 'image_classifier'

This is consistent on Mac (latest MacOS) and Linux (Ubuntu 16.04 DL AMI v26). Latest master from the TorchServe repo. The thing still works and I get inference results, but that error message was a surprise.

If things are actually working correctly (which they mostly appear to be, we shouldn't be getting an error message. Conversely, if the error message is describing a legitimate issue, it should be fixed, and we should understand why things are working anyway.

Add example for Custom Service

Add a new example for usage of Custom Service. Given that most user's of TorchServe will have their own custom models, this will help in jump starting things for deploying their models into production.

Incorrect docs for --model-store option

Docs for the torchserve command line at make the claim:

model-store: optional, A location where models are stored by default, all models in this location are loaded, the model name is same as archive or folder name.

This is in reference to the --model-store argument. This is incorrect; if it were true, I should be able to call torchserve --start --model-store model_store (where I have multiple models in the folder model_store) and get endpoints - but I get only 404s when trying to call them.

Document JDK version in install steps

Install stopped working after the latest changes. Failing with "FindBugs rule violations" for modelarchive.

Error details:

(myenv) ubuntu@ip-172-31-18-32:~/pt/serve$ pip install .
Processing /home/ubuntu/pt/serve
Requirement already satisfied: Pillow in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchserve==0.0.1b20200221) (7.0.0)
Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchserve==0.0.1b20200221) (5.6.7)
Processing /home/ubuntu/.cache/pip/wheels/8e/70/28/3d6ccd6e315f65f245da085482a2e1c7d14b90b30f239e2cf4/future-0.18.2-py3-none-any.whl
Requirement already satisfied: torch in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchserve==0.0.1b20200221) (1.4.0)
Requirement already satisfied: torchvision in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchserve==0.0.1b20200221) (0.5.0)
Requirement already satisfied: torchtext in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchserve==0.0.1b20200221) (0.5.0)
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchvision->torchserve==0.0.1b20200221) (1.18.1)
Requirement already satisfied: six in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchvision->torchserve==0.0.1b20200221) (1.14.0)
Requirement already satisfied: tqdm in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchtext->torchserve==0.0.1b20200221) (4.42.1)
Requirement already satisfied: requests in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from torchtext->torchserve==0.0.1b20200221) (2.22.0)
Collecting sentencepiece
Using cached sentencepiece-0.1.85-cp38-cp38-manylinux1_x86_64.whl (1.0 MB)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from requests->torchtext->torchserve==0.0.1b20200221) (1.25.8)
Requirement already satisfied: idna<2.9,>=2.5 in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from requests->torchtext->torchserve==0.0.1b20200221) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from requests->torchtext->torchserve==0.0.1b20200221) (2019.11.28)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages (from requests->torchtext->torchserve==0.0.1b20200221) (3.0.4)
Building wheels for collected packages: torchserve
Building wheel for torchserve ( ... error
ERROR: Command errored out with exit status 1:
command: /home/ubuntu/anaconda3/envs/myenv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-srpt9i6p/'"'"'; file='"'"'/tmp/pip-req-build-srpt9i6p/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-kibaudnl
cwd: /tmp/pip-req-build-srpt9i6p/
Complete output (135 lines):
running bdist_wheel
running build
running build_py
running build_frontend

Task :cts:clean
Task :modelarchive:clean

Task :server:killServer
No server running!

Task :server:clean
Task :cts:compileJava NO-SOURCE
Task :cts:processResources NO-SOURCE
Task :cts:classes UP-TO-DATE
Task :cts:jar
Task :cts:assemble
Task :cts:checkstyleMain NO-SOURCE
Task :cts:compileTestJava NO-SOURCE
Task :cts:processTestResources NO-SOURCE
Task :cts:testClasses UP-TO-DATE
Task :cts:checkstyleTest NO-SOURCE
Task :cts:findbugsMain NO-SOURCE
Task :cts:findbugsTest NO-SOURCE
Task :cts:test NO-SOURCE
Task :cts:jacocoTestCoverageVerification SKIPPED
Task :cts:jacocoTestReport SKIPPED
Task :cts:pmdMain NO-SOURCE
Task :cts:pmdTest SKIPPED
Task :cts:verifyJava
Task :cts:check
Task :cts:build
Task :modelarchive:compileJava
Task :modelarchive:processResources NO-SOURCE
Task :modelarchive:classes
Task :modelarchive:jar
Task :modelarchive:assemble
Task :modelarchive:checkstyleMain
Task :modelarchive:compileTestJava
Task :modelarchive:processTestResources
Task :modelarchive:testClasses
Task :modelarchive:checkstyleTest

Task :modelarchive:findbugsMain
The following classes needed for analysis were missing:

Task :modelarchive:findbugsMain FAILED

FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':modelarchive:findbugsMain'.

FindBugs rule violations were found. See the report at: file:///tmp/pip-req-build-srpt9i6p/frontend/modelarchive/build/reports/findbugs/main.html

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

  • Get more help at

13 actionable tasks: 13 executed
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-req-build-srpt9i6p/", line 137, in
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/setuptools/", line 144, in setup
return distutils.core.setup(**attrs)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 148, in setup
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 966, in run_commands
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/wheel/", line 223, in run
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/command/", line 135, in run
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
File "/tmp/pip-req-build-srpt9i6p/", line 98, in run
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
File "/tmp/pip-req-build-srpt9i6p/", line 85, in run
subprocess.check_call('frontend/gradlew -p frontend clean build', shell=True)
File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'frontend/gradlew -p frontend clean build' returned non-zero exit status 1.

ERROR: Failed building wheel for torchserve
Running clean for torchserve
Failed to build torchserve
Installing collected packages: future, torchserve, sentencepiece
Running install for torchserve ... error
ERROR: Command errored out with exit status 1:
command: /home/ubuntu/anaconda3/envs/myenv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-srpt9i6p/'"'"'; file='"'"'/tmp/pip-req-build-srpt9i6p/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mk_hdekw/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/anaconda3/envs/myenv/include/python3.8/torchserve
cwd: /tmp/pip-req-build-srpt9i6p/
Complete output (135 lines):
running install
running build
running build_py
running build_frontend
> Task :cts:clean
> Task :modelarchive:clean

> Task :server:killServer
No server running!

> Task :server:clean UP-TO-DATE
> Task :cts:compileJava NO-SOURCE
> Task :cts:processResources NO-SOURCE
> Task :cts:classes UP-TO-DATE
> Task :cts:jar
> Task :cts:assemble
> Task :cts:checkstyleMain NO-SOURCE
> Task :cts:compileTestJava NO-SOURCE
> Task :cts:processTestResources NO-SOURCE
> Task :cts:testClasses UP-TO-DATE
> Task :cts:checkstyleTest NO-SOURCE
> Task :cts:findbugsMain NO-SOURCE
> Task :cts:findbugsTest NO-SOURCE
> Task :cts:test NO-SOURCE
> Task :cts:jacocoTestCoverageVerification SKIPPED
> Task :cts:jacocoTestReport SKIPPED
> Task :cts:pmdMain NO-SOURCE
> Task :cts:pmdTest SKIPPED
> Task :cts:verifyJava
> Task :cts:check
> Task :cts:build
> Task :modelarchive:compileJava
> Task :modelarchive:processResources NO-SOURCE
> Task :modelarchive:classes
> Task :modelarchive:jar
> Task :modelarchive:assemble
> Task :modelarchive:checkstyleMain
> Task :modelarchive:compileTestJava
> Task :modelarchive:processTestResources
> Task :modelarchive:testClasses
> Task :modelarchive:checkstyleTest

> Task :modelarchive:findbugsMain FAILED
The following classes needed for analysis were missing:

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':modelarchive:findbugsMain'.
> FindBugs rule violations were found. See the report at: file:///tmp/pip-req-build-srpt9i6p/frontend/modelarchive/build/reports/findbugs/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at

13 actionable tasks: 12 executed, 1 up-to-date
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-srpt9i6p/", line 137, in <module>
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/setuptools/", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 148, in setup
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 966, in run_commands
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/site-packages/setuptools/command/", line 61, in run
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/command/", line 545, in run
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/command/", line 135, in run
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
  File "/tmp/pip-req-build-srpt9i6p/", line 98, in run
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 313, in run_command
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/distutils/", line 985, in run_command
  File "/tmp/pip-req-build-srpt9i6p/", line 85, in run
    subprocess.check_call('frontend/gradlew -p frontend clean build', shell=True)
  File "/home/ubuntu/anaconda3/envs/myenv/lib/python3.8/", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'frontend/gradlew -p frontend clean build' returned non-zero exit status 1.

ERROR: Command errored out with exit status 1: /home/ubuntu/anaconda3/envs/myenv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-srpt9i6p/'"'"'; file='"'"'/tmp/pip-req-build-srpt9i6p/'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);'"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mk_hdekw/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/anaconda3/envs/myenv/include/python3.8/torchserve Check the logs for full command output

On running server, workers die

I finally got the server to start, but when I do, I see a lot of this kind of thrash:

2019-12-19 12:01:32,130 [DEBUG] W-9006-densenet161 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
	at java.util.concurrent.ArrayBlockingQueue.poll(
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
2019-12-19 12:01:32,130 [WARN ] W-9006-densenet161 org.pytorch.serve.wlm.BatchAggregator - Load model failed: densenet161, error: Worker died.
2019-12-19 12:01:32,130 [DEBUG] W-9006-densenet161 org.pytorch.serve.wlm.WorkerThread - W-9006-densenet161 State change WORKER_STARTED -> WORKER_STOPPED
2019-12-19 12:01:32,131 [INFO ] W-9006-densenet161 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9006 in 55 seconds.

Java 1.8.0 Python 3.6.9 macOS 10.14.6

TorchServe failing to batch multi-image requests

My endpoint is configured thus:

    "modelName": "d161good",
    "modelVersion": "1.0",
    "modelUrl": "d161good.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 4,
    "maxBatchDelay": 5000,

When I make a multi-file request, it processes both inputs correctly, but does them as separate batches. For example, when I use the following command line:

curl -X POST -T "{kitten.jpg,kitten2.jpg}"

I get the following log:

2020-02-21 16:09:24,419 [INFO ] W-9000-d161good_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 271
2020-02-21 16:09:24,419 [INFO ] W-9000-d161good_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:270.15|#ModelName:d161good,Level:Model|#hostname:bradheintz-mbp,requestID:3cb6117f-bcf7-4195-b515-965f9ab45e73,timestamp:1582330164
2020-02-21 16:09:24,419 [INFO ] W-9000-d161good_1.0 ACCESS_LOG - / "POST /predictions/d161good HTTP/1.1" 200 5275
2020-02-21 16:09:24,419 [INFO ] W-9000-d161good_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:bradheintz-mbp,timestamp:null
2020-02-21 16:09:24,419 [DEBUG] W-9000-d161good_1.0 org.pytorch.serve.wlm.Job - Waiting time: 5002, Backend time: 273
2020-02-21 16:09:29,704 [INFO ] W-9000-d161good_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 276
2020-02-21 16:09:29,704 [INFO ] W-9000-d161good_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:275.3|#ModelName:d161good,Level:Model|#hostname:bradheintz-mbp,requestID:af4897b6-7b65-4b13-b650-52add0a75147,timestamp:1582330169
2020-02-21 16:09:29,704 [INFO ] W-9000-d161good_1.0 ACCESS_LOG - / "POST /predictions/d161good HTTP/1.1" 200 5283
2020-02-21 16:09:29,705 [INFO ] W-9000-d161good_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:bradheintz-mbp,timestamp:null
2020-02-21 16:09:29,705 [DEBUG] W-9000-d161good_1.0 org.pytorch.serve.wlm.Job - Waiting time: 5004, Backend time: 278

Note that it is handling them as separate requests, and not batching them - it's waiting for the maxBatchDelay to run down before each file passed in the single request.

I've verified this up to 5 files.

IllegalMonitorState Exceptions on running benchmark tests

Seeing many illegalmonitorstate exceptions on running the benchmarks tests. Tested on p2.8xlarge
with the default configurations using the in the benchmark folder and adding model store path to it.


(pytorch_p36) ubuntu@ip-172-31-41-247:~/serve/benchmarks$ python throughput --ts
Running benchmark throughput with model resnet-18
Processing jmeter output
Output available at /tmp/TSBenchmark/out/throughput/resnet-18
Report generated at /tmp/TSBenchmark/out/throughput/resnet-18/report/index.html
[{'throughput_resnet-18_Inference_Request_Average': 670,
'throughput_resnet-18_Inference_Request_Median': 526,
'throughput_resnet-18_Inference_Request_Throughput': 137.0,
'throughput_resnet-18_Inference_Request_aggregate_report_90_line': 681,
'throughput_resnet-18_Inference_Request_aggregate_report_99_line': 8585,
'throughput_resnet-18_Inference_Request_aggregate_report_error': '0.00%'}]

Jmeter reports: Don't show any errors

Even though hits/sec reaches 200/sec, in the metrics logged on the console there is never more than one active request at any time. So unclear if the metrics are getting logged properly. The Requests2xx.Count is always 1 as below:

2020-03-07 01:58:50,800 [INFO ] W-9015-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null

Errors in TrochServe console logs
2020-03-07 01:58:51,109 [INFO ] W-9008-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null
2020-03-07 01:58:51,109 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.Job - Waiting time: 0, Backend time: 38
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 40
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 ACCESS_LOG - / "POST /predictions/resnet-18 HTTP/1.1" 200 50
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null
2020-03-07 01:58:51,115 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.Job - Waiting time: 0, Backend time: 49
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:36.98|#ModelName:resnet-18,Level:Model|#hostname:ip-172-31-41-247,requestID:63c3d128-d2ab-4e60-8ab7-fa9905b6de93,timestamp:1583546331
2020-03-07 01:58:51,130 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.ModelVersionedRefs - Removed model: resnet-18 version: 1.0
2020-03-07 01:58:51,131 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9031-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,133 [WARN ] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:51,145 [INFO ] epollEventLoopGroup-4-32 org.pytorch.serve.wlm.WorkerThread - 9031 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,146 [DEBUG] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9031-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,148 [DEBUG] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,315 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9030-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,316 [WARN ] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:51,317 [INFO ] epollEventLoopGroup-4-29 org.pytorch.serve.wlm.WorkerThread - 9030 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,317 [DEBUG] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9030-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,318 [DEBUG] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,584 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9029-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,585 [WARN ] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:51,585 [INFO ] epollEventLoopGroup-4-31 org.pytorch.serve.wlm.WorkerThread - 9029 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,586 [DEBUG] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9029-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,587 [DEBUG] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,855 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9028-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,855 [WARN ] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:51,856 [INFO ] epollEventLoopGroup-4-28 org.pytorch.serve.wlm.WorkerThread - 9028 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,857 [DEBUG] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9028-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,858 [DEBUG] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,124 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9027-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,125 [WARN ] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:52,126 [INFO ] epollEventLoopGroup-4-30 org.pytorch.serve.wlm.WorkerThread - 9027 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,127 [DEBUG] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9027-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,129 [DEBUG] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,394 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9026-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,394 [WARN ] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:52,395 [INFO ] epollEventLoopGroup-4-26 org.pytorch.serve.wlm.WorkerThread - 9026 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,395 [DEBUG] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9026-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,396 [DEBUG] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,664 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9025-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,664 [WARN ] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:52,665 [INFO ] epollEventLoopGroup-4-27 org.pytorch.serve.wlm.WorkerThread - 9025 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,665 [DEBUG] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9025-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,666 [DEBUG] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,932 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9024-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,932 [WARN ] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:52,933 [INFO ] epollEventLoopGroup-4-24 org.pytorch.serve.wlm.WorkerThread - 9024 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,933 [DEBUG] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9024-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,934 [DEBUG] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,202 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9023-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,202 [WARN ] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:53,203 [INFO ] epollEventLoopGroup-4-25 org.pytorch.serve.wlm.WorkerThread - 9023 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,203 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9023-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,204 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,359 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9022-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,359 [WARN ] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:53,360 [INFO ] epollEventLoopGroup-4-17 org.pytorch.serve.wlm.WorkerThread - 9022 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,360 [DEBUG] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9022-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,361 [DEBUG] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,626 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9021-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,626 [WARN ] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:53,627 [INFO ] epollEventLoopGroup-4-19 org.pytorch.serve.wlm.WorkerThread - 9021 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,627 [DEBUG] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9021-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,628 [DEBUG] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,892 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9020-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,893 [WARN ] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:53,893 [INFO ] epollEventLoopGroup-4-21 org.pytorch.serve.wlm.WorkerThread - 9020 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,893 [DEBUG] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9020-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,895 [DEBUG] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,158 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9019-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,158 [WARN ] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:54,159 [INFO ] epollEventLoopGroup-4-18 org.pytorch.serve.wlm.WorkerThread - 9019 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,159 [DEBUG] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9019-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,159 [DEBUG] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,427 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9018-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,428 [INFO ] epollEventLoopGroup-4-4 org.pytorch.serve.wlm.WorkerThread - 9018 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9018-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,696 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9017-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,697 [WARN ] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:54,698 [INFO ] epollEventLoopGroup-4-9 org.pytorch.serve.wlm.WorkerThread - 9017 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,698 [DEBUG] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9017-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,698 [DEBUG] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,964 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9016-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,965 [WARN ] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:54,965 [INFO ] epollEventLoopGroup-4-13 org.pytorch.serve.wlm.WorkerThread - 9016 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,965 [DEBUG] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9016-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,965 [DEBUG] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,235 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9015-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,235 [WARN ] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:55,235 [INFO ] epollEventLoopGroup-4-22 org.pytorch.serve.wlm.WorkerThread - 9015 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,235 [DEBUG] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9015-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,236 [DEBUG] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,389 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9014-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,389 [WARN ] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:55,390 [INFO ] epollEventLoopGroup-4-5 org.pytorch.serve.wlm.WorkerThread - 9014 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,390 [DEBUG] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9014-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,390 [DEBUG] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,654 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9013-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,654 [WARN ] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:55,654 [INFO ] epollEventLoopGroup-4-8 org.pytorch.serve.wlm.WorkerThread - 9013 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,655 [DEBUG] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9013-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,655 [DEBUG] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,919 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9012-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,919 [WARN ] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:55,920 [INFO ] epollEventLoopGroup-4-11 org.pytorch.serve.wlm.WorkerThread - 9012 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,920 [DEBUG] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9012-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,920 [DEBUG] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,185 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9011-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,185 [WARN ] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:56,185 [INFO ] epollEventLoopGroup-4-14 org.pytorch.serve.wlm.WorkerThread - 9011 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,186 [DEBUG] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9011-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,187 [DEBUG] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,449 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9010-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,449 [WARN ] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:56,449 [INFO ] epollEventLoopGroup-4-20 org.pytorch.serve.wlm.WorkerThread - 9010 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,450 [DEBUG] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9010-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,450 [DEBUG] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,712 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9009-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,712 [INFO ] epollEventLoopGroup-4-15 org.pytorch.serve.wlm.WorkerThread - 9009 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,712 [WARN ] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:56,713 [DEBUG] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9009-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,713 [DEBUG] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,976 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9008-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,976 [INFO ] epollEventLoopGroup-4-1 org.pytorch.serve.wlm.WorkerThread - 9008 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,976 [WARN ] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:56,977 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9008-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,977 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,240 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9007-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,240 [INFO ] epollEventLoopGroup-4-23 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,240 [WARN ] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:57,240 [DEBUG] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9007-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,240 [DEBUG] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,391 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9006-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,391 [INFO ] epollEventLoopGroup-4-2 org.pytorch.serve.wlm.WorkerThread - 9006 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,391 [WARN ] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:57,391 [DEBUG] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9006-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,392 [DEBUG] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,655 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9005-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,656 [INFO ] epollEventLoopGroup-4-6 org.pytorch.serve.wlm.WorkerThread - 9005 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,656 [WARN ] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:57,656 [DEBUG] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9005-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,656 [DEBUG] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,919 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9004-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,919 [INFO ] epollEventLoopGroup-4-16 org.pytorch.serve.wlm.WorkerThread - 9004 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,919 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:57,920 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9004-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,920 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,182 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9003-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,182 [WARN ] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:58,182 [INFO ] epollEventLoopGroup-4-7 org.pytorch.serve.wlm.WorkerThread - 9003 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,183 [DEBUG] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,183 [DEBUG] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,443 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9002-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,443 [WARN ] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
at java.util.concurrent.locks.ReentrantLock.unlock(
at org.pytorch.serve.wlm.Model.pollBatch(
at org.pytorch.serve.wlm.BatchAggregator.getRequest(
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2020-03-07 01:58:58,443 [INFO ] epollEventLoopGroup-4-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,444 [DEBUG] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,444 [DEBUG] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,706 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9001-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,706 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:58,706 [INFO ] epollEventLoopGroup-4-10 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,707 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,707 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,968 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,968 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:58,968 [INFO ] epollEventLoopGroup-4-12 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,968 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,969 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:59,344 [INFO ] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.ModelManager - Model resnet-18 unregistered.

Corresponding GPU / CPU utilizations via Cloudwatch (using the script from AWS)
Screen Shot 2020-03-06 at 6 38 39 PM

Security issue loading model archives from arbitrary URL

There is a significant security issue with fetching new models via URL, in that model archives may contain arbitrary code, which will be run by TorchServe with no verification. Vulnerabilities include replacing a model at a target URL (with one containing malicious code), or registering such a model by getting access to the Management API, and active MITM attacks that insert their own model archives. (Other attacks are possible, such as inducing someone to pull a compromised model archive from the command line, e.g. via social engineering methods.) Model archive code has access to anything the TorchServe process does, including the model store directory, which may contain significant IP for the company running TorchServe. If the server is not locked down well, it could be a beachhead for a more aggressive network incursion.

Suggested mitigations (from security engineers and others within Facebook):

  • Only allow fetching model archives from a whitelist of URLs in the server config.
  • Require all such hosts to be https:// and perform proper cert validation and give the option for cert pinning
  • Checksum or code signing on model archives
  • Authentication on the Management API

This will not be a launch blocker for the experimental release, but will be for 1.0.

Can't log custom metrics

I attempted to log custom metrics according to the docs. I did the following:

At the top of my custom model handler:

import ts
from ts.metrics import dimension

In the inference method:

t = time.process_time()
dim1 = dimension('byzantine pernicious', t)
self.metrics.add_metric('byzantine pernicious', t + 1, dimensions=[dim1]) # test custom metrics

This fails with the error:

2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Invoking custom service failed.
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ts/", line 100, in predict
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/d37a667068d56c2de0b1c0a38aec56aa152ca6ba/", line 334, in handle
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     raise e
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/d37a667068d56c2de0b1c0a38aec56aa152ca6ba/", line 329, in handle
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     data = _service.inference(data)
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/d37a667068d56c2de0b1c0a38aec56aa152ca6ba/", line 300, in inference
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 67
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     dim1 = dimension('byzantine pernicious', t)
2020-02-21 00:24:26,537 [INFO ] W-9023-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - TypeError: 'module' object is not callable

It's not really clear to me what's wrong there. I tried using it without the allegedly optional dimensions argument, and got:

2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Invoking custom service failed.
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ts/", line 100, in predict
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/829d527168600113f021d498f082c0e95ba63c6c/", line 333, in handle
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     raise e
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/829d527168600113f021d498f082c0e95ba63c6c/", line 328, in handle
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     data = _service.inference(data)
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/829d527168600113f021d498f082c0e95ba63c6c/", line 300, in inference
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.metrics.add_metric('byzantine pernicious', t) # test custom metrics
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 67
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ts/metrics/", line 201, in add_metric
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self._add_or_update(name, value, req_id, unit, dimensions)
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ts/metrics/", line 58, in _add_or_update
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     dim_str = '-'.join(dim_str)
2020-02-21 00:34:22,018 [INFO ] W-9017-my_tc_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - TypeError: sequence item 1: expected str instance, NoneType found

Examining the code made it clear that the dimensions arg was not actually optional.

Also, the documentation is ambiguous as te whether the correct object name is dimension or Dimension. It looks to bet he former.

`torch.eval()` call in the wrong place

The various included model handlers include calls to model.eval(), which is good - we want the model put in that mode for inference.

Where this call really belongs, though, is in the parent class for all handlers, where the model gets loaded. In fact, calling model.eval() should probably be the next thing that happens after the model loads - someplace like line 61 in the current version of

This protects against someone forgetting to add the call (and unwittingly hobbling their performance), and doesn't stop them from calling model.train() if for some reason that's what they really need.

Question: Should new model version always become default?

If I put up an image classifier model with a version of 1.0, then register another with the same name and a version of 1.1, version 1.1 automatically becomes the default. Is this intentional?

Part of the point of versioning is to be able to test new versions without taking down the old version (or appearing to by taking over the default URL). Based on this, it would seem that the current behavior is broken.

One possible fix: Include a management API config flag that determines whether the new version should become default. I'm not even all that concerned with what that flag would default to, as long as I have a way to register v1.1 without taking "default" status away from v1.0.

Error installing

Error during install while running tests:

    2019-12-16 13:55:36,353 [WARN ] W-9008-noop-stderr org.pytorch.serve.wlm.WorkerLifeCycle -     import psutil
    2019-12-16 13:55:36,353 [WARN ] W-9008-noop-stderr org.pytorch.serve.wlm.WorkerLifeCycle - ModuleNotFoundError: No module named 'psutil'

Looks like a Python dependency issue.

Following instructions, model server does not start

» torchserve --start --model-store model_store --models densenet161=densenet161.mar
» Error: Could not find or load main class org.pytorch.serve.ModelServer

Install prereqs are met:

» python --version
Python 3.6.9 :: Anaconda, Inc.
» java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

Also, java is in my PATH.

ResNet-152 example code does suboptimal things with tensors and loops

In examples/image_classifier/resnet_152_batch/, in the preprocess() and postprocess() methods, tensors are looped over and subtensors/elements are handled individually. This processing will probably be faster and take less code using tensor broadcasting semantics and a small amount of refactoring. I'll handle it when I have a little time.

TorchServe ignores batch config properties

In my file, I have the lines:


I started TorchServe with the command line:

torchserve --start --ts-config --models d161good=d161good.mar  --model-store model_store

When I query the status of the endpoint with curl, I get:

    "modelName": "d161good",
    "modelVersion": "1.0",
    "modelUrl": "d161good.mar",
    "runtime": "python",
    "minWorkers": 12,
    "maxWorkers": 12,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,

Note the "batchSize" and "maxBatchDelay" entries.

torch-model-archiver Installation

Despite what the instructions say, I don't see any installation instructions for torch-model-archiver - and it's needed for the quick start.

It was simple enough to cd to the right folder and run, but this should still be called out correctly in the quick start.

config steps for serving via public-ip

The default configuration works for serving models on localhost. In order to run predictions on models via the public-ip (making calls from a different host) one has to specify or the explicit IP-address through the --ts-config file settings. This is not very clear in the documentation. Add clear instructions for new users to get started, otherwise can take a while to figure out why predictions are not working even after opening up the external security ports on the machine.

benchmark dependencies install script failing on fresh ubuntu 18.04

The '' script is not working on a fresh ubuntu 18.04 machine. Tested for both GPU and CPU installs.

Error details:

  • brew install jmeter --with-plugins
    Usage: brew install [options] formula
    Error: invalid option: --with-plugins
  • true
  • wget -O /home/ubuntu/.linuxbrew/Cellar/jmeter/5.2.1/libexec/lib/ext/jmeter-plugins-manager-1.3.jar
    /home/ubuntu/.linuxbrew/Cellar/jmeter/5.2.1/libexec/lib/ext/jmeter-plugins-manager-1.3.jar: No such file or directory

Conda preferable to pip

Anaconda is the preferred method for installing PyTorch & Torchvision - it would be good if the installation instructions mirrored that.

Unable to install on GPU machine

pip install . command failed with errors when installing on a ubuntu 18.04 GPU server (aws p3.8xlarge). The gradle tests as part of the install are failing with many Backend worker monitoring thread interrupted or backend worker process died errors in the logs.

Error logs are attached.

Java concurrency crash when attempting batch processing

I have an endpoint configured thus:

    "modelName": "d161good",
    "modelVersion": "1.0",
    "modelUrl": "d161good.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 4,
    "maxBatchDelay": 5000,

When I pass in a single image, this behaves as expected. It takes about 5 seconds to return while it waits for a batch, gives up, and processes the request.

It fails consistently when passed multiple requests in rapid succession (much less than maxBatchDelay), e.g.:

curl -X POST -T kitten.jpg &
curl -X POST -T kitten2.jpg &
curl -X POST -T kitten3.jpg &
curl -X POST -T nickcage1.jpg &

The error from the logs:

2020-02-21 15:38:09,027 [INFO ] W-9000-d161good_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 343
2020-02-21 15:38:09,027 [INFO ] W-9000-d161good_1.0 ACCESS_LOG - / "POST /predictions/d161good HTTP/1.1" 503 354
2020-02-21 15:38:09,027 [INFO ] W-9000-d161good_1.0 TS_METRICS - Requests5XX.Count:1|#Level:Host|#hostname:bradheintz-mbp,timestamp:null
2020-02-21 15:38:09,028 [DEBUG] W-9000-d161good_1.0 org.pytorch.serve.wlm.Job - Waiting time: 1, Inference time: 355
  "code": 503,
  "type": "InternalServerException",
  "message": "number of batch response mismatched"

[2]    57291 done       curl -X POST -T kitten2.jpg
2020-02-21 15:38:09,029 [WARN ] W-9000-d161good_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
    at java.util.LinkedHashMap$LinkedHashIterator.nextNode(
    at java.util.LinkedHashMap$
    at org.pytorch.serve.wlm.BatchAggregator.sendResponse(
    at java.util.concurrent.Executors$
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
2020-02-21 15:38:09,030 [ERROR] W-9000-d161good_1.0 org.pytorch.serve.wlm.BatchAggregator - Unexpected job: f006ec22-7c7f-4f94-8c3a-2b53692081e8

The java.util.ConcurrentModificationException is a 100% repro for me, with as few as two requests.

Model Archiver instructions incomplete & incorrect in README

In /, in the "Serve a Model" section, there is a command line for creating a model archive for the downloaded DenseNet-161 model:

torch-model-archiver --model-name densenet161 --model-file serve/examples/densenet_161/ --serialized-file densenet161-8d451a50.pth --extra-files serve/examples/index_to_name.json

This fails with the message:

torch-model-archiver: error: the following arguments are required: --handler

Also, the paths are incorrect - it should be examples/image_classifier/densenet_161 for both of the specified paths.

Benchmarks have dependency on Mxnet

if [[ $1 = True ]]
        echo "Installing pip packages for GPU"
        sudo apt install -y nvidia-cuda-toolkit
        pip install future psutil mxnet-cu92 pillow --user
        echo "Installing pip packages for CPU"
        pip install future psutil mxnet pillow --user

Command line does not allow starting with multiple model versions

In my model store, I have three model archives:

  • name densenet161, version 1.0, file densenet161.mar
  • name densenet161, version 1.1, file densenet161a.mar
  • name tsd161, version 1.0, file tsd161.mar

I attempted to use this command line to start the server:

torchserve --start --model-store model_store --models densenet161=densenet161.mar densenet161=densenet161a.mar tsd161=tsd161.mar

Expected behavior: There would be two versions of the densenet161 endpoint, as specified in the two model archive files.

Actual behavior from curl http://localhost:8081/models:

  "models": [
      "modelName": "densenet161",
      "modelUrl": "densenet161a.mar"
      "modelName": "tsd161",
      "modelUrl": "tsd161.mar"

Restricting worker env var access should use whitelist, not blacklist

When restricting access to something with a functionally infinite address space - like the namespace of env vars - blacklisting is a poor practice. Every new env variable that the blacklist doesn't know about becomes a new potential vulnerability. It is impossible for the blacklist to capture all possible invalid inputs that it might want to restrict.

A better practice is to use whitelisting. It is possible to know a priori which env vars the worker needs, and hopefully a runtime failure will alert the user to an attempt to access a non-whitelisted variable.

Run AWS DeepLearning Benchmarks agains TS containers

Config changes are not preserved

Previous high-priority item from the last round of feedback:

[p0] For Operational ease, model management to support at a minimum, the ability to add new models dynamically via the API and being able to preserve those changes on restarting the server along with monitoring and tracing capabilities.

AFAICT, this is not happening.

Serve installation fails on SageMaker Notebook and Cloud9

Following steps in Readme and the following errors occur - environment is Cloud9 and Amazon SageMaker Notebooks.
(venv) chzar:~/environment/serve/serve-setup/serve (master) $ pip install .
Processing /home/ec2-user/environment/serve/serve-setup/serve
Requirement already satisfied: Pillow in /home/ec2-user/environment/serve/serve-setup/venv/lib/python3.6/dist-packages (from torchserve==0.0.1b20200308) (7.0.0)
Collecting psutil (from torchserve==0.0.1b20200308)
Downloading (449kB)
100% |████████████████████████████████| 450kB 20.8MB/s
Collecting future (from torchserve==0.0.1b20200308)
Downloading (829kB)
100% |████████████████████████████████| 829kB 21.7MB/s
Requirement already satisfied: torch in /home/ec2-user/environment/serve/serve-setup/venv/lib/python3.6/dist-packages (from torchserve==0.0.1b20200308) (1.4.0)
Requirement already satisfied: torchvision in /home/ec2-user/environment/serve/serve-setup/venv/lib/python3.6/dist-packages (from torchserve==0.0.1b20200308) (0.5.0)
Collecting torchtext (from torchserve==0.0.1b20200308)
Downloading (73kB)
100% |████████████████████████████████| 81kB 23.4MB/s
Requirement already satisfied: six in /home/ec2-user/environment/serve/serve-setup/venv/lib/python3.6/dist-packages (from torchvision->torchserve==0.0.1b20200308) (1.14.0)
Requirement already satisfied: numpy in /home/ec2-user/environment/serve/serve-setup/venv/lib/python3.6/dist-packages (from torchvision->torchserve==0.0.1b20200308) (1.18.1)
Collecting sentencepiece (from torchtext->torchserve==0.0.1b20200308)
Downloading (1.0MB)
100% |████████████████████████████████| 1.0MB 21.5MB/s
Collecting requests (from torchtext->torchserve==0.0.1b20200308)
Downloading (58kB)
100% |████████████████████████████████| 61kB 23.7MB/s
Collecting tqdm (from torchtext->torchserve==0.0.1b20200308)
Downloading (59kB)
100% |████████████████████████████████| 61kB 24.2MB/s
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests->torchtext->torchserve==0.0.1b20200308)
Downloading (125kB)
100% |████████████████████████████████| 133kB 32.5MB/s
Collecting certifi>=2017.4.17 (from requests->torchtext->torchserve==0.0.1b20200308)
Downloading (156kB)
100% |████████████████████████████████| 163kB 31.7MB/s
Collecting chardet<4,>=3.0.2 (from requests->torchtext->torchserve==0.0.1b20200308)
Downloading (133kB)
100% |████████████████████████████████| 143kB 32.0MB/s
Collecting idna<3,>=2.5 (from requests->torchtext->torchserve==0.0.1b20200308)
Downloading (58kB)
100% |████████████████████████████████| 61kB 26.6MB/s
Installing collected packages: psutil, future, sentencepiece, urllib3, certifi, chardet, idna, requests, tqdm, torchtext, torchserve
Running install for psutil ... done
Running install for future ... done
Running install for torchserve ... error
Complete output from command /home/ec2-user/environment/serve/serve-setup/venv/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-lpcdwxb3/';f=getattr(tokenize, 'open', open)(file);'\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-3aonh65v/install-record.txt --single-version-externally-managed --compile --install-headers /home/ec2-user/environment/serve/serve-setup/venv/include/site/python3.6/torchserve:
running install
running build
running build_py
running build_frontend
Unzipping /home/ec2-user/.gradle/wrapper/dists/gradle-4.9-bin/e9cinqnqvph59rr7g70qubb4t/ to /home/ec2-user/.gradle/wrapper/dists/gradle-4.9-bin/e9cinqnqvph59rr7g70qubb4t
Set executable permissions for: /home/ec2-user/.gradle/wrapper/dists/gradle-4.9-bin/e9cinqnqvph59rr7g70qubb4t/gradle-4.9/bin/gradle

Welcome to Gradle 4.9!

Here are the highlights of this release:
 - Experimental APIs for creating and configuring tasks lazily
 - Pass arguments to JavaExec via CLI
 - Auxiliary publication dependency support for multi-project builds
 - Improved dependency insight report

For more details see

Starting a Gradle Daemon (subsequent builds will be faster)

FAILURE: Build failed with an exception.

* Where:
Build file '/tmp/pip-req-build-lpcdwxb3/frontend/build.gradle' line: 28

* What went wrong:
A problem occurred evaluating root project 'frontend'.
> Could not open dsl remapped class cache for 4116txk5uyl7fsx1ml41gcwgu (/home/ec2-user/.gradle/caches/4.9/scripts-remapped/formatter_8g1qvyqrv7q4wxxey9do0wgaw/4116txk5uyl7fsx1ml41gcwgu/dsl1724d65be2ee623103b4c593e57fc0c5).
   > Could not open dsl generic class cache for script '/tmp/pip-req-build-lpcdwxb3/frontend/tools/gradle/formatter.gradle' (/home/ec2-user/.gradle/caches/4.9/scripts/4116txk5uyl7fsx1ml41gcwgu/dsl/dsl1724d65be2ee623103b4c593e57fc0c5).
      > com/google/googlejavaformat/java/Main : Unsupported major.minor version 52.0

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at

Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
Use '--warning-mode all' to show the individual deprecation warnings.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-lpcdwxb3/", line 160, in <module>
    license='Apache License Version 2.0'
  File "/home/ec2-user/environment/serve/serve-setup/venv/lib64/python3.6/dist-packages/setuptools/", line 143, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib64/python3.6/distutils/", line 148, in setup
  File "/usr/lib64/python3.6/distutils/", line 955, in run_commands
  File "/usr/lib64/python3.6/distutils/", line 974, in run_command
  File "/home/ec2-user/environment/serve/serve-setup/venv/lib64/python3.6/dist-packages/setuptools/command/", line 61, in run
  File "/usr/lib64/python3.6/distutils/command/", line 593, in run
  File "/usr/lib64/python3.6/distutils/", line 313, in run_command
  File "/usr/lib64/python3.6/distutils/", line 974, in run_command
  File "/usr/lib64/python3.6/distutils/command/", line 135, in run
  File "/usr/lib64/python3.6/distutils/", line 313, in run_command
  File "/usr/lib64/python3.6/distutils/", line 974, in run_command
  File "/tmp/pip-req-build-lpcdwxb3/", line 98, in run
  File "/usr/lib64/python3.6/distutils/", line 313, in run_command
  File "/usr/lib64/python3.6/distutils/", line 974, in run_command
  File "/tmp/pip-req-build-lpcdwxb3/", line 85, in run
    subprocess.check_call('frontend/gradlew -p frontend clean build', shell=True)
  File "/usr/lib64/python3.6/", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'frontend/gradlew -p frontend clean build' returned non-zero exit status 1.


Command "/home/ec2-user/environment/serve/serve-setup/venv/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-lpcdwxb3/';f=getattr(tokenize, 'open', open)(file);'\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-3aonh65v/install-record.txt --single-version-externally-managed --compile --install-headers /home/ec2-user/environment/serve/serve-setup/venv/include/site/python3.6/torchserve" failed with error code 1 in /tmp/pip-req-build-lpcdwxb3/
You are using pip version 18.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

text_classification example not working

The text_classification example is not working. Adding the model works. But on scaling the workers, one gets 500 error with 'failed to start workers".

After this error torchserve keeps trying to restart the workers and logs are flooded with errors, till one explicity scales back the workers for the model to 0.

Detailed error logs in console:
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - No module named 'text_classifier'
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process die.
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/", line 163, in
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server()
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/", line 141, in run_server
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket)
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/", line 105, in handle_connection
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/", line 83, in load_model
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/", line 107, in load
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - entry_point(None, service.context)
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/torch_handler/", line 79, in handle
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - raise e
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/torch_handler/", line 68, in handle
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - _service.initialize(context)
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/ts/torch_handler/", line 20, in initialize
2020-02-16 02:35:53,352 [INFO ] epollEventLoopGroup-4-30 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_STARTED
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.source_vocab = torch.load(self.manifest['model']['sourceVocab'])
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/torch/", line 525, in load
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - with _open_file_like(f, 'rb') as opened_file:
2020-02-16 02:35:53,352 [INFO ] W-9007-my_text_classifier_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/ubuntu/anaconda3/envs/serve/lib/python3.8/site-packages/torch/", line 212, in _open_file_like
2020-02-16 02:35:53,352 [DEBUG] W-9007-my_text_classifier_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
at java.util.concurrent.ArrayBlockingQueue.poll(
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

Long wait times for first request from TorchScript model

I have two identical models, one in code + weights, the other in TorchScript. Doing inference with TorchScript takes far, far longer, which is surprising.

The setup:

The non-TorchScript model is just the DenseNet-161 model archive from the quick start.

The TorchScript model is the same one, but exported to TorchScript thus:

import torch
import torchvision
d161 = torchvision.models.densenet161(pretrained=True)
tsd161 = torch.jit.script(d161)'')

It was then packaged with:

torch-model-archiver --model-name tsd161 --version 1.0 --serialized-file --handler image_classifier

The server is started with:

torchserve --start --model-store model_store --models densenet161=densenet161.mar tsd161=tsd161.mar

This is the timing output from calling the regular model:

time curl -X POST -T kitten.jpg
    "tiger_cat": 0.46933549642562866
    "tabby": 0.4633878469467163
    "Egyptian_cat": 0.06456148624420166
    "lynx": 0.0012828214094042778
    "plastic_bag": 0.00023323034110944718
curl -X POST -T kitten.jpg  0.01s user 0.01s system 2% cpu 0.428 total

And from the TorchScript:

time curl -X POST -T kitten.jpg
    "282": "0.46933549642562866"
    "281": "0.4633878469467163"
    "285": "0.06456148624420166"
    "287": "0.0012828214094042778"
    "728": "0.00023323034110944718"
]curl -X POST -T kitten.jpg  0.01s user 0.01s system 0% cpu 1:16.54 total

The identical output between the two (except for the human-readable labels) shows we're dealing with the same model in both instances.

I'm marking this launch blocking, at least until we understand what's happening.

Add integration tests into the CI build process for catching regression issues

Add basic integration tests into the CI build process for catching regression issues. Please ensure:

  1. Basic install on a fresh machine or docker sandbox takes place as part of the sanity testing.
  2. All API endpoints are verified for regression. It can be as simple as running a Postman script with all the endpoints and testing that models get deployed and inference is working for the examples bundled.

TorchServe startup fails to honor valid combinations of and command line

I have models in my model_store folder, and the line model_store=model_store in my file.

The following two command lines work, with the first starting no endpoints and the second starting workers for the model:

torchserve --start --ts-config
torchserve --start --ts-config --models d161good=d161good.mar  --model-store model_store

In the first one, it takes the model store location from config; in the second, it ignores that in favor of the command line.

This command line fails (error message in the second line):

> torchserve --start --ts-config --models d161good=d161good.mar
--model-store is required to load model locally.

Again the model store is specified, just not on the command line. TorchServe should check the config for missing parameters before rejecting a command line.

Update error message when attempting inference on model with 0 workers

On adding a model via the management-api, the default min/max workers for the model is set to 0. As a result when running prediction against the model after registering gives a 503 error with details as 'No worker is available to serve request: densenet161'. This will be confusing for user's trying to add models from the inference api.

Register model using:
curl -X POST "http://:/models?url=https://<s3_path>/densenet161.mar"
"status": "Model densenet161 registered"

Inference using:
curl -X POST http://:/predictions/densenet161 -T cutekit.jpeg
"code": 503,
"type": "ServiceUnavailableException",
"message": "No worker is available to serve request: densenet161"

Model details:
curl -X GET http://:/models/densenet161
"modelName": "densenet161",
"modelVersion": "1.0",
"modelUrl": "https:///densenet161.mar",
"runtime": "python",
"minWorkers": 0,
"maxWorkers": 0,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": false,
"workers": []

docker build failing on line 27

Docker build failing on line 27.

Error details:

Step 9/24 : ADD serve serve
ADD failed: stat /var/lib/docker/tmp/docker-builder277367370/serve: no such file or directory

Undocumented and legacy endpoints

Calling curl -X OPTIONS http://localhost:8080 (not 8443 as the docs originally had it) gives an API description for inference that has a number of undocumented endpoints, including /predict (which it says is "A legacy predict entry point for each model"), /invoke (not clear what this is supposed to do if it's not prediction).

I'm not opposed to having /predict be an alias for /predictions, but in that case, we should probably have it described as an alias and not legacy cruft. Or we could just cut it. Similarly, /invoke and related endpoints should probably be documented or removed from the API description.

Java8 install steps using brew for MacOs does not work anymore

Java8 installation steps using brew for MacOs not working anymore. Please provide updated steps, It will be better to use OpenJDK instead for future proofing.

Following worked for me on Mac OS Catalina (10.15.1)

brew tap AdoptOpenJDK/openjdk
brew cask install adoptopenjdk8

Verify using:
java --version

If you get security error for installing from unverified source, add exception to the security settings from Settings-->Security & Privacy--> General -- "Allow apps download from"

NOTE: Torchserve install failed when using latest openjdk13 (gradle error), things worked for me only for jdk8. so please verify the supported version as well.

TorchServe fails to start multiple workers threads on multiple GPUs with large model.

On a c5.12xlarge instance, I was able to run 16 instances of the FairSeq English-to-German translation model, all simultaneously running translations. This model's weights take up about 2.5GB on disk (though its resident footprint in memory seems smaller).

Attempting a similar feat on a p3.8xlarge turned out to be impossible. I could get a single instance running, but if I attempted to get even 4 workers running, they crash repeatedly with OOMEs:

2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.6
2020-02-26 02:46:34,454 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-fairseq_model_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9001
2020-02-26 02:46:34,455 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /tmp/.ts.sock.9001.
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process die.
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/", line 163, in <module>
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/", line 141, in run_server
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/", line 105, in handle_connection
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/", line 83, in load_model
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/", line 107, in load
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     entry_point(None, service.context)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/", line 120, in handle
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     raise e
2020-02-26 02:46:38,735 [INFO ] epollEventLoopGroup-4-15 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/", line 109, in handle
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     _service.initialize(context)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/", line 73, in initialize
2020-02-26 02:46:38,735 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
        at java.util.concurrent.ArrayBlockingQueue.poll(
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     state_dict = torch.load(model_pt_path, map_location=self.device)
2020-02-26 02:46:38,735 [WARN ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: fairseq_model, error: Worker died.
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 529, in load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
2020-02-26 02:46:38,735 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-fairseq_model_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 702, in _legacy_load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     result = unpickler.load()
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 665, in persistent_load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     deserialized_objects[root_key] = restore_location(obj, location)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 21 seconds.
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 740, in restore_location
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return default_restore_location(storage, str(map_location))
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 156, in default_restore_location
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     result = fn(storage, location)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/", line 136, in _cuda_deserialize
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return storage_type(obj.size())
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/cuda/", line 480, in _lazy_new
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)

On digging through the logs, it appears that it's attempting to start all workers on the same GPU. The following is the output of grep GPU ts_log.log:

2020-02-26 02:45:13,375 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 3.03 GiB already allocated; 19.12 MiB free; 3.03 GiB reserved in total by PyTorch)
2020-02-26 02:45:13,375 [INFO ] W-9000-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 3.16 GiB already allocated; 19.12 MiB free; 3.16 GiB reserved in total by PyTorch)
2020-02-26 02:45:13,383 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.60 GiB already allocated; 19.12 MiB free; 2.60 GiB reserved in total by PyTorch)
2020-02-26 02:45:26,382 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.75 GiB total capacity; 3.05 GiB already allocated; 19.12 MiB free; 3.05 GiB reserved in total by PyTorch)
2020-02-26 02:45:26,519 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.75 GiB total capacity; 2.47 GiB already allocated; 19.12 MiB free; 2.47 GiB reserved in total by PyTorch)
2020-02-26 02:45:38,899 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.89 GiB already allocated; 7.12 MiB free; 2.89 GiB reserved in total by PyTorch)
2020-02-26 02:45:38,904 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.65 GiB already allocated; 7.12 MiB free; 2.65 GiB reserved in total by PyTorch)
2020-02-26 02:45:52,201 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.75 GiB total capacity; 3.05 GiB already allocated; 19.12 MiB free; 3.05 GiB reserved in total by PyTorch)
2020-02-26 02:45:59,593 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:08,962 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:21,358 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)

Note that multiple workers (W-9000, W-9001, W-9003) are shown, but only one GPU turns up (GPU 0). The p3.8xlarge has 4 GPUs.

I attempted to use arguments of the Management API, such as number_gpu=4, to fix this, but nothing worked. Same result every time.

Can't load two versions of same model from config file

I attempted to start a model server with two versions of the same model specified in the config options. Both model archives had a model name of "d161", with one of them having version number 1.0 and the other version 1.1. The two files were named d161_1_0.mar and d161_1_1.mar and they were both in the model store directory. My contained the line:


The expected outcome is that I should have both model versions available.

The actual outcome is that only the later one is loaded, according to the output of curl

  "models": [
      "modelName": "d161",
      "modelUrl": "d161_1_1.mar"

