roboflow / inference Goto Github PK

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.

Home Page: https://inference.roboflow.com

License: Other

Python 96.77% Makefile 0.10% Roff 0.19% Shell 0.40% JavaScript 0.01% HTML 1.44% TypeScript 0.77% CSS 0.04% Slim 0.20% Smarty 0.08%

computer-vision inference-api inference-server vit yolact yolov5 yolov7 yolov8 jetson tensorrt

inference's Introduction

notebooks | supervision | autodistill | maestro

👋 hello

Roboflow Inference is an open-source platform designed to simplify the deployment of computer vision models. It enables developers to perform object detection, classification, and instance segmentation and utilize foundation models like CLIP, Segment Anything, and YOLO-World through a Python-native package, a self-hosted inference server, or a fully managed API.

Explore our enterprise options for advanced features like server deployment, device management, active learning, and commercial licenses for YOLOv5 and YOLOv8.

💻 install

Inference package requires Python>=3.8,<=3.11. Click here to learn more about running Inference inside Docker.

pip install inference

👉 additional considerations

hardware

Enhance model performance in GPU-accelerated environments by installing CUDA-compatible dependencies.
```
pip install inference-gpu
```
models

The inference and inference-gpu packages install only the minimal shared dependencies. Install model-specific dependencies to ensure code compatibility and license compliance. Learn more about the models supported by Inference.
```
pip install inference[yolo-world]
```

🔥 quickstart

Use Inference SDK to run models locally with just a few lines of code. The image input can be a URL, a numpy array (BGR), or a PIL image.

from inference import get_model

model = get_model(model_id="yolov8n-640")

results = model.infer("https://media.roboflow.com/inference/people-walking.jpg")

👉 roboflow models

Set up your ROBOFLOW_API_KEY to access thousands of fine-tuned models shared by the Roboflow Universe community and your custom model. Navigate to 🔑 keys section to learn more.

from inference import get_model

model = get_model(model_id="soccer-players-5fuqs/1")

results = model.infer(
    image="https://media.roboflow.com/inference/soccer.jpg",
    confidence=0.5,
    iou_threshold=0.5
)

👉 foundational models

CLIP Embeddings - generate text and image embeddings that you can use for zero-shot classification or assessing image similarity.

from inference.models import Clip

model = Clip()

embeddings_text = clip.embed_text("a football match")
embeddings_image = model.embed_image("https://media.roboflow.com/inference/soccer.jpg")

Segment Anything - segment all objects visible in the image or only those associated with selected points or boxes.

from inference.models import SegmentAnything

model = SegmentAnything()

result = model.segment_image("https://media.roboflow.com/inference/soccer.jpg")

YOLO-World - an almost real-time zero-shot detector that enables the detection of any objects without any training.

from inference.models import YOLOWorld

model = YOLOWorld(model_id="yolo_world/l")

result = model.infer(
    image="https://media.roboflow.com/inference/dog.jpeg",
    text=["person", "backpack", "dog", "eye", "nose", "ear", "tongue"],
    confidence=0.03
)

📟 inference server

deploy server

The inference server is distributed via Docker. Behind the scenes, inference will download and run the image that is appropriate for your hardware. Here, you can learn more about the supported images.
```
inference server start
```
run client

Consume inference server predictions using the HTTP client available in the Inference SDK.
```
from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key=<ROBOFLOW_API_KEY>
)
with client.use_model(model_id="soccer-players-5fuqs/1"):
    predictions = client.infer("https://media.roboflow.com/inference/soccer.jpg")
```
If you're using the hosted API, change the local API URL to https://detect.roboflow.com. Accessing the hosted inference server and/or using any of the fine-tuned models require a ROBOFLOW_API_KEY. For further information, visit the 🔑 keys section.

🎥 inference pipeline

The inference pipeline is an efficient method for processing static video files and streams. Select a model, define the video source, and set a callback action. You can choose from predefined callbacks that allow you to display results on the screen or save them to a file.

from inference import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes

pipeline = InferencePipeline.init(
    model_id="yolov8x-1280",
    video_reference="https://media.roboflow.com/inference/people-walking.mp4",
    on_prediction=render_boxes
)

pipeline.start()
pipeline.join()

🔑 keys

Inference enables the deployment of a wide range of pre-trained and foundational models without an API key. To access thousands of fine-tuned models shared by the Roboflow Universe community, configure your API key.

export ROBOFLOW_API_KEY=<YOUR_API_KEY>

📚 documentation

Visit our documentation to explore comprehensive guides, detailed API references, and a wide array of tutorials designed to help you harness the full potential of the Inference package.

© license

The Roboflow Inference code is distributed under the Apache 2.0 license. However, each supported model is subject to its licensing. Detailed information on each model's license can be found here.

🏆 contribution

We would love your input to improve Roboflow Inference! Please see our contributing guide to get started. Thank you to all of our contributors! 🙏

inference's People

Contributors

Stargazers

Watchers

inference's Issues

Include `class_id` and `class_list` in inference response

Adding the class_id value to the inference response would help simplify the API between supervision and inference. sv.Detections requires int value representing class_id. Currently, we force users to provide class_list as one of the arguments of sv.Detections.from_roboflow method to perform the mapping. We could drop that requirement if class_id was provided.

sv.Detections.from_roboflow is the only connector that requires an additional class_list argument. All other connectors: from_detectron2, from_mmdetection, from_paddledet, from_sam, from_transformers, from_ultralytics, from_yolo_nas, from_yolov5 and from_yolov8 can retrieve class_id from inference result object. It is a common standard in computer vision.

Before

>>> import supervision as sv

>>> roboflow_result = {
...     "predictions": [
...         {
...             "x": 0.5,
...             "y": 0.5,
...             "width": 0.2,
...             "height": 0.3,
...             "class": "person",
...             "confidence": 0.9
...         },
...         # ... more predictions ...
...     ]
... }
>>> class_list = ["person", "car", "dog"]

>>> detections = sv.Detections.from_roboflow(roboflow_result, class_list)

After

>>> import supervision as sv

>>> roboflow_result = {
...     "predictions": [
...         {
...             "x": 0.5,
...             "y": 0.5,
...             "width": 0.2,
...             "height": 0.3,
...             "class": "person",
...             "class_id": 0,
...             "confidence": 0.9
...         },
...         # ... more predictions ...
...     ]
... }

>>> detections = sv.Detections.from_roboflow(roboflow_result)

🙋🏻 I would love to work on the implementation of this feature.

GPG Keys not avaliable

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I am trying to build a Docker image locally, from the Inference Server repository.
facing an error during the build process.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 http://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]
Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Err:1 http://deb.debian.org/debian bookworm InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY 6ED0E7B82643E131 NO_PUBKEY F8D2585B8783D481
Err:2 http://deb.debian.org/debian bookworm-updates InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY 6ED0E7B82643E131
Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 54404762BBB6E853 NO_PUBKEY BDE6D2B9216EC7A8
Reading package lists...
W: GPG error: http://deb.debian.org/debian bookworm InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY 6ED0E7B82643E131 NO_PUBKEY F8D2585B8783D481
E: The repository 'http://deb.debian.org/debian bookworm InRelease' is not signed.
W: GPG error: http://deb.debian.org/debian bookworm-updates InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0E98404D386FA1D9 NO_PUBKEY 6ED0E7B82643E131
E: The repository 'http://deb.debian.org/debian bookworm-updates InRelease' is not signed.
W: GPG error: http://deb.debian.org/debian-security bookworm-security InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 54404762BBB6E853 NO_PUBKEY BDE6D2B9216EC7A8
E: The repository 'http://deb.debian.org/debian-security bookworm-security InRelease' is not signed.
E: Problem executing scripts APT::Update::Post-Invoke 'rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true'
E: Sub-process returned an error code
The command '/bin/sh -c apt update -y && apt install -y     ffmpeg     libxext6     libopencv-dev     uvicorn     python3-pip     git     libgdal-dev     && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

Environment

OS: Fedora
Docker: version 1.13.1, build 7d71120/1.13.1

Minimal Reproducible Example

No response

Additional

i believe issue will be solved if we use python bullseye or any particular latest version of Debian.
its just a hunch

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

inference.core.exceptions.ModelArtefactError: Unable to load ONNX session. Cause: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

I apologize for any confusion my explanation may cause. I’m a beginner and I need to use roboflow inference to complete my project. There’s a problem that needs to be solved.

Here’s the situation:
When using the get_model() function to get the model, an error occurred, as follows:

When getting the model (model_id: kitchenfire/1), everything went smoothly;
When getting the model (model_id: chinese-calligraphy-styles/1) and the model (model_id: chinese-calligraphy-recognition-sl0eb/2), the following error message was displayed in the python terminal:

C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider'
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 713, in initialize_model
    self.onnx_session = onnxruntime.InferenceSession(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__   
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 424, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\OTHERs\Windows Shortcut Folder\Documents\Code Projects\VS Code Projects\CCRS\test.py", line 4, in <module>
    model = inference.get_model("chinese-calligraphy-recognition-sl0eb/2", api_key=__ApiKey)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\models\utils.py", line 195, in get_model
    return ROBOFLOW_MODEL_TYPES[(task, model)](model_id, api_key=api_key, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\models\vit\vit_classification.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\classification_base.py", line 40, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 612, in __init__
    self.initialize_model()
  File "C:\Users\Sui\anaconda3\envs\CCRS_ps-pms-if\Lib\site-packages\inference\core\models\roboflow.py", line 720, in initialize_model
    raise ModelArtefactError(
inference.core.exceptions.ModelArtefactError: Unable to load ONNX session. Cause: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/cache\chinese-calligraphy-recognition-sl0eb/2\best.onnx failed:Protobuf parsing failed.

Additional

No response

i can't find inference、inference-gpu 、inference-sdk in pip

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

or conda

Additional

No response

Inference on video is extremely slow

Hi,
Ran the code as metioned in the official [docs] (https://inference.roboflow.com/quickstart/run_model_on_rtsp_webcam/#installation)
But the output seems to be in a very slow frame rate.
any advice?

# Import the InferencePipeline object
from inference import InferencePipeline
# Import the built in render_boxes sink for visualizing results
from inference.core.interfaces.stream.sinks import render_boxes

# initialize a pipeline object
pipeline = InferencePipeline.init(
    model_id="rock-paper-scissors-sxsw/11", # Roboflow model to use
    video_reference=0, # Path to video, device id (int, usually 0 for built in webcams), or RTSP stream url
    on_prediction=render_boxes, # Function to run after each prediction
)
pipeline.start()
pipeline.join()

Update description for workflows steps

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

Update workflows steps descriptions - start with verbs in these sentences, noting the action each block will take.

Use case

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Batch Inference

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

Is batch inference possible? For models that support it? I am currently using Triton Server directly, which supports arrays of inputs, but I'd like to try to use Inference Server, which seems to be a level of abstraction higher (specially once you support custom, non Roboflow weights). But I got significant performance improvements from infering in batches (yolov8) vs infering an image at a time.

Additional

No response

orjson conversion is faulty

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

JSON serialisation introduced in #166 is faulty when used against requests to /v1 endpoints that require visualisation - due to presence of field visualization with binary content in response

Traceback (most recent call last):
  File "/app/inference/core/interfaces/http/http_api.py", line 133, in wrapped_route
    return await route(*args, **kwargs)
  File "/app/inference/core/interfaces/http/http_api.py", line 487, in infer_object_detection
    return await process_inference_request(
  File "/app/inference/core/interfaces/http/http_api.py", line 287, in process_inference_request
    return orjson_response(resp)
  File "/app/inference/core/interfaces/http/http_api.py", line 115, in orjson_response
    return ORJSONResponse(content=content)
  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 196, in __init__
    super().__init__(content, status_code, headers, media_type, background)
  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 55, in __init__
    self.body = self.render(content)
  File "/usr/local/lib/python3.9/site-packages/fastapi/responses.py", line 34, in render
    return orjson.dumps(
TypeError: Type is not JSON serializable: bytes
INFO:     192.168.65.1:19999 - "POST /infer/object_detection HTTP/1.1" 500 Internal Server Error

Environment

No response

Minimal Reproducible Example

run HTTP container from main

from inference_sdk import InferenceHTTPClient, InferenceConfiguration, VisualisationResponseFormat

LOCALHOST_CLIENT = InferenceHTTPClient(
    api_url="http://127.0.0.1:9001",
    api_key=API_KEYS["MY_PRIVATE_WORKSPACE_API_KEY"],
)
config = InferenceConfiguration(
    visualize_predictions=True,
)

with LOCALHOST_CLIENT.use_configuration(config):
    r = LOCALHOST_CLIENT.infer(
        DATASET["asl"][0],
        model_id="barbel-detection/2",
    )

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Hello, how does the local computer run to obtain the prediction label and detect the bbox coordinates?

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

Hello, how does the local computer run to obtain the prediction label and detect the bbox coordinates?
The following is my code, please take a look at it for me, thank you very much (I am running on a local computer, do not use API KEY)::

from inference import InferencePipeline
from inference.core.interfaces.stream.sinks import VideoFileSink

video_sink = VideoFileSink.init(video_file_name="output.avi")

pipeline = InferencePipeline.init_with_yolo_world(
    video_reference="./video_1.mp4",
    classes=["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
             "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
             "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
             "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
             "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
             "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "couch",
             "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard",
             "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
             "teddy bear", "hair drier", "toothbrush", "faucet", "Trash can", "chair", "Refrigerator"],
    model_size="s",
    on_prediction=video_sink.on_prediction,
)
# start the pipeline
pipeline.start()
# wait for the pipeline to finish
pipeline.join()

video_sink.release()

Additional

No response

How can I use custom weights with Yolo-World Model?

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

The model performance of the Yolo World Hugging Face model (https://huggingface.co/spaces/stevengrove/YOLO-World) is better for my purpose than the standard inference one "yolo_world/v2-x" for example.

Because my lack of domain knowledge regarding this topics I can't make the HF model work on my Mac (not an M processor) so I would like to be able to use the model weights with the abstraction from the Roboflow inference library.

Is there any way to use the author's provided weights together with the inference library?

From the author's instruction on how to run the model on an image one needs the configuration file and the weights: https://github.com/AILab-CVC/YOLO-World

The conf file I want to use and the weights are available here:
https://huggingface.co/spaces/stevengrove/YOLO-World/blob/main/configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py
https://huggingface.co/spaces/stevengrove/YOLO-World/blob/main/yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth

Thank you in advance

Additional

No response

Inference Server on NVIDIA Jetson with JP5.1.1 and TensorRT support

Hello,

I have a Jetson Orin NX 16GB running JetPack 5.1.1 and want to run Roboflow open-source inference server on the device.

I am able to successfully do this using the docker command which utilizes TensorRT:

docker run --privileged --net=host --runtime=nvidia roboflow/roboflow-inference-server-trt-jetson-5.1.1:latest

But, how can I also utilize TensorRT with the open-source inference server running on the Jetson device?

Here I have installed the inference server via pip install inference-gpu on the Jetson.

Best Regards,
Lakshantha

Add error status code to benchmark output

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

Adding status code to benchmark output would help to distinguish between API gateway rate limiting and actual errors (i.e. 500)

Use case

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Error Running cogvlm model on Self-Hosted GPU Server with Roboflow Inference (Transformer Version)

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I'm encountering an issue while attempting to deploy the cogvlm model on my own GPU server using Roboflow inference code. The server setup seems to be correct, but when I try to run the model, I run into the following error:

  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
    ):
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3926, in _load_pretrained_model
    model._is_quantized_training_enabled = True
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 802, in _load_state_dict_into_meta_model
    state_dict_index = offload_weight(param, param_name, state_dict_folder, state_dict_index)
  File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 124, in check_quantized_param
KeyError: 'inv_freq'
INFO:     172.17.0.1:45116 - "POST /llm/cogvlm HTTP/1.1" 500 Internal Server Error

Upon further investigation and based on this GitHub issue (THUDM/CogVLM#396), it's recommended to downgrade the transformers library to version 4.37 due to compatibility issues. However, the current deployment is using version 4.38. Could you please confirm if the transformers version could be the source of this issue and if downgrading would be appropriate? Any other insights or suggestions would also be greatly appreciated.
Thank you!

Environment

inference 0.9.20
inference-cli 0.9.20
inference-gpu 0.9.20
inference-sdk 0.9.20

x86-gpu(rtx3090)

Minimal Reproducible Example

cog-vlm-client$ python script.py --image "data/tire.jpg" --prompt "read serial number from tire"

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Inference SDK missing requirement when running tests

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

_ ERROR collecting tests/inference_sdk/unit_tests/http/test_client.py _
ImportError while importing test module '/home/perovskite/Development/professional/open-source/roboflow/inference-dev/inference/tests/inference_sdk/unit_tests/http/test_client.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../../mambaforge/envs/inference-dev/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
inference/tests/inference_sdk/unit_tests/http/test_client.py:11: in <module>
    from requests_mock.mocker import Mocker
E   ModuleNotFoundError: No module named 'requests_mock'

_ ERROR collecting tests/inference_sdk/unit_tests/http/utils/test_executors.py _
ImportError while importing test module '/home/perovskite/Development/professional/open-source/roboflow/inference-dev/inference/tests/inference_sdk/unit_tests/http/utils/test_executors.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../../mambaforge/envs/inference-dev/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
inference/tests/inference_sdk/unit_tests/http/utils/test_executors.py:9: in <module>
    from requests_mock import Mocker
E   ModuleNotFoundError: No module named 'requests_mock'

Environment

Inference: 0.9.14rc2 (dev install from main)
OS: arch linux
Device: macbook pro
Python: 3.11.8
mamba virtual env

Minimal Reproducible Example

I am using a mamba environment with the provided minimal environment file below in the code section. Any virtual environment will work, I'm using mamba as it is one of the tools I use most often.

I wanted to see the state of tests for each of the inference packages in the repo, so I was selectively installing the requirements using pip in my mamba environment with the following command.

# SDK requirements
pip install -r requirements/requirements.sdk.http.txt

After installation completed, I ran tests using the following command. I created an HTML coverage output so I could inspect tests that I might be able to contribute to.

pytest --cov-report html --cov=inference_sdk/ tests/inference_sdk/

The error show above was thrown, indicating that requests-mock was missing in the environment. I'm not sure if each requirement file is supposed to be self contained for running the different packages in the repo, but I did make this assumption and I couldn't easily see in the documentation a spot where it said this was not the case. So, I may have made a terrible assumption...

Nonetheless, I inspected the requirements.sdk.http.txt file and saw that requests-mock was not in the list. After adding it, and rerunning the tests, the tests completed successfully. If each requirements file is supposed to be self contained for a specific package in the repo, then requests-mock is missing as a requirement. If not, then I need to read the documentation more closely to figure out the correct order of operations for installing inference in a development environment.

code

# environment.yaml
name: inference-dev
channels:
  - conda-forge
dependencies:
  - python >3.8,<3.12

  # Dependencies

  # Package managers
  - pip

  # Development dependencies
  - black
  - flake8
  - isort
  - jupyterlab
  - pip:
    - pyre-check

  # Debugging dependencies
  - ipdb
  - pytest
  - pytest-asyncio
  - pytest-cov

  # Documents dependencies
  - mkdocs
  - mkdocstrings
  - mkdocstrings-python

Additional

During my inspection of the requirements.sdk.http.txt file I noticed that requests is listed twice. I figured this out by sorting the requirements.

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

H100 Support

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

Currently the version of CUDA we use in our GPU base images doesn't support the sm_90 compute environment which the H100 needs. I think it needs at least CUDA 11.8 and our base image is 11.7 (FROM nvcr.io/nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04).

Sidenote: Latest is 12.3; may be good to try to get all the way up there if possible while maintaining backwards compatibility to pull in bug fixes & performance improvements.

Use case

People with H100s.

Additional

NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

'TypeError: quote_from_bytes() expected' bytes when loading model using get_roboflow_model

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I just follow the step in https://github.com/roboflow/inference#single-image-inference

But it rasied an Error in this line
----> 1 model = get_roboflow_model(

Environment

Python 3.10.11
inference==0.9.5

Minimal Reproducible Example

In [1]: from inference.models.utils import get_roboflow_model
   ...: 

In [2]: model = get_roboflow_model(
   ...:     model_id="soccer-players-5fuqs/1"
   ...: )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = get_roboflow_model(
      2     model_id="soccer-players-5fuqs/1"
      3 )

File /opt/conda/lib/python3.10/site-packages/inference/models/utils.py:164, in get_roboflow_model(model_id, api_key, **kwargs)
    163 def get_roboflow_model(model_id, api_key=API_KEY, **kwargs):
--> 164     task, model = get_model_type(model_id, api_key=api_key)
    165     return ROBOFLOW_MODEL_TYPES[(task, model)](model_id, api_key=api_key, **kwargs)

File /opt/conda/lib/python3.10/site-packages/inference/core/registries/roboflow.py:80, in get_model_type(model_id, api_key)
     78 if cached_metadata is not None:
     79     return cached_metadata[0], cached_metadata[1]
---> 80 workspace_id = get_roboflow_workspace(api_key=api_key)
     81 project_task_type = get_roboflow_dataset_type(
     82     api_key=api_key, workspace_id=workspace_id, dataset_id=dataset_id
     83 )
     84 if version_id == STUB_VERSION_ID:

File /opt/conda/lib/python3.10/site-packages/inference/core/roboflow_api.py:74, in wrap_roboflow_api_errors.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     72 def wrapper(*args, **kwargs) -> Any:
     73     try:
---> 74         return function(*args, **kwargs)
     75     except (requests.exceptions.ConnectionError, ConnectionError) as error:
     76         logger.error(f"Could not connect to Roboflow API. Error: {error}")

File /opt/conda/lib/python3.10/site-packages/inference/core/roboflow_api.py:110, in get_roboflow_workspace(api_key)
    108 @wrap_roboflow_api_errors()
    109 def get_roboflow_workspace(api_key: str) -> WorkspaceID:
--> 110     api_url = _add_params_to_url(
    111         url=API_BASE_URL,
    112         params=[("api_key", api_key), ("nocache", "true")],
    113     )
    114     api_key_info = _get_from_url(url=api_url)
    115     workspace_id = api_key_info.get("workspace")

File /opt/conda/lib/python3.10/site-packages/inference/core/roboflow_api.py:389, in _add_params_to_url(url, params)
    387 if len(params) == 0:
    388     return url
--> 389 params_chunks = [
    390     f"{name}={urllib.parse.quote_plus(value)}" for name, value in params
    391 ]
    392 parameters_string = "&".join(params_chunks)
    393 return f"{url}?{parameters_string}"

File /opt/conda/lib/python3.10/site-packages/inference/core/roboflow_api.py:390, in <listcomp>(.0)
    387 if len(params) == 0:
    388     return url
    389 params_chunks = [
--> 390     f"{name}={urllib.parse.quote_plus(value)}" for name, value in params
    391 ]
    392 parameters_string = "&".join(params_chunks)
    393 return f"{url}?{parameters_string}"

File /opt/conda/lib/python3.10/urllib/parse.py:885, in quote_plus(string, safe, encoding, errors)
    883 else:
    884     space = b' '
--> 885 string = quote(string, safe + space, encoding, errors)
    886 return string.replace(' ', '+')

File /opt/conda/lib/python3.10/urllib/parse.py:869, in quote(string, safe, encoding, errors)
    867     if errors is not None:
    868         raise TypeError("quote() doesn't support 'errors' for bytes")
--> 869 return quote_from_bytes(string, safe)

File /opt/conda/lib/python3.10/urllib/parse.py:894, in quote_from_bytes(bs, safe)
    889 """Like quote(), but accepts a bytes object rather than a str, and does
    890 not perform string-to-bytes encoding.  It always returns an ASCII string.
    891 quote_from_bytes(b'abc def\x3f') -> 'abc%20def%3f'
    892 """
    893 if not isinstance(bs, (bytes, bytearray)):
--> 894     raise TypeError("quote_from_bytes() expected bytes")
    895 if not bs:
    896     return ''

TypeError: quote_from_bytes() expected bytes

In [3]:

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

CLI Should Work Without Docker

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I installed inference-gpu on a fresh VM (inside a docker container). I want to use the CLI to run the benchmark. But it's complaining that the docker daemon is not running.

inference_cli.lib.exceptions.DockerConnectionErrorException: Error connecting to Docker daemon. Is docker installed and running? See https://www.docker.com/get-started/ for installation instructions.

I understand I'd need that for commands like inference server start but I shouldn't need it to do other things with the CLI.

Environment

[email protected]:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
[email protected]:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

pip install inference-gpu
inference

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

NotImplementedError

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

When I was running this code (main/examples/sam-client/app.py), this error occurred, what could be the cause? I set --image_path to "./cat.jpg"
Traceback (most recent call last):
File "/home/z/deeplearning/roboflow/test3.py", line 42, in
inference_results = model.infer(image_path)
File "/home/z/anaconda3/envs/inference1/lib/python3.10/site-packages/inference/core/models/base.py", line 23, in infer
preproc_image, returned_metadata = self.preprocess(image, **kwargs)
File "/home/z/anaconda3/envs/inference1/lib/python3.10/site-packages/inference/core/models/base.py", line 35, in preprocess
raise NotImplementedError
NotImplementedError

Additional

No response

Can add, delete, update stream realtime for only one pipeline has been run before?

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

Hello everyone,
I'm very interested in Roboflow Inference. I'm wondering, with the current InferencePipeline, is it possible to add, edit, and delete the number of inputs for just one pipeline?
For example, AI Model has a max_batch_size of 8
Initially I initialize:
pipeline = InferencePipeline(...) with only one RTSP stream
pipeline.start()

Can I add an RTSP stream to the pipeline running above? For example:
pipeline.add_stream("rtsp://...")
Does adding not affect previously running threads?

Can anyone help me answer if InferencePipeline currently has the ability to do that?
And in your opinion, add_stream to a running pipeline is better than having multiple copies of the pipeline running at the same time?
Thank!

Additional

No response

Same structure for API and library results

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

The API (http) inference and the library (pip install) inference models have different result structures.
It would be neat if they both return results with the structure of the API inference. This way the approaches are exchangeable and, more important, the results of the library inference can be used with supervision, as the API inference results.

Use case

Let's take the quickstart example as usecase.
If you run the examples (ready to run scripts further down below), the following results are returned.

The API inference result:

{
   "time":0.2851218069990864,
   "image":{
      "width":398,
      "height":224
   },
   "predictions":[
      {
         "x":5.5,
         "y":152.0,
         "width":11.0,
         "height":30.0,
         "confidence":0.9074968099594116,
         "class":"player",
         "class_id":1
      },
      {
         "x":145.0,
         "y":96.0,
         "width":14.0,
         "height":24.0,
         "confidence":0.8891444206237793,
         "class":"player",
         "class_id":1
      },
      ...
   ]
}

The library inference result:

[[
[0.0, 137.0, 11.0, 167.0, 0.9075440764427185, 0.8860160708427429, 1.0], 
[138.0, 84.0, 152.0, 108.0, 0.8891727924346924, 0.9700851440429688, 1.0], 
[17.0, 77.0, 33.0, 102.0, 0.8874990940093994, 0.9652542471885681, 1.0], 
[305.0, 162.0, 321.0, 194.0, 0.8796935081481934, 0.9643577337265015, 1.0], 
...
]]

The API result can be easily processed with supervision, while the library result needs a different logic.
For a bounding box the logic is fairly simple but for polygons (instance segmentation) not necessarily. Also, it would be a lot more convenient to have the library results also integrated with supervision.

Please let me know if the library results can be converted into a supervision compatible format. I just couldn't find anything. From the documentation, it was not clear to me how to reformat the instance segmentation results for usage with supervision.

Thanks for the step of open sourcing 'inference'!
If I can help with the implementation, I'm happy to help, but might need some guidance/feedback.

Additional

Examples ready to run

API Example

import requests
import cv2
import supervision as sv
from roboflow import Roboflow
import urllib
import numpy as np

ROBOFLOW_API_KEY = "YOUR_API_KEY"

# Roboflow quickstart example

dataset_id = "soccer-players-5fuqs"
version_id = "1"
image_url = (
    "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
)

api_key = ROBOFLOW_API_KEY
confidence = 0.5

url = f"http://localhost:9001/{dataset_id}/{version_id}"

params = {
    "api_key": api_key,
    "confidence": confidence,
    "image": image_url,
}

res = requests.post(url, params=params)
print(res.json())

# Roboflow visualization with supervision

req = urllib.request.urlopen(image_url)
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
img = cv2.imdecode(arr, -1)

rf = Roboflow(api_key=api_key)
project = rf.workspace().project("soccer-players-5fuqs")
model = project.version(1).model
annotator = sv.BoxAnnotator()

class_list = ["player", "referee", "football"]
detections = sv.Detections.from_roboflow(res.json(), class_list)
labels = [
    f"{class_list[class_id]} {confidence_value:0.2f}"
    for _, _, confidence_value, class_id, _ in detections
]
annotated_frame = annotator.annotate(
    scene=img.copy(), detections=detections, labels=labels
)
cv2.imwrite("frame.jpg", img)
cv2.imwrite("annotated_frame.jpg", annotated_frame)

Library Example

from inference.models.utils import get_roboflow_model

ROBOFLOW_API_KEY = "YOUR_API_KEY"

# Roboflow quickstart example

model = get_roboflow_model(
    model_id="soccer-players-5fuqs/1",
    # Replace ROBOFLOW_API_KEY with your Roboflow API Key
    api_key=ROBOFLOW_API_KEY,
)

results = model.infer(
    image="https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg",
    confidence=0.5,
    iou_threshold=0.5,
)

print(results)

# Roboflow visualization with supervision

# Here a custom logic will be needed for visualizing the results.
# Bboxes can be done fairly easy and the format can be inferred from the result itself, but not necessarily for polygons / instance segmentation

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Is there a way to use it in colab

Is there a notebook available for this?

How can I get predicted label and the coordinates of the detection bbox ?

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

I reference this blog to build and run the docker image in Jetson. Then I reference this repo to implement webcam inference. It can run successfully.
But requests.post() function simply returns an image that draws the bboxes. How can I get predicted label and the coordinates of the detection bbox ?

Additional

No response

Poetry-installing `inference` with `roboflow` takes 26 - 38 minutes.

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

Starting with an empty project, poetry, python 3.10 environment, I ran:

poetry init -n
poetry shell
poetry add notebook
poetry add roboflow
poetry add inference

This was enough to make dependency resolution last 2319.0s.

Subsequently run of poetry add tqdm took 213.2s, following which I got "No dependencies to install or update".

Adding ipywidgets took another 220.5s.

Whether it's an issue with inference, poetry or roboflow or their dependencies, the end result is a very miserable dev experience.

Environment

Inference 0.9.14
Roboflow 1.1.20
Ubuntu 22.04.4 LTS
Alienware 15 r3
Python 3.10.12
Pip 23.3.2
Poetry 1.3.1

Minimal Reproducible Example

poetry init -n
poetry shell
poetry add roboflow
poetry add inference
# Took 1578.6s

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Run Multiple Models all together.

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

Hi, how can I run multiple models together in following code? The model are public models on roboflow, not created by me.

Additional

No response

yolov8-cls outputs differ

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I could not reproduce ultralytics behaviour on exported onnx model - seems like treating the model as multi-class leads to applying softmax at already softmaxed predictions + I am getting miss predictions even if this additional softmax is removed
tested against this image: https://3.bp.blogspot.com/-W__wiaHUjwI/Vt3Grd8df0I/AAAAAAAAA78/7xqUNj8ujtY/s1600/image02.png

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

inference --version

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

Would be great if inference --version told you what version you had installed.

Use case

I'm in a remote env and want to know what I'm working with.

Additional

Currently:

[email protected]:~$ inference --version
Usage: inference [OPTIONS] COMMAND [ARGS]...
Try 'inference --help' for help.
╭─ Error ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ No such option: --version                                                                                                                                                                                                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[email protected]:~$

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Video and webcam inference

Hello,

I want to know how to use the open-source inference server to inference on video inputs and webcam inputs.

I saw there is a repo for video inference here:
https://github.com/roboflow/inference-client

However, that repo is only suited if you are running the inference server via Docker.

So how can we do video and webcam inputs after installing the inference server via pip?

inference infer should warn that it needs the Docker running

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I tried to run inference via the CLI & it looks like it requires me to have the Docker running (vs using the pip package or hitting the Hosted API). The error message is extremely long and nonobvious.

Environment

[email protected]:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
[email protected]:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

(snipped some of the error output for brevity since Github complained that There was an error creating your Issue: body is too long (maximum is 65536 characters).)

[email protected]:/$ inference infer -i "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/660px-Flower_poster_2.jpg" -m "yolov8n-640"
Running inference on https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/660px-Flower_poster_2.jpg, using model: yolov8n-640, and host: http://localhost:9001
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:174 in _new_conn                   │
│                                                                                                  │
│   171 │   │   │   extra_kw["socket_options"] = self.socket_options                               │
│   172 │   │                                                                                      │
│   173 │   │   try:                                                                               │
│ ❱ 174 │   │   │   conn = connection.create_connection(                                           │
│   175 │   │   │   │   (self._dns_host, self.port), self.timeout, **extra_kw                      │
│   176 │   │   │   )                                                                              │
│   177                                                                                            │
│                                                                                                  │
│ ╭──────────────────────────────── locals ─────────────────────────────────╮                      │
│ │ extra_kw = {'socket_options': [(6, 1, 1)]}                              │                      │
│ │     self = <urllib3.connection.HTTPConnection object at 0x7f1e1023a350> │                      │
│ ╰─────────────────────────────────────────────────────────────────────────╯                      │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/util/connection.py:95 in create_connection       │
│                                                                                                  │
│    92 │   │   │   │   sock = None                                                                │
│    93 │                                                                                          │
│    94 │   if err is not None:                                                                    │
│ ❱  95 │   │   raise err                                                                          │
│    96 │                                                                                          │
│    97 │   raise socket.error("getaddrinfo returns an empty list")                                │
│    98                                                                                            │
│                                                                                                  │
│ ╭────────────────────────────── locals ──────────────────────────────╮                           │
│ │        address = ('localhost', 9001)                               │                           │
│ │             af = <AddressFamily.AF_INET: 2>                        │                           │
│ │      canonname = ''                                                │                           │
│ │            err = ConnectionRefusedError(111, 'Connection refused') │                           │
│ │         family = <AddressFamily.AF_INET: 2>                        │                           │
│ │           host = 'localhost'                                       │                           │
│ │           port = 9001                                              │                           │
│ │          proto = 6                                                 │                           │
│ │            res = (                                                 │                           │
│ │                  │   <AddressFamily.AF_INET: 2>,                   │                           │
│ │                  │   <SocketKind.SOCK_STREAM: 1>,                  │                           │
│ │                  │   6,                                            │                           │
│ │                  │   '',                                           │                           │
│ │                  │   ('127.0.0.1', 9001)                           │                           │
│ │                  )                                                 │                           │
│ │             sa = ('127.0.0.1', 9001)                               │                           │
│ │           sock = None                                              │                           │
│ │ socket_options = [(6, 1, 1)]                                       │                           │
│ │       socktype = <SocketKind.SOCK_STREAM: 1>                       │                           │
│ │ source_address = None                                              │                           │
│ │        timeout = None                                              │                           │
│ ╰────────────────────────────────────────────────────────────────────╯                           │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/util/connection.py:85 in create_connection       │
│                                                                                                  │
│    82 │   │   │   │   sock.settimeout(timeout)                                                   │
│    83 │   │   │   if source_address:                                                             │
│    84 │   │   │   │   sock.bind(source_address)                                                  │
│ ❱  85 │   │   │   sock.connect(sa)                                                               │
│    86 │   │   │   return sock                                                                    │
│    87 │   │                                                                                      │
│    88 │   │   except socket.error as e:                                                          │
│                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:715 in urlopen                 │
│                                                                                                  │
│    712 │   │   │   │   self._prepare_proxy(conn)                                                 │
│    713 │   │   │                                                                                 │
│    714 │   │   │   # Make the request on the httplib connection object.                          │
│ ❱  715 │   │   │   httplib_response = self._make_request(                                        │
│    716 │   │   │   │   conn,                                                                     │
│    717 │   │   │   │   method,                                                                   │
│    718 │   │   │   │   url,                                                                      │
│                                                                                                  │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:416 in _make_request           │
│                                                                                                  │
│    413 │   │   │   if chunked:                                                                   │
│    414 │   │   │   │   conn.request_chunked(method, url, **httplib_request_kw)                   │
│    415 │   │   │   else:                                                                         │
│ ❱  416 │   │   │   │   conn.request(method, url, **httplib_request_kw)                           │
│    417 │   │                                                                                     │
│    418 │   │   # We are swallowing BrokenPipeError (errno.EPIPE) since the server is             │
│    419 │   │   # legitimately able to close the connection after sending a valid response.       │
│                                                                                                  │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:244 in request                     │
│                                                                                                  │
│   241 │   │   │   headers = headers.copy()                                                       │
│   242 │   │   if "user-agent" not in (six.ensure_str(k.lower()) for k in headers):               │
│   243 │   │   │   headers["User-Agent"] = _get_default_user_agent()                              │
│ ❱ 244 │   │   super(HTTPConnection, self).request(method, url, body=body, headers=headers)       │
│   245 │                                                                                          │
│   246 │   def request_chunked(self, method, url, body=None, headers=None):                       │
│   247 │   │   """                                                                                │
│                                                                                                  │
│ /opt/conda/lib/python3.10/http/client.py:1283 in request                                         │
│                                                                                                  │
│   1280 │   def request(self, method, url, body=None, headers={}, *,                              │
│   1281 │   │   │   │   encode_chunked=False):                                                    │
│   1282 │   │   """Send a complete request to the server."""                                      │
│ ❱ 1283 │   │   self._send_request(method, url, body, headers, encode_chunked)                    │
│   1284 │                                                                                         │
│   1285 │   def _send_request(self, method, url, body, headers, encode_chunked):                  │
│   1286 │   │   # Honor explicitly requested Host: and Accept-Encoding: headers.                  │
│                                                                                                  │
│ /opt/conda/lib/python3.10/http/client.py:1329 in _send_request                                   │
│                                                                                                  │
│   1326 │   │   │   # RFC 2616 Section 3.7.1 says that text default has a                         │
│   1327 │   │   │   # default charset of iso-8859-1.                                              │
│   1328 │   │   │   body = _encode(body, 'body')                                                  │
│ ❱ 1329 │   │   self.endheaders(body, encode_chunked=encode_chunked)                              │
│   1330 │                                                                                         │
│   1331 │   def getresponse(self):                                                                │
│   1332 │   │   """Get the response from the server.                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.10/http/client.py:1278 in endheaders                                      │
│                                                                                                  │
│   1275 │   │   │   self.__state = _CS_REQ_SENT                                                   │
│   1276 │   │   else:                                                                             │
│   1277 │   │   │   raise CannotSendHeader()                                                      │
│ ❱ 1278 │   │   self._send_output(message_body, encode_chunked=encode_chunked)                    │
│   1279 │                                                                                         │
│   1280 │   def request(self, method, url, body=None, headers={}, *,                              │
│   1281 │   │   │   │   encode_chunked=False):                                                    │
│                                                                                                  │
│ /opt/conda/lib/python3.10/http/client.py:1038 in _send_output                                    │
│                                                                                                  │
│   1035 │   │   self._buffer.extend((b"", b""))                                                   │
│   1036 │   │   msg = b"\r\n".join(self._buffer)                                                  │
│   1037 │   │   del self._buffer[:]                                                               │
│ ❱ 1038 │   │   self.send(msg)                                                                    │
│   1039 │   │                                                                                     │
│   1040 │   │   if message_body is not None:                                                      │
│   1041                                                                                           │
│                                                                                                  │
│ /opt/conda/lib/python3.10/http/client.py:976 in send                                             │
│                                                                                                  │
│    973 │   │                                                                                     │
│    974 │   │   if self.sock is None:                                                             │
│    975 │   │   │   if self.auto_open:                                                            │
│ ❱  976 │   │   │   │   self.connect()                                                            │
│    977 │   │   │   else:                                                                         │
│    978 │   │   │   │   raise NotConnected()                                                      │
│    979                                                                                           │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:205 in connect                     │
│                                                                                                  │
│   202 │   │   │   self.auto_open = 0                                                             │
│   203 │                                                                                          │
│   204 │   def connect(self):                                                                     │
│ ❱ 205 │   │   conn = self._new_conn()                                                            │
│   206 │   │   self._prepare_conn(conn)                                                           │
│   207 │                                                                                          │
│   208 │   def putrequest(self, method, url, *args, **kwargs):                                    │
│                                                                                                  │
│ ╭────────────────────────────── locals ───────────────────────────────╮                          │
│ │ self = <urllib3.connection.HTTPConnection object at 0x7f1e1023a350> │                          │
│ ╰─────────────────────────────────────────────────────────────────────╯                          │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/connection.py:186 in _new_conn                   │
│                                                                                                  │
│   183 │   │   │   )                                                                              │
│   184 │   │                                                                                      │
│   185 │   │   except SocketError as e:                                                           │
│ ❱ 186 │   │   │   raise NewConnectionError(                                                      │
│   187 │   │   │   │   self, "Failed to establish a new connection: %s" % e                       │
│   188 │   │   │   )                                                                              │
│   189                                                                                            │
│                                                                                                  │
│ ╭──────────────────────────────── locals ─────────────────────────────────╮                      │
│ │ extra_kw = {'socket_options': [(6, 1, 1)]}                              │                      │
│ │     self = <urllib3.connection.HTTPConnection object at 0x7f1e1023a350> │                      │
│ ╰─────────────────────────────────────────────────────────────────────────╯                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1e1023a350>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/requests/adapters.py:486 in send                         │
│                                                                                                  │
│   483 │   │   │   timeout = TimeoutSauce(connect=timeout, read=timeout)                          │
│   484 │   │                                                                                      │
│   485 │   │   try:                                                                               │
│ ❱ 486 │   │   │   resp = conn.urlopen(                                                           │
│   487 │   │   │   │   method=request.method,                                                     │
│   488 │   │   │   │   url=url,                                                                   │
│   489 │   │   │   │   body=request.body,                                                         │
│                                                                                                  │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮               │
│ │    cert = None                                                                 │               │
│ │ chunked = False                                                                │               │
│ │    conn = <urllib3.connectionpool.HTTPConnectionPool object at 0x7f1e10239f00> │               │
│ │ proxies = OrderedDict()                                                        │               │
│ │ request = <PreparedRequest [GET]>                                              │               │
│ │    self = <requests.adapters.HTTPAdapter object at 0x7f1e10239de0>             │               │
│ │  stream = False                                                                │               │
│ │ timeout = Timeout(connect=None, read=None, total=None)                         │               │
│ │     url = '/model/registry'                                                    │               │
│ │  verify = True                                                                 │               │
│ ╰────────────────────────────────────────────────────────────────────────────────╯               │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:799 in urlopen                 │
│                                                                                                  │
│    796 │   │   │   elif isinstance(e, (SocketError, HTTPException)):                             │
│    797 │   │   │   │   e = ProtocolError("Connection aborted.", e)                               │
│    798 │   │   │                                                                                 │
│ ❱  799 │   │   │   retries = retries.increment(                                                  │
│    800 │   │   │   │   method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]           │
│    801 │   │   │   )                                                                             │
│    802 │   │   │   retries.sleep()                                                               │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ _is_ssl_error_message_from_http_proxy = <function                                            │ │
│ │                                         HTTPConnectionPool.urlopen.<locals>._is_ssl_error_m… │ │
│ │                                         at 0x7f1e10227130>                                   │ │
│ │                      assert_same_host = False                                                │ │
│ │                                  body = None                                                 │ │
│ │                              body_pos = None                                                 │ │
│ │                               chunked = False                                                │ │
│ │                            clean_exit = False                                                │ │
│ │                                  conn = None                                                 │ │
│ │                    destination_scheme = None                                                 │ │
│ │                                   err = None                                                 │ │
│ │                               headers = {'User-Agent': 'python-requests/2.31.0',             │ │
│ │                                         'Accept-Encoding': 'gzip, deflate, br', 'Accept':    │ │
│ │                                         '*/*', 'Connection': 'keep-alive'}                   │ │
│ │                  http_tunnel_required = False                                                │ │
│ │                     is_new_proxy_conn = False                                                │ │
│ │                                method = 'GET'                                                │ │
│ │                            parsed_url = Url(                                                 │ │
│ │                                         │   scheme=None,                                     │ │
│ │                                         │   auth=None,                                       │ │
│ │                                         │   host=None,                                       │ │
│ │                                         │   port=None,                                       │ │
│ │                                         │   path='/model/registry',                          │ │
│ │                                         │   query=None,                                      │ │
│ │                                         │   fragment=None                                    │ │
│ │                                         )                                                    │ │
│ │                          pool_timeout = None                                                 │ │
│ │                              redirect = False                                                │ │
│ │                          release_conn = False                                                │ │
│ │                     release_this_conn = True                                                 │ │
│ │                           response_kw = {'preload_content': False, 'decode_content': False}  │ │
│ │                               retries = Retry(total=0, connect=None, read=False,             │ │
│ │                                         redirect=None, status=None)                          │ │
│ │                                  self = <urllib3.connectionpool.HTTPConnectionPool object at │ │
│ │                                         0x7f1e10239f00>                                      │ │
│ │                               timeout = Timeout(connect=None, read=None, total=None)         │ │
│ │                           timeout_obj = Timeout(connect=None, read=None, total=None)         │ │
│ │                                   url = '/model/registry'                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/urllib3/util/retry.py:592 in increment                   │
│                                                                                                  │
│   589 │   │   )                                                                                  │
│   590 │   │                                                                                      │
│   591 │   │   if new_retry.is_exhausted():                                                       │
│ ❱ 592 │   │   │   raise MaxRetryError(_pool, url, error or ResponseError(cause))                 │
│   593 │   │                                                                                      │
│   594 │   │   log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)                  │
│   595                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │             _pool = <urllib3.connectionpool.HTTPConnectionPool object at 0x7f1e10239f00>     │ │
│ │       _stacktrace = <traceback object at 0x7f1e105cde40>                                     │ │
│ │             cause = 'unknown'                                                                │ │
│ │           connect = None                                                                     │ │
│ │             error = NewConnectionError('<urllib3.connection.HTTPConnection object at         │ │
│ │                     0x7f1e1023a350>: Failed to establish a new connection: [Errno 111]       │ │
│ │                     Connection refused')                                                     │ │
│ │           history = (                                                                        │ │
│ │                     │   RequestHistory(                                                      │ │
│ │                     │   │   method='GET',                                                    │ │
│ │                     │   │   url='/model/registry',                                           │ │
│ │                     │   │   error=NewConnectionError('<urllib3.connection.HTTPConnection     │ │
│ │                     object at 0x7f1e1023a350>: Failed to establish a new connection: [Errno  │ │
│ │                     111] Connection refused'),                                               │ │
│ │                     │   │   status=None,                                                     │ │
│ │                     │   │   redirect_location=None                                           │ │
│ │                     │   ),                                                                   │ │
│ │                     )                                                                        │ │
│ │            method = 'GET'                                                                    │ │
│ │         new_retry = Retry(total=-1, connect=None, read=False, redirect=None, status=None)    │ │
│ │             other = None                                                                     │ │
│ │              read = False                                                                    │ │
│ │          redirect = None                                                                     │ │
│ │ redirect_location = None                                                                     │ │
│ │          response = None                                                                     │ │
│ │              self = Retry(total=0, connect=None, read=False, redirect=None, status=None)     │ │
│ │            status = None                                                                     │ │
│ │      status_count = None                                                                     │ │
│ │             total = -1                                                                       │ │
│ │               url = '/model/registry'                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
MaxRetryError: HTTPConnectionPool(host='localhost', port=9001): Max retries exceeded with url: /model/registry (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1e1023a350>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/site-packages/inference_cli/main.py:89 in infer                        │
│                                                                                                  │
│    86 │   typer.echo(                                                                            │
│    87 │   │   f"Running inference on {input_reference}, using model: {model_id}, and host: {ho   │
│    88 │   )                                                                                      │
│ ❱  89 │   inference_cli.lib.infer(                                                               │
│    90 │   │   input_reference=input_reference,                                                   │
│    91 │   │   model_id=model_id,                                                                 │
│    92 │   │   api_key=api_key,                                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              api_key = None                                                                  │ │
│ │              display = False                                                                 │ │
│ │                 host = 'http://localhost:9001'                                               │ │
│ │      input_reference = 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_po… │ │
│ │         model_config = None                                                                  │ │
│ │             model_id = 'yolov8n-640'                                                         │ │
│ │      output_location = None                                                                  │ │
│ │ visualisation_config = None                                                                  │ │
│ │            visualise = True                                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_cli/lib/infer_adapter.py:120 in infer          │
│                                                                                                  │
│   117 │   │   │   model_configuration=model_configuration,                                       │
│   118 │   │   )                                                                                  │
│   119 │   │   return None                                                                        │
│ ❱ 120 │   infer_on_image(                                                                        │
│   121 │   │   input_reference=input_reference,                                                   │
│   122 │   │   model_id=model_id,                                                                 │
│   123 │   │   api_key=api_key,                                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              api_key = None                                                                  │ │
│ │              display = False                                                                 │ │
│ │                 host = 'http://localhost:9001'                                               │ │
│ │      input_reference = 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_po… │ │
│ │  model_configuration = None                                                                  │ │
│ │             model_id = 'yolov8n-640'                                                         │ │
│ │      output_location = None                                                                  │ │
│ │ visualisation_config = None                                                                  │ │
│ │            visualise = True                                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_cli/lib/infer_adapter.py:280 in infer_on_image │
│                                                                                                  │
│   277 │   │   on_frame_visualise = build_visualisation_callback(                                 │
│   278 │   │   │   visualisation_config=visualisation_config,                                     │
│   279 │   │   )                                                                                  │
│ ❱ 280 │   prediction = client.infer(inference_input=input_reference, model_id=model_id)          │
│   281 │   visualised = None                                                                      │
│   282 │   if visualise:                                                                          │
│   283 │   │   frame_base64 = load_image_from_string(reference=input_reference)[0]                │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              api_key = None                                                                  │ │
│ │               client = <inference_sdk.http.client.InferenceHTTPClient object at              │ │
│ │                        0x7f1e10239ae0>                                                       │ │
│ │              display = False                                                                 │ │
│ │                 host = 'http://localhost:9001'                                               │ │
│ │      input_reference = 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_po… │ │
│ │  model_configuration = None                                                                  │ │
│ │             model_id = 'yolov8n-640'                                                         │ │
│ │   on_frame_visualise = functools.partial(<function create_visualisation at 0x7f1e104128c0>,  │ │
│ │                        annotators=[<supervision.annotators.core.BoundingBoxAnnotator object  │ │
│ │                        at 0x7f1e10239a80>], tracker=None)                                    │ │
│ │      output_location = None                                                                  │ │
│ │ visualisation_config = None                                                                  │ │
│ │            visualise = True                                                                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:81 in decorate              │
│                                                                                                  │
│     78 def wrap_errors(function: callable) -> callable:                                          │
│     79 │   def decorate(*args, **kwargs) -> Any:                                                 │
│     80 │   │   try:                                                                              │
│ ❱   81 │   │   │   return function(*args, **kwargs)                                              │
│     82 │   │   except HTTPError as error:                                                        │
│     83 │   │   │   if "application/json" in error.response.headers.get("Content-Type", ""):      │
│     84 │   │   │   │   api_message = error.response.json().get("message")                        │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │     args = (<inference_sdk.http.client.InferenceHTTPClient object at 0x7f1e10239ae0>,)       │ │
│ │ function = <function InferenceHTTPClient.infer at 0x7f1e10410040>                            │ │
│ │   kwargs = {                                                                                 │ │
│ │            │   'inference_input':                                                            │ │
│ │            'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_2.jpg/6… │ │
│ │            │   'model_id': 'yolov8n-640'                                                     │ │
│ │            }                                                                                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:231 in infer                │
│                                                                                                  │
│    228 │   │   │   │   inference_input=inference_input,                                          │
│    229 │   │   │   │   model_id=model_id,                                                        │
│    230 │   │   │   )                                                                             │
│ ❱  231 │   │   return self.infer_from_api_v1(                                                    │
│    232 │   │   │   inference_input=inference_input,                                              │
│    233 │   │   │   model_id=model_id,                                                            │
│    234 │   │   )                                                                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ inference_input = 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_poster_… │ │
│ │        model_id = 'yolov8n-640'                                                              │ │
│ │            self = <inference_sdk.http.client.InferenceHTTPClient object at 0x7f1e10239ae0>   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:379 in infer_from_api_v1    │
│                                                                                                  │
│    376 │   │   model_id_to_be_used = model_id or self.__selected_model                           │
│    377 │   │   _ensure_model_is_selected(model_id=model_id_to_be_used)                           │
│    378 │   │   model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)  │
│ ❱  379 │   │   model_description = self.get_model_description(model_id=model_id_to_be_used)      │
│    380 │   │   max_height, max_width = _determine_client_downsizing_parameters(                  │
│    381 │   │   │   client_downsizing_disabled=self.__inference_configuration.client_downsizing_  │
│    382 │   │   │   model_description=model_description,                                          │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │     inference_input = 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Flower_pos… │ │
│ │            model_id = 'yolov8n-640'                                                          │ │
│ │ model_id_to_be_used = 'coco/3'                                                               │ │
│ │                self = <inference_sdk.http.client.InferenceHTTPClient object at               │ │
│ │                       0x7f1e10239ae0>                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:518 in                      │
│ get_model_description                                                                            │
│                                                                                                  │
│    515 │   ) -> ModelDescription:                                                                │
│    516 │   │   self.__ensure_v1_client_mode()                                                    │
│    517 │   │   de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)             │
│ ❱  518 │   │   registered_models = self.list_loaded_models()                                     │
│    519 │   │   matching_model = filter_model_descriptions(                                       │
│    520 │   │   │   descriptions=registered_models.models,                                        │
│    521 │   │   │   model_id=de_aliased_model_id,                                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       allow_loading = True                                                                   │ │
│ │ de_aliased_model_id = 'coco/3'                                                               │ │
│ │            model_id = 'coco/3'                                                               │ │
│ │                self = <inference_sdk.http.client.InferenceHTTPClient object at               │ │
│ │                       0x7f1e10239ae0>                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:81 in decorate              │
│                                                                                                  │
│     78 def wrap_errors(function: callable) -> callable:                                          │
│     79 │   def decorate(*args, **kwargs) -> Any:                                                 │
│     80 │   │   try:                                                                              │
│ ❱   81 │   │   │   return function(*args, **kwargs)                                              │
│     82 │   │   except HTTPError as error:                                                        │
│     83 │   │   │   if "application/json" in error.response.headers.get("Content-Type", ""):      │
│     84 │   │   │   │   api_message = error.response.json().get("message")                        │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ────────────────────────────────────────╮       │
│ │     args = (<inference_sdk.http.client.InferenceHTTPClient object at 0x7f1e10239ae0>,) │       │
│ │ function = <function InferenceHTTPClient.list_loaded_models at 0x7f1e104105e0>         │       │
│ │   kwargs = {}                                                                          │       │
│ ╰────────────────────────────────────────────────────────────────────────────────────────╯       │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/inference_sdk/http/client.py:564 in list_loaded_models   │
│                                                                                                  │
│    561 │   @wrap_errors                                                                          │
│    562 │   def list_loaded_models(self) -> RegisteredModels:                                     │
│    563 │   │   self.__ensure_v1_client_mode()                                                    │
│ ❱  564 │   │   response = requests.get(f"{self.__api_url}/model/registry")                       │
│    565 │   │   response.raise_for_status()                                                       │
│    566 │   │   response_payload = response.json()                                                │
│    567 │   │   return RegisteredModels.from_dict(response_payload)                               │
│                                                                                                  │
│ ╭──────────────────────────────────── locals ─────────────────────────────────────╮              │
│ │ self = <inference_sdk.http.client.InferenceHTTPClient object at 0x7f1e10239ae0> │              │
│ ╰─────────────────────────────────────────────────────────────────────────────────╯              │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/requests/api.py:73 in get                                │
│                                                                                                  │
│    70 │   :rtype: requests.Response                                                              │
│    71 │   """                                                                                    │
│    72 │                                                                                          │
│ ❱  73 │   return request("get", url, params=params, **kwargs)                                    │
│    74                                                                                            │
│    75                                                                                            │
│    76 def options(url, **kwargs):                                                                │
│                                                                                                  │
│ ╭──────────────────── locals ─────────────────────╮                                              │
│ │ kwargs = {}                                     │                                              │
│ │ params = None                                   │                                              │
│ │    url = 'http://localhost:9001/model/registry' │                                              │
│ ╰─────────────────────────────────────────────────╯                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/requests/api.py:59 in request                            │
│                                                                                                  │
│    56 │   # avoid leaving sockets open which can trigger a ResourceWarning in some               │
│    57 │   # cases, and look like a memory leak in others.                                        │
│    58 │   with sessions.Session() as session:                                                    │
│ ❱  59 │   │   return session.request(method=method, url=url, **kwargs)                           │
│    60                                                                                            │
│    61                                                                                            │
│    62 def get(url, params=None, **kwargs):                                                       │
│                                                                                                  │
│ ╭──────────────────────────── locals ────────────────────────────╮                               │
│ │  kwargs = {'params': None}                                     │                               │
│ │  method = 'get'                                                │                               │
│ │ session = <requests.sessions.Session object at 0x7f1e10238f10> │                               │
│ │     url = 'http://localhost:9001/model/registry'               │                               │
│ ╰────────────────────────────────────────────────────────────────╯                               │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/requests/sessions.py:589 in request                      │
│                                                                                                  │
│   586 │   │   │   "allow_redirects": allow_redirects,                                            │
│   587 │   │   }                                                                                  │
│   588 │   │   send_kwargs.update(settings)                                                       │
│ ❱ 589 │   │   resp = self.send(prep, **send_kwargs)                                              │
│   590 │   │                                                                                      │
│   591 │   │   return resp                                                                        │
│   592                                                                                            │
│                                                                                                  │
│ ╭────────────────────────────────────────── locals ───────────────────────────────────────────╮  │
│ │ allow_redirects = True                                                                      │  │
│ │            auth = None                                                                      │  │
│ │            cert = None                                                                      │  │
│ │         cookies = None                                                                      │  │
│ │            data = None                                                                      │  │
│ │           files = None                                                                      │  │
│ │         headers = None                                                                      │  │
│ │           hooks = None                                                                      │  │
│ │            json = None                                                                      │  │
│ │          method = 'get'                                                                     │  │
│ │          params = None                                                                      │  │
│ │            prep = <PreparedRequest [GET]>                                                   │  │
│ │         proxies = {}                                                                        │  │
│ │             req = <Request [GET]>                                                           │  │
│ │            self = <requests.sessions.Session object at 0x7f1e10238f10>                      │  │
│ │     send_kwargs = {                                                                         │  │
│ │                   │   'timeout': None,                                                      │  │
│ │                   │   'allow_redirects': True,                                              │  │
│ │                   │   'proxies': OrderedDict(),                                             │  │
│ │                   │   'stream': False,                                                      │  │
│ │                   │   'verify': True,                                                       │  │
│ │                   │   'cert': None                                                          │  │
│ │                   }                                                                         │  │
│ │        settings = {'proxies': OrderedDict(), 'stream': False, 'verify': True, 'cert': None} │  │
│ │          stream = None                                                                      │  │
│ │         timeout = None                                                                      │  │
│ │             url = 'http://localhost:9001/model/registry'                                    │  │
│ │          verify = None                                                                      │  │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────────╯  │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/requests/sessions.py:703 in send                         │
│                                                                                                  │
│   700 │   │   start = preferred_clock()                                                          │
│   701 │   │                                                                                      │
│   702 │   │   # Send the request                                                                 │
│ ❱ 703 │   │   r = adapter.send(request, **kwargs)                                                │
│   704 │   │                                                                                      │
│   705 │   │   # Total elapsed time of the request (approximately)                                │
│   706 │   │   elapsed = preferred_clock() - start                                                │
│                                                                                                  │
│ ╭────────────────────────────────── locals ──────────────────────────────────╮                   │
│ │         adapter = <requests.adapters.HTTPAdapter object at 0x7f1e10239de0> │                   │
│ │ allow_redirects = True                                                     │                   │
│ │           hooks = {'response': []}                                         │                   │
│ │          kwargs = {                                                        │                   │
│ │                   │   'timeout': None,                                     │                   │
│ │                   │   'proxies': OrderedDict(),                            │                   │
│ │                   │   'stream': False,                                     │                   │
│ │                   │   'verify': True,                                      │                   │
│ │                   │   'cert': None                                         │                   │
│ │                   }                                                        │                   │
│ │         request = <PreparedRequest [GET]>                                  │                   │
│ │            self = <requests.sessions.Session object at 0x7f1e10238f10>     │                   │
│ │           start = 1710104956.0023863                                       │                   │
│ │          stream = False                                                    │                   │
│ ╰────────────────────────────────────────────────────────────────────────────╯                   │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/requests/adapters.py:519 in send                         │
│                                                                                                  │
│   516 │   │   │   │   # This branch is for urllib3 v1.22 and later.                              │
│   517 │   │   │   │   raise SSLError(e, request=request)                                         │
│   518 │   │   │                                                                                  │
│ ❱ 519 │   │   │   raise ConnectionError(e, request=request)                                      │
│   520 │   │                                                                                      │
│   521 │   │   except ClosedPoolError as e:                                                       │
│   522 │   │   │   raise ConnectionError(e, request=request)                                      │
│                                                                                                  │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮               │
│ │    cert = None                                                                 │               │
│ │ chunked = False                                                                │               │
│ │    conn = <urllib3.connectionpool.HTTPConnectionPool object at 0x7f1e10239f00> │               │
│ │ proxies = OrderedDict()                                                        │               │
│ │ request = <PreparedRequest [GET]>                                              │               │
│ │    self = <requests.adapters.HTTPAdapter object at 0x7f1e10239de0>             │               │
│ │  stream = False                                                                │               │
│ │ timeout = Timeout(connect=None, read=None, total=None)                         │               │
│ │     url = '/model/registry'                                                    │               │
│ │  verify = True                                                                 │               │
│ ╰────────────────────────────────────────────────────────────────────────────────╯               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ConnectionError: HTTPConnectionPool(host='localhost', port=9001): Max retries exceeded with url: /model/registry (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1e1023a350>: Failed to establish a new connection: [Errno 111] Connection refused'))
[email protected]:/$

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Does Inference not work with Windows?

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

I am trying to run inference on Windows to do inferencing with YOLO v8 with my GPU. When I try inference server start I get the following error: TypeError: Invalid type for device_requests param: expected list but found <class 'tuple'>.

This works: docker run --network=host --gpus=all roboflow/roboflow-inference-server-gpu:latest but then the server does not accept any requests:
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9001): Max retries exceeded with url: /doctr/ocr?api_key=St4BLhrH2dLkV8EQuXG4 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001C585F67150>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Additional

No response

GPU Acceleration Doesn't Work with CUDA 12

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

If you install inference-gpu on a machine with CUDA 12 it'll complain and fall back to CPU execution mode.

2024-03-10 21:00:56.403012156 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

There's a special version of onnxruntime needed for CUDA 12: https://onnxruntime.ai/docs/install/

Ideally we'd detect this automatically & make it "just work" with CUDA 12. But alternatively we could let the user know why they're not getting GPU acceleration.

Environment

[email protected]:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
[email protected]:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

pip install inference-gpu
inference benchmark python-package-speed -m "yolov8n-640"

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Inference directly on the image in the form of `np.ndarray` without the need to save it to hard drive

Inference now takes the path to the image rather than the image itself. This makes video file processing highly suboptimal and creates a bottleneck.

import requests

url = f"http://localhost:9001/{DATASET_ID}/{VERSION_ID}"

params = {
    "api_key": ROBOFLOW_API_KEY,
    "confidence": 0.5,
    "image": IMAGE_URL,
}

res = requests.post(url, params=params)
print(res.json())

It would be great if inference took np.ndarray directly. Example below is just pseudo-code.

import requests
import supervision as sv

url = f"http://localhost:9001/{DATASET_ID}/{VERSION_ID}"

for frame in sv.get_video_frames_generator(source_path=VIDEO_PATH):

    params = {
        "api_key": ROBOFLOW_API_KEY,
        "confidence": 0.5,
        "image": frame,
    }
    
    res = requests.post(url, params=params)
    print(res.json())

🙋🏻 I would love to work on the implementation of this feature.

YOLOWorld Strange behavoir

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

After playing around with the yoloworld model for a while, I have seen that the model will only ever return results if it detects two of the classes you have set with set_classes. I passed the example dog image, as well as other images, repeatedly, with one class "dog" and it returns nothing, but if I pass it ["person", "dog"] it will return both. I don't see any flaws in the text embedding logic, but seems like something in the model is stopping it from detecting single classes.

Additional

No response

LMM workflow step - rename to LLM

Search before asking

I have searched the Inference issues and found no similar feature requests.

Description

Rename:
LMM -> LLM
LMMForClassification -> LLMForClassification

Use case

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Model manager with fixed batch size is broken

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

If we exceed the limit of models to be kept in memory and the purge should happen - error happens:

Traceback (most recent call last):
  File "/app/inference/core/interfaces/http/http_api.py", line 158, in wrapped_route
    return await route(*args, **kwargs)
  File "/app/inference/core/interfaces/http/http_api.py", line 476, in model_add
    self.model_manager.add_model(request.model_id, request.api_key)
  File "/app/inference/core/managers/decorators/fixed_size_cache.py", line 43, in add_model
    self.remove(to_remove_model_id)
  File "/app/inference/core/managers/decorators/fixed_size_cache.py", line 58, in remove
    return super().remove(model_id)
  File "/app/inference/core/managers/decorators/base.py", line 130, in remove
    return self.model_manager.remove(model_id)
  File "/app/inference/core/managers/base.py", line 260, in remove
    self._models[model_id].clear_cache()
  File "/app/inference/core/models/roboflow.py", line 145, in clear_cache
    clear_cache(model_id=self.endpoint)
  File "/app/inference/core/cache/model_artifacts.py", line 92, in clear_cache
    shutil.rmtree(cache_dir)
  File "/usr/local/lib/python3.9/shutil.py", line 724, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/usr/local/lib/python3.9/shutil.py", line 722, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/cache/coco-dataset-vdnr1/4'

Environment

HTTP API container

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Video Inference

Hello, Roboflow is presented with an amazing detection of soccer players in this tweet, is it possible to provide an example of video inference then?

Thank you in advance!

Error on Inference on Raspberry Pi

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

Hi, I am using Raspberry Pi to do Inference using RTSP stream with below code:

# import the InferencePipeline interface
from inference import InferencePipeline
# import a built in sink called render_boxes (sinks are the logic that happens after inference)
from inference.core.interfaces.stream.sinks import render_boxes

# create an inference pipeline object
pipeline = InferencePipeline.init(
    model_id="cow-lie-stand-walk/2", # set the model id to a yolov8x model with in put size 1280
    video_reference="rtsp://192.168.1.100:5543/live/channel0", # set the video reference (source of video), it can be a link/path to a video file, an RTSP stream url, or an integer representing a device id (usually 0 for built in webcams)
    on_prediction=render_boxes, # tell the pipeline object what to do with each set of inference by passing a function
    api_key="<API-KEY>", # provide your roboflow api key for loading models from the roboflow api
)
# start the pipeline
pipeline.start()
# wait for the pipeline to finish
pipeline.join()

The above code is working fine for my Windows 11, but giving following error for my raspberry Pi

[02/10/24 20:11:11] ERROR    Could not sent prediction with frame_id=3493 to sink due to error: OpenCV(4.8.0) /io/opencv/modules/highgui/src/window.cpp:1272:   sinks.py:252 error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvShowImage'

Please let me know how to fix this.

Environment

-inference : 0.9.9 -OS: Ubuntu 23.04 -Device: Raspberry Pi 4 Model B(8GB) -Python: 3.11.6

Minimal Reproducible Example

# import the InferencePipeline interface
from inference import InferencePipeline
# import a built in sink called render_boxes (sinks are the logic that happens after inference)
from inference.core.interfaces.stream.sinks import render_boxes

# create an inference pipeline object
pipeline = InferencePipeline.init(
    model_id="cow-lie-stand-walk/2", # set the model id to a yolov8x model with in put size 1280
    video_reference="rtsp://192.168.1.100:5543/live/channel0", # set the video reference (source of video), it can be a link/path to a video file, an RTSP stream url, or an integer representing a device id (usually 0 for built in webcams)
    on_prediction=render_boxes, # tell the pipeline object what to do with each set of inference by passing a function
    api_key="<API-KEY>", # provide your roboflow api key for loading models from the roboflow api
)
# start the pipeline
pipeline.start()
# wait for the pipeline to finish
pipeline.join()

Save the above code in Python file and run the mentioned Device and OS. Make sure to put a valid RTSP url

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

If you install inside of Docker you need to ensure you have additional dependencies otherwise you get an error.

Ideally we should not require these (I think there's a headless version of cv2 we could use). Alternatively maybe we could catch this error and recommend the solve: apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

Environment

[email protected]:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
[email protected]:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

[email protected]:~$ inference benchmark python-package-speed -m "yolov8n-640"
Traceback (most recent call last):
  File "/usr/local/bin/inference", line 5, in <module>
    from inference_cli.main import app
  File "/usr/local/lib/python3.10/dist-packages/inference_cli/main.py", line 6, in <module>
    import inference_cli.lib
  File "/usr/local/lib/python3.10/dist-packages/inference_cli/lib/__init__.py", line 8, in <module>
    from inference_cli.lib.container_adapter import (
  File "/usr/local/lib/python3.10/dist-packages/inference_cli/lib/container_adapter.py", line 10, in <module>
    from inference_cli.lib.utils import read_env_file
  File "/usr/local/lib/python3.10/dist-packages/inference_cli/lib/utils.py", line 5, in <module>
    from supervision.utils.file import read_yaml_file
  File "/usr/local/lib/python3.10/dist-packages/supervision/__init__.py", line 9, in <module>
    from supervision.annotators.core import (
  File "/usr/local/lib/python3.10/dist-packages/supervision/annotators/core.py", line 4, in <module>
    import cv2
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
[email protected]:~$ D

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

`TypeError: Invalid type for device_requests param: expected list but found <class 'tuple'>` when `inference server start` on AWS `g4dn.xlarge` instance

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I am trying to run CogVLM inference on an AWS instance, however when I run inference server start I encounter the error TypeError: Invalid type for device_requests param: expected list but found <class 'tuple'>.

GPU detected. Using a GPU image.
Starting inference server container...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ubuntu/.local/lib/python3.8/site-packages/inference_cli/server.py:30 in start              │
│                                                                                                  │
│   27 │   │   ),                                                                                  │
│   28 │   ] = "roboflow-platform",                                                                │
│   29 ):                                                                                          │
│ ❱ 30 │   start_inference_container("", port=port, project=rf_env)                                │
│   31                                                                                             │
│   32                                                                                             │
│   33 @server_app.command()                                                                       │
│                                                                                                  │
│ ╭─────────── locals ───────────╮                                                                 │
│ │   port = 9001                │                                                                 │
│ │ rf_env = 'roboflow-platform' │                                                                 │
│ ╰──────────────────────────────╯                                                                 │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/inference_cli/lib/container_adapter.py:92 in     │
│ start_inference_container                                                                        │
│                                                                                                  │
│    89 │   │   )                                                                                  │
│    90 │                                                                                          │
│    91 │   print(f"Starting inference server container...")                                       │
│ ❱  92 │   docker_client.containers.run(                                                          │
│    93 │   │   image=image,                                                                       │
│    94 │   │   privileged=privileged,                                                             │
│    95 │   │   detach=True,                                                                       │
│                                                                                                  │
│ ╭───────────────────────────── locals ──────────────────────────────╮                            │
│ │         api_key = ''                                              │                            │
│ │      containers = []                                              │                            │
│ │       device_id = None                                            │                            │
│ │ device_requests = (                                               │                            │
│ │                   │   [                                           │                            │
│ │                   │   │   {                                       │                            │
│ │                   │   │   │   'Driver': '',                       │                            │
│ │                   │   │   │   'Count': 0,                         │                            │
│ │                   │   │   │   'DeviceIDs': ['all'],               │                            │
│ │                   │   │   │   'Capabilities': [['gpu']],          │                            │
│ │                   │   │   │   'Options': {}                       │                            │
│ │                   │   │   }                                       │                            │
│ │                   │   ],                                          │                            │
│ │                   )                                               │                            │
│ │           image = 'roboflow/roboflow-inference-server-gpu:latest' │                            │
│ │          labels = None                                            │                            │
│ │ metrics_enabled = True                                            │                            │
│ │     num_workers = 1                                               │                            │
│ │            port = 9001                                            │                            │
│ │      privileged = True                                            │                            │
│ │         project = 'roboflow-platform'                             │                            │
│ ╰───────────────────────────────────────────────────────────────────╯                            │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/docker/models/containers.py:858 in run           │
│                                                                                                  │
│    855 │   │   │   )                                                                             │
│    856 │   │                                                                                     │
│    857 │   │   try:                                                                              │
│ ❱  858 │   │   │   container = self.create(image=image, command=command,                         │
│    859 │   │   │   │   │   │   │   │   │   detach=detach, **kwargs)                              │
│    860 │   │   except ImageNotFound:                                                             │
│    861 │   │   │   self.client.images.pull(image, platform=platform)                             │
│                                                                                                  │
│ ╭────────────────────────────────────── locals ──────────────────────────────────────╮           │
│ │  command = None                                                                    │           │
│ │   detach = True                                                                    │           │
│ │    image = 'roboflow/roboflow-inference-server-gpu:latest'                         │           │
│ │   kwargs = {                                                                       │           │
│ │            │   'privileged': True,                                                 │           │
│ │            │   'labels': None,                                                     │           │
│ │            │   'ports': {'9001': 9001},                                            │           │
│ │            │   'device_requests': (                                                │           │
│ │            │   │   [                                                               │           │
│ │            │   │   │   {                                                           │           │
│ │            │   │   │   │   'Driver': '',                                           │           │
│ │            │   │   │   │   'Count': 0,                                             │           │
│ │            │   │   │   │   'DeviceIDs': ['all'],                                   │           │
│ │            │   │   │   │   'Capabilities': [['gpu']],                              │           │
│ │            │   │   │   │   'Options': {}                                           │           │
│ │            │   │   │   }                                                           │           │
│ │            │   │   ],                                                              │           │
│ │            │   ),                                                                  │           │
│ │            │   'environment': [                                                    │           │
│ │            │   │   'HOST=0.0.0.0',                                                 │           │
│ │            │   │   'PORT=9001',                                                    │           │
│ │            │   │   'PROJECT=roboflow-platform',                                    │           │
│ │            │   │   'METRICS_ENABLED=True',                                         │           │
│ │            │   │   'DEVICE_ID=None',                                               │           │
│ │            │   │   'API_KEY=',                                                     │           │
│ │            │   │   'NUM_WORKERS=1'                                                 │           │
│ │            │   ]                                                                   │           │
│ │            }                                                                       │           │
│ │ platform = None                                                                    │           │
│ │   remove = False                                                                   │           │
│ │     self = <docker.models.containers.ContainerCollection object at 0x7f75b3bda430> │           │
│ │   stderr = False                                                                   │           │
│ │   stdout = True                                                                    │           │
│ │   stream = False                                                                   │           │
│ ╰────────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/docker/models/containers.py:916 in create        │
│                                                                                                  │
│    913 │   │   kwargs['image'] = image                                                           │
│    914 │   │   kwargs['command'] = command                                                       │
│    915 │   │   kwargs['version'] = self.client.api._version                                      │
│ ❱  916 │   │   create_kwargs = _create_container_args(kwargs)                                    │
│    917 │   │   resp = self.client.api.create_container(**create_kwargs)                          │
│    918 │   │   return self.get(resp['Id'])                                                       │
│    919                                                                                           │
│                                                                                                  │
│ ╭───────────────────────────────────── locals ──────────────────────────────────────╮            │
│ │ command = None                                                                    │            │
│ │   image = 'roboflow/roboflow-inference-server-gpu:latest'                         │            │
│ │  kwargs = {}                                                                      │            │
│ │    self = <docker.models.containers.ContainerCollection object at 0x7f75b3bda430> │            │
│ ╰───────────────────────────────────────────────────────────────────────────────────╯            │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/docker/models/containers.py:1140 in              │
│ _create_container_args                                                                           │
│                                                                                                  │
│   1137 │   if kwargs:                                                                            │
│   1138 │   │   raise create_unexpected_kwargs_error('run', kwargs)                               │
│   1139 │                                                                                         │
│ ❱ 1140 │   create_kwargs['host_config'] = HostConfig(**host_config_kwargs)                       │
│   1141 │                                                                                         │
│   1142 │   # Fill in any kwargs which need processing by create_host_config first                │
│   1143 │   port_bindings = create_kwargs['host_config'].get('PortBindings')                      │
│                                                                                                  │
│ ╭────────────────────────────────────── locals ──────────────────────────────────────╮           │
│ │      create_kwargs = {                                                             │           │
│ │                      │   'detach': True,                                           │           │
│ │                      │   'labels': None,                                           │           │
│ │                      │   'environment': [                                          │           │
│ │                      │   │   'HOST=0.0.0.0',                                       │           │
│ │                      │   │   'PORT=9001',                                          │           │
│ │                      │   │   'PROJECT=roboflow-platform',                          │           │
│ │                      │   │   'METRICS_ENABLED=True',                               │           │
│ │                      │   │   'DEVICE_ID=None',                                     │           │
│ │                      │   │   'API_KEY=',                                           │           │
│ │                      │   │   'NUM_WORKERS=1'                                       │           │
│ │                      │   ],                                                        │           │
│ │                      │   'image': 'roboflow/roboflow-inference-server-gpu:latest', │           │
│ │                      │   'command': None                                           │           │
│ │                      }                                                             │           │
│ │ host_config_kwargs = {                                                             │           │
│ │                      │   'privileged': True,                                       │           │
│ │                      │   'device_requests': (                                      │           │
│ │                      │   │   [                                                     │           │
│ │                      │   │   │   {                                                 │           │
│ │                      │   │   │   │   'Driver': '',                                 │           │
│ │                      │   │   │   │   'Count': 0,                                   │           │
│ │                      │   │   │   │   'DeviceIDs': ['all'],                         │           │
│ │                      │   │   │   │   'Capabilities': [['gpu']],                    │           │
│ │                      │   │   │   │   'Options': {}                                 │           │
│ │                      │   │   │   }                                                 │           │
│ │                      │   │   ],                                                    │           │
│ │                      │   ),                                                        │           │
│ │                      │   'version': '1.41',                                        │           │
│ │                      │   'port_bindings': {'9001': 9001}                           │           │
│ │                      }                                                             │           │
│ │                key = 'version'                                                     │           │
│ │             kwargs = {}                                                            │           │
│ │            network = None                                                          │           │
│ │ network_driver_opt = None                                                          │           │
│ │              ports = {'9001': 9001}                                                │           │
│ │            volumes = {}                                                            │           │
│ ╰────────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                  │
│ /home/ubuntu/.local/lib/python3.8/site-packages/docker/types/containers.py:641 in __init__       │
│                                                                                                  │
│   638 │   │   │   if version_lt(version, '1.40'):                                                │
│   639 │   │   │   │   raise host_config_version_error('device_requests', '1.40')                 │
│   640 │   │   │   if not isinstance(device_requests, list):                                      │
│ ❱ 641 │   │   │   │   raise host_config_type_error(                                              │
│   642 │   │   │   │   │   'device_requests', device_requests, 'list'                             │
│   643 │   │   │   │   )                                                                          │
│   644 │   │   │   self['DeviceRequests'] = []                                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │         auto_remove = False                                                                  │ │
│ │               binds = None                                                                   │ │
│ │        blkio_weight = None                                                                   │ │
│ │ blkio_weight_device = None                                                                   │ │
│ │             cap_add = None                                                                   │ │
│ │            cap_drop = None                                                                   │ │
│ │       cgroup_parent = None                                                                   │ │
│ │            cgroupns = None                                                                   │ │
│ │           cpu_count = None                                                                   │ │
│ │         cpu_percent = None                                                                   │ │
│ │          cpu_period = None                                                                   │ │
│ │           cpu_quota = None                                                                   │ │
│ │       cpu_rt_period = None                                                                   │ │
│ │      cpu_rt_runtime = None                                                                   │ │
│ │          cpu_shares = None                                                                   │ │
│ │         cpuset_cpus = None                                                                   │ │
│ │         cpuset_mems = None                                                                   │ │
│ │ device_cgroup_rules = None                                                                   │ │
│ │     device_read_bps = None                                                                   │ │
│ │    device_read_iops = None                                                                   │ │
│ │     device_requests = (                                                                      │ │
│ │                       │   [                                                                  │ │
│ │                       │   │   {                                                              │ │
│ │                       │   │   │   'Driver': '',                                              │ │
│ │                       │   │   │   'Count': 0,                                                │ │
│ │                       │   │   │   'DeviceIDs': ['all'],                                      │ │
│ │                       │   │   │   'Capabilities': [['gpu']],                                 │ │
│ │                       │   │   │   'Options': {}                                              │ │
│ │                       │   │   }                                                              │ │
│ │                       │   ],                                                                 │ │
│ │                       )                                                                      │ │
│ │    device_write_bps = None                                                                   │ │
│ │   device_write_iops = None                                                                   │ │
│ │             devices = None                                                                   │ │
│ │                 dns = None                                                                   │ │
│ │             dns_opt = None                                                                   │ │
│ │          dns_search = None                                                                   │ │
│ │         extra_hosts = None                                                                   │ │
│ │           group_add = None                                                                   │ │
│ │                init = None                                                                   │ │
│ │           init_path = None                                                                   │ │
│ │            ipc_mode = None                                                                   │ │
│ │           isolation = None                                                                   │ │
│ │       kernel_memory = None                                                                   │ │
│ │               links = None                                                                   │ │
│ │          log_config = None                                                                   │ │
│ │            lxc_conf = None                                                                   │ │
│ │           mem_limit = None                                                                   │ │
│ │     mem_reservation = None                                                                   │ │
│ │      mem_swappiness = None                                                                   │ │
│ │       memswap_limit = None                                                                   │ │
│ │              mounts = None                                                                   │ │
│ │           nano_cpus = None                                                                   │ │
│ │        network_mode = None                                                                   │ │
│ │    oom_kill_disable = False                                                                  │ │
│ │       oom_score_adj = None                                                                   │ │
│ │            pid_mode = None                                                                   │ │
│ │          pids_limit = None                                                                   │ │
│ │       port_bindings = {'9001': 9001}                                                         │ │
│ │          privileged = True                                                                   │ │
│ │   publish_all_ports = False                                                                  │ │
│ │           read_only = None                                                                   │ │
│ │      restart_policy = None                                                                   │ │
│ │             runtime = None                                                                   │ │
│ │        security_opt = None                                                                   │ │
│ │                self = {                                                                      │ │
│ │                       │   'Privileged': True,                                                │ │
│ │                       │   'NetworkMode': 'default',                                          │ │
│ │                       │   'PortBindings': {                                                  │ │
│ │                       │   │   '9001/tcp': [{'HostIp': '', 'HostPort': '9001'}]               │ │
│ │                       │   }                                                                  │ │
│ │                       }                                                                      │ │
│ │            shm_size = None                                                                   │ │
│ │         storage_opt = None                                                                   │ │
│ │             sysctls = None                                                                   │ │
│ │               tmpfs = None                                                                   │ │
│ │             ulimits = None                                                                   │ │
│ │         userns_mode = None                                                                   │ │
│ │            uts_mode = None                                                                   │ │
│ │             version = '1.41'                                                                 │ │
│ │       volume_driver = None                                                                   │ │
│ │        volumes_from = None                                                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Invalid type for device_requests param: expected list but found <class 'tuple'>

Environment

pip list | grep inference result:

inference                         0.9.6
inference-cli                     0.9.6

nvidia-smi result:

Mon Dec 18 14:31:51 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   22C    P8    14W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

docker --version result:
```
Docker version 20.10.17, build 100c701
```

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Possible documentation mistake for HTTP inference with numpy arrays

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

On the roboflow website, the local http inference payload structure for "NumPy Array" image payloads is documented as

{
# ...
    "image": {
        "type": "numpy",
        "value": image,
    },
# ...
}

Where image is initialized as

image = Image.open(file_name)

However, if you attempt to use requests.post as recommended on the same page, it will complain TypeError: Object of type PngImageFile is not JSON serializable

There seems to be two mistakes here:

The documentation titles the example as "numpy array", but never retrieves the numpy array from the ImageFile object.
The documentation appears to be missing a step to serialize the numpy array to a form acceptable for JSON serialization.

Environment

Python 3.10.12

requests 2.31.0
Pillow 10.1.0

Docker Image roboflow/roboflow-inference-server-gpu version 0.9.4

Minimal Reproducible Example

Launch roboflow inference docker container with docker run --network=host --gpus=all roboflow/roboflow-inference-server-gpu -v roboflow-model-cache:/cache --name=roboflow_inference
Follow example as documented in https://inference.roboflow.com/quickstart/http_inference/#step-2-run-inference under the tab "NumPy Array"

Additional

Checking against roboflow/roboflow-api-snippets suggests that using numpy arrays over the HTTP inference API is not possible as snippets in that repository always serialize the numpy array to base64 first.

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Duplicate Inference with `client.infer_on_stream()` on Windows

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

The Issue

I'm currently experiencing unexpected behavior when running inference on a directory of images using inference_sdk. When I run the method infer_on_stream() the context manager loops through the files twice on Windows 10 resulting in double the inferences.

Below is a code snippet to recreate the issue.

from inference_sdk import InferenceHTTPClient

image_path="C:\\Users\\nick\\Desktop\\bug\\images"

client = InferenceHTTPClient(
    api_url="https://detect.roboflow.com/",
    api_key="" # redacted
)

for file_path, image, prediction in client.infer_on_stream(input_uri=image_path, model_id="coco/3"):
    print(file_path)

Image Dir Contents:

Console Output:

Environment

inference-sdk 0.9.20
Python 3.10.11
Windows 10

Minimal Reproducible Example

from inference_sdk import InferenceHTTPClient

image_path="C:\\Users\\nick\\Desktop\\bug\\images"

client = InferenceHTTPClient(
    api_url="https://detect.roboflow.com/",
    api_key="" # redacted
)

for file_path, image, prediction in client.infer_on_stream(input_uri=image_path, model_id="coco/3"):
    print(file_path)

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Cache ROOT not used

Search before asking

I have searched the Inference issues and found no similar feature requests.

Question

https://github.com/roboflow/inference/blob/main/inference/core/models/roboflow.py#L243 should we not use self.cache_dir_root. That variable is set but not used at all

Additional

No response

Inference-cli installation doesn't install needed dependencies.

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

After installing the inference cli on a Nvidia Jetson Orin Nano, I'm trying to start the server by running a inference server start. I received an ModuleNotFoundError: No module named 'aiohttp'. I can move past the error by performing a pip install aiohttp. I then attempt to inference server start when I recieve the new error ModuleNotFoundError: No module named 'backoff'. I can move past the error by running pip install backoff. I would expect aiohttp and backoff to be installed with my original inference-cli install if they are dependencies.

Environment

inference-cli==0.9.15
inference-sdk==0.9.8
OS: Jetpack 5.1.1
Device: Nvidia Jetson Orin Nano
Python 3.8.10

Minimal Reproducible Example

Create virtual environment python3 -m venv venv
pip install inferences-cli
inference server start

When running the last command I get the following error:

(venv) nickherrig@jetson:~$ inference server start
Traceback (most recent call last):
  File "/home/nickherrig/venv/bin/inference", line 5, in <module>
    from inference_cli.main import app
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/main.py", line 6, in <module>
    import inference_cli.lib
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/__init__.py", line 8, in <module>
    from inference_cli.lib.container_adapter import (
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/container_adapter.py", line 10, in <module>
    from inference_cli.lib.utils import read_env_file
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/utils.py", line 9, in <module>
    from inference_sdk import InferenceConfiguration, InferenceHTTPClient
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_sdk/__init__.py", line 1, in <module>
    from inference_sdk.http.client import InferenceHTTPClient
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_sdk/http/client.py", line 4, in <module>
    import aiohttp
ModuleNotFoundError: No module named 'aiohttp'

I can fix the error by pip install aiohttp

I then run inference server start and receive the new error:

(venv) nickherrig@jetson:~$ inference server start
Traceback (most recent call last):
  File "/home/nickherrig/venv/bin/inference", line 5, in <module>
    from inference_cli.main import app
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/main.py", line 6, in <module>
    import inference_cli.lib
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/__init__.py", line 8, in <module>
    from inference_cli.lib.container_adapter import (
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/container_adapter.py", line 10, in <module>
    from inference_cli.lib.utils import read_env_file
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_cli/lib/utils.py", line 9, in <module>
    from inference_sdk import InferenceConfiguration, InferenceHTTPClient
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_sdk/__init__.py", line 1, in <module>
    from inference_sdk.http.client import InferenceHTTPClient
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_sdk/http/client.py", line 34, in <module>
    from inference_sdk.http.utils.executors import (
  File "/home/nickherrig/venv/lib/python3.8/site-packages/inference_sdk/http/utils/executors.py", line 9, in <module>
    import backoff
ModuleNotFoundError: No module named 'backoff'

From here I am able to start the inference server using the CPU.

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Ctrl+C Doesn't Exit Benchmark

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I got a configuration error so want to exit the CLI benchmark before it finishes but Ctrl+C doesn't do anything. It just keeps going.

[email protected]:/$ inference benchmark python-package-speed -m "yolov8n-640"
Loading images...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00,  1.79it/s]
Detected images dimensions: {(612, 612), (440, 640), (427, 640), (500, 375), (334, 500), (480, 640), (375, 500)}
Inference will be executed with the following parameters: {}
2024-03-10 21:00:56.403012156 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
Model details | task_type=object-detection | model_type=yolov8n | batch_size=batch | input_height=640 | input_width=640
Warming up model...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.31it/s]
avg: 798.6ms    | rps: 1.2      | p75: 804.1ms  | p90: 804.1    | %err: 0.0
avg: 812.9ms    | rps: 0.9      | p75: 848.7ms  | p90: 894.4    | %err: 0.0
avg: 1287.8ms   | rps: 0.7      | p75: 896.0ms  | p90: 2625.1   | %err: 0.0
avg: 1390.6ms   | rps: 0.7      | p75: 1699.6ms | p90: 2337.5   | %err: 0.0
avg: 1439.3ms   | rps: 0.7      | p75: 1698.5ms | p90: 2125.1   | %err: 0.0
avg: 1460.6ms   | rps: 0.7      | p75: 1698.9ms | p90: 1912.7   | %err: 0.0
^C^C^C^Cavg: 1472.2ms   | rps: 0.7      | p75: 1700.0ms | p90: 1901.5   | %err: 0.0
^C^C^C^Cavg: 1381.2ms   | rps: 0.7      | p75: 1698.5ms | p90: 1845.4   | %err: 0.0
avg: 1321.5ms   | rps: 0.7      | p75: 1694.0ms | p90: 1790.5   | %err: 0.0
avg: 1239.1ms   | rps: 0.8      | p75: 1668.0ms | p90: 1730.3   | %err: 0.0
avg: 1186.2ms   | rps: 0.8      | p75: 1607.6ms | p90: 1700.2   | %err: 0.0
avg: 1129.2ms   | rps: 0.9      | p75: 1599.4ms | p90: 1700.2   | %err: 0.0
avg: 1084.3ms   | rps: 0.9      | p75: 1548.2ms | p90: 1699.6   | %err: 0.0
^C^C^Cavg: 1059.4ms     | rps: 0.9      | p75: 1178.0ms | p90: 1698.8   | %err: 0.0
^C^C^C^C

Environment

[email protected]:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
[email protected]:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

inference benchmark python-package-speed -m "yolov8n-640"

Then try Ctrl+C

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Support for Pose Estimation?

Thank you so much for this valuable contribution! Curious if there are plans to add support for pose estimation / key point detection in the near future?

Can't run inference on AWS Lambda

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

I am running this code on AWS Lambda

import os
from inference_sdk import InferenceHTTPClient

def handler(event, context):
    client = InferenceHTTPClient(api_url="https://detect.roboflow.com",
                                 api_key=os.environ["ROBOFLOW_API_KEY"])
    img_path = "./pizza.jpg"
    return client.infer(img_path, model_id="pizza-identifier/3")

As part of a docker container that looks like this:

FROM public.ecr.aws/lambda/python:3.11

RUN yum install -y mesa-libGL

COPY requirements.txt ${LAMBDA_TASK_ROOT}

RUN pip install -r requirements.txt

COPY pizza.jpg ${LAMBDA_TASK_ROOT}

COPY lambda_function.py ${LAMBDA_TASK_ROOT}

CMD [ "lambda_function.handler" ]

My requirements.txt contains nothing but inference==0.9.17

When the code runs I get the following error. I have been trying to fix this and tried workarounds but to no avail. I understand that the error is somehow related to multiprocessing. I found this post from which I understand that multiprocessing isn't possible on AWS Lambda, however, my script does not control or trigger any multiprocessing.

This is the full error:

{
  "errorMessage": "[Errno 38] Function not implemented",
  "errorType": "OSError",
  "requestId": "703be804-fd86-4b44-88f9-ac54c87717be",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 10, in handler\n    return client.infer(img_path, model_id=\"pizza-identifier/3\")\n",
    "  File \"/var/lang/lib/python3.11/site-packages/inference_sdk/http/client.py\", line 82, in decorate\n    return function(*args, **kwargs)\n",
    "  File \"/var/lang/lib/python3.11/site-packages/inference_sdk/http/client.py\", line 237, in infer\n    return self.infer_from_api_v0(\n",
    "  File \"/var/lang/lib/python3.11/site-packages/inference_sdk/http/client.py\", line 299, in infer_from_api_v0\n    responses = execute_requests_packages(\n",
    "  File \"/var/lang/lib/python3.11/site-packages/inference_sdk/http/utils/executors.py\", line 42, in execute_requests_packages\n    responses = make_parallel_requests(\n",
    "  File \"/var/lang/lib/python3.11/site-packages/inference_sdk/http/utils/executors.py\", line 58, in make_parallel_requests\n    with ThreadPool(processes=workers) as pool:\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/pool.py\", line 930, in __init__\n    Pool.__init__(self, processes, initializer, initargs)\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/pool.py\", line 196, in __init__\n    self._change_notifier = self._ctx.SimpleQueue()\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/context.py\", line 113, in SimpleQueue\n    return SimpleQueue(ctx=self.get_context())\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/queues.py\", line 341, in __init__\n    self._rlock = ctx.Lock()\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/context.py\", line 68, in Lock\n    return Lock(ctx=self.get_context())\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/synchronize.py\", line 169, in __init__\n    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
    "  File \"/var/lang/lib/python3.11/multiprocessing/synchronize.py\", line 57, in __init__\n    sl = self._semlock = _multiprocessing.SemLock(\n"
  ]
}

Environment

No response

Minimal Reproducible Example

No response

Additional

I am incredibly frustrated since I've been working on this for 9 hours now and would appreciate any hints!

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

`Docker Getting Started` link in docs returns 404

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

The link to Getting Started With Docker on this page: https://inference.roboflow.com/quickstart/devices/
links to https://inference.roboflow.com/docs/quickstart/docker which returns a 404 Page Not Found

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

Reading list

Installing Inference does not work as it produces an error.

Search before asking

I have searched the Inference issues and found no similar bug report.

Bug

When using pip install inference-cpu it results in an error
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? when getting the requirements to build wheel.

Environment

OS: macOS Sonoma 14.2.1
Conda version(for environment): 23.9.0
Python version: 3.12.1
Shell: zsh
Hardware: MacOS M1.

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

roboflow / inference Goto Github PK

inference's Introduction

👋 hello

💻 install

🔥 quickstart

📟 inference server

🎥 inference pipeline

🔑 keys

📚 documentation

© license

🏆 contribution

inference's People

Contributors

Stargazers

Watchers

Forkers

inference's Issues

Before

After

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Search before asking

Question

Additional

Search before asking

Question

Additional

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Search before asking

Question

Additional

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Search before asking

Question

Additional

Search before asking

Question

Additional

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Search before asking

Bug

Environment

Minimal Reproducible Example

code

Additional

Are you willing to submit a PR?

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?