Your current environment <div class="snippet-clipboard-content notranslate posit

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug]: VLLM usage on AWS Inferentia instances about vllm HOT 6 OPEN

ashutoshsaboo commented on July 23, 2024

[Bug]: VLLM usage on AWS Inferentia instances

from vllm.

Comments (6)

youkaichao commented on July 23, 2024

cc @liangfu

from vllm.

ashutoshsaboo commented on July 23, 2024

@liangfu would appreciate if you can help with the above issue!

from vllm.

mgoin commented on July 23, 2024

@aws-patlange could you please look into this?

from vllm.

aws-patlange commented on July 23, 2024

We currently don't support paged attention in the neuron integration. You need to explicitly set block-size to the max-model-len. See https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching.html.

This will likely need some edits here to be able to pass it to one of API entrypoints provided in vllm.

from vllm.

aws-patlange commented on July 23, 2024

Please try the following after editing the argument parser that is currently restricting --block-size to only some specific values:

 python -u -m vllm.entrypoints.openai.api_server \
    --port 8081 \
    --model $MODEL_NAME \
    --trust-remote-code \
    --max-num-seqs 1 \
    --device neuron \
    --max-model-len 2048 \
    --block-size 2048 \
    2>&1 | tee api_server.log &

from vllm.

minhtcai commented on July 23, 2024

@aws-patlange
Hi, I use your command but getting:
TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker
Any pointers? Thanks!

from vllm.

[Bug]: VLLM usage on AWS Inferentia instances about vllm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent