Model deion Hi all, currently, the microso

I am able to run the model with the following command on 2.0.2: <div class="snippe

Add support for Phi-3 Model about text-generation-inference HOT 4 CLOSED

ChristophRaab commented on June 5, 2024

Add support for Phi-3 Model

from text-generation-inference.

Comments (4)

nitronomic commented on June 5, 2024 1

I get the same error loading phi-3-128k on the latest docker:

Status: Downloaded newer image for ghcr.io/huggingface/text-generation-inference:latest
2024-04-30T11:03:01.808284Z  INFO text_generation_launcher: Args {
    model_id: "/home/nitro/models//microsoft_Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 15,
    sharded: None,
    num_shard: Some(
        2,
    ),
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        57344,
    ),
    max_total_tokens: Some(
        65536,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        57344,
    ),
    max_batch_total_tokens: Some(
        65536,
    ),
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: Some(
        [
            1,
            2,
            4,
            8,
            16,
            32,
        ],
    ),
    hostname: "0.0.0.0",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.99,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
}
2024-04-30T11:03:01.808396Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/home/nitro/models//microsoft_Phi-3-mini-128k-instruct` do not contain malicious code.
2024-04-30T11:03:01.808403Z  INFO text_generation_launcher: Sharding model on 2 processes
2024-04-30T11:03:01.808519Z  INFO download: text_generation_launcher: Starting download process.
2024-04-30T11:03:05.695523Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-04-30T11:03:06.315199Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-30T11:03:06.315540Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-30T11:03:06.315622Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-04-30T11:03:12.513292Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 217, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 333, in get_model
    return FlashLlama(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
    model = FlashLlamaForCausalLM(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 385, in __init__
    self.model = FlashLlamaModel(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 309, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 310, in <listcomp>
    FlashLlamaLayer(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 249, in __init__
    self.self_attn = FlashLlamaAttention(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 107, in __init__
    self.rotary_emb = PositionRotaryEmbedding.static(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 1032, in static
    scaling_factor = rope_scaling["factor"]
KeyError: 'factor'

Thank you for all the work on TGI

from text-generation-inference.

amihalik commented on June 5, 2024

I'm able to get a bit farther if I run with a newer TGI build, eg:

docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g  \
    ghcr.io/huggingface/text-generation-inference:sha-986b404 \
    --model-id microsoft/Phi-3-mini-128k-instruct/ \
    --trust-remote-code \
    --num-shard $(nvidia-smi -L | wc -l)

But TGI errors out because factor isn't set. I've tried various combinations of rope-factor and rope-scaling, (eg --rope-factor=32 --rope-scaling=dynamic), but the model generates garbage.

Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.

from text-generation-inference.

RonanKMcGovern commented on June 5, 2024

I'm able to get a bit farther if I run with a newer TGI build, eg:
docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g  \
    ghcr.io/huggingface/text-generation-inference:sha-986b404 \
    --model-id microsoft/Phi-3-mini-128k-instruct/ \
    --trust-remote-code \
    --num-shard $(nvidia-smi -L | wc -l) 
But TGI errors out because factor isn't set. I've tried various combinations of rope-factor and rope-scaling, (eg --rope-factor=32 --rope-scaling=dynamic), but the model generates garbage.

Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.

Same issue.

from text-generation-inference.

ChristophRaab commented on June 5, 2024

I am able to run the model with the following command on 2.0.2:

text-generation-launcher --model-id=microsoft/Phi-3-mini-128k-instruct --port=80  --trust-remote-code --rope-factor=32  --rope-scaling=dynamic

However, i receive the warning:

2024-05-02T10:09:32.001826Z  WARN text_generation_router: router/src/main.rs:266: Could not parse config Error("unknown variant `phi3`, expected one of `llava_next`, `clip_vision_model`, `mistral`, `idefics`, `idefics2`, `ssm`, `gpt_bigcode`, `santacoder`, `bloom`, `mpt`, `gpt_neox`, `phi`, `phi-msft`, `llama`, `baichuan`, `gemma`, `cohere`, `drbx`, `falcon`, `mixtral`, `starcoder2`, `qwen2`, `opt`, `t5`", line: 19, column: 22)

@Narsil because you added support for phi3, it the above warning may is interesting to you.

from text-generation-inference.

Add support for Phi-3 Model about text-generation-inference HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent