Giter Club home page Giter Club logo

Comments (4)

nitronomic avatar nitronomic commented on June 5, 2024 1

Hi

I get the same error loading phi-3-128k on the latest docker:

Status: Downloaded newer image for ghcr.io/huggingface/text-generation-inference:latest
2024-04-30T11:03:01.808284Z  INFO text_generation_launcher: Args {
    model_id: "/home/nitro/models//microsoft_Phi-3-mini-128k-instruct",
    revision: None,
    validation_workers: 15,
    sharded: None,
    num_shard: Some(
        2,
    ),
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        57344,
    ),
    max_total_tokens: Some(
        65536,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        57344,
    ),
    max_batch_total_tokens: Some(
        65536,
    ),
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: Some(
        [
            1,
            2,
            4,
            8,
            16,
            32,
        ],
    ),
    hostname: "0.0.0.0",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.99,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
}
2024-04-30T11:03:01.808396Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/home/nitro/models//microsoft_Phi-3-mini-128k-instruct` do not contain malicious code.
2024-04-30T11:03:01.808403Z  INFO text_generation_launcher: Sharding model on 2 processes
2024-04-30T11:03:01.808519Z  INFO download: text_generation_launcher: Starting download process.
2024-04-30T11:03:05.695523Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-04-30T11:03:06.315199Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-30T11:03:06.315540Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-30T11:03:06.315622Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-04-30T11:03:12.513292Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 217, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 333, in get_model
    return FlashLlama(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
    model = FlashLlamaForCausalLM(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 385, in __init__
    self.model = FlashLlamaModel(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 309, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 310, in <listcomp>
    FlashLlamaLayer(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 249, in __init__
    self.self_attn = FlashLlamaAttention(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 107, in __init__
    self.rotary_emb = PositionRotaryEmbedding.static(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 1032, in static
    scaling_factor = rope_scaling["factor"]
KeyError: 'factor'

Thank you for all the work on TGI

from text-generation-inference.

amihalik avatar amihalik commented on June 5, 2024

I'm able to get a bit farther if I run with a newer TGI build, eg:

docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g  \
    ghcr.io/huggingface/text-generation-inference:sha-986b404 \
    --model-id microsoft/Phi-3-mini-128k-instruct/ \
    --trust-remote-code \
    --num-shard $(nvidia-smi -L | wc -l) 

But TGI errors out because factor isn't set. I've tried various combinations of rope-factor and rope-scaling, (eg --rope-factor=32 --rope-scaling=dynamic), but the model generates garbage.

Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.

from text-generation-inference.

RonanKMcGovern avatar RonanKMcGovern commented on June 5, 2024

I'm able to get a bit farther if I run with a newer TGI build, eg:

docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g  \
    ghcr.io/huggingface/text-generation-inference:sha-986b404 \
    --model-id microsoft/Phi-3-mini-128k-instruct/ \
    --trust-remote-code \
    --num-shard $(nvidia-smi -L | wc -l) 

But TGI errors out because factor isn't set. I've tried various combinations of rope-factor and rope-scaling, (eg --rope-factor=32 --rope-scaling=dynamic), but the model generates garbage.

Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.

Same issue.

from text-generation-inference.

ChristophRaab avatar ChristophRaab commented on June 5, 2024

I am able to run the model with the following command on 2.0.2:

text-generation-launcher --model-id=microsoft/Phi-3-mini-128k-instruct --port=80  --trust-remote-code --rope-factor=32  --rope-scaling=dynamic

However, i receive the warning:

2024-05-02T10:09:32.001826Z  WARN text_generation_router: router/src/main.rs:266: Could not parse config Error("unknown variant `phi3`, expected one of `llava_next`, `clip_vision_model`, `mistral`, `idefics`, `idefics2`, `ssm`, `gpt_bigcode`, `santacoder`, `bloom`, `mpt`, `gpt_neox`, `phi`, `phi-msft`, `llama`, `baichuan`, `gemma`, `cohere`, `drbx`, `falcon`, `mixtral`, `starcoder2`, `qwen2`, `opt`, `t5`", line: 19, column: 22) 

@Narsil because you added support for phi3, it the above warning may is interesting to you.

from text-generation-inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.