Comments (4)
Hi
I get the same error loading phi-3-128k on the latest docker:
Status: Downloaded newer image for ghcr.io/huggingface/text-generation-inference:latest
2024-04-30T11:03:01.808284Z INFO text_generation_launcher: Args {
model_id: "/home/nitro/models//microsoft_Phi-3-mini-128k-instruct",
revision: None,
validation_workers: 15,
sharded: None,
num_shard: Some(
2,
),
quantize: None,
speculate: None,
dtype: None,
trust_remote_code: true,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: Some(
57344,
),
max_total_tokens: Some(
65536,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: Some(
57344,
),
max_batch_total_tokens: Some(
65536,
),
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: Some(
[
1,
2,
4,
8,
16,
32,
],
),
hostname: "0.0.0.0",
port: 80,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some(
"/data",
),
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 0.99,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
cors_allow_origin: [],
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
}
2024-04-30T11:03:01.808396Z WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/home/nitro/models//microsoft_Phi-3-mini-128k-instruct` do not contain malicious code.
2024-04-30T11:03:01.808403Z INFO text_generation_launcher: Sharding model on 2 processes
2024-04-30T11:03:01.808519Z INFO download: text_generation_launcher: Starting download process.
2024-04-30T11:03:05.695523Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-30T11:03:06.315199Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-30T11:03:06.315540Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-30T11:03:06.315622Z INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-04-30T11:03:12.513292Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 217, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 333, in get_model
return FlashLlama(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
model = FlashLlamaForCausalLM(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 385, in __init__
self.model = FlashLlamaModel(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 309, in __init__
[
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 310, in <listcomp>
FlashLlamaLayer(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 249, in __init__
self.self_attn = FlashLlamaAttention(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 107, in __init__
self.rotary_emb = PositionRotaryEmbedding.static(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 1032, in static
scaling_factor = rope_scaling["factor"]
KeyError: 'factor'
Thank you for all the work on TGI
from text-generation-inference.
I'm able to get a bit farther if I run with a newer TGI build, eg:
docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g \
ghcr.io/huggingface/text-generation-inference:sha-986b404 \
--model-id microsoft/Phi-3-mini-128k-instruct/ \
--trust-remote-code \
--num-shard $(nvidia-smi -L | wc -l)
But TGI errors out because factor
isn't set. I've tried various combinations of rope-factor
and rope-scaling
, (eg --rope-factor=32 --rope-scaling=dynamic
), but the model generates garbage.
Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.
from text-generation-inference.
I'm able to get a bit farther if I run with a newer TGI build, eg:
docker run -it --rm --name tgi -p 8080:80 --gpus all --shm-size 1g \ ghcr.io/huggingface/text-generation-inference:sha-986b404 \ --model-id microsoft/Phi-3-mini-128k-instruct/ \ --trust-remote-code \ --num-shard $(nvidia-smi -L | wc -l)
But TGI errors out because
factor
isn't set. I've tried various combinations ofrope-factor
andrope-scaling
, (eg--rope-factor=32 --rope-scaling=dynamic
), but the model generates garbage.Has anyone gotten farther with phi-3-128k? phi-3-4k works fine using the command above.
Same issue.
from text-generation-inference.
I am able to run the model with the following command on 2.0.2:
text-generation-launcher --model-id=microsoft/Phi-3-mini-128k-instruct --port=80 --trust-remote-code --rope-factor=32 --rope-scaling=dynamic
However, i receive the warning:
2024-05-02T10:09:32.001826Z WARN text_generation_router: router/src/main.rs:266: Could not parse config Error("unknown variant `phi3`, expected one of `llava_next`, `clip_vision_model`, `mistral`, `idefics`, `idefics2`, `ssm`, `gpt_bigcode`, `santacoder`, `bloom`, `mpt`, `gpt_neox`, `phi`, `phi-msft`, `llama`, `baichuan`, `gemma`, `cohere`, `drbx`, `falcon`, `mixtral`, `starcoder2`, `qwen2`, `opt`, `t5`", line: 19, column: 22)
@Narsil because you added support for phi3, it the above warning may is interesting to you.
from text-generation-inference.
Related Issues (20)
- Multi-Model Endpoint support in Sagemaker
- Logging has no formating when using docker enviroment instead of command
- SnapKV support
- Question about KV cache HOT 3
- Min P generation parameter HOT 2
- Router /v1/chat/completions not compatible with openai spec HOT 1
- TGI 2.0.2 CodeLlama error `piece id is out of range.`
- LoRA Adapter from local model are leading to error HOT 4
- HF web service streaming response differs from OpenAI, breaking clients
- StarCoder2 AWQ does not work correctly
- Document Request HOT 2
- metric: tgi_request_total increments by 2 upon every request
- error: unexpected argument ‘–max-input-tokens’ found HOT 1
- Clarification and supplement to the online docs example
- Docs missing for LLaVA NeXT Model
- Phi-3 not starting on TGI 2.0.3 in kubernetes cluster HOT 2
- Wrong validations on `Parameters` in TGI python library
- LlavaNext Model cannot be started
- version in docker not correct
- Pydantic validation error re: ChoiceDelta (text_generation/types.py)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.