Comments (5)
Maybe that would be better to use this volume in read only. So I would just need to make them available in the bucket before starting the process?
Could you please guide me in identifying the procedure to provision the s3 bucket?
Thanks :)
from text-generation-inference.
Ok I managed to do what I want:
- cloning the model
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
- sync the model in s3
aws s3 sync Mistral-7B-Instruct-v0.2 s3://<bucket_name>/Mistral-7B-Instruct-v0.2
- use it under the pod
text-generation-launcher --model-id=/data/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
2024-04-26T12:57:37.746280Z INFO text_generation_launcher: Args { model_id: "/data/Mistral-7B-Instruct-v0.2", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "text-generation-inference-58d9869995-gxzx2", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-26T12:57:37.746720Z INFO download: text_generation_launcher: Starting download process.
2024-04-26T12:57:48.114689Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-04-26T12:57:50.144159Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-26T12:57:50.144763Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-26T12:58:00.242683Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-04-26T12:58:02.873865Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
rank=0
2024-04-26T12:58:02.873894Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0
2024-04-26T12:58:02.944252Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-26T12:58:02.944282Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
I have another error that might not be related. I'm gonna solve that before closing this issue
from text-generation-inference.
Ok my first issue was caused by insufficient memory allocation.
Now I got this error
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
from text-generation-inference.
Well I managed to download the model using the recommanded way with huggingface-cli
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2
aws s3 sync /home/smana/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2 s3://<bucket>/models--mistralai--Mistral-7B-Instruct-v0.2
When the pod starts I still have permissions errors :/
text-generation-launcher --model-id=mistralai/Mistral-7B-Instruct-v0.2 --quantize bitsandbytes-nf4
...
2024-04-26T15:37:48.725974Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
...
PermissionError: [Errno 1] Operation not permitted: '/data/models--mistralai--Mistral-7B-Instruct-v0.2/tmp_7e2fd113-2af9-4a1a-bf0e-22d328d4bc8b'
from text-generation-inference.
It is working much better with an EFS storage, but I let this issue open in case someone is able to find out a solution for the S3 mountpoint.
from text-generation-inference.
Related Issues (20)
- Encounter install error when install vllm package.
- Mistral7b takes 4 times its size in VRAM on A100 HOT 5
- Regarding llama3-70b-instruct
- Use pre-built FA2, vllm, quantization kernels in the dockerfiles
- "docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN={your_token} ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard" showing error with my token id that "Unable to find image 'ghcr.io/huggingface/text-generation-inference:latest' locally latest: Pulling from huggingface/text-generation-inference docker: no matching manifest for linux/arm64/v8 in the manifest list entries. See 'docker run --help'."
- Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'} HOT 1
- llama3-70B-Instruct-AWQ causing CUDA error: an illegal memory access was encountered
- how do I adjust the logging level when launching via the docker container?
- [Question] Onnx support in TGI
- Automatic NUMA binding
- How to share memory among 2 GPUS for distributed inference? HOT 10
- text generation details not working when stream=False HOT 1
- concurrent requests permit limit is broken
- Multi-Model Endpoint support in Sagemaker
- Logging has no formating when using docker enviroment instead of command
- SnapKV support
- Question about KV cache HOT 3
- Min P generation parameter HOT 2
- Router /v1/chat/completions not compatible with openai spec HOT 1
- TGI 2.0.2 CodeLlama error `piece id is out of range.`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.