Comments (5)
text-generation-launcher --model-id /root/autodl-tmp/bloom --num-shard 1
This is the startup command I entered, model-id specifies my local model path
from text-generation-inference.
model-id
is the the name of the model on the HuggingFace hub. The weights will be downloaded automatically to your HUGGINGFACE_HUB_CACHE
. If you want to specify were the weights are stored on your disk, you can specify a --weights-cache-override
.
In your case: text-generation-launcher --model-id bigscience/bloom --weights-cache-override /root/autodl-tmp/bloom
from text-generation-inference.
I downloaded the complete model from huggingface. Why would I download it again after entering this command
(my-env) root@autodl-container-b369119e00-448c7f66:~/autodl-tmp/text-generation-inference# text-generation-launcher --model-id bigscience/bloom --weights-cache-override /root/autodl-tmp/bloom
2023-02-21T06:28:55.309769Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom", revision: None, num_shard: 1, quantize: false, max_concurrent_requests: 128, max_input_length: 1000, max_batch_size: 32, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: Some("/root/autodl-tmp/bloom"), disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [] }
2023-02-21T06:28:55.309876Z INFO text_generation_launcher: weights_cache_override is set to Some("/root/autodl-tmp/bloom").
2023-02-21T06:28:55.309890Z INFO text_generation_launcher: Skipping download.
2023-02-21T06:28:55.310058Z INFO text_generation_launcher: Starting shard 0
2023-02-21T06:29:05.320211Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-02-21T06:29:12.805928Z ERROR shard-manager: text_generation_launcher: "Error when initializing model
Traceback (most recent call last):
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 386, in _make_request
self._validate_conn(conn)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 1042, in _validate_conn
conn.connect()
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connection.py\", line 414, in connect
self.sock = ssl_wrap_socket(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/ssl_.py\", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/ssl_.py\", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File \"/root/miniconda3/envs/my-env/lib/python3.9/ssl.py\", line 500, in wrap_socket
return self.sslsocket_class._create(
File \"/root/miniconda3/envs/my-env/lib/python3.9/ssl.py\", line 1040, in _create
self.do_handshake()
File \"/root/miniconda3/envs/my-env/lib/python3.9/ssl.py\", line 1309, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1105: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/adapters.py\", line 489, in send
resp = conn.urlopen(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 787, in urlopen
retries = retries.increment(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/retry.py\", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/packages/six.py\", line 770, in reraise
raise value
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 703, in urlopen
httplib_response = self._make_request(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 389, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py\", line 340, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out. (read timeout=10.0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/root/miniconda3/envs/my-env/bin/text-generation-server\", line 8, in <module>
sys.exit(app())
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__
return get_command(self)(*args, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/click/core.py\", line 1130, in __call__
return self.main(*args, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/typer/core.py\", line 778, in main
return _main(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/typer/core.py\", line 216, in _main
rv = self.invoke(ctx)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/click/core.py\", line 760, in invoke
return __callback(*args, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper
return callback(**use_params) # type: ignore
File \"/root/autodl-tmp/text-generation-inference/server/text_generation/cli.py\", line 55, in serve
server.serve(model_id, revision, sharded, quantize, uds_path)
File \"/root/autodl-tmp/text-generation-inference/server/text_generation/server.py\", line 130, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize))
File \"/root/miniconda3/envs/my-env/lib/python3.9/asyncio/runners.py\", line 44, in run
return loop.run_until_complete(main)
File \"/root/miniconda3/envs/my-env/lib/python3.9/asyncio/base_events.py\", line 629, in run_until_complete
self.run_forever()
File \"/root/miniconda3/envs/my-env/lib/python3.9/asyncio/base_events.py\", line 596, in run_forever
self._run_once()
File \"/root/miniconda3/envs/my-env/lib/python3.9/asyncio/base_events.py\", line 1890, in _run_once
handle._run()
File \"/root/miniconda3/envs/my-env/lib/python3.9/asyncio/events.py\", line 80, in _run
self._context.run(self._callback, *self._args)
> File \"/root/autodl-tmp/text-generation-inference/server/text_generation/server.py\", line 99, in serve_inner
model = get_model(model_id, revision, sharded, quantize)
File \"/root/autodl-tmp/text-generation-inference/server/text_generation/models/__init__.py\", line 59, in get_model
return BLOOM(model_id, revision, quantize=quantize)
File \"/root/autodl-tmp/text-generation-inference/server/text_generation/models/causal_lm.py\", line 253, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/models/auto/auto_factory.py\", line 464, in from_pretrained
return model_class.from_pretrained(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/modeling_utils.py\", line 2333, in from_pretrained
resolved_archive_file, sharded_metadata = get_checkpoint_shard_files(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/utils/hub.py\", line 920, in get_checkpoint_shard_files
cached_filename = cached_file(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/utils/hub.py\", line 410, in cached_file
resolved_file = hf_hub_download(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn
return fn(*args, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 1283, in hf_hub_download
http_get(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 503, in http_get
r = _request_wrapper(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 440, in _request_wrapper
return http_backoff(
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/utils/_http.py\", line 129, in http_backoff
response = requests.request(method=method, url=url, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/api.py\", line 59, in request
return session.request(method=method, url=url, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/sessions.py\", line 587, in request
resp = self.send(prep, **send_kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/sessions.py\", line 701, in send
r = adapter.send(request, **kwargs)
File \"/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/adapters.py\", line 578, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out. (read timeout=10.0)
" rank=0
2023-02-21T06:29:13.524601Z ERROR text_generation_launcher: Shard 0 failed to start:
We're not using custom kernels.
Traceback (most recent call last):
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/root/miniconda3/envs/my-env/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/root/miniconda3/envs/my-env/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/root/miniconda3/envs/my-env/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1105: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 389, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 340, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out. (read timeout=10.0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/my-env/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/root/autodl-tmp/text-generation-inference/server/text_generation/cli.py", line 55, in serve
server.serve(model_id, revision, sharded, quantize, uds_path)
File "/root/autodl-tmp/text-generation-inference/server/text_generation/server.py", line 130, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize))
File "/root/miniconda3/envs/my-env/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/root/miniconda3/envs/my-env/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/root/autodl-tmp/text-generation-inference/server/text_generation/server.py", line 99, in serve_inner
model = get_model(model_id, revision, sharded, quantize)
File "/root/autodl-tmp/text-generation-inference/server/text_generation/models/__init__.py", line 59, in get_model
return BLOOM(model_id, revision, quantize=quantize)
File "/root/autodl-tmp/text-generation-inference/server/text_generation/models/causal_lm.py", line 253, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/models/auto/auto_factory.py", line 464, in from_pretrained
return model_class.from_pretrained(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/modeling_utils.py", line 2333, in from_pretrained
resolved_archive_file, sharded_metadata = get_checkpoint_shard_files(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/utils/hub.py", line 920, in get_checkpoint_shard_files
cached_filename = cached_file(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/transformers-4.27.0.dev0-py3.9.egg/transformers/utils/hub.py", line 410, in cached_file
resolved_file = hf_hub_download(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
return fn(*args, **kwargs)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1283, in hf_hub_download
http_get(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 503, in http_get
r = _request_wrapper(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 440, in _request_wrapper
return http_backoff(
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 129, in http_backoff
response = requests.request(method=method, url=url, **kwargs)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/root/miniconda3/envs/my-env/lib/python3.9/site-packages/requests/adapters.py", line 578, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out. (read timeout=10.0)
2023-02-21T06:29:13.524841Z INFO text_generation_launcher: Shutting down shards
from text-generation-inference.
I downloaded bigscience/bloom on huggingface, except for_ The model.bin file was placed in the/root/autodl-tmp/bloom folder, and then I executed the command text-generation-launcher -- model-id bigscience/bloom -- weights-cache-override/root/autodl-tmp/bloom
. The console kept printing "2023-02-21T07:04: 55.545088Z INFO text_generation_launcher: Waiting for shard 0 to be ready...". During the printing process, I checked the/root/autodl-tmp/bloom folder, I found a new folder "/models -- bigscience -- bloom/" in it. This should be the cache folder of the huggingface. I'm not sure. My understanding should be that I can use the corresponding model by specifying the model path that I have downloaded locally instead of the model cache path through the configuration of a parameter. But now I don't know what parameter to use. I have downloaded this bloom model file 3 to 4 times, and because of the network problem, the download is very slow. If you can help me, I will thank you very much @OlivierDehaene
from text-generation-inference.
I moved the locally downloaded file to the directory specified by "HF_HOME", and then executed the command "text generation launcher -- model id bigscience/bloom -- weights cache override/root/autodl tmp/bloom". Now it can run correctly! @OlivierDehaene thank you
from text-generation-inference.
Related Issues (20)
- support llava1.5v
- Not able to install locally HOT 8
- Help me to add NLLB
- The settings of top_k, typical_p, do_sample in the request do not affect the generation?
- Webserver Crashed when serving CommandR-plus HOT 1
- The transformation between repetition_penalty and presence_penalty seems to be incorrect
- CohereForAI/c4ai-command-r-plus-4bit deployment fails on Inference Endpoint HOT 1
- Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt HOT 2
- Request failed during generation: Server error: 'FlashMixtral' object has no attribute 'compiled_model' HOT 3
- Unable to start TGI with llama3-70b HOT 1
- The EETQ quantization model cannot be performed locally
- Take into account num_return_sequences to get multiple outputs
- Add support for Phi-3 Model HOT 4
- Inference error for Mistral7b v-0.2 while deploying in Azure VM
- Frequency penalty corrupting generations HOT 1
- Shared volume using mountpoint-s3, permissions issues HOT 5
- Planned/Potential of significant work
- Suport for InternVL-Chat-V1-5 HOT 1
- Support for ReFT
- Python client: Extra slash in base_uri leads to failures in chat endpoint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.