Hello, I've just done a fresh manual install of h2ogpt on linux. My OS is Rocky Linux

pytorch_model.bin 1.34G download hangs forever on Linux,about h2oai/h2ogpt

Comments (7)

han-sogawa commented on June 18, 2024

https://huggingface.co/api/models/hkunlp/instructor-large is the file it cannot download, although I can access it on the browser.

from h2ogpt.

pseudotensor commented on June 18, 2024

Are you using that as the base model? What is your actual generate.py line?

from h2ogpt.

han-sogawa commented on June 18, 2024

No, it looks like it is another dependency, which attempts to download regardless of which base model I am using

One example of a generate.py line that I have tried:
python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048

from h2ogpt.

han-sogawa commented on June 18, 2024

I think this may be where it is entering to try to download the file:

h2ogpt/src/gpt_langchain.py

Line 530 in e0f5ab9

embedding = HuggingFaceInstructEmbeddings(model_name=hf_embedding_model,

from h2ogpt.

pseudotensor commented on June 18, 2024

What if you try a different embedding model, e.g. add to generate.py line:

--hf_embedding_model=sentence-transformers/all-MiniLM-L12-v2

Also, you can try disabling hf_transfer by setting this env:

export HF_HUB_ENABLE_HF_TRANSFER=0

from h2ogpt.

pseudotensor commented on June 18, 2024

FYI this is what it looks like when running your command you gave:

(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048
Using Model meta-llama/llama-2-7b-chat-hf
load INSTRUCTOR_Transformer
max_seq_length  512
Starting get_model: meta-llama/llama-2-7b-chat-hf 
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 1.45MB/s]
Overriding max_seq_len -> 4096
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 3.93MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 8.89MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 5.95MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 876kB/s]
Overriding max_seq_len -> 4096
Overriding max_seq_len -> 4096
device_map: {'': 0}
pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 85.6MB/s]
pytorch_model-00001-of-00002.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 9.98G/9.98G [01:29<00:00, 112MB/s]
pytorch_model-00002-of-00002.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 3.50G/3.50G [00:31<00:00, 110MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:01<00:00, 60.77s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.30s/it]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 530kB/s]
Model {'base_model': 'meta-llama/llama-2-7b-chat-hf', 'base_model0': 'meta-llama/llama-2-7b-chat-hf', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n", 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.", 'can_handle_system_prompt': True}, 'display_name': 'meta-llama/llama-2-7b-chat-hf', 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'use_flash_attention_2': False, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': None, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': '', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': ''}, 'rope_scaling': {}, 'max_seq_len': 4096, 'max_output_seq_len': None, 'exllama_dict': {}, 'gptq_dict': {}, 'attention_sinks': False, 'sink_dict': {}, 'truncation_generation': False, 'hf_model_dict': {}, 'force_seq2seq_type': False, 'force_t5_type': False, 'trust_remote_code': True}
Begin auto-detect HF cache text generation models
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
No loading model philschmid/bart-large-cnn-samsum because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-instruct/68deee8b69383b30826ea2fc642ba170b89e4edd/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
  warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-instruct/68deee8b69383b30826ea2fc642ba170b89e4edd/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
  warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
WARNING:transformers_modules.tiiuae.falcon-40b-instruct.ecb78d97ac356d098e79f0db222c9ce7c5d9ee5f.configuration_falcon:
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.

No loading model openai/whisper-large-v3 because is_encoder_decoder=True
No loading model openai/whisper-base.en because is_encoder_decoder=True
No loading model h2oai/ggml because h2oai/ggml does not appear to have a file named config.json. Checkout 'https://huggingface.co/h2oai/ggml/main' for available files.
No loading model Systran/faster-whisper-large-v3 because is_encoder_decoder=True
No loading model openai/whisper-medium because is_encoder_decoder=True
No loading model philschmid/flan-t5-base-samsum because is_encoder_decoder=True
No loading model stabilityai/stable-diffusion-xl-refiner-1.0 because stabilityai/stable-diffusion-xl-refiner-1.0 does not appear to have a file named config.json. Checkout 'https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/main' for available files.
No loading model distil-whisper/distil-large-v2 because is_encoder_decoder=True
No loading model tloen/alpaca-lora-7b because tloen/alpaca-lora-7b does not appear to have a file named config.json. Checkout 'https://huggingface.co/tloen/alpaca-lora-7b/main' for available files.
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/039e37745f00858f0e01e988383a8c4393b1a4f5/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
  warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/039e37745f00858f0e01e988383a8c4393b1a4f5/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
  warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
No loading model distil-whisper/distil-large-v3 because is_encoder_decoder=True
No loading model microsoft/speecht5_hifigan because The checkpoint you are trying to load has model type `hifigan` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
No loading model unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x because unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x does not appear to have a file named config.json. Checkout 'https://huggingface.co/unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x/main' for available files.
No loading model stabilityai/stable-diffusion-xl-base-1.0 because stabilityai/stable-diffusion-xl-base-1.0 does not appear to have a file named config.json. Checkout 'https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/main' for available files.
No loading model Salesforce/blip2-flan-t5-xl because is_encoder_decoder=True
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:103: FutureWarning: The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument has no effect
  warnings.warn(
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:143: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
  warnings.warn(
No loading model google/pix2struct-textcaps-base because is_encoder_decoder=True
No loading model Salesforce/blip2-flan-t5-xxl because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/28fc475f7b73a5631fbbc6419645c27177f275d4/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
  warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/28fc475f7b73a5631fbbc6419645c27177f275d4/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
  warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
No loading model microsoft/speecht5_vc because is_encoder_decoder=True
No loading model microsoft/speecht5_tts because is_encoder_decoder=True
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: localhost port: 7860
Use local URL: http://localhost:7860/
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_info" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_names" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
INFO:__name__:OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY
INFO:__name__:OpenAI API key: EMPTY

All fine here.

If I remove the instructor-large model and try again:

(h2ogpt) jon@pseudotensor:~/h2ogpt$ rm -rf ~/.cache/torch/sentence_transformers/hkunlp_instructor-large/
(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048
Using Model meta-llama/llama-2-7b-chat-hf
.gitattributes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 3.83MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 792kB/s]
2_Dense/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.52MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 31.9MB/s]
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 1.16MB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 3.43MB/s]
config_sentence_transformers.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 1.59MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1.34G/1.34G [00:13<00:00, 100MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 116kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 6.28MB/s]
spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 13.6MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 12.8MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 7.18MB/s]
modules.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 5.86MB/s]
load INSTRUCTOR_Transformer
... same as before

It downloads fine. So I guess you have some network complication.

from h2ogpt.

han-sogawa commented on June 18, 2024

The workaround of adding --hf_embedding_model=sentence-transformers/all-MiniLM-L12-v2 worked for me, thank you! Still don't know why the instructor-large embedding file wouldn't download. I'll update if I find out more, but for now, my issue is resolved. Thank you very much!

from h2ogpt.

pytorch_model.bin 1.34G download hangs forever on Linux about h2ogpt HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent