Comments (7)
https://huggingface.co/api/models/hkunlp/instructor-large is the file it cannot download, although I can access it on the browser.
from h2ogpt.
Are you using that as the base model? What is your actual generate.py line?
from h2ogpt.
No, it looks like it is another dependency, which attempts to download regardless of which base model I am using
One example of a generate.py line that I have tried:
python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048
from h2ogpt.
I think this may be where it is entering to try to download the file:
Line 530 in e0f5ab9
from h2ogpt.
What if you try a different embedding model, e.g. add to generate.py line:
--hf_embedding_model=sentence-transformers/all-MiniLM-L12-v2
Also, you can try disabling hf_transfer by setting this env:
export HF_HUB_ENABLE_HF_TRANSFER=0
from h2ogpt.
FYI this is what it looks like when running your command you gave:
(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048
Using Model meta-llama/llama-2-7b-chat-hf
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: meta-llama/llama-2-7b-chat-hf
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 1.45MB/s]
Overriding max_seq_len -> 4096
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 3.93MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 8.89MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 5.95MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 876kB/s]
Overriding max_seq_len -> 4096
Overriding max_seq_len -> 4096
device_map: {'': 0}
pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 85.6MB/s]
pytorch_model-00001-of-00002.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 9.98G/9.98G [01:29<00:00, 112MB/s]
pytorch_model-00002-of-00002.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 3.50G/3.50G [00:31<00:00, 110MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:01<00:00, 60.77s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.30s/it]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 530kB/s]
Model {'base_model': 'meta-llama/llama-2-7b-chat-hf', 'base_model0': 'meta-llama/llama-2-7b-chat-hf', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n", 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.", 'can_handle_system_prompt': True}, 'display_name': 'meta-llama/llama-2-7b-chat-hf', 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'use_flash_attention_2': False, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': None, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': '', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': ''}, 'rope_scaling': {}, 'max_seq_len': 4096, 'max_output_seq_len': None, 'exllama_dict': {}, 'gptq_dict': {}, 'attention_sinks': False, 'sink_dict': {}, 'truncation_generation': False, 'hf_model_dict': {}, 'force_seq2seq_type': False, 'force_t5_type': False, 'trust_remote_code': True}
Begin auto-detect HF cache text generation models
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
No loading model philschmid/bart-large-cnn-samsum because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-instruct/68deee8b69383b30826ea2fc642ba170b89e4edd/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-instruct/68deee8b69383b30826ea2fc642ba170b89e4edd/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
WARNING:transformers_modules.tiiuae.falcon-40b-instruct.ecb78d97ac356d098e79f0db222c9ce7c5d9ee5f.configuration_falcon:
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
No loading model openai/whisper-large-v3 because is_encoder_decoder=True
No loading model openai/whisper-base.en because is_encoder_decoder=True
No loading model h2oai/ggml because h2oai/ggml does not appear to have a file named config.json. Checkout 'https://huggingface.co/h2oai/ggml/main' for available files.
No loading model Systran/faster-whisper-large-v3 because is_encoder_decoder=True
No loading model openai/whisper-medium because is_encoder_decoder=True
No loading model philschmid/flan-t5-base-samsum because is_encoder_decoder=True
No loading model stabilityai/stable-diffusion-xl-refiner-1.0 because stabilityai/stable-diffusion-xl-refiner-1.0 does not appear to have a file named config.json. Checkout 'https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/main' for available files.
No loading model distil-whisper/distil-large-v2 because is_encoder_decoder=True
No loading model tloen/alpaca-lora-7b because tloen/alpaca-lora-7b does not appear to have a file named config.json. Checkout 'https://huggingface.co/tloen/alpaca-lora-7b/main' for available files.
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/039e37745f00858f0e01e988383a8c4393b1a4f5/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/039e37745f00858f0e01e988383a8c4393b1a4f5/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
No loading model distil-whisper/distil-large-v3 because is_encoder_decoder=True
No loading model microsoft/speecht5_hifigan because The checkpoint you are trying to load has model type `hifigan` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
No loading model unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x because unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x does not appear to have a file named config.json. Checkout 'https://huggingface.co/unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x/main' for available files.
No loading model stabilityai/stable-diffusion-xl-base-1.0 because stabilityai/stable-diffusion-xl-base-1.0 does not appear to have a file named config.json. Checkout 'https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/main' for available files.
No loading model Salesforce/blip2-flan-t5-xl because is_encoder_decoder=True
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:103: FutureWarning: The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument has no effect
warnings.warn(
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/llava/configuration_llava.py:143: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.42, Please use `text_config.vocab_size` instead.
warnings.warn(
No loading model google/pix2struct-textcaps-base because is_encoder_decoder=True
No loading model Salesforce/blip2-flan-t5-xxl because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/28fc475f7b73a5631fbbc6419645c27177f275d4/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi or rope is turned on, setting `learned_pos_emb` to `False.`')
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/28fc475f7b73a5631fbbc6419645c27177f275d4/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
No loading model microsoft/speecht5_vc because is_encoder_decoder=True
No loading model microsoft/speecht5_tts because is_encoder_decoder=True
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: localhost port: 7860
Use local URL: http://localhost:7860/
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_info" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:160: UserWarning: Field "model_names" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
INFO:__name__:OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY
INFO:__name__:OpenAI API key: EMPTY
All fine here.
If I remove the instructor-large model and try again:
(h2ogpt) jon@pseudotensor:~/h2ogpt$ rm -rf ~/.cache/torch/sentence_transformers/hkunlp_instructor-large/
(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=meta-llama/llama-2-7b-chat-hf --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048
Using Model meta-llama/llama-2-7b-chat-hf
.gitattributes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 3.83MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 792kB/s]
2_Dense/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.52MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 31.9MB/s]
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 1.16MB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 3.43MB/s]
config_sentence_transformers.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 1.59MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1.34G/1.34G [00:13<00:00, 100MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 116kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 6.28MB/s]
spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 13.6MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 12.8MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 7.18MB/s]
modules.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 5.86MB/s]
load INSTRUCTOR_Transformer
... same as before
It downloads fine. So I guess you have some network complication.
from h2ogpt.
The workaround of adding --hf_embedding_model=sentence-transformers/all-MiniLM-L12-v2
worked for me, thank you! Still don't know why the instructor-large embedding file wouldn't download. I'll update if I find out more, but for now, my issue is resolved. Thank you very much!
from h2ogpt.
Related Issues (20)
- Timestamps issue in Youtube Chat
- Document Content Presentation Difference Between Built-In UI and Custom UI using Gradio client HOT 1
- Change AutoGPT Agent Embeddings Model HOT 5
- Question:extracting preference data of clients' response HOT 3
- Does h2o have assistant API HOT 1
- Consider switching to Coqui TTS from new repo
- Can I use existing llama.cpp server as inference server? HOT 1
- Is there a plan to incorporate a Knowledge Graph RAG Query Engine? HOT 1
- Option to place relavent documents chunks in system prompt instead of user prompt
- AutoGPT issue running on Local LLM HOT 2
- Running H2ogpt with Ollama inference Server HOT 1
- Unable to Programmatically Receive Sources with Prompts & Responses HOT 1
- h2o Windows installer "Web Search" and "Q/A" HOT 1
- Source Link opened in the same tab
- KeyError: images_num_max HOT 3
- tab visibility flag like --visible_system_tab=False not working HOT 7
- Chunk should open on the same page from it has been taken
- can't add personal data db/collection to auth.json HOT 3
- sidebar display control HOT 2
- h2ogpt tries to download model from hugging face when using local inference server
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.