Giter Club home page Giter Club logo

chatglm3.openvino's Introduction

English | 简体中文

chatglm3.openvino Demo

Here is an example of how to deploy ChatGLM3 using OpenVINO

1. Environment configuration

We recommend that you create a new virtual environment and then install the dependencies as follows. The recommended Python version is 3.10+.

Linux

python3 -m venv openvino_env

source openvino_env/bin/activate

python3 -m pip install --upgrade pip

pip install wheel setuptools

pip install -r requirements.txt

Windows Powershell

python3 -m venv openvino_env

.\openvino_env\Scripts\activate

python3 -m pip install --upgrade pip

pip install wheel setuptools

pip install -r requirements.txt

2. Convert model

Since the Huggingface model needs to be converted to an OpenVINO IR model, you need to download the model and convert.

python3 convert.py --model_id THUDM/chatglm3-6b --precision int4 --output {your_path}/chatglm3-6b-ov

Parameters that can be selected

  • --model_id - path (absolute path) to be used from Huggngface_hub (https://huggingface.co/models) or the directory where the model is located.

  • --precision - model precision: fp16, int8 or int4.

  • --output - the path where the converted model is saved

  • If you have difficulty accessing huggingface, you can try to use mirror-hf to download

    Linux

    export HF_ENDPOINT=https://hf-mirror.com
    

    Windows Powershell

    $env:HF_ENDPOINT = "https://hf-mirror.com"
    

    Download model

    huggingface-cli download --resume-download --local-dir-use-symlinks False THUDM/chatglm3-6b --local-dir {your_path}/chatglm3-6b
    

3. Run the streaming chatbot

python3 chat.py --model_path {your_path}/chatglm3-6b-ov --max_sequence_length 4096 --device CPU

Parameters that can be selected

  • --model_path - The path to the directory where the OpenVINO IR model is located.
  • --max_sequence_length - Maximum size of output tokens.
  • --device - The device to run inference on. e.g "CPU","GPU".

example

User: Hello
AI Assistant: Hello! Is there anything I can do to help you?

User: Who are you?
ChatGLM3-6B-OpenVINO: I am an artificial intelligence assistant named ChatGLM3-6B, which was developed from a language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests.

User: Please tell me an interesting story
ChatGLM3-6B-OpenVINO: Once upon a time, there was a little boy named Xiao Ming who loved animals very much. One day, while walking in the forest, he discovered a very beautiful bird. The bird was injured and could not fly. Xiao Ming felt very distressed and decided to take care of the little bird. Xiao Ming took the bird home, built a small nest for it, and found some soft grass and food. Every day, he feeds the bird and changes its water. Gradually, the bird's injury recovered and began to fly around Xiao Ming's home. They became very good friends. However, one day, Xiao Ming's parents told him that they must send Xiao Ming's little bird to the forest. Xiao Ming was very sad because he had become good friends with Xiaoniao. However, his parents told him that the bird would be more free in the forest and that he could continue to watch the bird's life in the forest. So, Xiao Ming and his parents sent the bird to the forest. The little bird is very happy because it can fly again and has many other little animal friends. Xiao Ming also felt very happy because he knew that even if he could not always have the birds, he could still appreciate their beauty in nature. From then on, Xiao Ming often came to the forest to look for birds.

User: Please give this story a title
ChatGLM3-6B-OpenVINO: "The Power of Friendship: Xiao Ming and Bird's Forest Adventure"

Common problem

  1. Why does a huggingface link error appear when importing a local model?

    • Downgrade transformers library to version 4.37.2
  2. Do I need to install the OpenVINO C++ inference engine?

    • Unnecessary
  3. Do I have to use Intel hardware?

    • We only tried it on Intel devices, and we recommend using x86 architecture Intel devices, including but not limited to:
    • Intel CPU, including personal computer CPU and server CPU.
    • Intel's integrated GPU. For example: Arc™ Series and Iris® Series.
    • Intel's discrete graphics card. For example: ARC™ A770 graphics card.
  4. Why OpenVINO cannot find GPU device in my system?

    • Ensure OpenCL diivess are installed correctly.
    • Ensure you enabled the right permissions for GPU device
    • More information can be found in Install GPU drivers
  5. Whether support C++?

chatglm3.openvino's People

Contributors

openvino-dev-samples avatar zrzrzrzrzrzrzr avatar

Stargazers

LinQingYang avatar AlbertGao avatar Nolan avatar 潇潇雨歇 avatar monkey_coder avatar  avatar  avatar Chen Hui avatar  avatar Yuan Chai avatar  avatar Jiayi Shi avatar Silence avatar  avatar ArthurYan avatar wws avatar

Watchers

 avatar

chatglm3.openvino's Issues

rag_chain.invoke has an error: "TypeError: 'NoneType' object is not callable"

My code is here:
`import argparse
from typing import List, Tuple
from threading import Event, Thread
import torch
from optimum.intel.openvino import OVModelForCausalLM
from transformers import (AutoTokenizer, AutoConfig, pipeline,
TextIteratorStreamer, StoppingCriteriaList, StoppingCriteria)
from langchain_community.vectorstores import FAISS
from langchain.prompts.prompt import PromptTemplate
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

def create_and_load_faiss_index(read_local=None, path=None, document_list=[]):
global db
if read_local is True:
# 读本地数据
# db = FAISS.load_local(path, embeddings)
db = FAISS.load_local(path, embeddings, allow_dangerous_deserialization=True)

else:
    print("构建向量数据库中...")
    db = FAISS.from_documents(document_list, embeddings)
    db.save_local(path)
return db

class StopOnTokens(StoppingCriteria):
def init(self, token_ids):
self.token_ids = token_ids

def __call__(
        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
) -> bool:
    for stop_id in self.token_ids:
        if input_ids[0][-1] == stop_id:
            return True
    return False

if name == "main":
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('-h',
'--help',
action='help',
help='Show this help message and exit.')
parser.add_argument('-m',
'--model_path',
required=True,
type=str,
help='Required. model path')
parser.add_argument('-l',
'--max_sequence_length',
default=256,
required=False,
type=int,
help='Required. maximun length of output')
parser.add_argument('-d',
'--device',
default='CPU',
required=False,
type=str,
help='Required. device for inference')
args = parser.parse_args()
model_dir = args.model_path

ov_config = {"PERFORMANCE_HINT": "LATENCY",
             "NUM_STREAMS": "1", "CACHE_DIR": ""}

tokenizer = AutoTokenizer.from_pretrained(
    model_dir, trust_remote_code=True)

print("====Compiling model====")
ov_model = OVModelForCausalLM.from_pretrained(
    model_dir,
    device=args.device,
    ov_config=ov_config,
    config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
    trust_remote_code=True,
)
# TextIteratorStreamer ???
streamer = TextIteratorStreamer(
    tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True
)
stop_tokens = [0, 2]
print('StopOnTokens')

stop_tokens = [StopOnTokens(stop_tokens)]

embeddingsModelPath: str = 'D:/AI_projects/Langchain-Chatchat/llm_model/Embedding/bge-large-zh-v1.5'
embeddings = HuggingFaceEmbeddings(
    model_name=embeddingsModelPath,  # Provide the pre-trained model's path
    model_kwargs={"trust_remote_code": True, 'device': 'cpu'},  # Pass the model configuration options
    encode_kwargs={'normalize_embeddings': False}  # Pass the encoding options
)
db = create_and_load_faiss_index(read_local=True, path="pkl_chatglm3", document_list=[])



history = []
print("====Starting conversation====")

input_text = '发什么快递'


print("ChatGLM3-6B-OpenVINO:", end=" ")
# history = history + [[parse_text(input_text), ""]]
# model_inputs = convert_history_to_token(history)

docs_and_scores = db.similarity_search_with_score(input_text)
print('docs_and_scores')
print(input_text)
print(docs_and_scores)

retriever = db.as_retriever(search_kwargs={"k": 3})

# StoppingCriteriaList ???
generate_kwargs = dict(
    # input_ids=model_inputs,
    model=ov_model,
    max_new_tokens=args.max_sequence_length,
    temperature=0.1,
    do_sample=True,
    top_p=1.0,
    top_k=50,
    repetition_penalty=1.1,
    streamer=streamer,
    stopping_criteria=StoppingCriteriaList(stop_tokens)
)

pipe = pipeline("text-generation", **generate_kwargs)
llm = HuggingFacePipeline(pipeline=pipe)

# prompt = PromptTemplate.from_template(llm_model_configuration["rag_prompt_template"])
prompt_template = """
            尽可能详细地回答问题,从提供的上下文中提供所有细节。如果答案不在提供的上下文中,请说“答案在上下文中不可用”,不要提供错误的答案。\n\n
            上下文:\n {context}?\n
            问题: \n{question}\n

            答案:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain_type_kwargs = {"prompt": prompt}
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
    # chain_type_kwargs=chain_type_kwargs,
)


print('question')
print(input_text)
rag_chain.invoke(input_text)
# stream_complete.set()

`
But I am encountered with the error:

> Entering new RetrievalQA chain... Traceback (most recent call last): File "D:\AI_projects\chatglm3.openvino\chat_from_doc_new.py", line 209, in <module> rag_chain.invoke(input_text) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 144, in _call answer = self.combine_documents_chain.run( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 574, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 378, in __call__ return self.invoke( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\combine_documents\base.py", line 137, in _call output, extra_return_dict = self.combine_docs( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 244, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 293, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 378, in __call__ return self.invoke( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 103, in _call response = self.generate([inputs], run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 115, in generate return self.llm.generate_prompt( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 597, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 767, in generate output = self._generate_helper( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 634, in _generate_helper raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 621, in _generate_helper self._generate( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_community\llms\huggingface_pipeline.py", line 267, in _generate responses = self.pipeline( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\text_generation.py", line 240, in __call__ return super().__call__(text_inputs, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\base.py", line 1187, in __call__ outputs = list(final_iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__ item = next(self.iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__ item = next(self.iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 19, in __getitem__ processed = self.process(item, **self.params) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\text_generation.py", line 264, in preprocess inputs = self.tokenizer( TypeError: 'NoneType' object is not callable

Can you help with this? Thanks!

convert error

error happen when I convert bin to openvino, somebody can explain?
Snipaste_2024-04-24_17-32-52

DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

(chatglm3) C:\Intel\chatglm3.openvino>python chat.py --model_path c:/models/chatglm3-6b-ov-int4 --max_sequence_length 4096 --device CPU
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
Traceback (most recent call last):
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1472, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Program Files\Python39\lib\importlib_init
.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 972, in _find_and_load_unlocked
File "", line 228, in _call_with_frames_removed
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in find_and_load_unlocked
File "", line 680, in load_unlocked
File "", line 850, in exec_module
File "", line 228, in call_with_frames_removed
File "C:\env\chatglm3\lib\site-packages\transformers\data_init
.py", line 26, in
from .metrics import glue_compute_metrics, xnli_compute_metrics
File "C:\env\chatglm3\lib\site-packages\transformers\data\metrics_init
.py", line 19, in
from scipy.stats import pearsonr, spearmanr
File "C:\env\chatglm3\lib\site-packages\scipy\stats_init.py", line 606, in
from ._stats_py import *
File "C:\env\chatglm3\lib\site-packages\scipy\stats_stats_py.py", line 49, in
from . import distributions
File "C:\env\chatglm3\lib\site-packages\scipy\stats\distributions.py", line 10, in
from . import _continuous_distns
File "C:\env\chatglm3\lib\site-packages\scipy\stats_continuous_distns.py", line 33, in
import scipy.stats._boost as boost
File "C:\env\chatglm3\lib\site-packages\scipy\stats_boost_init
.py", line 37, in
from scipy.stats._boost.nct_ufunc import (
ImportError: DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Intel\chatglm3.openvino\chat.py", line 4, in
from optimum.intel.openvino import OVModelForCausalLM
File "C:\env\chatglm3\lib\site-packages\optimum\intel\openvino_init_.py", line 39, in
from .quantization import OVQuantizer
File "C:\env\chatglm3\lib\site-packages\optimum\intel\openvino\quantization.py", line 36, in
from transformers import AutoTokenizer, DataCollator, PreTrainedModel, default_data_collator
File "", line 1055, in _handle_fromlist
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1462, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1474, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.data.data_collator because of the following error (look up to see its traceback):
DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

Covert error

I tried to do model conversion in windows and encountered different errors in both attempts. The errors are reported as below, how can I fix these problems?

First:
(openvino_env) C:\Users\dell\chatglm3.openvino>python convert.py --model_id F:/LLM/chatglm3-6b-modelscope/chatglm3-6b --precision int4 --output F:/chatglm3-6b-ov
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
====Exporting IR=====
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [06:13<00:00, 53.39s/it]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.2+cpu
WARNING:root:Cannot apply model.to_bettertransformer because of the exception:
The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention
Overriding 1 configuration item(s)
- use_cache -> True
C:\Users\dell\openvino_env\lib\site-packages\transformers\modeling_utils.py:4225: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
C:\Users\dell\openvino_env\lib\site-packages\optimum\exporters\openvino\model_patcher.py:198: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_length:
WARNING:nncf:Weight compression expects a single reduction axis, but 2 given. Weight shape: (8192, 32, 2), reduction axes: (1, 2), node name: __module.transformer/aten::index/Gather. The node won't be quantized.
Searching for Mixed-Precision Configuration ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 112/112 • 0:03:55 • 0:00:00INFO:nncf:Statistics of the bitwidth distribution:
+--------------+---------------------------+-----------------------------------+
| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |
| | | (layers) |
+==============+===========================+===================================+
| 8 | 28% (31 / 114) | 21% (29 / 112) |
+--------------+---------------------------+-----------------------------------+
| 4 | 72% (83 / 114) | 79% (83 / 112) |
+--------------+---------------------------+-----------------------------------+
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 114/114 • 0:04:48 • 0:00:00Exception ignored in: <finalize object at 0x285f445c720; dead>
Traceback (most recent call last):
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\weakref.py", line 591, in call
return info.func(*info.args, **(info.kwargs or {}))
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 859, in _cleanup
cls._rmtree(name, ignore_errors=ignore_errors)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 620, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 846, in onerror
cls._rmtree(path, ignore_errors=ignore_errors)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 601, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 598, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\dell\AppData\Local\Temp\tmpl2bqxzug\openvino_model.bin'
Configuration saved in F:\chatglm3-6b-ov\openvino_config.json
====Exporting tokenizer=====
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting eos_token is not supported, use the default one.
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting pad_token is not supported, use the default one.
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting unk_token is not supported, use the default one.

Second:
(openvino_env) C:\Users\dell\chatglm3.openvino>python convert.py --model_id F:\LLM\chatglm3-6b-modelscope\chatglm3-6b --output F:\chatglm3-6b-OV
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
====Exporting IR=====
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:38<00:00, 56.91s/it]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.2+cpu
WARNING:root:Cannot apply model.to_bettertransformer because of the exception:
The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention
Overriding 1 configuration item(s)
- use_cache -> True
C:\Users\dell\openvino_env\lib\site-packages\transformers\modeling_utils.py:4225: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
C:\Users\dell\openvino_env\lib\site-packages\optimum\exporters\openvino\model_patcher.py:198: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_length:
Traceback (most recent call last):
File "C:\Users\dell\chatglm3.openvino\convert.py", line 53, in
ov_model = OVModelForCausalLM.from_pretrained(args.model_id, export=True,
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\modeling_base.py", line 401, in from_pretrained
return from_pretrained_method(
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 268, in _from_transformers
return cls._from_pretrained(
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 571, in _from_pretrained
model = cls.load_model(model_cache_path, quantization_config=None if load_in_4bit else quantization_config)
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_base.py", line 126, in load_model
model = core.read_model(file_name) if not file_name.suffix == ".onnx" else convert_model(file_name)
File "C:\Users\dell\openvino_env\lib\site-packages\openvino\runtime\ie_api.py", line 479, in read_model
return Model(super().read_model(model))
RuntimeError: Exception from src\inference\src\cpp\core.cpp:92:
Check 'false' failed at src\frontends\common\src\frontend.cpp:54:
Converting input model
stoll argument out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.