openvino-dev-samples / chatglm3.openvino Goto Github PK

View Code? Open in Web Editor NEW

16.0 1.0 5.0 85 KB

This sample shows how to deploy ChatGLM3 using OpenVINO

Python 100.00%

chatglm3.openvino's Introduction

English | 简体中文

chatglm3.openvino Demo

Here is an example of how to deploy ChatGLM3 using OpenVINO

1. Environment configuration

We recommend that you create a new virtual environment and then install the dependencies as follows. The recommended Python version is 3.10+.

Linux

python3 -m venv openvino_env

source openvino_env/bin/activate

python3 -m pip install --upgrade pip

pip install wheel setuptools

pip install -r requirements.txt

Windows Powershell

python3 -m venv openvino_env

.\openvino_env\Scripts\activate

python3 -m pip install --upgrade pip

pip install wheel setuptools

pip install -r requirements.txt

2. Convert model

Since the Huggingface model needs to be converted to an OpenVINO IR model, you need to download the model and convert.

python3 convert.py --model_id THUDM/chatglm3-6b --precision int4 --output {your_path}/chatglm3-6b-ov

Parameters that can be selected

--model_id - path (absolute path) to be used from Huggngface_hub (https://huggingface.co/models) or the directory where the model is located.
--precision - model precision: fp16, int8 or int4.
--output - the path where the converted model is saved

If you have difficulty accessing huggingface, you can try to use mirror-hf to download

Linux

export HF_ENDPOINT=https://hf-mirror.com

Windows Powershell

$env:HF_ENDPOINT = "https://hf-mirror.com"

Download model

huggingface-cli download --resume-download --local-dir-use-symlinks False THUDM/chatglm3-6b --local-dir {your_path}/chatglm3-6b

3. Run the streaming chatbot

python3 chat.py --model_path {your_path}/chatglm3-6b-ov --max_sequence_length 4096 --device CPU

Parameters that can be selected

--model_path - The path to the directory where the OpenVINO IR model is located.
--max_sequence_length - Maximum size of output tokens.
--device - The device to run inference on. e.g "CPU","GPU".

example

User: Hello
AI Assistant: Hello! Is there anything I can do to help you?

User: Who are you?
ChatGLM3-6B-OpenVINO: I am an artificial intelligence assistant named ChatGLM3-6B, which was developed from a language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests.

User: Please tell me an interesting story
ChatGLM3-6B-OpenVINO: Once upon a time, there was a little boy named Xiao Ming who loved animals very much. One day, while walking in the forest, he discovered a very beautiful bird. The bird was injured and could not fly. Xiao Ming felt very distressed and decided to take care of the little bird. Xiao Ming took the bird home, built a small nest for it, and found some soft grass and food. Every day, he feeds the bird and changes its water. Gradually, the bird's injury recovered and began to fly around Xiao Ming's home. They became very good friends. However, one day, Xiao Ming's parents told him that they must send Xiao Ming's little bird to the forest. Xiao Ming was very sad because he had become good friends with Xiaoniao. However, his parents told him that the bird would be more free in the forest and that he could continue to watch the bird's life in the forest. So, Xiao Ming and his parents sent the bird to the forest. The little bird is very happy because it can fly again and has many other little animal friends. Xiao Ming also felt very happy because he knew that even if he could not always have the birds, he could still appreciate their beauty in nature. From then on, Xiao Ming often came to the forest to look for birds.

User: Please give this story a title
ChatGLM3-6B-OpenVINO: "The Power of Friendship: Xiao Ming and Bird's Forest Adventure"

Common problem

Why does a huggingface link error appear when importing a local model?
- Downgrade transformers library to version 4.37.2
Do I need to install the OpenVINO C++ inference engine?
- Unnecessary
Do I have to use Intel hardware?
- We only tried it on Intel devices, and we recommend using x86 architecture Intel devices, including but not limited to:
- Intel CPU, including personal computer CPU and server CPU.
- Intel's integrated GPU. For example: Arc™ Series and Iris® Series.
- Intel's discrete graphics card. For example: ARC™ A770 graphics card.
Why OpenVINO cannot find GPU device in my system?
- Ensure OpenCL diivess are installed correctly.
- Ensure you enabled the right permissions for GPU device
- More information can be found in Install GPU drivers
Whether support C++?
- Please refer to this example

chatglm3.openvino's People

Contributors

Stargazers

Watchers

Forkers

zrzrzrzrzrzrzr wayne-ma liujinjiang5 dreamistw somewaha

chatglm3.openvino's Issues

rag_chain.invoke has an error: "TypeError: 'NoneType' object is not callable"

My code is here:
`import argparse
from typing import List, Tuple
from threading import Event, Thread
import torch
from optimum.intel.openvino import OVModelForCausalLM
from transformers import (AutoTokenizer, AutoConfig, pipeline,
TextIteratorStreamer, StoppingCriteriaList, StoppingCriteria)
from langchain_community.vectorstores import FAISS
from langchain.prompts.prompt import PromptTemplate
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

def create_and_load_faiss_index(read_local=None, path=None, document_list=[]):
global db
if read_local is True:
# 读本地数据
# db = FAISS.load_local(path, embeddings)
db = FAISS.load_local(path, embeddings, allow_dangerous_deserialization=True)

else:
    print("构建向量数据库中...")
    db = FAISS.from_documents(document_list, embeddings)
    db.save_local(path)
return db

class StopOnTokens(StoppingCriteria):
def init(self, token_ids):
self.token_ids = token_ids

def __call__(
        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
) -> bool:
    for stop_id in self.token_ids:
        if input_ids[0][-1] == stop_id:
            return True
    return False

if name == "main":
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('-h',
'--help',
action='help',
help='Show this help message and exit.')
parser.add_argument('-m',
'--model_path',
required=True,
type=str,
help='Required. model path')
parser.add_argument('-l',
'--max_sequence_length',
default=256,
required=False,
type=int,
help='Required. maximun length of output')
parser.add_argument('-d',
'--device',
default='CPU',
required=False,
type=str,
help='Required. device for inference')
args = parser.parse_args()
model_dir = args.model_path

ov_config = {"PERFORMANCE_HINT": "LATENCY",
             "NUM_STREAMS": "1", "CACHE_DIR": ""}

tokenizer = AutoTokenizer.from_pretrained(
    model_dir, trust_remote_code=True)

print("====Compiling model====")
ov_model = OVModelForCausalLM.from_pretrained(
    model_dir,
    device=args.device,
    ov_config=ov_config,
    config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
    trust_remote_code=True,
)
# TextIteratorStreamer ???
streamer = TextIteratorStreamer(
    tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True
)
stop_tokens = [0, 2]
print('StopOnTokens')

stop_tokens = [StopOnTokens(stop_tokens)]

embeddingsModelPath: str = 'D:/AI_projects/Langchain-Chatchat/llm_model/Embedding/bge-large-zh-v1.5'
embeddings = HuggingFaceEmbeddings(
    model_name=embeddingsModelPath,  # Provide the pre-trained model's path
    model_kwargs={"trust_remote_code": True, 'device': 'cpu'},  # Pass the model configuration options
    encode_kwargs={'normalize_embeddings': False}  # Pass the encoding options
)
db = create_and_load_faiss_index(read_local=True, path="pkl_chatglm3", document_list=[])



history = []
print("====Starting conversation====")

input_text = '发什么快递'


print("ChatGLM3-6B-OpenVINO:", end=" ")
# history = history + [[parse_text(input_text), ""]]
# model_inputs = convert_history_to_token(history)

docs_and_scores = db.similarity_search_with_score(input_text)
print('docs_and_scores')
print(input_text)
print(docs_and_scores)

retriever = db.as_retriever(search_kwargs={"k": 3})

# StoppingCriteriaList ???
generate_kwargs = dict(
    # input_ids=model_inputs,
    model=ov_model,
    max_new_tokens=args.max_sequence_length,
    temperature=0.1,
    do_sample=True,
    top_p=1.0,
    top_k=50,
    repetition_penalty=1.1,
    streamer=streamer,
    stopping_criteria=StoppingCriteriaList(stop_tokens)
)

pipe = pipeline("text-generation", **generate_kwargs)
llm = HuggingFacePipeline(pipeline=pipe)

# prompt = PromptTemplate.from_template(llm_model_configuration["rag_prompt_template"])
prompt_template = """
            尽可能详细地回答问题，从提供的上下文中提供所有细节。如果答案不在提供的上下文中，请说“答案在上下文中不可用”，不要提供错误的答案。\n\n
            上下文:\n {context}?\n
            问题: \n{question}\n

            答案:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain_type_kwargs = {"prompt": prompt}
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
    # chain_type_kwargs=chain_type_kwargs,
)


print('question')
print(input_text)
rag_chain.invoke(input_text)
# stream_complete.set()

`
But I am encountered with the error：

> Entering new RetrievalQA chain... Traceback (most recent call last): File "D:\AI_projects\chatglm3.openvino\chat_from_doc_new.py", line 209, in <module> rag_chain.invoke(input_text) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 144, in _call answer = self.combine_documents_chain.run( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 574, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 378, in __call__ return self.invoke( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\combine_documents\base.py", line 137, in _call output, extra_return_dict = self.combine_docs( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 244, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 293, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 378, in __call__ return self.invoke( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 163, in invoke raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 103, in _call response = self.generate([inputs], run_manager=run_manager) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain\chains\llm.py", line 115, in generate return self.llm.generate_prompt( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 597, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 767, in generate output = self._generate_helper( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 634, in _generate_helper raise e File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_core\language_models\llms.py", line 621, in _generate_helper self._generate( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\langchain_community\llms\huggingface_pipeline.py", line 267, in _generate responses = self.pipeline( File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\text_generation.py", line 240, in __call__ return super().__call__(text_inputs, **kwargs) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\base.py", line 1187, in __call__ outputs = list(final_iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__ item = next(self.iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__ item = next(self.iterator) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\dataloader.py", line 631, in __next__ data = self._next_data() File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\pt_utils.py", line 19, in __getitem__ processed = self.process(item, **self.params) File "C:\ProgramData\anaconda3\envs\llm_310_onv\lib\site-packages\transformers\pipelines\text_generation.py", line 264, in preprocess inputs = self.tokenizer( TypeError: 'NoneType' object is not callable

Can you help with this? Thanks!

could I deploy it on intel NPU?

convert error

error happen when I convert bin to openvino, somebody can explain?

How to deploy Chatgml3 with openvino to openai compatible RESTful API?

such as /v1/chat/completions?

AttributeError: can't set attribute 'eos_token'

DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

(chatglm3) C:\Intel\chatglm3.openvino>python chat.py --model_path c:/models/chatglm3-6b-ov-int4 --max_sequence_length 4096 --device CPU
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
Traceback (most recent call last):
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1472, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Program Files\Python39\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 972, in _find_and_load_unlocked
File "", line 228, in _call_with_frames_removed
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in find_and_load_unlocked
File "", line 680, in load_unlocked
File "", line 850, in exec_module
File "", line 228, in call_with_frames_removed
File "C:\env\chatglm3\lib\site-packages\transformers\data_init.py", line 26, in
from .metrics import glue_compute_metrics, xnli_compute_metrics
File "C:\env\chatglm3\lib\site-packages\transformers\data\metrics_init.py", line 19, in
from scipy.stats import pearsonr, spearmanr
File "C:\env\chatglm3\lib\site-packages\scipy\stats_init.py", line 606, in
from ._stats_py import *
File "C:\env\chatglm3\lib\site-packages\scipy\stats_stats_py.py", line 49, in
from . import distributions
File "C:\env\chatglm3\lib\site-packages\scipy\stats\distributions.py", line 10, in
from . import _continuous_distns
File "C:\env\chatglm3\lib\site-packages\scipy\stats_continuous_distns.py", line 33, in
import scipy.stats._boost as boost
File "C:\env\chatglm3\lib\site-packages\scipy\stats_boost_init.py", line 37, in
from scipy.stats._boost.nct_ufunc import (
ImportError: DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Intel\chatglm3.openvino\chat.py", line 4, in
from optimum.intel.openvino import OVModelForCausalLM
File "C:\env\chatglm3\lib\site-packages\optimum\intel\openvino_init_.py", line 39, in
from .quantization import OVQuantizer
File "C:\env\chatglm3\lib\site-packages\optimum\intel\openvino\quantization.py", line 36, in
from transformers import AutoTokenizer, DataCollator, PreTrainedModel, default_data_collator
File "", line 1055, in _handle_fromlist
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1462, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\env\chatglm3\lib\site-packages\transformers\utils\import_utils.py", line 1474, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.data.data_collator because of the following error (look up to see its traceback):
DLL load failed while importing nct_ufunc: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

为什么需要重写_from_pretrained和_reshape

刚开始接触openvino的llm应用，请问为什么需要重写_from_pretrained和_reshape

Covert error

I tried to do model conversion in windows and encountered different errors in both attempts. The errors are reported as below, how can I fix these problems?

First：
(openvino_env) C:\Users\dell\chatglm3.openvino>python convert.py --model_id F:/LLM/chatglm3-6b-modelscope/chatglm3-6b --precision int4 --output F:/chatglm3-6b-ov
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
====Exporting IR=====
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [06:13<00:00, 53.39s/it]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.2+cpu
WARNING:root:Cannot apply model.to_bettertransformer because of the exception:
The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention
Overriding 1 configuration item(s)
- use_cache -> True
C:\Users\dell\openvino_env\lib\site-packages\transformers\modeling_utils.py:4225: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
C:\Users\dell\openvino_env\lib\site-packages\optimum\exporters\openvino\model_patcher.py:198: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_length:
WARNING:nncf:Weight compression expects a single reduction axis, but 2 given. Weight shape: (8192, 32, 2), reduction axes: (1, 2), node name: __module.transformer/aten::index/Gather. The node won't be quantized.
Searching for Mixed-Precision Configuration ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 112/112 • 0:03:55 • 0:00:00INFO:nncf:Statistics of the bitwidth distribution:
+--------------+---------------------------+-----------------------------------+
| Num bits (N) | % all parameters (layers) | % ratio-defining parameters |
| | | (layers) |
+==============+===========================+===================================+
| 8 | 28% (31 / 114) | 21% (29 / 112) |
+--------------+---------------------------+-----------------------------------+
| 4 | 72% (83 / 114) | 79% (83 / 112) |
+--------------+---------------------------+-----------------------------------+
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 114/114 • 0:04:48 • 0:00:00Exception ignored in: <finalize object at 0x285f445c720; dead>
Traceback (most recent call last):
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\weakref.py", line 591, in call
return info.func(*info.args, **(info.kwargs or {}))
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 859, in _cleanup
cls._rmtree(name, ignore_errors=ignore_errors)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 620, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 846, in onerror
cls._rmtree(path, ignore_errors=ignore_errors)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 601, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "C:\Users\dell\AppData\Local\Programs\Python\Python310\lib\shutil.py", line 598, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\dell\AppData\Local\Temp\tmpl2bqxzug\openvino_model.bin'
Configuration saved in F:\chatglm3-6b-ov\openvino_config.json
====Exporting tokenizer=====
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting eos_token is not supported, use the default one.
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting pad_token is not supported, use the default one.
WARNING:transformers_modules.chatglm3-6b.tokenization_chatglm:Setting unk_token is not supported, use the default one.

Second：
(openvino_env) C:\Users\dell\chatglm3.openvino>python convert.py --model_id F:\LLM\chatglm3-6b-modelscope\chatglm3-6b --output F:\chatglm3-6b-OV
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino
====Exporting IR=====
Framework not specified. Using pt to export the model.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:38<00:00, 56.91s/it]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Using framework PyTorch: 2.2.2+cpu
WARNING:root:Cannot apply model.to_bettertransformer because of the exception:
The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention
Overriding 1 configuration item(s)
- use_cache -> True
C:\Users\dell\openvino_env\lib\site-packages\transformers\modeling_utils.py:4225: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead
warnings.warn(
C:\Users\dell\openvino_env\lib\site-packages\optimum\exporters\openvino\model_patcher.py:198: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_length:
Traceback (most recent call last):
File "C:\Users\dell\chatglm3.openvino\convert.py", line 53, in
ov_model = OVModelForCausalLM.from_pretrained(args.model_id, export=True,
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\modeling_base.py", line 401, in from_pretrained
return from_pretrained_method(
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 268, in _from_transformers
return cls._from_pretrained(
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 571, in _from_pretrained
model = cls.load_model(model_cache_path, quantization_config=None if load_in_4bit else quantization_config)
File "C:\Users\dell\openvino_env\lib\site-packages\optimum\intel\openvino\modeling_base.py", line 126, in load_model
model = core.read_model(file_name) if not file_name.suffix == ".onnx" else convert_model(file_name)
File "C:\Users\dell\openvino_env\lib\site-packages\openvino\runtime\ie_api.py", line 479, in read_model
return Model(super().read_model(model))
RuntimeError: Exception from src\inference\src\cpp\core.cpp:92:
Check 'false' failed at src\frontends\common\src\frontend.cpp:54:
Converting input model
stoll argument out of range

MoE model surport

when could openvino convert MoE model, like Qwen1.5-MoE-2.7B ?

Convet error

使用同样的方式转换glm4-9b-chat，但是报错

使用同样的方式转换glm4-9b-chat，但是报错，报错信息如下：
AttributeError: 'ChatGLMModel' object has no attribute 'pre_seq_len'

[Feature Request] Add performance metrices after each output

It would be great if this can support the performance metrices for each input token we provide.
First token latency
other tokens latency
First infer latency
other infers latency
token/sec

Something which we have here - https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python

Can't download ChatGLM model files

We can't use this link download the model at all and have to download the files one by one from "Files and versions" in https://hf-mirror.com/THUDM/chatglm3-6b/tree/main repo.
huggingface-cli download --resume-download --local-dir-use-symlinks False THUDM/chatglm3-6b --local-dir {your_path}/chatglm3-6b

openvino-dev-samples / chatglm3.openvino Goto Github PK

chatglm3.openvino's Introduction

chatglm3.openvino Demo

1. Environment configuration

2. Convert model

Parameters that can be selected

3. Run the streaming chatbot

Parameters that can be selected

example

Common problem

chatglm3.openvino's People

Contributors

Stargazers

Watchers

Forkers

chatglm3.openvino's Issues

Recommend Projects

Recommend Topics

Recommend Org