qihoo360 / 360zhinao Goto Github PK

View Code? Open in Web Editor NEW

263.0 263.0 22.0 8.24 MB

360zhinao

License: Apache License 2.0

Python 97.95% Shell 2.05%

360zhinao's People

Contributors

Stargazers

Watchers

Forkers

aoniboy brianwang1990 nianqitongs garycaokai hjf2005 slq5007 xuehaonokia suibing620 no1fff toyslife rion927 yuanych acproject wenlaizhou aricluo ethancck catkinser

360zhinao's Issues

{ "msg": "name 'message' is not defined", "status_code": 500 }

4090服务器上，搭建完环境后，运行python openai_api.py ，服务启动正常，但是发送请求时，结果都是显示: { "msg": "name 'message' is not defined", "status_code": 500 }
这个是内部代码哪里报错了吗？

postpostman请求示例

项目无法运行推理360Zhinao-7B-Chat-360K-Int4和360Zhinao-7B-Chat-32K-Int4两个量化版

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /home/ai-models/360_model/360Zhinao-7B-Chat-360K-Int4. 运行模型推理报错，提示缺文件，但是魔搭模型文件夹内并没有这几个文件，运行非量化360Zhinao-7B-Chat-360K正常

Failed building wheel for flash-attn

按照readme操作，遇到如下问题，也尝试自己解决了一下，还是无法编译 flash-attn

系统：win11
python: 3.10.0

cmake ,ninja, nvcc 如上图。

BFloat16 is not supported on MPS

多模态大模型访问方式

hello，看介绍360智脑具备图文理解的能力，但是好像没找到可测试的体验入口？请问多模态的模型有开源或者技术blog之类的吗，谢谢。

运行微调时报错“torch.cuda.OutOfMemoryError: CUDA out of memory. ”

运行微调时出现如下报错信息：
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.48 GiB. GPU has a total capacity of 23.68 GiB of which 1.51 GiB is free. Including non-PyTorch memory, this process has 22.16 GiB memory in use. Of the allocated memory 21.79 GiB is allocated by PyTorch, and 18.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

我的是双GPU GTX3090

Token indices sequence length is longer than the specified maximum sequence length for this model (8473 > 4096). Running this sequence through the model will result in indexing errors

请问为什么输入8K的长度内容就会报错说，最大只支持4k？我下载的是360K的chat模型。

项目运行360Zhinao-7B-Chat-32K 'NoneType' object is not callable

运行环境：
python = 3.11.7
pytorch = 2.2.2
transformers = 4.38.2
CUDA = 12.1

参考github官网：https://github.com/Qihoo360/360zhinao

按顺序安装
pip install -r requirements.txt
pip install flash_attn-2.5.6+cu118torch2.2cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
从ModelScope社区下载360Zhinao-7B-Chat-32K
from modelscope import snapshot_download
model_dir_360Zhinao_7B_Chat_32K = snapshot_download("qihoo360/360Zhinao-7B-Chat-32K", revision = "master")
替换模型地址为
MODEL_NAME_OR_PATH = "/home/zhifeng.zhao/.cache/modelscope/hub/qihoo360/360Zhinao-7B-Chat-32K"
运行streamlit run web_demo.py，报错'NoneType' object is not callable，详细报错信息见下方。

(360zhinao) xxx@xxx:~/360zhinao$ streamlit run web_demo.py

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

2024-04-17 17:58:25.122 Did not auto detect external IP.
Please go to https://docs.streamlit.io/ for debugging hints.

You can now view your Streamlit app in your browser.

Network URL: http://192.168.50.126:8501

Please install FlashAttention first, e.g., with pip install flash-attn
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 2.71it/s]
ic| self.eos_token_id: 158326
self.pad_token_id: 158323
self.im_start_id: 158332
self.im_end_id: 158333
generation_config: GenerationConfig {
"do_sample": true,
"eos_token_id": [
158326,
158332,
158333
],
"max_new_tokens": 512,
"pad_token_id": 158326,
"top_p": 0.8
}

Exception in thread Thread-7 (generate):
Traceback (most recent call last):
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/anaconda3/envs/360zhinao/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 345, in flash_attention
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = self._upad_input(
^^^^^^^^^^^^^^^^^
File "/home/zhifeng.zhao/.cache/huggingface/modules/transformers_modules/360Zhinao-7B-Chat-32K/modeling_zhinao.py", line 442, in _upad_input
key_layer = index_first_axis(key_layer.reshape(batch_size * kv_seq_len, num_heads, head_dim), indices_k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

关于模型大小及需要gpu内存大小的问题

以下模型大小是多少，各需要多大内存的gpu才能跑起来，最好能在环境要求或模型下载里面说明一下：
360Zhinao-7B-Base
360Zhinao-7B-Chat-4K
360Zhinao-7B-Chat-32K
360Zhinao-7B-Chat-360K

是否会支持llama.cpp模型转换至gguf？

现在ollama中使用，是否未来会支持llama.cpp的gguf转换？

问问题的时候出现错误，报错

模型：360Zhinao-7B-Chat-4K
环境：windows wsl2
显卡：nvidia 4090 ，24g
环境变量安装：已全部安装

报错如下：
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/usr/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 918, in generate
response = super().generate(
^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 816, in forward
outputs = self.model(
^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 711, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 513, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/flyfo/Desktop/360zhinao/myenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 416, in forward
attn_output = self.flash_attention(query_states, key_states, value_states, attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 345, in flash_attention
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = self._upad_input(
^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/qihoo360/360Zhinao-7B-Chat-4K/7ac2410120e0bd9a91baa92c0f3f973590dac490/modeling_zhinao.py", line 442, in _upad_input
key_layer = index_first_axis(key_layer.reshape(batch_size * kv_seq_len, num_heads, head_dim), indices_k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

运行微调代码时提示#error "PYTHON < 3.6 IS UNSUPPORTED. pybind11 v2.9 was the last to support Python 2 and 3.5.错误

ubuntu2204
nvidia driver 550.67
cuda 12.4

Not compatible with vllm 4.0

File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/zhinao.py", line 31, in
from vllm.model_executor.input_metadata import InputMetadata
ModuleNotFoundError: No module named 'vllm.model_executor.input_metadata'

maybe should change from vllm.model_executor.input_metadata import InputMetadata to

from vllm.attention import Attention, AttentionMetadata

是否提供磁力

希望提供一个磁力下载