yvonwin / qwen2.cpp Goto Github PK

qwen2 and llama3 cpp implementation

License: Other

CMake 1.86% C++ 63.91% Python 33.94% Shell 0.29%

large-language-models nlp qwen1-5 moe qwen2 qwen

qwen2.cpp's Issues

gpu版本为啥无法使用

gpu版本会报错：
ggml_aligned_malloc: insufficient memory (attempted to allocate 876546.56 MB)
GGML_ASSERT: /tmp/pip-req-build-5898jsey/third_party/ggml/src/ggml.c:2327: ctx->mem_buffer != NULL
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: 不允许的操作.
No stack.
The program is not being run.
已放弃 (核心已转储)
请问是哪里导致的memory过载？

error: use of undeclared identifier 'ggml_metal_free' auto operator()(ggml_metal_context *ctx) const noexcept -> void { ggml_metal_free(ctx); }

error: use of undeclared identifier 'ggml_metal_free'
auto operator()(ggml_metal_context *ctx) const noexcept -> void { ggml_metal_free(ctx); }
error: unknown type name 'ggml_metal_context'; did you mean 'ggml_opt_context'?
using unique_ggml_metal_context_t = std::unique_ptr<ggml_metal_context, ggml_metal_context_deleter_t>;
error: use of undeclared identifier 'ggml_metal_init'; did you mean 'ggml_numa_init'?
return unique_ggml_metal_context_t(ggml_metal_init(n_cb));
^~~~~~~~~~~~~~~
ggml_numa_init

如何使用openai在gpu上启动推理服务

MODEL=./qwen2_1.8b-ggml.bin python -m uvicorn qwen_cpp.openai_api:app --host 127.0.0.1 --port 8000 这种方式模型启动是运行在cpu上，使用build/bin/main是在gpu上，如何使用openai的方式设置只在gpu上启动模型服务

windows support

Can it be compiled on windows?

使用aarch64-linux-gnu-g++编译时，在编译ggml会报错

报了如下错误

aarch64-linux-gnu-gcc: 错误： unrecognized command line option ‘-mavx’
aarch64-linux-gnu-gcc: 错误： unrecognized command line option ‘-mavx2’
aarch64-linux-gnu-gcc: 错误： unrecognized command line option ‘-mfma’
aarch64-linux-gnu-gcc: 错误： unrecognized command line option ‘-mf16c’
aarch64-linux-gnu-gcc: 错误： unrecognized command line option ‘-msse3’

应该怎么解决？

输入长度怎么设置

GGML_ASSERT: /tmp/pip-req-build-obcizsli/third_party/ggml/src/ggml.c:2493: view_src == NULL || data_size + view_offs <= ggml_nbytes(view_src)
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: 对设备不适当的 ioctl 操作.
No stack.
The program is not being run.

max_length设置1024，推理报错，请问是转化时候要设置么

Is there a converted ggml file for sharing?

Model conversion requires cloning the original model,
but it takes a lot of time to download.

I would like to ask the author if he has a converted model to share.

如何找到或生成Qwen2-7B-Instruct-q8.ggml需要的qwen.tiktoken？

附带的qwen.tiktoken报错：
$ ./build/bin/main -m Qwen2-7B-Instruct-q8.ggml -p 你想活出怎样的人生 -s "你是一个猫娘"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
unknown token: 152063

yvonwin / qwen2.cpp Goto Github PK

qwen2.cpp's Issues

gpu版本为啥无法使用

error: use of undeclared identifier 'ggml_metal_free' auto operator()(ggml_metal_context *ctx) const noexcept -> void { ggml_metal_free(ctx); }

如何使用openai在gpu上启动推理服务

windows support

使用aarch64-linux-gnu-g++编译时，在编译ggml会报错

输入长度怎么设置

Is there a converted ggml file for sharing?

如何找到或生成Qwen2-7B-Instruct-q8.ggml需要的qwen.tiktoken？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent