Comments (3)
Thanks. I can get desired output after adding the tokenizer type. The 7B world cpp-q8_0 model runs with about 10 tokens/sec speed on my 16GB M1 mac book.
python rwkv/generate_completions.py rwkv-cpp-readflow-7B-ctx32k-q8_0.bin world
Loading world tokenizer
System info: AVX=0 AVX2=0 AVX512=0 FMA=0 NEON=1 ARM_FMA=1 F16C=0 FP16_VA=1 WASM_SIMD=0 BLAS=1 SSE3=0 VSX=0
Loading RWKV model
91 tokens in prompt
--- Generation 0 ---
# rwkv.cpp
This is a port of [BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM) to [ggerganov/ggml](https://github.com/ggerganov/ggml).
Besides usual **FP32**, it supports **FP16** and **quantized INT4** inference on CPU. This project is **CPU only**.[
# Example
```cpp
#include "ggml.h"
#include "common.h"
#include "logger.h"
int main()
{
// Load model
model::Ptr model = model::load("E:/workspace/ggml/examples/yolo/yolo.onnx");
// Convert input to float
real input_data = 1.0;
real input_data_f = model->input_to_float]
Took 9.785 sec, 97 ms per token
from rwkv.cpp.
Looks like you use wrong tokenizer...
Or wrong model. Try with raven.
Also use bigger one, maybe 3B
from rwkv.cpp.
Hi!
You are using RWKV World
model, which uses world
tokenizer. By default, generate_completions.py
/ chat_with_bot.py
uses 20B
tokenizer, which will give garbage output when using it with the RWKV World
model.
You need to explicitly specify world
tokenizer when running the script:
python rwkv/chat_with_bot.py rwkv-cpp-world-1.5B-q8_0.bin world
Or wrong model. Try with raven.
Also use bigger one, maybe 3B
From my experience, even 1B5 models are fluent and (when used with the correct tokenizer) generate okay texts.
from rwkv.cpp.
Related Issues (20)
- Tutorial for python script? HOT 1
- it seems does not support the newly RWKV-4-World-CHNtuned-3B-v1-20230625-ctx4096.bin HOT 2
- Can't build with ggml HOT 3
- Support hipBLAS
- Support Metal in apple macOS?
- [QUESTION] Implementing RNN/LSTM with ggml HOT 4
- crash on GGML_ASSERT: 'rwkv.cpp/ggml/src/ggml.c:5316: ggml_can_repeat_rows(b, a)' HOT 1
- Update new GGML for GGML_MAX_NODES limit? HOT 2
- Support RWKV v5 HOT 1
- llama-node is not working in the moment HOT 1
- Fix extras/CMakeList.txt file for static build
- Support build with cublas and hipblas on github action HOT 3
- CMake Error HOT 1
- hipblas cannot build using cmake on windows with rocm5.7.1 HOT 2
- The linked Huging Face page in the README doesn't have any .bin files HOT 1
- Add mac ARM build as part of the build process
- Replace all assertions in Python code with if statements HOT 7
- Support RWKV v6
- Colab notebook to start faster?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rwkv.cpp.