rwkv / rwkv.cpp Goto Github PK
View Code? Open in Web Editor NEWINT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
License: MIT License
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
License: MIT License
Just tried running cmake the latest release (master-0a8157d) and I get this error:
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
CMake Error at CMakeLists.txt:229 (add_subdirectory):
The source directory
/Users/jhogue/Desktop/_Notes/ML_Local/rwkv.cpp/ggml
does not contain a CMakeLists.txt file.
CMake Error at CMakeLists.txt:232 (set_target_properties):
set_target_properties Can not find target to add properties to: ggml
-- Configuring incomplete, errors occurred!
Is it possible to make prompt processing faster with help of a gpu device, just like CuBLAS or ClBlast can with CPU hosted Llama models or other?
I'm Using https://huggingface.co/xzuyn/RWKV-4-Raven-7B-v11x-Eng99-Other1-20230429-ctx8192-ggml-q5_1 ggml weights with rwkv.cpp modified infrence scripts, which is
import argparse
import os
import pathlib
import time
import tokenizers
from typing import Optional, List, Mapping, Any
from langchain.llms.base import LLM
from rwkv_utils import sampling
from rwkv_utils import rwkv_cpp_model
from rwkv_utils import rwkv_cpp_shared_library
import fire
class RWKV_LLM():
rwkv_model: Optional[str] = None
def __init__(
self,
model_path: Optional[str],
temperature: float = 0.8,
top_p: float = 0.5,
max_tokens: int = 100,
tokenizer_path: Optional[str] = "../utils/20B_tokenizer.json"
):
super().__init__()
self.model_path = model_path
self.temperature = temperature
self.top_p = top_p
self.max_tokens = max_tokens
self.tokenizer_path = tokenizer_path
assert self.model_path, "Please Provide The Path of the Model"
assert self.tokenizer_path, "Please Provide The Path of the RWKV Tokenizer"
assert self.temperature, "Please Provide The Temperature"
assert self.top_p, "Please Provide The Top Probability for Sampling"
assert self.max_tokens, "Please Provide Max Token to Generate"
def generate_prompt(self, instruction: str, input_ctxt: str = None) -> str:
if input_ctxt:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input_ctxt}
### Response:"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""
def initialize_model(self):
# load tokenizer
self.tokenizer = tokenizers.Tokenizer.from_file(str(self.tokenizer_path))
# load RWKV Model
library = rwkv_cpp_shared_library.load_rwkv_shared_library()
print(f'System info: {library.rwkv_get_system_info_string()}')
print('Loading RWKV model....')
self.rwkv_model = rwkv_cpp_model.RWKVModel(library, self.model_path, thread_count=2)
print('Loaded Successfully....')
def ask(self, prompt: str, stop: Optional[List[str]] = None) -> str:
if stop is not None:
pass
if self.rwkv_model is None:
self.initialize_model()
# Generates completions from RWKV model based on a prompt.
prompt = self.generate_prompt(prompt)
prompt_tokens = self.tokenizer.encode(prompt).ids
print(f'{len(prompt_tokens)} tokens in prompt')
init_logits, init_state = None, None
for token in prompt_tokens:
init_logits, init_state = self.rwkv_model.eval(token, init_state, init_state, init_logits)
start = time.time()
logits, state = init_logits.clone(), init_state.clone()
for i in range(self.max_tokens):
token = sampling.sample_logits(logits, self.temperature, self.top_p)
print(self.tokenizer.decode([token]), end='')
logits, state = self.rwkv_model.eval(token, state, state, logits)
delay = time.time() - start
print(']\n\nTook %.3f sec, %d ms per token' % (delay, delay / self.max_tokens * 1000))
def main(model_path):
llm = RWKV_LLM(model_path)
llm.ask(input("> "))
if __name__ == '__main__':
fire.Fire(main)
I Guess it to generate good outputs, but i got the following outputs
> Write a Poem About AI
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model....
Loaded Successfully....
35 tokens in prompt
/content/Intellique/llms/rwkv_utils/rwkv_cpp_model.py:100: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
state_out.storage().data_ptr(),
Ai, the machine that thinks and dreams,
A powerful force that cannot be stopped,
A machine that knows no bounds,
A machine that dreams and thinks and dreams.
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
A]
I Tried Multiple Times with Multiple Prompts, But I Got No Luck
So is there anything i can do to get good outputs or what's the problem here
Hi,
I am working on rwkv-rs project which based on ggml as well, recentlly I am refering your implemtation of ggml, but I found it was hard to keep the rust friendly type defination (which is on master branch of gglm) and Q4_1_0 support (on your master branch) together.
Do you have any plain to update your gglm or contribute the Q4_1_0 back to ggml?
Currently I manually merged part of your code (without Q4_1_0 ) with newest gglm code on my project.
and here is my project,
https://github.com/yorkzero831/rwkv-rs
Very nice work! Any chance we can get a binary to do inference with, similar to llama.cpp, that is to say, without having to use Python? Thanks.
I'm Running RWKV -14B on 12 gigs of System RAM. This is What My Infrence Speed is Took 231.591 sec, 2315 ms per token
is there anything i can do to decrease the inference speed?
(rwkv) C:\Users\micro\Downloads\rwkv.cpp>python rwkv\chat_with_bot.py C:\Users\micro\Downloads\raven7B_q.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 5589895568, available 5589886801)
Traceback (most recent call last):
File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\chat_with_bot.py", line 45, in <module>
model = rwkv_cpp_model.RWKVModel(library, args.model_path)
File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\rwkv_cpp_model.py", line 37, in __init__
self.ctx = self.library.rwkv_init_from_file(model_path, thread_count)
File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py", line 74, in rwkv_init_from_file
ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'), ctypes.c_uint32(thread_count))
OSError: exception: access violation writing 0x0000000000000038
Should I add some code like below?
from typing import List
A: List[int]=[]
Traceback (most recent call last):
processed_tokens: list[int] = []
TypeError: 'type' object is not subscriptable
Exception ignored in: <function RWKVModel.__del__ at 0x000001DEFD5F7D30>
There is a seq model in ChatRWKV:
if seq_mode:
if 'cuda' in str(dev) and os.environ["RWKV_CUDA_ON"] == '1':
ATT = self.cuda_att_seq if wtype != torch.uint8 else self.cuda_att_seq_i8
else:
ATT = self.att_seq if wtype != torch.uint8 else self.att_seq_i8
FFN = self.ffn_seq if wtype != torch.uint8 else self.ffn_seq_i8
else:
ATT = self.att_one if wtype != torch.uint8 else self.att_one_i8
FFN = self.ffn_one if wtype != torch.uint8 else self.ffn_one_i8
It allow process a sequence of tokens, which makes the prompt loading faster (not only the initial prompt, but also the inputs of user).
python3 rwkv/quantize.py ../RWKV-4-Pile-14B-20230313-ctx8192-test1050.bin ../RWKV-4-Pile-14B-20230313-ctx8192-test1050-Q4_1.bin 3
Traceback (most recent call last):
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 28, in
main()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 17, in main
library = rwkv_cpp_shared_library.load_rwkv_shared_library()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 202, in load_rwkv_shared_library
return RWKVSharedLibrary(path)
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 29, in init
self.library = ctypes.cdll.LoadLibrary(shared_library_path)
File "/usr/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: bin/Release/rwkv.so: undefined symbol: max
System: Ubuntu 20.04.6 LTS
GCC: 9.4.0
CPU: Intel(R) Xeon(R) Platinum 8358P
Issue:
$ python rwkv/chat_with_bot.py /path/to/models/Raven-14B-v9-Q4.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
Processing 92 prompt tokens, may take a while
Segmentation fault (core dumped)
cmake -DBUILD_SHARED_LIBS=ON .
-- The C compiler identification is GNU 11.1.0
-- The CXX compiler identification is GNU 11.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done
-- Generating done
-- Build files have been written to: /www/rwkv.cpp
BUT
/www/rwkv.cpp$ cmake --build . --config Release
[ 33%] Building C object CMakeFiles/ggml.dir/ggml.c.o
In file included from /www/rwkv.cpp/ggml.c:4:
/www/rwkv.cpp/ggml.h:836:1: warning: function declaration isn’t a prototype [-Wstrict-prototypes]
836 | void ggml_run_test_suite();
| ^~~~
/www/rwkv.cpp/ggml.c: In function ‘dequantize_row_q4_1’:
/www/rwkv.cpp/ggml.c:1086:13: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
1086 | const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
| ^~~~~
/www/rwkv.cpp/ggml.c:1086:46: error: incompatible type for argument 1 of ‘vmovl_s8’
1086 | const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
| ^~~~~~~~~~~~~~~~
| |
| uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
7989 | vmovl_s8 (int8x8_t __a)
| ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c:1087:46: error: incompatible type for argument 1 of ‘vmovl_s8’
1087 | const uint16x8_t vi_1 = vmovl_s8(vget_high_u8(vq));
| ^~~~~~~~~~~~~~~~
| |
| uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
7989 | vmovl_s8 (int8x8_t __a)
| ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c: In function ‘dequantize_row_q4_1_o’:
/www/rwkv.cpp/ggml.c:1304:46: error: incompatible type for argument 1 of ‘vmovl_s8’
1304 | const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
| ^~~~~~~~~~~~~~~~
| |
| uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
7989 | vmovl_s8 (int8x8_t __a)
| ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c:1305:46: error: incompatible type for argument 1 of ‘vmovl_s8’
1305 | const uint16x8_t vi_1 = vmovl_s8(vget_high_u8(vq));
| ^~~~~~~~~~~~~~~~
| |
| uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
7989 | vmovl_s8 (int8x8_t __a)
| ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_compute_forward_mul_mat_q4_1_o_f32’:
/www/rwkv.cpp/ggml.c:7144:15: warning: unused variable ‘ne10’ [-Wunused-variable]
7144 | const int ne10 = src1->ne[0];
| ^~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_test_quantization’:
/www/rwkv.cpp/ggml.c:11442:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11442 | GGML_TEST_ASSERT(max_result == max_expected, "%f, %f", max_result, max_expected);
| ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11442:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11442 | GGML_TEST_ASSERT(max_result == max_expected, "%f, %f", max_result, max_expected);
| ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11453:64: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11453 | GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
| ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11453:78: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11453 | GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
| ^~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11456:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11456 | GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
| ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11456:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11456 | GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
| ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_test_quantization_q4_1_o’:
/www/rwkv.cpp/ggml.c:11478:64: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11478 | GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
| ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11478:78: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11478 | GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
| ^~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11482:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11482 | GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
| ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11482:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11482 | GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
| ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11490:80: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11490 | GGML_TEST_ASSERT(outlier_value_result == outlier_value_expected, "%f, %f", outlier_value_result, outlier_value_expected);
| ^~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11490:102: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11490 | GGML_TEST_ASSERT(outlier_value_result == outlier_value_expected, "%f, %f", outlier_value_result, outlier_value_expected);
| ^~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11506:57: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11506 | GGML_TEST_ASSERT(diff <= 1.0F, "%d: %f, %f", i, actual, expected);
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11506:65: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11506 | GGML_TEST_ASSERT(diff <= 1.0F, "%d: %f, %f", i, actual, expected);
| ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_run_test_suite’:
/www/rwkv.cpp/ggml.c:11542:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11542 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11542:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11542 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11542:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11542 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11543 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11543 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11543 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11544 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11544 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11544 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11545 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11545 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11545 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11546 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11546 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11546 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11547 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11547 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11547 | GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11556 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
| ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11556 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11556 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11557 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
| ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11557 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11557 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11558 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11558 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11558 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11559 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11559 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11559 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11560 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
| ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11560 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11560 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11561 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11561 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11561 | GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11570 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11570 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11570 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11571 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11571 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11571 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11572 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11572 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11572 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11573 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11573 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11573 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11574 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11574 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11574 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11575 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11575 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11575 | GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11584 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11584 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11584 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11585 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11585 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11585 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11586 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11586 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11586 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11587 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11587 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11587 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11588 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
| ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11588 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11588 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11589 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
| ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11589 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 | GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
| ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 | fprintf(stderr, __VA_ARGS__);\
| ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11589 | GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
CMakeFiles/ggml.dir/build.make:75: recipe for target 'CMakeFiles/ggml.dir/ggml.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml.c.o] Error 1
CMakeFiles/Makefile2:84: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
The novel model "+++" is not output after several times of writing, and other instructions are not output. I can only reset the model.
decoded: str = tokenizer.decode(accumulated_tokens)
if '\uFFFD' not in decoded:
Hi 👋
First of all, amazing project :)
Let me clarify, I don't have a Mac to double test this - and our CI doesn't seems to catch this bug. I'm the author of LocalAI, and people are reporting since the last rwkv.cpp release that compilation broke with Mac mudler/LocalAI#411 , any chance you can reproduce this locally?
Thanks!
generate_completions seems to be very bad at narration for any meaningful length (past about 200 words), often hallucinating more or repeating passages (for the 8GB fp16quanti4 14B Raven instruct 6 model).
also general usage could be easily improved with a while loop
Would be good to implement a simple while loop so the model does not leave RAM if someone wants to generate more and some of the repetition penalty logic (using GEN_alpha_presence = 0.2 # Presence Penalty and GEN_alpha_frequency = 0.2 # Frequency Penalty) that BlinkDL uses in ChatRWKV.
It's pretty fast on AVX2 ! this is an awesome repo and thank you for your work
Hi super cool looking project.
is there going to be any way to exchange ideas? gh discussion, discord server?
or did I miss something
My build crashes inferencing with a model with "Illegal Instruction".
I debugged it and seems to crash on an endbr64
instruction. I think my CPU doesn't support the instruction set.
Is there a building option to turn off the instruction set?
Version: Master, commit e84c446d9533dabef2d8d60735d5924db63362ff
Command to reproduce
python rwkv/chat_with_bot.py ../models/xxxxxxx.bin
It crashed with "Illegal Instruction"
I debugged the program:
> gdb python
(gdb) handle SIGILL stop
(gdb) run rwkv/chat_with_bot.py ../models/xxxx.bin
...
[New Thread 0x7fff6fa49640 (LWP 738136)]
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffde693135 in ggml_init () from /*****/rwkv.cpp/librwkv.so
(gdb) disassemble
Dump of assembler code for function ggml_init:
0x00007fffde692fd0 <+0>: endbr64
0x00007fffde692fd4 <+4>: push %r15
0x00007fffde692fd6 <+6>: mov $0x1,%eax
0x00007fffde692fdb <+11>: push %r14
...
% cmake . -DRWKV_CUBLAS=ON
-- GGML CUDA sources found, configuring CUDA architecture
-- Configuring done (2.4s)
CMake Error at CMakeLists.txt:250 (add_library):
Cannot find source file:
/tmp/rwkv.cpp/ggml/src/ggml.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
.hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at CMakeLists.txt:250 (add_library):
No SOURCES given to target: ggml
CMake Generate step failed. Build files cannot be regenerated correctly.
Hi @saharNooby, first off amazing work in this repo, I've been looking for a cpu implementation of RWKV to experiment with using the pre-trained models (don't have a large gpu).
I've put together a basic port of my OpenAI-compatible webserver from llama-cpp-python and tested it on Linux with the your library and the RWKV Raven 3B model in f16, q4_0, and q4_1 (pictured below). Going to try some larger models this weekend to test performance / quality. The cool thing about exposing the model through this server is that It opens the project up to be connected to any OpenAI client (langchain, chatui's, multi-language client libraries).
Let me know if you want me to put a PR to merge this in somewhere and if so the best place to put it.
Cheers, and again great work!
Thank you for this promising work,
like this one
https://github.com/ggerganov/llama.cpp#android
I have compiled with linux instructions ,and everything seems ok, but when i run python file, this error occured:
Thanks for the great work. Maybe we can use Q4_1 for some of the matrices? (and Q4_1_O for others)
Hey I have noticed this doesn't seem to contain samplers in c I was wondering would it be difficult to implement? why not just copy the llama samplers? stupid question likely! I am not a CPP or ggml pro sorry
Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per torch.load
in https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py
, line 126
). Not to mention how much heavier the unquantized models are on bandwidths.
Converts an RWKV model checkpoint in PyTorch format to an rwkv.cpp compatible file using convert_pytorch_to_ggml.py.
Get a LoRA checkpoint with https://github.com/Blealtan/RWKV-LM-LoRA.
Merges a LoRA checkpoint in PyTorch format (.pth) into an rwkv.cpp model file using merge_lora_into_ggml.py.
Warnings like that "Unused parameter in LoRA state dict blocks.13.att.receptance.lora_B(att.key.lora_A、att.value.lora_A、att.receptance.lora_A、ffn.key.lora_A、ffn.receptance.lora_A、ffn.value.lora_A、att.key.lora_B、att.value.lora_B、att.receptance.lora_B、ffn.key.lora_B、ffn.receptance.lora_B、ffn.value.lora_B)" were printed during the merge .
Using the merged model is poor and does not reflect the effect of lora.
Why does this happen?
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 4625438848, available 4538236928)
Traceback (most recent call last):
File "D:\AI\rwkv.cpp\rwkv\chat_with_bot.py", line 4, in
model = rwkv_cpp_model.RWKVModel(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI\rwkv.cpp\rwkv\rwkv_cpp_model.py", line 37, in init
self.ctx = self.library.rwkv_init_from_file(model_path, thread_count)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py", line 74, in rwkv_init_from_file
ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'), ctypes.c_uint32(thread_count))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: exception: access violation reading 0x0000000000000008
Anyone know how to fix it?
...even after building with
cmake -DRWKV_AVX=OFF -DRWKV_AVX2=OFF -DRWKV_AVX512=OFF -DRWKV_FMA=OFF -DBUILD_SHARED_LIBS=ON .
cmake --build . --config Release
This is on an N4200 cpu (has SSE2, SSSE3, SSE4_1 & SSE4_2, but not FMA or AVXs) running ubuntu jaunty (in a singularity container).
(Issue unlikely to be due to singularity: apptainer/singularity#5795)
First of all, amazing work on this implementation!
I wanted to really test pushing the context length to its limits (using the chat example), but noticed that the state is always the same size when saved, even it had gone through 50k tokens worth of context.
How do I increase the size of the state available to accommodate the larger context?
Any help would be appreciated!
Porting over the https://github.com/ggerganov/llama.cpp mmap support, would reduce the minimum RAM requirement for the model.
I know, you know, haha - just putting it here for me to keep track of (if i come around to picking it up, to optimize the RWKV-cpp-node bindings)
I've been measuring loss and perplexity of different sizes and data types on a very small private dataset:
rwkv.cpp-169M-Q4_0.bin averages: loss [3.629], perplexity 37.691
rwkv.cpp-169M-Q4_1.bin, averages: loss [3.163], perplexity 23.642
rwkv.cpp-169M-float16.bin, averages: loss [2.699], perplexity 14.861
rwkv.cpp-169M.bin, averages: loss [2.699], perplexity 14.861
RWKV-4-Pile-430M-20220808-8066-q4_0.bin, averages: loss [2.911], perplexity 18.375
RWKV-4-Pile-430M-20220808-8066-q4_1.bin, averages: loss [2.631], perplexity 13.885
RWKV-4-Pile-430M-20220808-8066-FP16.bin, averages: loss [2.377], perplexity 10.777
RWKV-4-Pile-430M-20220808-8066-FP32.bin, averages: loss [2.377], perplexity 10.777
RWKV-4-Pile-1B5-20220929-ctx4096-Q4_0.bin, averages: loss [3.079], perplexity 21.745
RWKV-4-Pile-1B5-20220929-ctx4096-Q4_1.bin, averages: loss [2.655], perplexity 14.231
RWKV-4-Pile-1B5-20220929-ctx4096-FP16.bin, averages: loss [2.060], perplexity 7.847
RWKV-4-Pile-1B5-20220929-ctx4096-FP32.bin, averages: loss [2.060], perplexity 7.847
RWKV-4-Pile-3B-20221110-ctx4096-Q4_0.bin, averages: loss [4.689], perplexity 108.724
RWKV-4-Pile-3B-20221110-ctx4096-Q4_1.bin, averages: loss [2.916], perplexity 18.475
RWKV-4-Pile-3B-20221110-ctx4096-FP16.bin, averages: loss [2.067], perplexity 7.901
RWKV-4-Pile-7B-20230109-ctx4096-Q4_0.bin, averages: loss [6.296], perplexity 542.322
RWKV-4-Pile-7B-20230109-ctx4096-Q4_1.bin, averages: loss [3.017], perplexity 20.423
The measuring method may not be entirely correct, but these huge losses and perplexities really do show in the quality of generated text -- it is almost incoherent.
Of course, we need proper measuring on WikiText; but it would be very slow on my hardware, and WikiText is not representative of my use case.
Interesting thing to note are min and max values of RWKV matrix weights:
169M: -13.8750 14.0000
430M: -14.5000 14.9375
1.5B: -27.2500 27.3750
3B: -12.6875 14.1250
For comparison, llama 7B min and max values are around -2.5 2.5!
As a next step, I'll try to determine whether these huge values are outliers, or most weights really are distributed in this range.
I guess we need alternative quantization scheme for RWKV.
This is a really great package. I'm not yet understanding the training mathematics, however. In order to get a system that integrates with legacy C++ code and runs fast, ideally faster than a python bridge, how easy would it be to slap together a baby trainer demo? Something similar to the (pick one) character-based / word-based tiny-shakespeare / OpenWebText training examples in https://github.com/karpathy/nanoGPT/tree/master/data? This would be really useful and great if it could be done.
How to generate the quantized INT4, INT5 and INT8 model?
Do you use GPTQ/RPTQ or normal per-tensor/per-channel PTQ? For quantized int8 model? Do you use int8 @ int8 -> int32 cublas?
Hi,
I am getting this error when trying to load the model. Can someone please help me? I downloaded the quantized model from huggingface
File [e:\minions\RWKV\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py:90](file:///E:/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py:90), in RWKVSharedLibrary.rwkv_init_from_file(self, model_file_path, thread_count)
[74](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=73) """
[75](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=74) Loads the model from a file and prepares it for inference.
[76](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=75) Throws an exception in case of any error. Error messages would be printed to stderr.
(...)
[85](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=84) Count of layers to load on gpu, must be positive only enabled with cuBLAS.
[86](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=85) """
[88](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=87) ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'),
[89](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=88) ctypes.c_uint32(thread_count))
---> [90](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=89) assert ptr is not None, 'rwkv_init_from_file failed, check stderr'
[91](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=90) return RWKVContext(ptr)
AssertionError: rwkv_init_from_file failed, check stderr
typo here
python rwkv\convert_rwkv_to_ggml.py
# Windows
python rwkv\convert_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float16
should be
# Windows
python rwkv\convert_pytorch_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float16
Could some GetLastError
-like function be added, with error messages or error codes, so it's possible to get error messages without redirecting stderr?
cmake . ✔
cmake --build . --config Release
-- The C compiler identification is GNU 12.2.1
-- The CXX compiler identification is GNU 12.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at CMakeLists.txt:215 (add_subdirectory):
The source directory
/home/rexommendation/Downloads/rwkv.cpp-master-a3178b2/ggml
does not contain a CMakeLists.txt file.
CMake Error at CMakeLists.txt:218 (set_target_properties):
set_target_properties Can not find target to add properties to: ggml
-- Configuring incomplete, errors occurred!
make: Makefile: No such file or directory
make: *** No rule to make target 'Makefile'. Stop.
i'm running chat_with_bot.py using torch==1.13, bug some error occur:
Loading 20B tokenizer System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | Loading RWKV model Processing 185 prompt tokens, may take a while Traceback (most recent call last): File "rwkv/chat_with_bot.py", line 115, in <module> process_tokens(split_last_end_of_line(tokenizer.encode(init_prompt).ids)) File "rwkv/chat_with_bot.py", line 81, in process_tokens logits, state = model.eval(_token, state, state, logits) File "/root/RWKVcpu/rwkv/rwkv_cpp_model.py", line 102, in eval state_out.untyped_storage().data_ptr(), AttributeError: 'Tensor' object has no attribute 'untyped_storage'
looks like there are some version conflicts, need help
python3 rwkv/quantize.py ../RWKV-4-Pile-14B-20230313-ctx8192-test1050.bin ../RWKV-4-Pile-14B-20230313-ctx8192-test1050-Q4_1.bin 3
Traceback (most recent call last):
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 28, in
main()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 17, in main
library = rwkv_cpp_shared_library.load_rwkv_shared_library()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 202, in load_rwkv_shared_library
return RWKVSharedLibrary(path)
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 29, in init
self.library = ctypes.cdll.LoadLibrary(shared_library_path)
File "/usr/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: bin/Release/rwkv.so: undefined symbol: max
rwkv world 3b or 7b Q8_0
input
Translate the following text into Korean: "Hello"
output
File "/www/wenda-pi/llms/rwkvcpp/rwkv_tokenizer.py", line 94, in decode
return self.decodeBytes(tokens).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data
Modify the above file
return self.decodeBytes(tokens).decode('utf-8','ignore')
output
안하세요.
but the correct one should be
안녕하세요
lost character 녕
by model rwkv world fp16 is correct
After successfully converting the RWKV model, quanting it does not work. Here is the error:
Unsupported quantization type 84426624
/Users/dac/Documents/_AI/rwkv.cpp/rwkv.cpp:583: q_type == 2 || q_type == 3 || q_type == 4 || q_type == 5 || q_type == 6
Traceback (most recent call last):
File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/quantize.py", line 31, in <module>
main()
File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/quantize.py", line 22, in main
library.rwkv_quantize_model_file(
File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 177, in rwkv_quantize_model_file
assert self.library.rwkv_quantize_model_file(
AssertionError: rwkv_quantize_model_file failed, check stderr ```
I have tried Q8_0, Q5_0, Q5_1, Q4_2 and all result with the same error.
Hi @saharNooby, I really enjoy the project. However, the chat_with_bot.py
does not work well with the Raven models (see below). I think this may due to the prompt. Based on the demo I wrote a new script that can fit my need. I put it here as it may help others. And please correct and comment.
https://gist.github.com/zklhp/a60c4501060383d1cb99b4b6e24109d1
Thank you very much for your input.
With chat_with_bot.py
, it does not really follow the order. When you press Enter, it talks to himself. The rwkv.cpp-14B.bin
is converted from RWKV-4-Raven-14B-v8-Eng-20230408-ctx4096.pth
(link).
$ python -i rwkv/chat_with_bot.py rwkv.cpp-14B.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
Processing 92 prompt tokens, may take a while
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:100: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
state_out.storage().data_ptr(),
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:101: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
logits_out.storage().data_ptr()
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:82: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
state_in_ptr = state_in.storage().data_ptr()
Chat initialized! Write something and press Enter.
Tell me about ravens.
> Bob: Ravens are large, black birds of the family Corvidae, known for their intelligence, keen eyesight, and ability to mimic human speech.
Write a song about ravens.
> Bob: Once upon a time, there was a group of ravens who lived in a big, old tree. They were all very different, but they all loved to sing. One day, they decided to form a band and sing together. They were the best band in the forest, and they sang all night long.
> Bob: Tell me about the history of the Beatles.
> Bob: Tell me about the history of the Beatles.
> Bob: Tell me about the history of the Beatles.
With my script, it looks like this. Note I add two functions.
Blah Blah are ignored.
Chat initialized! Write something and press Enter.
- Use '+' to start a new dialog.
- To fill the input, use '\' at the end of line.
> Write a song about ravens.
# Response: Verse 1:
Birds of the sky,
Birdies of the earth,
Black and white,
Ravens, in the sky.
Verse 2:
Cawing and calling,
Stalking the ground,
Birds of prey,
Ravens, in the sky.
Chorus:
Ravens,
Cawing and flying,
Black and white,
In the sky.
Verse 3:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 4:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 5:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 6:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 7:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Chorus:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
> + Write a song about ravens.
Open a new dialog.
# Response: Verse 1:
Listen to the ravens, they're calling so loud,
Cawing and cawing, spreading their wings so high.
With a croak and a croak, a flutter and a flutter,
They're telling a tale, of secrets so important.
Chorus:
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their stories so far.
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever, I'll be there.
Verse 2:
Cawing and cawing, telling their tales,
Ravens, ravens, soaring and singing.
In the sky, so high, soaring and singing,
Cawing and cawing, telling their tales so proud.
Chorus:
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Bridge:
Ravens, ravens, living in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the sky,
Cawing and cawing, forever, forever I'll be there.
Chorus:
Ravens, ravens, living in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the sky,
Cawing and cawing, forever, forever I'll be there.
> Extend the following song\
> Cawing and cawing, forever, forever I'll be there.
# Response: Cawing and cawing, forever, forever I'll be there,
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so loud,
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Cawing and cawing, forever, forever I'll be there.
> Extend the following song\
> Cawing and cawing, forever, forever I'll be there.
# Response: Cawing and cawing, forever, forever I'll be there,
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so loud,
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Cawing and cawing, forever, forever I'll be there.
On Windows 11 installed per instructions, conversion seems to only support float16/float32, not quantized formats.
~\src\rwkv.cpp> python rwkv\convert_pytorch_to_ggml.py RWKV-4-Raven-14B-v12-Eng98%-Other2%-20230523-ctx8192.pth Q8_0_RWKV-4-Raven-14B-v12.bin Q8_0
usage: convert_pytorch_to_ggml.py [-h] src_path dest_path {float16,float32}
convert_pytorch_to_ggml.py: error: argument data_type: invalid choice: ‘Q8_0’ (choose from ‘float16’, ‘float32’)
When I did "cmake --build . --config Release", it occurred an error :
"ggml/src/CMakeFiles/ggml.dir/build.make:75: recipe for target 'ggml/src/CMakeFiles/ggml.dir/ggml.c.o' failed
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/ggml.c.o] Error 1
CMakeFiles/Makefile2:164: recipe for target 'ggml/src/CMakeFiles/ggml.dir/all' failed
make[1]: *** [ggml/src/CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2"
Do you know how to fix it?
Because instruction for Apple silicon in Readme.md has been applied, CMAKE_SYSTEM_PROCESSOR from cmake .
appears arm64.
But, there is warning as below for cmake .
-- Accelerate framework found
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- CMAKE_SYSTEM_PROCESSOR: arm64
CMake Warning at ggml/src/CMakeLists.txt:48 (message):
Your arch is announced as x86_64, but it seems to actually be ARM64. Not
fixing that can lead to bad performance. For more info see:
https://github.com/ggerganov/whisper.cpp/issues/66#issuecomment-#1282546789
-- ARM detected
-- Accelerate framework found
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /my/path/rwkv.cpp
Furthermore, below error occurs for python rwkv/generate_completions.py /my/path/rwkv.cpp/models/rwkv-4-raven/ggml-model-q5_1.bin
OSError: dlopen(/my/path/rwkv.cpp/librwkv.dylib, 0x0006): tried: '/my/path/rwkv.cpp/librwkv.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/my/path/rwkv.cpp/librwkv.dylib' (no such file), '/my/path/rwkv.cpp/librwkv.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
For your reference, similar error for llama-cpp-python was solved with arch -arm64 pip install llama-cpp-python --no-cache-dir
It might be similar solution mentioned in above github link from warning.
Additionally, arch -arm64 cmake .
and arch -arm64 cmake --build . --config Release
were not valid.
Compile failing on M1 MacOSX, looks like there is a missing type.
-- Accelerate framework found
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- Accelerate framework found
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/jeremyprice/git/go-rwkv.cpp/rwkv.cpp
[ 12%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 25%] Linking C static library libggml.a
[ 25%] Built target ggml
[ 37%] Building CXX object CMakeFiles/rwkv.dir/rwkv.cpp.o
/git/go-rwkv.cpp/rwkv.cpp/rwkv.cpp:471:19: error: variable has incomplete type 'struct stat64'
struct stat64 file_stat;
^
/git/go-rwkv.cpp/rwkv.cpp/rwkv.cpp:471:12: note: forward declaration of 'stat64'
struct stat64 file_stat;
^
Is it some how possible to use rwkv.cpp with langchain.js https://js.langchain.com/docs/
Currently, only matrices of layers are offloaded to the GPU. Head, the biggest matrix in the model, stays on CPU and evaluated there.
On my machine, offloading head of 14B model in addition to offloading all layers gives 60 ms
per token latency vs 70 ms
without head offloading.
As always, the hardest question here is API design -- we need to preserve compatibility and not inflate API with new small functions.
then if the bot returns a '\n', break
, it will be stopped.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.