rwkv / rwkv.cpp Goto Github PK

View Code? Open in Web Editor NEW

1.1K 21.0 75.0 16.86 MB

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

License: MIT License

C++ 40.53% Python 34.87% C 15.43% CMake 9.17%

deep-learning language-model llm machine-learning quantization rwkv ggml

rwkv.cpp's People

Contributors

Stargazers

Watchers

rwkv.cpp's Issues

Mac Build stops with Errors

Just tried running cmake the latest release (master-0a8157d) and I get this error:

-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
CMake Error at CMakeLists.txt:229 (add_subdirectory):
The source directory
/Users/jhogue/Desktop/_Notes/ML_Local/rwkv.cpp/ggml

does not contain a CMakeLists.txt file.

CMake Error at CMakeLists.txt:232 (set_target_properties):
set_target_properties Can not find target to add properties to: ggml

-- Configuring incomplete, errors occurred!

Blas-like Prompt Parallelization? (sequence processing mode)

Is it possible to make prompt processing faster with help of a gpu device, just like CuBLAS or ClBlast can with CPU hosted Llama models or other?

UnExpected Outputs

I'm Using https://huggingface.co/xzuyn/RWKV-4-Raven-7B-v11x-Eng99-Other1-20230429-ctx8192-ggml-q5_1 ggml weights with rwkv.cpp modified infrence scripts, which is

import argparse
import os
import pathlib
import time
import tokenizers
from typing import Optional, List, Mapping, Any
from langchain.llms.base import LLM
from rwkv_utils import sampling
from rwkv_utils import rwkv_cpp_model
from rwkv_utils import rwkv_cpp_shared_library
import fire

class RWKV_LLM():

    rwkv_model: Optional[str] = None

    def __init__(
        self,
        model_path: Optional[str],
        temperature: float = 0.8,
        top_p: float = 0.5,
        max_tokens: int = 100,
        tokenizer_path: Optional[str] = "../utils/20B_tokenizer.json"
    ):
        super().__init__()
        self.model_path = model_path
        self.temperature = temperature
        self.top_p = top_p
        self.max_tokens = max_tokens
        self.tokenizer_path = tokenizer_path

        assert self.model_path, "Please Provide The Path of the Model"
        assert self.tokenizer_path, "Please Provide The Path of the RWKV Tokenizer"
        assert self.temperature, "Please Provide The Temperature"
        assert self.top_p, "Please Provide The Top Probability for Sampling"
        assert self.max_tokens, "Please Provide Max Token to Generate"

    def generate_prompt(self, instruction: str, input_ctxt: str = None) -> str:
        if input_ctxt:
            return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_ctxt}

### Response:"""
        else:
            return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

        
    def initialize_model(self):
        # load tokenizer
        self.tokenizer = tokenizers.Tokenizer.from_file(str(self.tokenizer_path))

        # load RWKV Model
        library = rwkv_cpp_shared_library.load_rwkv_shared_library()
        print(f'System info: {library.rwkv_get_system_info_string()}')
        print('Loading RWKV model....')
        self.rwkv_model = rwkv_cpp_model.RWKVModel(library, self.model_path, thread_count=2)
        print('Loaded Successfully....')


    def ask(self, prompt: str, stop: Optional[List[str]] = None) -> str:

        if stop is not None:
            pass

        if self.rwkv_model is None:
            self.initialize_model()

        # Generates completions from RWKV model based on a prompt.
        prompt = self.generate_prompt(prompt)
        prompt_tokens = self.tokenizer.encode(prompt).ids 
        print(f'{len(prompt_tokens)} tokens in prompt')

        init_logits, init_state = None, None

        for token in prompt_tokens:
            init_logits, init_state = self.rwkv_model.eval(token, init_state, init_state, init_logits)

        start = time.time()

        logits, state = init_logits.clone(), init_state.clone()

        for i in range(self.max_tokens):

            token = sampling.sample_logits(logits, self.temperature, self.top_p)
            print(self.tokenizer.decode([token]), end='')
            logits, state = self.rwkv_model.eval(token, state, state, logits)

        delay = time.time() - start
        print(']\n\nTook %.3f sec, %d ms per token' % (delay, delay / self.max_tokens * 1000))

def main(model_path):
    llm = RWKV_LLM(model_path)
    llm.ask(input("> "))


if __name__ == '__main__':
    fire.Fire(main)

I Guess it to generate good outputs, but i got the following outputs

> Write a Poem About AI
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Loading RWKV model....
Loaded Successfully....
35 tokens in prompt
/content/Intellique/llms/rwkv_utils/rwkv_cpp_model.py:100: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  state_out.storage().data_ptr(),

Ai, the machine that thinks and dreams,
A powerful force that cannot be stopped,
A machine that knows no bounds,
A machine that dreams and thinks and dreams.
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
Ai, the machine that dreams,
A machine that dreams,
A]

I Tried Multiple Times with Multiple Prompts, But I Got No Luck

So is there anything i can do to get good outputs or what's the problem here

Code difference is getting more between ggml and rwkv.cpp

Hi,
I am working on rwkv-rs project which based on ggml as well, recentlly I am refering your implemtation of ggml, but I found it was hard to keep the rust friendly type defination (which is on master branch of gglm) and Q4_1_0 support (on your master branch) together.

Do you have any plain to update your gglm or contribute the Q4_1_0 back to ggml?

Currently I manually merged part of your code (without Q4_1_0 ) with newest gglm code on my project.
and here is my project,
https://github.com/yorkzero831/rwkv-rs

Inference binary

Very nice work! Any chance we can get a binary to do inference with, similar to llama.cpp, that is to say, without having to use Python? Thanks.

Slow Inference

I'm Running RWKV -14B on 12 gigs of System RAM. This is What My Infrence Speed is Took 231.591 sec, 2315 ms per token

is there anything i can do to decrease the inference speed?

Memory error

(rwkv) C:\Users\micro\Downloads\rwkv.cpp>python rwkv\chat_with_bot.py C:\Users\micro\Downloads\raven7B_q.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 5589895568, available 5589886801)
Traceback (most recent call last):
  File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\chat_with_bot.py", line 45, in <module>
    model = rwkv_cpp_model.RWKVModel(library, args.model_path)
  File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\rwkv_cpp_model.py", line 37, in __init__
    self.ctx = self.library.rwkv_init_from_file(model_path, thread_count)
  File "C:\Users\micro\Downloads\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py", line 74, in rwkv_init_from_file
    ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'), ctypes.c_uint32(thread_count))
OSError: exception: access violation writing 0x0000000000000038

Python type hint cannot work: A: list[int]=[] when run chat_with_bot.py

Should I add some code like below?

from typing import List
A: List[int]=[]

Error

Traceback (most recent call last):
    processed_tokens: list[int] = []
TypeError: 'type' object is not subscriptable
Exception ignored in: <function RWKVModel.__del__ at 0x000001DEFD5F7D30>

environment:

win10
Anaconda3
python 3.8.8

Is it possible to implement the seq mode for loading prompt?

There is a seq model in ChatRWKV:

                if seq_mode:
                    if 'cuda' in str(dev) and os.environ["RWKV_CUDA_ON"] == '1':
                        ATT = self.cuda_att_seq if wtype != torch.uint8 else self.cuda_att_seq_i8
                    else:
                        ATT = self.att_seq if wtype != torch.uint8 else self.att_seq_i8
                    FFN = self.ffn_seq if wtype != torch.uint8 else self.ffn_seq_i8
                else:
                    ATT = self.att_one if wtype != torch.uint8 else self.att_one_i8
                    FFN = self.ffn_one if wtype != torch.uint8 else self.ffn_one_i8

Source: https://github.com/BlinkDL/ChatRWKV/blob/e63830f03669c01ff4e567db57420adf2096d06a/rwkv_pip_package/src/rwkv/model.py#L586-L594

It allow process a sequence of tokens, which makes the prompt loading faster (not only the initial prompt, but also the inputs of user).

Can someone provide compiled version?

OSError: bin/Release/rwkv.so: undefined symbol: max

python3 rwkv/quantize.py ../RWKV-4-Pile-14B-20230313-ctx8192-test1050.bin ../RWKV-4-Pile-14B-20230313-ctx8192-test1050-Q4_1.bin 3

Traceback (most recent call last):
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 28, in
main()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/quantize.py", line 17, in main
library = rwkv_cpp_shared_library.load_rwkv_shared_library()
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 202, in load_rwkv_shared_library
return RWKVSharedLibrary(path)
File "/home/redboxing/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 29, in init
self.library = ctypes.cdll.LoadLibrary(shared_library_path)
File "/usr/lib/python3.10/ctypes/init.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: bin/Release/rwkv.so: undefined symbol: max

(Ubuntu x86_64) Segmentation Fault Running Q4_1_O Model

System: Ubuntu 20.04.6 LTS
GCC: 9.4.0
CPU: Intel(R) Xeon(R) Platinum 8358P

Issue:

$ python rwkv/chat_with_bot.py /path/to/models/Raven-14B-v9-Q4.bin 
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Loading RWKV model
Processing 92 prompt tokens, may take a while
Segmentation fault (core dumped)

Q4_1_O quantization test is not passing (Linux aarch64)

cmake -DBUILD_SHARED_LIBS=ON .
-- The C compiler identification is GNU 11.1.0
-- The CXX compiler identification is GNU 11.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done
-- Generating done
-- Build files have been written to: /www/rwkv.cpp

BUT

/www/rwkv.cpp$ cmake --build . --config Release
[ 33%] Building C object CMakeFiles/ggml.dir/ggml.c.o
In file included from /www/rwkv.cpp/ggml.c:4:
/www/rwkv.cpp/ggml.h:836:1: warning: function declaration isn’t a prototype [-Wstrict-prototypes]
  836 | void ggml_run_test_suite();
      | ^~~~
/www/rwkv.cpp/ggml.c: In function ‘dequantize_row_q4_1’:
/www/rwkv.cpp/ggml.c:1086:13: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 1086 |             const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
      |             ^~~~~
/www/rwkv.cpp/ggml.c:1086:46: error: incompatible type for argument 1 of ‘vmovl_s8’
 1086 |             const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
      |                                              ^~~~~~~~~~~~~~~~
      |                                              |
      |                                              uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
 7989 | vmovl_s8 (int8x8_t __a)
      |           ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c:1087:46: error: incompatible type for argument 1 of ‘vmovl_s8’
 1087 |             const uint16x8_t vi_1 = vmovl_s8(vget_high_u8(vq));
      |                                              ^~~~~~~~~~~~~~~~
      |                                              |
      |                                              uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
 7989 | vmovl_s8 (int8x8_t __a)
      |           ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c: In function ‘dequantize_row_q4_1_o’:
/www/rwkv.cpp/ggml.c:1304:46: error: incompatible type for argument 1 of ‘vmovl_s8’
 1304 |             const uint16x8_t vi_0 = vmovl_s8(vget_low_u8 (vq));
      |                                              ^~~~~~~~~~~~~~~~
      |                                              |
      |                                              uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
 7989 | vmovl_s8 (int8x8_t __a)
      |           ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c:1305:46: error: incompatible type for argument 1 of ‘vmovl_s8’
 1305 |             const uint16x8_t vi_1 = vmovl_s8(vget_high_u8(vq));
      |                                              ^~~~~~~~~~~~~~~~
      |                                              |
      |                                              uint8x8_t
In file included from /www/rwkv.cpp/ggml.c:193:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:7989:20: note: expected ‘int8x8_t’ but argument is of type ‘uint8x8_t’
 7989 | vmovl_s8 (int8x8_t __a)
      |           ~~~~~~~~~^~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_compute_forward_mul_mat_q4_1_o_f32’:
/www/rwkv.cpp/ggml.c:7144:15: warning: unused variable ‘ne10’ [-Wunused-variable]
 7144 |     const int ne10 = src1->ne[0];
      |               ^~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_test_quantization’:
/www/rwkv.cpp/ggml.c:11442:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11442 |     GGML_TEST_ASSERT(max_result == max_expected, "%f, %f", max_result, max_expected);
      |                                                            ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11442:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11442 |     GGML_TEST_ASSERT(max_result == max_expected, "%f, %f", max_result, max_expected);
      |                                                                        ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11453:64: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11453 |     GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
      |                                                                ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11453:78: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11453 |     GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
      |                                                                              ^~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11456:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11456 |     GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
      |                                                            ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11456:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11456 |     GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
      |                                                                        ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_test_quantization_q4_1_o’:
/www/rwkv.cpp/ggml.c:11478:64: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11478 |     GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
      |                                                                ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11478:78: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11478 |     GGML_TEST_ASSERT(delta_result == delta_expected, "%f, %f", delta_result, delta_expected);
      |                                                                              ^~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11482:60: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11482 |     GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
      |                                                            ^~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11482:72: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11482 |     GGML_TEST_ASSERT(min_result == min_expected, "%f, %f", min_result, min_expected);
      |                                                                        ^~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11490:80: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11490 |     GGML_TEST_ASSERT(outlier_value_result == outlier_value_expected, "%f, %f", outlier_value_result, outlier_value_expected);
      |                                                                                ^~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11490:102: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11490 |     GGML_TEST_ASSERT(outlier_value_result == outlier_value_expected, "%f, %f", outlier_value_result, outlier_value_expected);
      |                                                                                                      ^~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11506:57: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11506 |         GGML_TEST_ASSERT(diff <= 1.0F, "%d: %f, %f", i, actual, expected);
      |                                                         ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11506:65: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11506 |         GGML_TEST_ASSERT(diff <= 1.0F, "%d: %f, %f", i, actual, expected);
      |                                                                 ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c: In function ‘ggml_run_test_suite’:
/www/rwkv.cpp/ggml.c:11542:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11542 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11542:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11542 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11542:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11542 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 0, 2.7322F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11543 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11543 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11543:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11543 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 1, 2.8531F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11544 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11544 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11544:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11544 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 2, 0.6466F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11545 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11545 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11545:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11545 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 3, 0.4974F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11546 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11546 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11546:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11546 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 4, 5.6463F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:44: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11547 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
      |                                            ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11547 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11547:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11547 |     GGML_TEST_ASSERT_ELEMENT_F32(exp_a, 5, 0.9564F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11556 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
      |                                                  ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11556 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11556:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11556 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 0, -0.0051F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11557 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
      |                                                  ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11557 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11557:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11557 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 1, -0.0484F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11558 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
      |                                                  ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11558 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11558:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11558 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 2, 1.4361F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11559 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
      |                                                  ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11559 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11559:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11559 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 3, 1.6984F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11560 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
      |                                                  ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11560 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11560:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11560 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 4, -0.7310F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:50: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11561 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
      |                                                  ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11561 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11561:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11561 |     GGML_TEST_ASSERT_ELEMENT_F32(one_minus_a, 5, 1.0446F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11570 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11570 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11570:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11570 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 0, 0.7321F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11571 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11571 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11571:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11571 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 1, 0.7405F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11572 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11572 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11572:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11572 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 2, 0.3927F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11573 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11573 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11573:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11573 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 3, 0.3322F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11574 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11574 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11574:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11574 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 4, 0.8495F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:48: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11575 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
      |                                                ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11575 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11575:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11575 |     GGML_TEST_ASSERT_ELEMENT_F32(sigmoid_a, 5, 0.4889F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11584 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
      |                                              ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11584 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11584:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11584 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 0, 1.0051F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11585 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
      |                                              ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11585 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11585:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11585 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 1, 1.0484F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11586 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
      |                                              ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11586 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11586:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11586 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 2, 1.6200F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11587 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
      |                                              ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11587 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11587:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11587 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 3, 0.5156F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11588 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
      |                                              ^~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11588 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11588:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11588 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 4, 1.7310F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:46: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11589 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
      |                                              ^~~~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11589 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11425:134: warning: implicit conversion from ‘float’ to ‘double’ when passing argument to function [-Wdouble-promotion]
11425 |         GGML_TEST_ASSERT(fabsf(actual - expected_value) <= 0.0001F, "At %s[%d]: expected %f, actual %f", #tensor, i, expected_value, actual);\
      |                                                                                                                                      ^~~~~~
/www/rwkv.cpp/ggml.c:11417:29: note: in definition of macro ‘GGML_TEST_ASSERT’
11417 |             fprintf(stderr, __VA_ARGS__);\
      |                             ^~~~~~~~~~~
/www/rwkv.cpp/ggml.c:11589:5: note: in expansion of macro ‘GGML_TEST_ASSERT_ELEMENT_F32’
11589 |     GGML_TEST_ASSERT_ELEMENT_F32(max_a_b, 5, -0.0446F);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
CMakeFiles/ggml.dir/build.make:75: recipe for target 'CMakeFiles/ggml.dir/ggml.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml.c.o] Error 1
CMakeFiles/Makefile2:84: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

"+++" continuous writing exception

The novel model "+++" is not output after several times of writing, and other instructions are not output. I can only reset the model.
decoded: str = tokenizer.decode(accumulated_tokens)
if '\uFFFD' not in decoded:

Compilation issue on Mac

Hi 👋

First of all, amazing project :)

Let me clarify, I don't have a Mac to double test this - and our CI doesn't seems to catch this bug. I'm the author of LocalAI, and people are reporting since the last rwkv.cpp release that compilation broke with Mac mudler/LocalAI#411 , any chance you can reproduce this locally?

Thanks!

generate_completions feedback

generate_completions seems to be very bad at narration for any meaningful length (past about 200 words), often hallucinating more or repeating passages (for the 8GB fp16quanti4 14B Raven instruct 6 model).

also general usage could be easily improved with a while loop

Would be good to implement a simple while loop so the model does not leave RAM if someone wants to generate more and some of the repetition penalty logic (using GEN_alpha_presence = 0.2 # Presence Penalty and GEN_alpha_frequency = 0.2 # Frequency Penalty) that BlinkDL uses in ChatRWKV.

It's pretty fast on AVX2 ! this is an awesome repo and thank you for your work

chat room

Hi super cool looking project.

is there going to be any way to exchange ideas? gh discussion, discord server?
or did I miss something

Crash on an `endbr64` instruction.

My build crashes inferencing with a model with "Illegal Instruction".
I debugged it and seems to crash on an endbr64 instruction. I think my CPU doesn't support the instruction set.
Is there a building option to turn off the instruction set?

Version: Master, commit e84c446d9533dabef2d8d60735d5924db63362ff

Command to reproduce
python rwkv/chat_with_bot.py ../models/xxxxxxx.bin

It crashed with "Illegal Instruction"

I debugged the program:

> gdb python 
(gdb) handle SIGILL stop
(gdb) run rwkv/chat_with_bot.py ../models/xxxx.bin
...
[New Thread 0x7fff6fa49640 (LWP 738136)]
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Loading RWKV model

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffde693135 in ggml_init () from /*****/rwkv.cpp/librwkv.so
(gdb) disassemble
Dump of assembler code for function ggml_init:
   0x00007fffde692fd0 <+0>:	endbr64 
   0x00007fffde692fd4 <+4>:	push   %r15
   0x00007fffde692fd6 <+6>:	mov    $0x1,%eax
   0x00007fffde692fdb <+11>:	push   %r14
...

jetson orin build failure

% cmake . -DRWKV_CUBLAS=ON
-- GGML CUDA sources found, configuring CUDA architecture
-- Configuring done (2.4s)
CMake Error at CMakeLists.txt:250 (add_library):
Cannot find source file:

/tmp/rwkv.cpp/ggml/src/ggml.c

Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
.hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

CMake Error at CMakeLists.txt:250 (add_library):
No SOURCES given to target: ggml

CMake Generate step failed. Build files cannot be regenerated correctly.

rwkv.cpp server

Hi @saharNooby, first off amazing work in this repo, I've been looking for a cpu implementation of RWKV to experiment with using the pre-trained models (don't have a large gpu).

I've put together a basic port of my OpenAI-compatible webserver from llama-cpp-python and tested it on Linux with the your library and the RWKV Raven 3B model in f16, q4_0, and q4_1 (pictured below). Going to try some larger models this weekend to test performance / quality. The cool thing about exposing the model through this server is that It opens the project up to be connected to any OpenAI client (langchain, chatui's, multi-language client libraries).

Let me know if you want me to put a PR to merge this in somewhere and if so the best place to put it.

Cheers, and again great work!

How can i build android rwkv.cpp?

Thank you for this promising work,

like this one
https://github.com/ggerganov/llama.cpp#android

I have compiled with linux instructions ,and everything seems ok, but when i run python file, this error occured:

Can we use Q4_1 for some of the matrices?

Thanks for the great work. Maybe we can use Q4_1 for some of the matrices? (and Q4_1_O for others)

Add CuBLAS support

Basic Samplers?

Hey I have noticed this doesn't seem to contain samplers in c I was wondering would it be difficult to implement? why not just copy the llama samplers? stupid question likely! I am not a CPP or ggml pro sorry

Consider uploading some quantized checkpoints to hugginface

Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per torch.load in https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py, line 126). Not to mention how much heavier the unquantized models are on bandwidths.

WARNING: Unused parameter in LoRA state dict

Converts an RWKV model checkpoint in PyTorch format to an rwkv.cpp compatible file using convert_pytorch_to_ggml.py.
Get a LoRA checkpoint with https://github.com/Blealtan/RWKV-LM-LoRA.
Merges a LoRA checkpoint in PyTorch format (.pth) into an rwkv.cpp model file using merge_lora_into_ggml.py.
Warnings like that "Unused parameter in LoRA state dict blocks.13.att.receptance.lora_B(att.key.lora_A、att.value.lora_A、att.receptance.lora_A、ffn.key.lora_A、ffn.receptance.lora_A、ffn.value.lora_A、att.key.lora_B、att.value.lora_B、att.receptance.lora_B、ffn.key.lora_B、ffn.receptance.lora_B、ffn.value.lora_B)" were printed during the merge .
Using the merged model is poor and does not reflect the effect of lora.
Why does this happen?

ggml_new_tensor_impl: not enough space in the context's memory pool

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 4625438848, available 4538236928)
Traceback (most recent call last):
File "D:\AI\rwkv.cpp\rwkv\chat_with_bot.py", line 4, in
model = rwkv_cpp_model.RWKVModel(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI\rwkv.cpp\rwkv\rwkv_cpp_model.py", line 37, in init
self.ctx = self.library.rwkv_init_from_file(model_path, thread_count)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AI\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py", line 74, in rwkv_init_from_file
ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'), ctypes.c_uint32(thread_count))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: exception: access violation reading 0x0000000000000008

Anyone know how to fix it?

Illegal instruction (Intel N4200, Linux Ubuntu Jaunty)

...even after building with
cmake -DRWKV_AVX=OFF -DRWKV_AVX2=OFF -DRWKV_AVX512=OFF -DRWKV_FMA=OFF -DBUILD_SHARED_LIBS=ON .
cmake --build . --config Release

This is on an N4200 cpu (has SSE2, SSSE3, SSE4_1 & SSE4_2, but not FMA or AVXs) running ubuntu jaunty (in a singularity container).

(Issue unlikely to be due to singularity: apptainer/singularity#5795)

How to increase state size?

First of all, amazing work on this implementation!

I wanted to really test pushing the context length to its limits (using the chat example), but noticed that the state is always the same size when saved, even it had gone through 50k tokens worth of context.

How do I increase the size of the state available to accommodate the larger context?

Any help would be appreciated!

Add MMAP support

Porting over the https://github.com/ggerganov/llama.cpp mmap support, would reduce the minimum RAM requirement for the model.

I know, you know, haha - just putting it here for me to keep track of (if i come around to picking it up, to optimize the RWKV-cpp-node bindings)

Q4_0 and Q4_1 quantization breaks RWKV due to weight/activation outliers

I've been measuring loss and perplexity of different sizes and data types on a very small private dataset:

rwkv.cpp-169M-Q4_0.bin                      averages: loss [3.629], perplexity  37.691
rwkv.cpp-169M-Q4_1.bin,                     averages: loss [3.163], perplexity  23.642
rwkv.cpp-169M-float16.bin,                  averages: loss [2.699], perplexity  14.861
rwkv.cpp-169M.bin,                          averages: loss [2.699], perplexity  14.861

RWKV-4-Pile-430M-20220808-8066-q4_0.bin,    averages: loss [2.911], perplexity  18.375
RWKV-4-Pile-430M-20220808-8066-q4_1.bin,    averages: loss [2.631], perplexity  13.885
RWKV-4-Pile-430M-20220808-8066-FP16.bin,    averages: loss [2.377], perplexity  10.777
RWKV-4-Pile-430M-20220808-8066-FP32.bin,    averages: loss [2.377], perplexity  10.777

RWKV-4-Pile-1B5-20220929-ctx4096-Q4_0.bin,  averages: loss [3.079], perplexity  21.745
RWKV-4-Pile-1B5-20220929-ctx4096-Q4_1.bin,  averages: loss [2.655], perplexity  14.231
RWKV-4-Pile-1B5-20220929-ctx4096-FP16.bin,  averages: loss [2.060], perplexity   7.847
RWKV-4-Pile-1B5-20220929-ctx4096-FP32.bin,  averages: loss [2.060], perplexity   7.847

RWKV-4-Pile-3B-20221110-ctx4096-Q4_0.bin,   averages: loss [4.689], perplexity 108.724
RWKV-4-Pile-3B-20221110-ctx4096-Q4_1.bin,   averages: loss [2.916], perplexity  18.475
RWKV-4-Pile-3B-20221110-ctx4096-FP16.bin,   averages: loss [2.067], perplexity   7.901

RWKV-4-Pile-7B-20230109-ctx4096-Q4_0.bin,   averages: loss [6.296], perplexity 542.322
RWKV-4-Pile-7B-20230109-ctx4096-Q4_1.bin,   averages: loss [3.017], perplexity  20.423

The measuring method may not be entirely correct, but these huge losses and perplexities really do show in the quality of generated text -- it is almost incoherent.

Of course, we need proper measuring on WikiText; but it would be very slow on my hardware, and WikiText is not representative of my use case.

Interesting thing to note are min and max values of RWKV matrix weights:

169M: -13.8750 14.0000
430M: -14.5000 14.9375
1.5B: -27.2500 27.3750
3B: -12.6875 14.1250

For comparison, llama 7B min and max values are around -2.5 2.5!

As a next step, I'll try to determine whether these huge values are outliers, or most weights really are distributed in this range.

I guess we need alternative quantization scheme for RWKV.

Training, in .cpp, on one machine?

This is a really great package. I'm not yet understanding the training mathematics, however. In order to get a system that integrates with legacy C++ code and runs fast, ideally faster than a python bridge, how easy would it be to slap together a baby trainer demo? Something similar to the (pick one) character-based / word-based tiny-shakespeare / OpenWebText training examples in https://github.com/karpathy/nanoGPT/tree/master/data? This would be really useful and great if it could be done.

question about the quantization

How to generate the quantized INT4, INT5 and INT8 model?

Do you use GPTQ/RPTQ or normal per-tensor/per-channel PTQ? For quantized int8 model? Do you use int8 @ int8 -> int32 cublas?

AssertionError: rwkv_init_from_file failed, check stderr

Hi,

I am getting this error when trying to load the model. Can someone please help me? I downloaded the quantized model from huggingface

File [e:\minions\RWKV\rwkv.cpp\rwkv\rwkv_cpp_shared_library.py:90](file:///E:/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py:90), in RWKVSharedLibrary.rwkv_init_from_file(self, model_file_path, thread_count)
     [74](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=73) """
     [75](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=74) Loads the model from a file and prepares it for inference.
     [76](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=75) Throws an exception in case of any error. Error messages would be printed to stderr.
   (...)
     [85](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=84)     Count of layers to load on gpu, must be positive only enabled with cuBLAS.
     [86](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=85) """
     [88](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=87) ptr = self.library.rwkv_init_from_file(model_file_path.encode('utf-8'),
     [89](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=88)                                        ctypes.c_uint32(thread_count))
---> [90](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=89) assert ptr is not None, 'rwkv_init_from_file failed, check stderr'
     [91](file:///e%3A/minions/RWKV/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py?line=90) return RWKVContext(ptr)

AssertionError: rwkv_init_from_file failed, check stderr

Typo in README.md

typo here

python rwkv\convert_rwkv_to_ggml.py

# Windows
python rwkv\convert_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float16

should be

# Windows
python rwkv\convert_pytorch_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float16

Get error status and message without stderr

Could some GetLastError-like function be added, with error messages or error codes, so it's possible to get error messages without redirecting stderr?

Library Fails to Build

cmake .  ✔
cmake --build . --config Release
-- The C compiler identification is GNU 12.2.1
-- The CXX compiler identification is GNU 12.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Error at CMakeLists.txt:215 (add_subdirectory):
The source directory

/home/rexommendation/Downloads/rwkv.cpp-master-a3178b2/ggml

does not contain a CMakeLists.txt file.

CMake Error at CMakeLists.txt:218 (set_target_properties):
set_target_properties Can not find target to add properties to: ggml

-- Configuring incomplete, errors occurred!
make: Makefile: No such file or directory
make: *** No rule to make target 'Makefile'. Stop.

AssertionError: xxxxxxxxx.bin is not a file

Do you know how to fix it?

AttributeError: 'Tensor' object has no attribute 'untyped_storage' (PyTorch version)

i'm running chat_with_bot.py using torch==1.13, bug some error occur:

Loading 20B tokenizer System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | Loading RWKV model Processing 185 prompt tokens, may take a while Traceback (most recent call last): File "rwkv/chat_with_bot.py", line 115, in <module> process_tokens(split_last_end_of_line(tokenizer.encode(init_prompt).ids)) File "rwkv/chat_with_bot.py", line 81, in process_tokens logits, state = model.eval(_token, state, state, logits) File "/root/RWKVcpu/rwkv/rwkv_cpp_model.py", line 102, in eval state_out.untyped_storage().data_ptr(), AttributeError: 'Tensor' object has no attribute 'untyped_storage'

looks like there are some version conflicts, need help

OSError: bin/Release/rwkv.so: undefined symbol: max

python3 rwkv/quantize.py ../RWKV-4-Pile-14B-20230313-ctx8192-test1050.bin ../RWKV-4-Pile-14B-20230313-ctx8192-test1050-Q4_1.bin 3

"Unexpected end of data" when decoding partial Unicode characters with World tokenizer

rwkv world 3b or 7b Q8_0
input
Translate the following text into Korean: "Hello"
output
File "/www/wenda-pi/llms/rwkvcpp/rwkv_tokenizer.py", line 94, in decode
return self.decodeBytes(tokens).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data

Modify the above file
return self.decodeBytes(tokens).decode('utf-8','ignore')

output
안하세요.
but the correct one should be
안녕하세요

lost character 녕

by model rwkv world fp16 is correct

"Unsupported quantization type" when quantizing model

After successfully converting the RWKV model, quanting it does not work. Here is the error:

Unsupported quantization type 84426624
/Users/dac/Documents/_AI/rwkv.cpp/rwkv.cpp:583: q_type == 2 || q_type == 3 || q_type == 4 || q_type == 5 || q_type == 6
Traceback (most recent call last):
  File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/quantize.py", line 31, in <module>
    main()
  File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/quantize.py", line 22, in main
    library.rwkv_quantize_model_file(
  File "/Users/dac/Documents/_AI/rwkv.cpp/rwkv/rwkv_cpp_shared_library.py", line 177, in rwkv_quantize_model_file
    assert self.library.rwkv_quantize_model_file(
AssertionError: rwkv_quantize_model_file failed, check stderr ```

I have tried Q8_0, Q5_0, Q5_1, Q4_2  and all result with the same error.

chat_with_bot.py not work well with Raven v8

Hi @saharNooby, I really enjoy the project. However, the chat_with_bot.py does not work well with the Raven models (see below). I think this may due to the prompt. Based on the demo I wrote a new script that can fit my need. I put it here as it may help others. And please correct and comment.

https://gist.github.com/zklhp/a60c4501060383d1cb99b4b6e24109d1

Thank you very much for your input.

With chat_with_bot.py, it does not really follow the order. When you press Enter, it talks to himself. The rwkv.cpp-14B.bin is converted from RWKV-4-Raven-14B-v8-Eng-20230408-ctx4096.pth (link).

$ python -i rwkv/chat_with_bot.py rwkv.cpp-14B.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
Processing 92 prompt tokens, may take a while
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:100: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  state_out.storage().data_ptr(),
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:101: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  logits_out.storage().data_ptr()
/ssd1/ai/rwkv.cpp/rwkv/rwkv_cpp_model.py:82: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  state_in_ptr = state_in.storage().data_ptr()

Chat initialized! Write something and press Enter.
Tell me about ravens.
> Bob: Ravens are large, black birds of the family Corvidae, known for their intelligence, keen eyesight, and ability to mimic human speech.
Write a song about ravens.
> Bob: Once upon a time, there was a group of ravens who lived in a big, old tree. They were all very different, but they all loved to sing. One day, they decided to form a band and sing together. They were the best band in the forest, and they sang all night long.

> Bob: Tell me about the history of the Beatles.

> Bob: Tell me about the history of the Beatles.

> Bob: Tell me about the history of the Beatles.

With my script, it looks like this. Note I add two functions.

Blah Blah are ignored.

Chat initialized! Write something and press Enter.
- Use '+' to start a new dialog.
- To fill the input, use '\' at the end of line.

> Write a song about ravens.
# Response: Verse 1:
Birds of the sky,
Birdies of the earth,
Black and white,
Ravens, in the sky.
Verse 2:
Cawing and calling,
Stalking the ground,
Birds of prey,
Ravens, in the sky.
Chorus:
Ravens,
Cawing and flying,
Black and white,
In the sky.
Verse 3:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 4:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 5:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 6:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Verse 7:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Chorus:
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
Ravens,
Cawing and singing,
Black and white,
In the sky.
> + Write a song about ravens.
Open a new dialog.
# Response: Verse 1:
Listen to the ravens, they're calling so loud,
Cawing and cawing, spreading their wings so high.
With a croak and a croak, a flutter and a flutter,
They're telling a tale, of secrets so important.
Chorus:
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their stories so far.
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever, I'll be there.
Verse 2:
Cawing and cawing, telling their tales,
Ravens, ravens, soaring and singing.
In the sky, so high, soaring and singing,
Cawing and cawing, telling their tales so proud.
Chorus:
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Bridge:
Ravens, ravens, living in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the sky,
Cawing and cawing, forever, forever I'll be there.
Chorus:
Ravens, ravens, living in the sky,
Cawing and cawing, telling their tales so far.
Ravens, ravens, living in the sky,
Cawing and cawing, forever, forever I'll be there.
> Extend the following song\
> Cawing and cawing, forever, forever I'll be there.
# Response: Cawing and cawing, forever, forever I'll be there,
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so loud,
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Cawing and cawing, forever, forever I'll be there.
> Extend the following song\
> Cawing and cawing, forever, forever I'll be there.
# Response: Cawing and cawing, forever, forever I'll be there,
Ravens, ravens, flying in the sky,
Cawing and cawing, telling their tales so loud,
Ravens, ravens, living in the wind,
Cawing and cawing, forever, forever I'll be there.
Cawing and cawing, forever, forever I'll be there.

conversion seems to only support float16/float32, not quantized formats.

On Windows 11 installed per instructions, conversion seems to only support float16/float32, not quantized formats.

~\src\rwkv.cpp> python rwkv\convert_pytorch_to_ggml.py RWKV-4-Raven-14B-v12-Eng98%-Other2%-20230523-ctx8192.pth Q8_0_RWKV-4-Raven-14B-v12.bin Q8_0
usage: convert_pytorch_to_ggml.py [-h] src_path dest_path {float16,float32}
convert_pytorch_to_ggml.py: error: argument data_type: invalid choice: ‘Q8_0’ (choose from ‘float16’, ‘float32’)

Cmake Error

When I did "cmake --build . --config Release", it occurred an error :
"ggml/src/CMakeFiles/ggml.dir/build.make:75: recipe for target 'ggml/src/CMakeFiles/ggml.dir/ggml.c.o' failed
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/ggml.c.o] Error 1
CMakeFiles/Makefile2:164: recipe for target 'ggml/src/CMakeFiles/ggml.dir/all' failed
make[1]: *** [ggml/src/CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2"
Do you know how to fix it?

Apple Silicon : mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')

Because instruction for Apple silicon in Readme.md has been applied, CMAKE_SYSTEM_PROCESSOR from cmake . appears arm64.
But, there is warning as below for cmake .

-- Accelerate framework found
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- CMAKE_SYSTEM_PROCESSOR: arm64
CMake Warning at ggml/src/CMakeLists.txt:48 (message):
  Your arch is announced as x86_64, but it seems to actually be ARM64.  Not
  fixing that can lead to bad performance.  For more info see:
  https://github.com/ggerganov/whisper.cpp/issues/66#issuecomment-#1282546789


-- ARM detected
-- Accelerate framework found
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /my/path/rwkv.cpp

Furthermore, below error occurs for python rwkv/generate_completions.py /my/path/rwkv.cpp/models/rwkv-4-raven/ggml-model-q5_1.bin

OSError: dlopen(/my/path/rwkv.cpp/librwkv.dylib, 0x0006): tried: '/my/path/rwkv.cpp/librwkv.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/my/path/rwkv.cpp/librwkv.dylib' (no such file), '/my/path/rwkv.cpp/librwkv.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

For your reference, similar error for llama-cpp-python was solved with arch -arm64 pip install llama-cpp-python --no-cache-dir
It might be similar solution mentioned in above github link from warning.

Additionally, arch -arm64 cmake . and arch -arm64 cmake --build . --config Release were not valid.

error: variable has incomplete type 'struct stat64'

Compile failing on M1 MacOSX, looks like there is a missing type.

-- Accelerate framework found
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- Accelerate framework found
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/jeremyprice/git/go-rwkv.cpp/rwkv.cpp
[ 12%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 25%] Linking C static library libggml.a
[ 25%] Built target ggml
[ 37%] Building CXX object CMakeFiles/rwkv.dir/rwkv.cpp.o

/git/go-rwkv.cpp/rwkv.cpp/rwkv.cpp:471:19: error: variable has incomplete type 'struct stat64'
    struct stat64 file_stat;
                  ^
/git/go-rwkv.cpp/rwkv.cpp/rwkv.cpp:471:12: note: forward declaration of 'stat64'
    struct stat64 file_stat;
           ^

Lanchain.js integration

Is it some how possible to use rwkv.cpp with langchain.js https://js.langchain.com/docs/

Offload model head when using cuBLAS

Currently, only matrices of layers are offloaded to the GPU. Head, the biggest matrix in the model, stays on CPU and evaluated there.

On my machine, offloading head of 14B model in addition to offloading all layers gives 60 ms per token latency vs 70 ms without head offloading.

As always, the hardest question here is API design -- we need to preserve compatibility and not inflate API with new small functions.

chat_with_bot will stop when '\n' in response.

https://github.com/saharNooby/rwkv.cpp/blob/1be9fda24874f87d7e604b76f76359b1bb20cb2b/rwkv/chat_with_bot.py#L84-L86

then if the bot returns a '\n', break, it will be stopped.

rwkv / rwkv.cpp Goto Github PK

rwkv.cpp's People

Contributors

Stargazers

Watchers

Forkers

rwkv.cpp's Issues

Error

environment:

Recommend Projects

Recommend Topics

Recommend Org