abdeladim-s / pyllamacpp Goto Github PK

View Code? Open in Web Editor NEW

59.0 59.0 21.0 3.38 MB

Python bindings for llama.cpp

Home Page: https://abdeladim-s.github.io/pyllamacpp/

License: MIT License

CMake 6.40% Python 25.39% C++ 68.13% C 0.01% Shell 0.08%

langchain llama llamacpp llms

pyllamacpp's Introduction

Hi there 👋 I am Abdeladim

• •

pyllamacpp's People

Contributors

Stargazers

Watchers

Forkers

seshakiran skumarrs0320 gopi2212 parisneo marketally sandjab afirez pajoma vaibhav-jha btconway kevinteng525 qztseng avocadobaums hfchiu lzqxyz2023 adarshxs mirainero arthurcowboy zhangheli

pyllamacpp's Issues

It still does not work with some GGML v3 models

For example:
https://huggingface.co/TheBloke/gpt4-alpaca-lora-30B-4bit-GGML/tree/main
The main branch should have GGML v3 but python crashes when trying to run this model.
I can run it using llama_cpp package.

Thanks

Llamma2 model in Apple Sillicon is supported

Hi,

Thanks for the contribution. I tried the model llama-2-13b-chat.ggmlv3.q8_0.bin in Mac M1 Max 64GB RAM with pyllamacpp==2.4.1, python==3.9 and works as charm. Sample code:

from pyllamacpp.model import Model

input = "I want you to act as a physician. Explain what superconductors are."
model_path='./llama-2-13b-chat.ggmlv3.q8_0.bin'
model = Model(model_path)

for token in model.generate(input):
    print(token, end='', flush=True)

$python testLLM13B.py
llama.cpp: loading model from ./llama-2-13b-chat.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 15237.95 MB (+ 3216.00 MB per state)
.
llama_init_from_file: kv self size  =  800.00 MB
 Explain their properties and the potential benefits they offer.
  Superconductors are materials that exhibit zero electrical resistance when cooled below a certain temperature, known as the critical temperature (Tc). This means that superconductors can conduct electricity with perfect efficiency and without any loss of energy.

The properties of superconductors include:

1. Zero electrical resistance: Superconductors have zero electrical resistance when cooled below Tc, which makes them ideal for high-power appli                                 as power transmission and storage.
2. Perfect diamagnetism: Superconductors expel magnetic fields when cooled below Tc, which makes them useful in MRI machines and other medical applications.
3. Quantum levitation: Superconductors can levitate above a magnet when cooled below Tc, which has potential applications in transportation and energy storage.
4. High-temperature superconductivity: Some superconductors have critical temperatures above the boiling point of liquid nitrogen (77 K), making them more practical for real-world applications.
The potential benefits of superconductors include:
1. More efficient power transmission and storage: Superconductors can transmit and store electricity with perfect efficiency, which could lead to significant energy savings and reduced carbon emissions.
2. Improved medical imaging: Superconducting magnets are used in MRI machines, which provide higher-resolution images and faster scan times than traditional magnets.
3. High-speed transportation: Superconductors could be used to create magnetic levitation trains that are faster and more efficient than conventional trains.
4. Enhanced security: Superconducting sensors can detect even slight changes in magnetic fields, which could be useful in security applications such as intrusion detection.
5. Energy storage: Superconductors could be used to store energy generated by renewable sources such as wind and solar power, which could help to reduce our reliance on fossil fuels.
Overall, superconductors have the potential to revolutionize a wide range of industries and provide significant benefits to society. However, more research is needed to fully understand their properties and potential applications.

Embeddings

Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?

Best regards

windows build faild while llama-cpp-py worksa

Is this CPU ONLY?

Illegal Instruction (core dumped) even after disabling AVX2 and FMA

Hi, I'm very new to all of this and pyllamacpp so I'm sorry in advance if the details provided in this issue aren't good enough or up to par but I've been having some issues when doing:
python -c 'from pyllamacpp.model import Model'

I know this has something do with my CPU and I've also followed this guide exactly: nomic-ai/pygpt4all#71.
I have an older server machine with 2 Intel Xeon X5670.

How do I figure out what's going on and how do I fix it?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I am not sure where exactly the issue comes from (either it is from model or from pyllamacpp), so opened also this one nomic-ai/gpt4all#529

I tried with GPT4All models (for, instance https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)

I am able to run this model as well as lighter models, but in about 2-4 promts given to the model (in the process of answering) it fails with "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)". If provide max allowed prompt (±4000 tokens), then it fails with the first request to generate a responce. The same behavior for all gpt4all models downloaded 2-3 days ago. I am running it on Macbook Pro M1 (2021), 16 GB RAM. Tried python from 3.9 to 3.11. Also, tried with Jupyter lab (kernel 3.10), PyCharm and terminal. It is all the same. pyllamacpp is of 2.1.3 version Any ideas where the problem may come from?

I traced calls and found the exact code line where it failes:

`call, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:225
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:226
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:227
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:230
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:183
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:184
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:186
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:187
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:188
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:189
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)`

This is in the generate method at calling C-code as far as I can judge:
pp.llama_eval(self._ctx, predicted_tokens, len(predicted_tokens), self._n_past, n_threads)

Compilation on raspberry pi fails

Hi @abdeladim-s . I am trying to test lollms on a raspberry pi 4 with orca-mini-3b and can't manage to compile your code without errors.

Did you test building the wheels for raspberry pi? it would be cool to have a wheel compatible with raspberry pi 4 because building burns my rasp and takes lonjg time and it fails after more than an hour of compilation. With A raspberry pi 4 we can fuse the whispercpp and the lollms to have a 100% local assistant.

I need the wheel to be built for python 3.10 if possible. This could be a real good challenge and I think your binding is small enough to be used. The other bindings are to complicated.

what do you think?

Can't import vicuna models : `(bad f16 value 5)`

When I try to load the vicuna models downloaded from this page, I have the following error :

# pyllamacpp /models/ggml-vicuna-7b-1.1-q4_2.bin 


██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗ 
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝ 
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║     
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝     
                                                                                    

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.1.3 

         
=========================================================================================

[+] Running model `/models/ggml-vicuna-7b-1.1-q4_2.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama_model_load: loading model from '/models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 5
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: invalid model file '/models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

I do not have this problem when using the gpt4all models. Running the vicuna models with the latest version of llama.cpp works just fine.

Process finished with exit code -1073741795 (0xC000001D)

Process finished with exit code -1073741795 (0xC000001D) appears after I try to import class 'Model' with
from pyllamacpp.model import Model.
I've tried to downgrade lib version but nothing changed.

I use Windows 10, python version 3.11.0

Color code on windows.

Hi, many thanks for great work. pyllamacpp is my favorite llama.cpp binding, and I love using it on my Mac. But on Windows, I see that 'color code' doesn't seem to be working, like
"You: What's the point of Zen Buddhism ?
�[94mAI: �[0m�[96mZ�[0m�[96men�[0m�[96m Buddh�[0m�[96mism�[0m�[96m is�[0m�[96m a�[0m�[96m branch�[0m�[96m of�[0m�[96m Mah�[0m�[96may�[0m�[96mana�[0m�[96m Buddh�[0m�[96mism�[0m�[96m that�[0m�[96m emphas�[0m�[96mizes�[0m�[96m the�[0m�[96m att�[0m�[96main�[0m�[96mment�[0m�[96m of�[0m�[96m en�[0m�[96mlight�[0m�[96men�[0m�[96mment�[0m�[96m through�[0m�[96m med�[0m�[96mitation�[0m�[96m and�[0m�[96m the�[0m�[96m experience�[0m�[96m of�[0m�[96m moment�[0m�[96m-�[0m�[96mto�[0m�[96m-�[0m�[96mm�[0m�[96moment�[0m�[96m aw�[0m�[96maren�[0m�[96mess�[0m�[96m.�[0m�[96m The�[0m�[96m ult�[0m�[96mimate�[0m�[96m goal�[0m�[96m of�[0m�[96m Z�[0m�[96men�[0m�[96m Buddh�[0m�[96mism�[0m�[96m is�[0m�[96m to�[0m�[96m realize�[0m�[96m one�[0m�[96m'�[0m�[96ms�[0m�[96m true�[0m�[96m nature�[0m�[96m or�[...."
Any idea how to fix this, please ?

import _pyllamacpp as pp ImportError: initialization failed for latest versions of pyllamacpp

On a clean venv, installing only pip install pyllamacpp==2.4.1 or pip install pyllamacpp,

when I run

from pyllamacpp.model import Model

I get the following error: import _pyllamacpp as pp ImportError: initialization failed

When I downgrade to pyllamacpp==2.1.3, everything works.

Wrong description of detokenize() parameter 'tokens'

def detokenize(self, tokens: list):
        """
        Returns a list of tokens for the text <- wrong description
        :param text: text to be tokenized <- wrong description
        :return: A string representing the text extracted from the tokens
        """
        return pp.llama_tokens_to_str(self._ctx, tokens)

ggllm branch

Hi Abdeladim, many thanks for this new branch which I didn't expect it done this quick ! I tried on 3 platforms, ie. OSX Mojave, WSL2( ubuntu) and Ubuntu 22.04, but can't make it work... First pip/git install did not work on all three. So I downloaded the project and installed 'python setup.py install', but again all three failed with the same error messages. I attach the error messages for your ref. It obviously is above my understanding, as you've guessed ! I'd appreciate if you could have a look and advise how to make this work on my environment. Cheers !

pyllamacpp_ggllm_errors.txt

Using model.cpp_generate

I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?

Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference

@app.post("/chat")
async def chat(request: ChatRequest):
    prompt = request.prompt
    
    global conversation_history
    conversation_history += request.conversation_history

    # Pass prompt and conversation_history to model
    full_prompt = conversation_history + "\n" + prompt
    
    def iter_tokens():
        for token in model.generate(
            prompt=full_prompt,
            antiprompt="Human:",
            n_threads=6,
            n_batch=1024,
            n_predict=256,
            n_keep=48,
            repeat_penalty=1.0,
        ):
            yield token.encode()

    return StreamingResponse(iter_tokens(), media_type="text/plain")

chat with bob example broken

Hi @abdeladim-s , thanks for the update!

I was trying to update to pyllamacpp==2.4.0 but found that even the example on the README, which is similar to llama.cpp's ./examples/chat.sh but not identical, is not working properly. For example, when I copied the example code into a foo.py and run it, I got:

If I go to llama.cpp, check out 66874d4 then make clean && make && ./examples/chat.sh, I got:

I just want to get an equivalent of running llama.cpp's chat.sh with pyllamacpp==2.4.0, no more no less. How should I do it?

Model loading speed?

Everything works fine, model can be loaded. But it takes a very long time before it generates something at a decent speed. (2 minutes)
Sometimes it stops after generating 5-10 tokens and after 20 seconds later it proceeds.

Using 13b gpt4 x alpaca, 12gen i7 12700F (n_theads = 20), f16_kv = 1, 16gb ram (fits in)

Using alpaca.cpp it loads in like 2 seconds and generates directly after without stop.

Bos token will always be added to suffix

https://github.com/abdeladim-s/pyllamacpp/blob/6d487b904b93c48862cc1d8b29c7f3466ca6f6a5/pyllamacpp/model.py#LL111C92-L111C92

pp.llama_tokenize with True param will add bos token to the string in suffix.
After in line 202
input_tokens = self._prompt_prefix_tokens + pp.llama_tokenize(self._ctx, prompt, True) + self._prompt_suffix_tokens
the input prompt will be [BOS]<promt>[BOS]<suffix>.
In the case if bos token is not "", I think it will be improper.

support new version of llamacpp

Hi Abdeladim, there are many new models that can't run on the pyllamacpp binding because they are using version 2 of ggml.
If you have some time, can you try and add support to this please?