abdeladim-s / pyllamacpp Goto Github PK
View Code? Open in Web Editor NEWPython bindings for llama.cpp
Home Page: https://abdeladim-s.github.io/pyllamacpp/
License: MIT License
Python bindings for llama.cpp
Home Page: https://abdeladim-s.github.io/pyllamacpp/
License: MIT License
For example:
https://huggingface.co/TheBloke/gpt4-alpaca-lora-30B-4bit-GGML/tree/main
The main branch should have GGML v3 but python crashes when trying to run this model.
I can run it using llama_cpp package.
Thanks
Hi,
Thanks for the contribution. I tried the model llama-2-13b-chat.ggmlv3.q8_0.bin
in Mac M1 Max 64GB RAM with pyllamacpp==2.4.1
, python==3.9
and works as charm. Sample code:
from pyllamacpp.model import Model
input = "I want you to act as a physician. Explain what superconductors are."
model_path='./llama-2-13b-chat.ggmlv3.q8_0.bin'
model = Model(model_path)
for token in model.generate(input):
print(token, end='', flush=True)
$python testLLM13B.py
llama.cpp: loading model from ./llama-2-13b-chat.ggmlv3.q8_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 7 (mostly Q8_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: mem required = 15237.95 MB (+ 3216.00 MB per state)
.
llama_init_from_file: kv self size = 800.00 MB
Explain their properties and the potential benefits they offer.
Superconductors are materials that exhibit zero electrical resistance when cooled below a certain temperature, known as the critical temperature (Tc). This means that superconductors can conduct electricity with perfect efficiency and without any loss of energy.
The properties of superconductors include:
1. Zero electrical resistance: Superconductors have zero electrical resistance when cooled below Tc, which makes them ideal for high-power appli as power transmission and storage.
2. Perfect diamagnetism: Superconductors expel magnetic fields when cooled below Tc, which makes them useful in MRI machines and other medical applications.
3. Quantum levitation: Superconductors can levitate above a magnet when cooled below Tc, which has potential applications in transportation and energy storage.
4. High-temperature superconductivity: Some superconductors have critical temperatures above the boiling point of liquid nitrogen (77 K), making them more practical for real-world applications.
The potential benefits of superconductors include:
1. More efficient power transmission and storage: Superconductors can transmit and store electricity with perfect efficiency, which could lead to significant energy savings and reduced carbon emissions.
2. Improved medical imaging: Superconducting magnets are used in MRI machines, which provide higher-resolution images and faster scan times than traditional magnets.
3. High-speed transportation: Superconductors could be used to create magnetic levitation trains that are faster and more efficient than conventional trains.
4. Enhanced security: Superconducting sensors can detect even slight changes in magnetic fields, which could be useful in security applications such as intrusion detection.
5. Energy storage: Superconductors could be used to store energy generated by renewable sources such as wind and solar power, which could help to reduce our reliance on fossil fuels.
Overall, superconductors have the potential to revolutionize a wide range of industries and provide significant benefits to society. However, more research is needed to fully understand their properties and potential applications.
Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?
Best regards
windows build faild while llama-cpp-py worksa
Is this CPU ONLY?
Hi, I'm very new to all of this and pyllamacpp so I'm sorry in advance if the details provided in this issue aren't good enough or up to par but I've been having some issues when doing:
python -c 'from pyllamacpp.model import Model'
I know this has something do with my CPU and I've also followed this guide exactly: nomic-ai/pygpt4all#71.
I have an older server machine with 2 Intel Xeon X5670.
How do I figure out what's going on and how do I fix it?
I am not sure where exactly the issue comes from (either it is from model or from pyllamacpp), so opened also this one nomic-ai/gpt4all#529
I tried with GPT4All models (for, instance https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
I am able to run this model as well as lighter models, but in about 2-4 promts given to the model (in the process of answering) it fails with "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)". If provide max allowed prompt (Β±4000 tokens), then it fails with the first request to generate a responce. The same behavior for all gpt4all models downloaded 2-3 days ago. I am running it on Macbook Pro M1 (2021), 16 GB RAM. Tried python from 3.9 to 3.11. Also, tried with Jupyter lab (kernel 3.10), PyCharm and terminal. It is all the same. pyllamacpp is of 2.1.3 version Any ideas where the problem may come from?
I traced calls and found the exact code line where it failes:
`call, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:225
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:226
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:227
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:230
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:183
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:184
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:186
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:187
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:188
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:189
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)`
This is in the generate method at calling C-code as far as I can judge:
pp.llama_eval(self._ctx, predicted_tokens, len(predicted_tokens), self._n_past, n_threads)
Hi @abdeladim-s . I am trying to test lollms on a raspberry pi 4 with orca-mini-3b and can't manage to compile your code without errors.
Did you test building the wheels for raspberry pi? it would be cool to have a wheel compatible with raspberry pi 4 because building burns my rasp and takes lonjg time and it fails after more than an hour of compilation. With A raspberry pi 4 we can fuse the whispercpp and the lollms to have a 100% local assistant.
I need the wheel to be built for python 3.10 if possible. This could be a real good challenge and I think your binding is small enough to be used. The other bindings are to complicated.
what do you think?
When I try to load the vicuna models downloaded from this page, I have the following error :
# pyllamacpp /models/ggml-vicuna-7b-1.1-q4_2.bin
βββββββ βββ ββββββ βββ ββββββ ββββ ββββ ββββββ ββββββββββββββ βββββββ
ββββββββββββ βββββββ βββ βββββββββββββ βββββββββββββββββββββββββββββββββββββ
ββββββββ βββββββ βββ βββ ββββββββββββββββββββββββββββββ ββββββββββββββββ
βββββββ βββββ βββ βββ ββββββββββββββββββββββββββββββ βββββββ βββββββ
βββ βββ βββββββββββββββββββ ββββββ βββ ββββββ ββββββββββββββ βββ
βββ βββ βββββββββββββββββββ ββββββ ββββββ βββ ββββββββββ βββ
PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.1.3
=========================================================================================
[+] Running model `/models/ggml-vicuna-7b-1.1-q4_2.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama_model_load: loading model from '/models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 5
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: invalid model file '/models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)
I do not have this problem when using the gpt4all models. Running the vicuna models with the latest version of llama.cpp works just fine.
Process finished with exit code -1073741795 (0xC000001D)
appears after I try to import class 'Model' with
from pyllamacpp.model import Model
.
I've tried to downgrade lib version but nothing changed.
I use Windows 10, python version 3.11.0
Hi, many thanks for great work. pyllamacpp is my favorite llama.cpp binding, and I love using it on my Mac. But on Windows, I see that 'color code' doesn't seem to be working, like
"You: What's the point of Zen Buddhism ?
οΏ½[94mAI: οΏ½[0mοΏ½[96mZοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m isοΏ½[0mοΏ½[96m aοΏ½[0mοΏ½[96m branchοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m MahοΏ½[0mοΏ½[96mayοΏ½[0mοΏ½[96manaοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m thatοΏ½[0mοΏ½[96m emphasοΏ½[0mοΏ½[96mizesοΏ½[0mοΏ½[96m theοΏ½[0mοΏ½[96m attοΏ½[0mοΏ½[96mainοΏ½[0mοΏ½[96mmentοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m enοΏ½[0mοΏ½[96mlightοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96mmentοΏ½[0mοΏ½[96m throughοΏ½[0mοΏ½[96m medοΏ½[0mοΏ½[96mitationοΏ½[0mοΏ½[96m andοΏ½[0mοΏ½[96m theοΏ½[0mοΏ½[96m experienceοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m momentοΏ½[0mοΏ½[96m-οΏ½[0mοΏ½[96mtoοΏ½[0mοΏ½[96m-οΏ½[0mοΏ½[96mmοΏ½[0mοΏ½[96momentοΏ½[0mοΏ½[96m awοΏ½[0mοΏ½[96marenοΏ½[0mοΏ½[96messοΏ½[0mοΏ½[96m.οΏ½[0mοΏ½[96m TheοΏ½[0mοΏ½[96m ultοΏ½[0mοΏ½[96mimateοΏ½[0mοΏ½[96m goalοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m ZοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m isοΏ½[0mοΏ½[96m toοΏ½[0mοΏ½[96m realizeοΏ½[0mοΏ½[96m oneοΏ½[0mοΏ½[96m'οΏ½[0mοΏ½[96msοΏ½[0mοΏ½[96m trueοΏ½[0mοΏ½[96m natureοΏ½[0mοΏ½[96m orοΏ½[...."
Any idea how to fix this, please ?
On a clean venv, installing only pip install pyllamacpp==2.4.1 or pip install pyllamacpp,
when I run
from pyllamacpp.model import Model
I get the following error: import _pyllamacpp as pp ImportError: initialization failed
When I downgrade to pyllamacpp==2.1.3, everything works.
def detokenize(self, tokens: list):
"""
Returns a list of tokens for the text <- wrong description
:param text: text to be tokenized <- wrong description
:return: A string representing the text extracted from the tokens
"""
return pp.llama_tokens_to_str(self._ctx, tokens)
Hi Abdeladim, many thanks for this new branch which I didn't expect it done this quick ! I tried on 3 platforms, ie. OSX Mojave, WSL2( ubuntu) and Ubuntu 22.04, but can't make it work... First pip/git install did not work on all three. So I downloaded the project and installed 'python setup.py install', but again all three failed with the same error messages. I attach the error messages for your ref. It obviously is above my understanding, as you've guessed ! I'd appreciate if you could have a look and advise how to make this work on my environment. Cheers !
I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?
Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference
@app.post("/chat")
async def chat(request: ChatRequest):
prompt = request.prompt
global conversation_history
conversation_history += request.conversation_history
# Pass prompt and conversation_history to model
full_prompt = conversation_history + "\n" + prompt
def iter_tokens():
for token in model.generate(
prompt=full_prompt,
antiprompt="Human:",
n_threads=6,
n_batch=1024,
n_predict=256,
n_keep=48,
repeat_penalty=1.0,
):
yield token.encode()
return StreamingResponse(iter_tokens(), media_type="text/plain")
Hi @abdeladim-s , thanks for the update!
I was trying to update to pyllamacpp==2.4.0
but found that even the example on the README, which is similar to llama.cpp
's ./examples/chat.sh
but not identical, is not working properly. For example, when I copied the example code into a foo.py
and run it, I got:
If I go to llama.cpp
, check out 66874d4
then make clean && make && ./examples/chat.sh
, I got:
I just want to get an equivalent of running llama.cpp
's chat.sh
with pyllamacpp==2.4.0
, no more no less. How should I do it?
Everything works fine, model can be loaded. But it takes a very long time before it generates something at a decent speed. (2 minutes)
Sometimes it stops after generating 5-10 tokens and after 20 seconds later it proceeds.
Using 13b gpt4 x alpaca, 12gen i7 12700F (n_theads = 20), f16_kv = 1, 16gb ram (fits in)
Using alpaca.cpp it loads in like 2 seconds and generates directly after without stop.
pp.llama_tokenize with True param will add bos token to the string in suffix.
After in line 202
input_tokens = self._prompt_prefix_tokens + pp.llama_tokenize(self._ctx, prompt, True) + self._prompt_suffix_tokens
the input prompt will be [BOS]<promt>[BOS]<suffix>
.
In the case if bos token is not "", I think it will be improper.
Hi Abdeladim, there are many new models that can't run on the pyllamacpp binding because they are using version 2 of ggml.
If you have some time, can you try and add support to this please?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.