Comments (13)
Hi Saifeddine,
Yes I am in the process of syncing the bindings with the latest llama.cpp
progress, but there are so many breaking changes to llama.cpp
lately so it is taking time.
Meanwhile, could you please share with me some links of the models that do not work so I can test them ?
Thank you!
from pyllamacpp.
hi, thanks, here are some. The bloke has up todate models. for example :
https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML
from pyllamacpp.
Yeah, those models are converted to the newest version of ggml
after this breaking change ggerganov/llama.cpp#1405.
The problem is there is no backward compatibility with older models!
I think I will push a release up to that change, and then another release after that change, this way one can choose what version to use based on the models they have ?
what do you think ?
from pyllamacpp.
It can be done like this. For now, I use the official llamacpp binding for the new models format and yours for the previous ones. But maybe having two releases is a good thing.
from pyllamacpp.
-
Version
2.3.0
is now built with the latestllama.cpp
release ( 699b1ad ) and it is working with the newest version of the models ( I've tested it with TheBloke's model above at least). -
The
2.2.0
version can still be used for older models.
But Yeah feel free to use either one. The official bindings are great as well.
from pyllamacpp.
I used these steps to update the new model:
Follow these steps to easily acquire the (Alpaca/LLaMA) F16 model:
1. Download and install the Alpaca-lora repo. https://github.com/tloen/alpaca-lora
2. Once you've successfully downloaded the model weights, you should have them inside a folder like this (on linux):
3. run python convert-pth-to-ggml.py ~/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348 1
4. Once you get your f16 model, copy it into the llama.cpp/models folder.
5. Run: ./quantize ./models/ggml-model-f16.bin ./models/ggml-model-q4_0.bin q4_0
Done.
..but indeed - the pyllamacpp bindings are now broken.
I'll have a look and see if I can switch to the abetlen/llama-cpp-python bindings in the meantime, and get it to work. But yeah - version upgrade is a real time waster, which is why developers should take note and either 1. make it easy to update or 2. make your app/framework/etc. backward compatible with older versions.
Version
2.3.0
is now built with the latestllama.cpp
release ( 699b1ad ) and it is working with the newest version of the models ( I've tested it with TheBloke's model above at least).
- The
2.2.0
version can still be used for older models.But Yeah feel free to use either one. The official bindings are great as well.
It doesn't work for me. It is hallucinating 100% of the time, often in random languages, and not responding coherently to any of my prompts.
from pyllamacpp.
Ok, I have a workaround for now.
It seems that the prompt_context, prompt_suffix and prompt_prefix are broken. So I have to add them manually into the prompt for now.
These python bindings are the only ones working with the new update of LLama.cpp. So well done
UPDATE:
It seems to repeat a lot of answers, which I don't think used to happen before (or maybe I missed it?).
from pyllamacpp.
How do i upgrade any model to the new version of ggjt 2?
using gpt4-x-alpaca-13b-native-ggml-model-q4_0
(i'm now able to compile with cmake)
from pyllamacpp.
I used these steps to update the new model:
Follow these steps to easily acquire the (Alpaca/LLaMA) F16 model: 1. Download and install the Alpaca-lora repo. https://github.com/tloen/alpaca-lora 2. Once you've successfully downloaded the model weights, you should have them inside a folder like this (on linux): 3. run python convert-pth-to-ggml.py ~/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348 1 4. Once you get your f16 model, copy it into the llama.cpp/models folder. 5. Run: ./quantize ./models/ggml-model-f16.bin ./models/ggml-model-q4_0.bin q4_0 Done.
..but indeed - the pyllamacpp bindings are now broken.
I'll have a look and see if I can switch to the abetlen/llama-cpp-python bindings in the meantime, and get it to work. But yeah - version upgrade is a real time waster, which is why developers should take note and either 1. make it easy to update or 2. make your app/framework/etc. backward compatible with older versions.
Version
2.3.0
is now built with the latestllama.cpp
release ( 699b1ad ) and it is working with the newest version of the models ( I've tested it with TheBloke's model above at least).
- The
2.2.0
version can still be used for older models.But Yeah feel free to use either one. The official bindings are great as well.
It doesn't work for me. It is hallucinating 100% of the time, often in random languages, and not responding coherently to any of my prompts.
@twinlizzie from where did you get the steps you described here ?
Is convert-pth-to-ggml.py
is now updated to to convert the models to the new ggjtv2
format ?
Yeah, unfortunately, llama.cpp
introduced some breaking changes and it is not backward compatible, the models need to be reconverted! I tried to push aversion to PYPI (v2.2.0
) before updating to the latest version to be compatible with the older models. You can give it a try as well ?
from pyllamacpp.
Ok, I have a workaround for now.
It seems that the prompt_context, prompt_suffix and prompt_prefix are broken. So I have to add them manually into the prompt for now.
These python bindings are the only ones working with the new update of LLama.cpp. So well done
UPDATE:
It seems to repeat a lot of answers, which I don't think used to happen before (or maybe I missed it?).
What do you mean by it repeat answers ? You mean you get the same answer everytime you run the generation ?
from pyllamacpp.
How do i upgrade any model to the new version of ggjt 2? using gpt4-x-alpaca-13b-native-ggml-model-q4_0 (i'm now able to compile with cmake)
@Naugustogi, afaik you will need to get the Pytorch models and re-quantize them to a supported format.
from pyllamacpp.
@twinlizzie from where did you get the steps you described here ? Is
convert-pth-to-ggml.py
is now updated to to convert the models to the newggjtv2
format ?Yeah, unfortunately,
llama.cpp
introduced some breaking changes and it is not backward compatible, the models need to be reconverted! I tried to push aversion to PYPI (v2.2.0
) before updating to the latest version to be compatible with the older models. You can give it a try as well ?
Yep. Convert-pth-to-ggml works now to convert larger models to the ggjtv2. And I figured out the steps on my own.
Ok, I have a workaround for now.
It seems that the prompt_context, prompt_suffix and prompt_prefix are broken. So I have to add them manually into the prompt for now.
These python bindings are the only ones working with the new update of LLama.cpp. So well done
UPDATE:
It seems to repeat a lot of answers, which I don't think used to happen before (or maybe I missed it?).What do you mean by it repeat answers ? You mean you get the same answer everytime you run the generation ?
Actually, I'm not entirely sure. I set the repeat_penalty to 1.2 and it seems to have fixed it, for now.
It would sometimes get stuck in a loop where you get the same type of answer no matter what you ask.
On the llama-cpp-python repo it seems to be even worse because you always get the same answers 100%. (To the same question that is) .Or maybe I'm missing something about how to properly run the api...
The Llama.cpp itself does not have this problem and works perfectly even with my diy upgraded model. I get a different answer to the same question which is how I want it.
from pyllamacpp.
@twinlizzie from where did you get the steps you described here ? Is
convert-pth-to-ggml.py
is now updated to to convert the models to the newggjtv2
format ?
Yeah, unfortunately,llama.cpp
introduced some breaking changes and it is not backward compatible, the models need to be reconverted! I tried to push aversion to PYPI (v2.2.0
) before updating to the latest version to be compatible with the older models. You can give it a try as well ?Yep. Convert-pth-to-ggml works now to convert larger models to the ggjtv2. And I figured out the steps on my own.
Ok, I have a workaround for now.
It seems that the prompt_context, prompt_suffix and prompt_prefix are broken. So I have to add them manually into the prompt for now.
These python bindings are the only ones working with the new update of LLama.cpp. So well done
UPDATE:
It seems to repeat a lot of answers, which I don't think used to happen before (or maybe I missed it?).What do you mean by it repeat answers ? You mean you get the same answer everytime you run the generation ?
Actually, I'm not entirely sure. I set the repeat_penalty to 1.2 and it seems to have fixed it, for now.
It would sometimes get stuck in a loop where you get the same type of answer no matter what you ask.
On the llama-cpp-python repo it seems to be even worse because you always get the same answers 100%. (To the same question that is) .Or maybe I'm missing something about how to properly run the api...
The Llama.cpp itself does not have this problem and works perfectly even with my diy upgraded model. I get a different answer to the same question which is how I want it.
I just tested it again on my end with the models above and I don't have this problem. Everytime I run it I get a different answer.
Could you please share the code ? Maybe you are doing something wrong!
from pyllamacpp.
Related Issues (20)
- Model loading speed? HOT 31
- Can't import vicuna models : `(bad f16 value 5)` HOT 7
- Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
- It still does not work with some GGML v3 models HOT 5
- chat with bob example broken HOT 21
- Bos token will always be added to suffix HOT 1
- windows build faild while llama-cpp-py worksa HOT 1
- Wrong description of detokenize() parameter 'tokens' HOT 1
- Illegal Instruction (core dumped) even after disabling AVX2 and FMA HOT 5
- Embeddings HOT 15
- Color code on windows. HOT 5
- Compilation on raspberry pi fails HOT 1
- Process finished with exit code -1073741795 (0xC000001D) HOT 1
- ggllm branch HOT 5
- Using model.cpp_generate HOT 1
- Is this CPU ONLY? HOT 1
- Llamma2 model in Apple Sillicon is supported HOT 1
- import _pyllamacpp as pp ImportError: initialization failed for latest versions of pyllamacpp HOT 2
- Cannot install on Windows with Python 3.12.1 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyllamacpp.