Giter Club home page Giter Club logo

pyllamacpp's Introduction

Hi there πŸ‘‹ I am Abdeladim

πšƒπš˜πšπšŠπš• πš‚πšπšŠπš›πšœ β€’ abdeladim-s β€’ π™Άπš’πšπ™·πšžπš‹ πšπš˜πš•πš•πš˜πš πšŽπš›πšœ

abdeladim-s

abdeladim-s

pyllamacpp's People

Contributors

abdeladim-s avatar adarshxs avatar andrewmelis avatar bmschmidt avatar deadredmond avatar dependabot[bot] avatar kasimir123 avatar nuance1979 avatar pajoma avatar parisneo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pyllamacpp's Issues

Llamma2 model in Apple Sillicon is supported

Hi,

Thanks for the contribution. I tried the model llama-2-13b-chat.ggmlv3.q8_0.bin in Mac M1 Max 64GB RAM with pyllamacpp==2.4.1, python==3.9 and works as charm. Sample code:

from pyllamacpp.model import Model

input = "I want you to act as a physician. Explain what superconductors are."
model_path='./llama-2-13b-chat.ggmlv3.q8_0.bin'
model = Model(model_path)

for token in model.generate(input):
    print(token, end='', flush=True)
$python testLLM13B.py
llama.cpp: loading model from ./llama-2-13b-chat.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 15237.95 MB (+ 3216.00 MB per state)
.
llama_init_from_file: kv self size  =  800.00 MB
 Explain their properties and the potential benefits they offer.
  Superconductors are materials that exhibit zero electrical resistance when cooled below a certain temperature, known as the critical temperature (Tc). This means that superconductors can conduct electricity with perfect efficiency and without any loss of energy.

The properties of superconductors include:

1. Zero electrical resistance: Superconductors have zero electrical resistance when cooled below Tc, which makes them ideal for high-power appli                                 as power transmission and storage.
2. Perfect diamagnetism: Superconductors expel magnetic fields when cooled below Tc, which makes them useful in MRI machines and other medical applications.
3. Quantum levitation: Superconductors can levitate above a magnet when cooled below Tc, which has potential applications in transportation and energy storage.
4. High-temperature superconductivity: Some superconductors have critical temperatures above the boiling point of liquid nitrogen (77 K), making them more practical for real-world applications.
The potential benefits of superconductors include:
1. More efficient power transmission and storage: Superconductors can transmit and store electricity with perfect efficiency, which could lead to significant energy savings and reduced carbon emissions.
2. Improved medical imaging: Superconducting magnets are used in MRI machines, which provide higher-resolution images and faster scan times than traditional magnets.
3. High-speed transportation: Superconductors could be used to create magnetic levitation trains that are faster and more efficient than conventional trains.
4. Enhanced security: Superconducting sensors can detect even slight changes in magnetic fields, which could be useful in security applications such as intrusion detection.
5. Energy storage: Superconductors could be used to store energy generated by renewable sources such as wind and solar power, which could help to reduce our reliance on fossil fuels.
Overall, superconductors have the potential to revolutionize a wide range of industries and provide significant benefits to society. However, more research is needed to fully understand their properties and potential applications.

Embeddings

Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?

Best regards

Illegal Instruction (core dumped) even after disabling AVX2 and FMA

Hi, I'm very new to all of this and pyllamacpp so I'm sorry in advance if the details provided in this issue aren't good enough or up to par but I've been having some issues when doing:
python -c 'from pyllamacpp.model import Model'

I know this has something do with my CPU and I've also followed this guide exactly: nomic-ai/pygpt4all#71.
I have an older server machine with 2 Intel Xeon X5670.

How do I figure out what's going on and how do I fix it?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I am not sure where exactly the issue comes from (either it is from model or from pyllamacpp), so opened also this one nomic-ai/gpt4all#529

I tried with GPT4All models (for, instance https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)

I am able to run this model as well as lighter models, but in about 2-4 promts given to the model (in the process of answering) it fails with "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)". If provide max allowed prompt (Β±4000 tokens), then it fails with the first request to generate a responce. The same behavior for all gpt4all models downloaded 2-3 days ago. I am running it on Macbook Pro M1 (2021), 16 GB RAM. Tried python from 3.9 to 3.11. Also, tried with Jupyter lab (kernel 3.10), PyCharm and terminal. It is all the same. pyllamacpp is of 2.1.3 version Any ideas where the problem may come from?

I traced calls and found the exact code line where it failes:

`call, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:225
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:226
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:227
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:230
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:183
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:184
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:186
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:187
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:188
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:189
line, /opt/homebrew/anaconda3/envs/gpt4all-converted_conda/lib/python3.10/site-packages/pyllamacpp/model.py:185

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)`

This is in the generate method at calling C-code as far as I can judge:
pp.llama_eval(self._ctx, predicted_tokens, len(predicted_tokens), self._n_past, n_threads)

Compilation on raspberry pi fails

Hi @abdeladim-s . I am trying to test lollms on a raspberry pi 4 with orca-mini-3b and can't manage to compile your code without errors.

Did you test building the wheels for raspberry pi? it would be cool to have a wheel compatible with raspberry pi 4 because building burns my rasp and takes lonjg time and it fails after more than an hour of compilation. With A raspberry pi 4 we can fuse the whispercpp and the lollms to have a 100% local assistant.

I need the wheel to be built for python 3.10 if possible. This could be a real good challenge and I think your binding is small enough to be used. The other bindings are to complicated.

what do you think?

Can't import vicuna models : `(bad f16 value 5)`

When I try to load the vicuna models downloaded from this page, I have the following error :

# pyllamacpp /models/ggml-vicuna-7b-1.1-q4_2.bin 


β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— 
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
β–ˆβ–ˆβ•”β•β•β•β•   β•šβ–ˆβ–ˆβ•”β•  β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•β• β–ˆβ–ˆβ•”β•β•β•β• 
β–ˆβ–ˆβ•‘        β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘     
β•šβ•β•        β•šβ•β•   β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β•β•šβ•β•     β•šβ•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β•β•šβ•β•     β•šβ•β•     
                                                                                    

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.1.3 

         
=========================================================================================

[+] Running model `/models/ggml-vicuna-7b-1.1-q4_2.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama_model_load: loading model from '/models/ggml-vicuna-7b-1.1-q4_2.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 5
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: invalid model file '/models/ggml-vicuna-7b-1.1-q4_2.bin' (bad f16 value 5)
llama_init_from_file: failed to load model
Segmentation fault (core dumped)

I do not have this problem when using the gpt4all models. Running the vicuna models with the latest version of llama.cpp works just fine.

Process finished with exit code -1073741795 (0xC000001D)

Process finished with exit code -1073741795 (0xC000001D) appears after I try to import class 'Model' with
from pyllamacpp.model import Model.
I've tried to downgrade lib version but nothing changed.

I use Windows 10, python version 3.11.0

Color code on windows.

Hi, many thanks for great work. pyllamacpp is my favorite llama.cpp binding, and I love using it on my Mac. But on Windows, I see that 'color code' doesn't seem to be working, like
"You: What's the point of Zen Buddhism ?
οΏ½[94mAI: οΏ½[0mοΏ½[96mZοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m isοΏ½[0mοΏ½[96m aοΏ½[0mοΏ½[96m branchοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m MahοΏ½[0mοΏ½[96mayοΏ½[0mοΏ½[96manaοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m thatοΏ½[0mοΏ½[96m emphasοΏ½[0mοΏ½[96mizesοΏ½[0mοΏ½[96m theοΏ½[0mοΏ½[96m attοΏ½[0mοΏ½[96mainοΏ½[0mοΏ½[96mmentοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m enοΏ½[0mοΏ½[96mlightοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96mmentοΏ½[0mοΏ½[96m throughοΏ½[0mοΏ½[96m medοΏ½[0mοΏ½[96mitationοΏ½[0mοΏ½[96m andοΏ½[0mοΏ½[96m theοΏ½[0mοΏ½[96m experienceοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m momentοΏ½[0mοΏ½[96m-οΏ½[0mοΏ½[96mtoοΏ½[0mοΏ½[96m-οΏ½[0mοΏ½[96mmοΏ½[0mοΏ½[96momentοΏ½[0mοΏ½[96m awοΏ½[0mοΏ½[96marenοΏ½[0mοΏ½[96messοΏ½[0mοΏ½[96m.οΏ½[0mοΏ½[96m TheοΏ½[0mοΏ½[96m ultοΏ½[0mοΏ½[96mimateοΏ½[0mοΏ½[96m goalοΏ½[0mοΏ½[96m ofοΏ½[0mοΏ½[96m ZοΏ½[0mοΏ½[96menοΏ½[0mοΏ½[96m BuddhοΏ½[0mοΏ½[96mismοΏ½[0mοΏ½[96m isοΏ½[0mοΏ½[96m toοΏ½[0mοΏ½[96m realizeοΏ½[0mοΏ½[96m oneοΏ½[0mοΏ½[96m'οΏ½[0mοΏ½[96msοΏ½[0mοΏ½[96m trueοΏ½[0mοΏ½[96m natureοΏ½[0mοΏ½[96m orοΏ½[...."
Any idea how to fix this, please ?

Wrong description of detokenize() parameter 'tokens'

def detokenize(self, tokens: list):
        """
        Returns a list of tokens for the text <- wrong description
        :param text: text to be tokenized <- wrong description
        :return: A string representing the text extracted from the tokens
        """
        return pp.llama_tokens_to_str(self._ctx, tokens)

ggllm branch

Hi Abdeladim, many thanks for this new branch which I didn't expect it done this quick ! I tried on 3 platforms, ie. OSX Mojave, WSL2( ubuntu) and Ubuntu 22.04, but can't make it work... First pip/git install did not work on all three. So I downloaded the project and installed 'python setup.py install', but again all three failed with the same error messages. I attach the error messages for your ref. It obviously is above my understanding, as you've guessed ! I'd appreciate if you could have a look and advise how to make this work on my environment. Cheers !

pyllamacpp_ggllm_errors.txt

Using model.cpp_generate

I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?

Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference

@app.post("/chat")
async def chat(request: ChatRequest):
    prompt = request.prompt
    
    global conversation_history
    conversation_history += request.conversation_history

    # Pass prompt and conversation_history to model
    full_prompt = conversation_history + "\n" + prompt
    
    def iter_tokens():
        for token in model.generate(
            prompt=full_prompt,
            antiprompt="Human:",
            n_threads=6,
            n_batch=1024,
            n_predict=256,
            n_keep=48,
            repeat_penalty=1.0,
        ):
            yield token.encode()

    return StreamingResponse(iter_tokens(), media_type="text/plain")

chat with bob example broken

Hi @abdeladim-s , thanks for the update!

I was trying to update to pyllamacpp==2.4.0 but found that even the example on the README, which is similar to llama.cpp's ./examples/chat.sh but not identical, is not working properly. For example, when I copied the example code into a foo.py and run it, I got:

Screenshot 2023-05-27 at 2 29 17 PM

If I go to llama.cpp, check out 66874d4 then make clean && make && ./examples/chat.sh, I got:

Screenshot 2023-05-27 at 2 25 22 PM

I just want to get an equivalent of running llama.cpp's chat.sh with pyllamacpp==2.4.0, no more no less. How should I do it?

Model loading speed?

Everything works fine, model can be loaded. But it takes a very long time before it generates something at a decent speed. (2 minutes)
Sometimes it stops after generating 5-10 tokens and after 20 seconds later it proceeds.

Using 13b gpt4 x alpaca, 12gen i7 12700F (n_theads = 20), f16_kv = 1, 16gb ram (fits in)

Using alpaca.cpp it loads in like 2 seconds and generates directly after without stop.

support new version of llamacpp

Hi Abdeladim, there are many new models that can't run on the pyllamacpp binding because they are using version 2 of ggml.
If you have some time, can you try and add support to this please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.