rhohndorf / auto-llama-cpp Goto Github PK

View Code? Open in Web Editor NEW

381.0 381.0 66.0 593 KB

Uses Auto-GPT with Llama.cpp

License: MIT License

Dockerfile 0.19% Python 99.81%

auto-llama-cpp's People

Stargazers

Watchers

auto-llama-cpp's Issues

cublas implemetation?

Duplicates

I have searched the existing issues

Summary 💡

for llama, there's a flag called --gpu-layers N, basically oflloads some layers to the gpu for processing

Examples 🌈

from ooba

Motivation 🔦

since cpu is super slow, gpu would be nice

Error message when calling scripts/main.py

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

I pulled the git repository, edited my copy of .env to suit my needs, and even recreated a run.bat from the original AutoGPT project, adding the check_requirements.py script to the scripts (why was it actually removed?).
Using a conda environment with Python 3.10.10
Pip successfully installed all requirements

Current behavior 😯

now calling
python scripts/main.py
results in

Traceback (most recent call last):
  File "E:\LLama\Auto-Llama-cpp\scripts\main.py", line 3, in <module>
    import commands as cmd
  File "E:\LLama\Auto-Llama-cpp\scripts\commands.py", line 1, in <module>
    import browse
  File "E:\LLama\Auto-Llama-cpp\scripts\browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "E:\LLama\Auto-Llama-cpp\scripts\llm_utils.py", line 7
    def create_chat_completion(messages[0]["content"], model=None, temperature=cfg.temperature, max_tokens=0)->str:
                                       ^
SyntaxError: invalid syntax

What am i doing wrong?

Expected behavior 🤔

It never executes as it should and doesn't seem to find my model.
Im not sure if that is the problem or if it's even earlier.

Your prompt 📝

There is no last_run_ai_settings.yaml, because it never executes.

What am i doing wrong?

Running the app in docker but cannot find EMBED_DIM var.

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

Using ggml-vicuna-13b-4bit.bin model
Changed .env file (From the default)


SMART_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
FAST_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
EMBED_DIM = 8192

Running docker build -t foo/auto-llama .
Running docker run -p80:3000 foo/auto-llama

Current behavior 😯

docker run -p80:3000 foo/auto-llama
Traceback (most recent call last):
  File "/app/main.py", line 3, in <module>
    import commands as cmd
  File "/app/commands.py", line 1, in <module>
    import browse
  File "/app/browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "/app/llm_utils.py", line 4, in <module>
    cfg = Config()
          ^^^^^^^^
  File "/app/config.py", line 18, in __call__
    cls._instances[cls] = super(
                          ^^^^^^
  File "/app/config.py", line 69, in __init__
    self.EMBED_DIM = int(os.getenv("EMBED_DIM"))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
(base)

Expected behavior 🤔

Was hoping the app would run after the steps above.
I'm sure I'm misconfiguring the setup.

Your prompt 📝

# Paste your prompt here

Memory Error -- shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

After thinking, I got the following error (on a Ubuntu 22.04 VM):

Using memory of type: LocalCache
| Thinking...
llama_print_timings: load time = 629.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 629.34 ms / 2 tokens ( 314.67 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 629.68 ms
Traceback (most recent call last):
File "/data/Auto-Llama-cpp/scripts/main.py", line 331, in
assistant_reply = chat.chat_with_ai(
File "/data/Auto-Llama-cpp/scripts/chat.py", line 77, in chat_with_ai
relevant_memory = permanent_memory.get_relevant(str(full_message_history[-5:]), 10)
File "/data/Auto-Llama-cpp/scripts/memory/local.py", line 105, in get_relevant
scores = np.dot(self.data.embeddings, embedding)
File "<array_function internals>", line 5, in dot
ValueError: shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

json loads error Expecting value: line 1 column 1 (char 0)

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

I have the same problem when running any model, I tried running different versions of Vicuna since with the original 13b it also gives the same problem, I'm running it as it comes, I just added the model to the .env file, the prompt it has is the one it comes with default

Llama.generate: prefix-match hit
| Thinking...
llama_print_timings: load time = 1466.45 ms
llama_print_timings: sample time = 30.25 ms / 31 runs ( 0.98 ms per run)
llama_print_timings: prompt eval time = 768713.51 ms / 987 tokens ( 778.84 ms per token)
llama_print_timings: eval time = 24162.74 ms / 30 runs ( 805.42 ms per run)
llama_print_timings: total time = 794693.56 ms
Assistent Reply If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
Error:
Traceback (most recent call last): File "scripts/main.py", line 79, in print_assistant_thoughts assistant_reply_json = fix_and_parse_json(assistant_reply) File "/root/llama.cpp/Auto-Llama-cpp/scripts/json_parser.py", line 52, in fix_and_parse_json brace_index = json_str.index("{") ValueError: substring not found
json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
NEXT ACTION: COMMAND = Error: ARGUMENTS = substring not found
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for Entrepreneur-GPT...
Input:

Current behavior 😯

Default

Expected behavior 🤔

Default

Your prompt 📝

Defaulr prompt

ToDO : Add GPU Support via GPTQ not required anymore

I don't think that is necessary anymore, llama.cpp can be compiled for BLAS support per https://github.com/ggerganov/llama.cpp/tree/master#blas-build

dockerfile lable error image to dangling easy fix for you if you want :D

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

=> => exporting layers 8.5s
=> => exporting manifest sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => exporting config sha256:00a84a43fa0487344f94620038375e6d5607fc5f482c144bf8e938ceb7c76803 0.0s
=> => naming to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => unpacking to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4

but something like perks it right up, the source image can be swapped for rocm, intel, arm etc. i habve a cuda gpu so i played to my strong suite.

Use an official CUDA runtime as a parent image

FROM nvidia/cuda:11.5.0-runtime-ubuntu20.04

Install Python and any necessary dependencies

RUN apt-get update && apt-get install python3.11 python3-pip -y

Set the working directory to /app

WORKDIR /app

Copy the scripts and requirements.txt files into the container at /app

COPY scripts/ /app/scripts/
COPY requirements.txt /app/

Install any necessary Python packages

RUN ls //requirements.txt|xargs -n 1 -P 3 pip install -r

Set any necessary environment variables

ENV CUDA_VISIBLE_DEVICES=all

Set the command to run when the container starts

CMD ["python3.11", "/bin/bash"]

Current behavior 😯

failure to store and run image

Expected behavior 🤔

store and run image

Your prompt 📝

# Paste your prompt here
``` doesnt get that far

Hard code file location of json.gbnf

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

in scripts/llm_utils.py:grammar, there is a line of code

grammar = LlamaGrammar.from_file("/home/ruben/Code/Auto-Llama-cpp/grammars/json.gbnf")

The line of code should not read a file with absolute path, but I am not should what should be used instead.

Current behavior 😯

Error is prompt and program terminated. As I use Docker, I need to modify Dockerfile and add the lines below and build the image:

RUN mkdir -p /home/ruben/Code/Auto-Llama-cpp
COPY grammars /home/ruben/Code/Auto-Llama-cpp/grammars

Expected behavior 🤔

The application should be able to build by

docker build -t auto-llama .

And run with

docker run -it --env-file "./.env" -v "<MODEL_PATH>:/models" auto-llama

Your prompt 📝

# Paste your prompt here

I hope if it supports petals.dev

Duplicates

I have searched the existing issues

Summary 💡

I think it will be useful to add support for petals.dev.
I think it will run faster and work with bigger models like llama 2 70b.

Examples 🌈

No response

Motivation 🔦

No response

LLM call

Hi, I noticed that when calling LLM in the code, only the first item in the messages list is passed as the prompt. Is this an error?

Auto-Llama-cpp/scripts/llm_utils.py

Line 8 in e2aa7c4

 response = llm(messages[0]["content"], stop=["Q:", "### Human:"], echo=False, temperature=temperature, max_tokens=max_tokens) 

I want to change other huggingface local models.

Duplicates

I have searched the existing issues

Summary 💡

If I want to change other huggingface local models, shall I modify this field? llm = Llama(model_path="ggml-vicuna-13b-4bit.bin", n_ctx=2048, embedding=True) But they have a lot of bin section is how to load: https://huggingface.co/chavinlo/gpt4-x-alpaca/tree/main
pytorch_model-00001-of-00006.bin
pytorch_model-00002-of-00006.bin
pytorch_model-00003-of-00006.bin
pytorch_model-00004-of-00006.bin
pytorch_model-00005-of-00006.bin
pytorch_model-00006-of-00006.bin
May I ask if I need to fill in my openai-key when I query the configuration file /.env.template with OPENAI_API_KEY=your-openai-api-key? I see that the code does not run Openai-key, shall I use it to access openai in later versions? Can I omit this parameter now?

Examples 🌈

No response

Motivation 🔦

No response

Inference time slow: running llama.cpp in child processes doesn't use full CPU capacity

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

npm start

./test-installation.sh

Current behavior 😯

On Mac Mini M1, at 8 threads, llama.cpp is way slower than expected.
It only uses 20-30% of available resources for each worker.

Expected behavior 🤔

Should use 100% resources for each thread.

Your prompt 📝

N/A

Error Message after "thinking"

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

You start the Program by executing the main.py File.
Then you press "y" and Enter.
After a couple of minutes you will get an Error Message.
I am using the Vicuna 13b Model.

Current behavior 😯

I press enter and this is the Output after letting it "think".

AutoGPT INFO Error:
: Traceback (most recent call last):
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\main.py", line 79, in print_assistant_thoughts
assistant_reply_json = fix_and_parse_json(assistant_reply)
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\json_parser.py", line 52, in fix_and_parse_json
brace_index = json_str.index("{")
ValueError: substring not found

Expected behavior 🤔

I think it should continue with the Process.

Your prompt 📝

# Paste your prompt here

How to run with CUDA

as newer version of llama.cpp support GPU how can we use that wiht this

iam new here

error in running docker build

Duplicates

I have searched the existing issues

Steps to reproduce 🕹

when I ran docker run -p80:3000 auto-llama1, I got the following errors:

Welcome to Auto-Llama! Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI: For example, 'Entrepreneur-GPT'
AI Name: Traceback (most recent call last):
File "/app/main.py", line 313, in
prompt = construct_prompt()
^^^^^^^^^^^^^^^^^^
File "/app/main.py", line 205, in construct_prompt
config = prompt_user()
^^^^^^^^^^^^^
File "/app/main.py", line 231, in prompt_user
ai_name = utils.clean_input("AI Name: ")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/utils.py", line 3, in clean_input
return input(prompt)
^^^^^^^^^^^^^
EOFError: EOF when reading a line

Any idea how to fix it?

Current behavior 😯

No response

Expected behavior 🤔

No response

Your prompt 📝

# Paste your prompt here

rhohndorf / auto-llama-cpp Goto Github PK

auto-llama-cpp's People

Stargazers

Watchers

Forkers

auto-llama-cpp's Issues

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

After thinking, I got the following error (on a Ubuntu 22.04 VM):

Duplicates

Steps to reproduce 🕹

Assistant: Ready.

Assistant: Ready.

Assistant: Ready.

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Steps to reproduce 🕹

Use an official CUDA runtime as a parent image

Install Python and any necessary dependencies

Set the working directory to /app

Copy the scripts and requirements.txt files into the container at /app

Install any necessary Python packages

Set any necessary environment variables

Set the command to run when the container starts

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Duplicates

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Recommend Projects

Recommend Topics

Recommend Org