Giter Club home page Giter Club logo

auto-llama-cpp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auto-llama-cpp's Issues

cublas implemetation?

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

for llama, there's a flag called --gpu-layers N, basically oflloads some layers to the gpu for processing

Examples ๐ŸŒˆ

image
from ooba

Motivation ๐Ÿ”ฆ

since cpu is super slow, gpu would be nice

Error message when calling scripts/main.py

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

I pulled the git repository, edited my copy of .env to suit my needs, and even recreated a run.bat from the original AutoGPT project, adding the check_requirements.py script to the scripts (why was it actually removed?).
Using a conda environment with Python 3.10.10
Pip successfully installed all requirements

Current behavior ๐Ÿ˜ฏ

now calling
python scripts/main.py
results in

Traceback (most recent call last):
  File "E:\LLama\Auto-Llama-cpp\scripts\main.py", line 3, in <module>
    import commands as cmd
  File "E:\LLama\Auto-Llama-cpp\scripts\commands.py", line 1, in <module>
    import browse
  File "E:\LLama\Auto-Llama-cpp\scripts\browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "E:\LLama\Auto-Llama-cpp\scripts\llm_utils.py", line 7
    def create_chat_completion(messages[0]["content"], model=None, temperature=cfg.temperature, max_tokens=0)->str:
                                       ^
SyntaxError: invalid syntax

What am i doing wrong?

Expected behavior ๐Ÿค”

It never executes as it should and doesn't seem to find my model.
Im not sure if that is the problem or if it's even earlier.

Your prompt ๐Ÿ“

There is no last_run_ai_settings.yaml, because it never executes.

What am i doing wrong?

Running the app in docker but cannot find EMBED_DIM var.

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

  1. Using ggml-vicuna-13b-4bit.bin model
  2. Changed .env file (From the default)

SMART_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
FAST_LLM_MODEL=./models/ggml-vicuna-13b-4bit.bin
EMBED_DIM = 8192
  1. Running docker build -t foo/auto-llama .
  2. Running docker run -p80:3000 foo/auto-llama

Current behavior ๐Ÿ˜ฏ

docker run -p80:3000 foo/auto-llama
Traceback (most recent call last):
  File "/app/main.py", line 3, in <module>
    import commands as cmd
  File "/app/commands.py", line 1, in <module>
    import browse
  File "/app/browse.py", line 4, in <module>
    from llm_utils import create_chat_completion
  File "/app/llm_utils.py", line 4, in <module>
    cfg = Config()
          ^^^^^^^^
  File "/app/config.py", line 18, in __call__
    cls._instances[cls] = super(
                          ^^^^^^
  File "/app/config.py", line 69, in __init__
    self.EMBED_DIM = int(os.getenv("EMBED_DIM"))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
(base)

Expected behavior ๐Ÿค”

Was hoping the app would run after the steps above.
I'm sure I'm misconfiguring the setup.

Your prompt ๐Ÿ“

# Paste your prompt here

Memory Error -- shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

After thinking, I got the following error (on a Ubuntu 22.04 VM):

Using memory of type: LocalCache
| Thinking...
llama_print_timings: load time = 629.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 629.34 ms / 2 tokens ( 314.67 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 629.68 ms
Traceback (most recent call last):
File "/data/Auto-Llama-cpp/scripts/main.py", line 331, in
assistant_reply = chat.chat_with_ai(
File "/data/Auto-Llama-cpp/scripts/chat.py", line 77, in chat_with_ai
relevant_memory = permanent_memory.get_relevant(str(full_message_history[-5:]), 10)
File "/data/Auto-Llama-cpp/scripts/memory/local.py", line 105, in get_relevant
scores = np.dot(self.data.embeddings, embedding)
File "<array_function internals>", line 5, in dot
ValueError: shapes (0,8192) and (5120,) not aligned: 8192 (dim 1) != 5120 (dim 0)

json loads error Expecting value: line 1 column 1 (char 0)

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

I have the same problem when running any model, I tried running different versions of Vicuna since with the original 13b it also gives the same problem, I'm running it as it comes, I just added the model to the .env file, the prompt it has is the one it comes with default

Llama.generate: prefix-match hit
| Thinking...
llama_print_timings: load time = 1466.45 ms
llama_print_timings: sample time = 30.25 ms / 31 runs ( 0.98 ms per run)
llama_print_timings: prompt eval time = 768713.51 ms / 987 tokens ( 778.84 ms per token)
llama_print_timings: eval time = 24162.74 ms / 30 runs ( 805.42 ms per run)
llama_print_timings: total time = 794693.56 ms
Assistent Reply If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
Error:
Traceback (most recent call last): File "scripts/main.py", line 79, in print_assistant_thoughts assistant_reply_json = fix_and_parse_json(assistant_reply) File "/root/llama.cpp/Auto-Llama-cpp/scripts/json_parser.py", line 52, in fix_and_parse_json brace_index = json_str.index("{") ValueError: substring not found
json If you understand these rules, enter 'Ready' and I will start the game.

Assistant: Ready.

json loads error Expecting value: line 1 column 1 (char 0)
NEXT ACTION: COMMAND = Error: ARGUMENTS = substring not found
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for Entrepreneur-GPT...
Input:

Current behavior ๐Ÿ˜ฏ

Default

Expected behavior ๐Ÿค”

Default

Your prompt ๐Ÿ“

Defaulr prompt

dockerfile lable error image to dangling easy fix for you if you want :D

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

=> => exporting layers 8.5s
=> => exporting manifest sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => exporting config sha256:00a84a43fa0487344f94620038375e6d5607fc5f482c144bf8e938ceb7c76803 0.0s
=> => naming to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4 0.0s
=> => unpacking to dangling@sha256:81b29524e6ca86716c44c2fa16b8dc312af04dd88c3e3c03af98f087b650c8f4

but something like perks it right up, the source image can be swapped for rocm, intel, arm etc. i habve a cuda gpu so i played to my strong suite.

Use an official CUDA runtime as a parent image

FROM nvidia/cuda:11.5.0-runtime-ubuntu20.04

Install Python and any necessary dependencies

RUN apt-get update && apt-get install python3.11 python3-pip -y

Set the working directory to /app

WORKDIR /app

Copy the scripts and requirements.txt files into the container at /app

COPY scripts/ /app/scripts/
COPY requirements.txt /app/

Install any necessary Python packages

RUN ls //requirements.txt|xargs -n 1 -P 3 pip install -r

Set any necessary environment variables

ENV CUDA_VISIBLE_DEVICES=all

Set the command to run when the container starts

CMD ["python3.11", "/bin/bash"]

Current behavior ๐Ÿ˜ฏ

failure to store and run image

Expected behavior ๐Ÿค”

store and run image

Your prompt ๐Ÿ“

# Paste your prompt here
``` doesnt get that far

Hard code file location of json.gbnf

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

in scripts/llm_utils.py:grammar, there is a line of code

grammar = LlamaGrammar.from_file("/home/ruben/Code/Auto-Llama-cpp/grammars/json.gbnf")

The line of code should not read a file with absolute path, but I am not should what should be used instead.

Current behavior ๐Ÿ˜ฏ

Error is prompt and program terminated. As I use Docker, I need to modify Dockerfile and add the lines below and build the image:

RUN mkdir -p /home/ruben/Code/Auto-Llama-cpp
COPY grammars /home/ruben/Code/Auto-Llama-cpp/grammars

Expected behavior ๐Ÿค”

The application should be able to build by

docker build -t auto-llama .

And run with

docker run -it --env-file "./.env" -v "<MODEL_PATH>:/models" auto-llama

Your prompt ๐Ÿ“

# Paste your prompt here

I hope if it supports petals.dev

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

I think it will be useful to add support for petals.dev.
I think it will run faster and work with bigger models like llama 2 70b.

Examples ๐ŸŒˆ

No response

Motivation ๐Ÿ”ฆ

No response

LLM call

Hi, I noticed that when calling LLM in the code, only the first item in the messages list is passed as the prompt. Is this an error?

response = llm(messages[0]["content"], stop=["Q:", "### Human:"], echo=False, temperature=temperature, max_tokens=max_tokens)

I want to change other huggingface local models.

Duplicates

  • I have searched the existing issues

Summary ๐Ÿ’ก

  1. If I want to change other huggingface local models, shall I modify this field? llm = Llama(model_path="ggml-vicuna-13b-4bit.bin", n_ctx=2048, embedding=True) But they have a lot of bin section is how to load: https://huggingface.co/chavinlo/gpt4-x-alpaca/tree/main
    pytorch_model-00001-of-00006.bin
    pytorch_model-00002-of-00006.bin
    pytorch_model-00003-of-00006.bin
    pytorch_model-00004-of-00006.bin
    pytorch_model-00005-of-00006.bin
    pytorch_model-00006-of-00006.bin
  2. May I ask if I need to fill in my openai-key when I query the configuration file /.env.template with OPENAI_API_KEY=your-openai-api-key? I see that the code does not run Openai-key, shall I use it to access openai in later versions? Can I omit this parameter now?

Examples ๐ŸŒˆ

No response

Motivation ๐Ÿ”ฆ

No response

Inference time slow: running llama.cpp in child processes doesn't use full CPU capacity

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

npm start

./test-installation.sh

Current behavior ๐Ÿ˜ฏ

On Mac Mini M1, at 8 threads, llama.cpp is way slower than expected.
It only uses 20-30% of available resources for each worker.

Expected behavior ๐Ÿค”

Should use 100% resources for each thread.

Your prompt ๐Ÿ“

N/A

Error Message after "thinking"

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

You start the Program by executing the main.py File.
Then you press "y" and Enter.
After a couple of minutes you will get an Error Message.
I am using the Vicuna 13b Model.

Current behavior ๐Ÿ˜ฏ

I press enter and this is the Output after letting it "think".

AutoGPT INFO Error:
: Traceback (most recent call last):
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\main.py", line 79, in print_assistant_thoughts
assistant_reply_json = fix_and_parse_json(assistant_reply)
File "C:\Users\alexr\Documents\Auto-Llama-cpp\scripts\json_parser.py", line 52, in fix_and_parse_json
brace_index = json_str.index("{")
ValueError: substring not found

Expected behavior ๐Ÿค”

I think it should continue with the Process.

Your prompt ๐Ÿ“

# Paste your prompt here

How to run with CUDA

as newer version of llama.cpp support GPU how can we use that wiht this

iam new here

error in running docker build

Duplicates

  • I have searched the existing issues

Steps to reproduce ๐Ÿ•น

when I ran docker run -p80:3000 auto-llama1, I got the following errors:

Welcome to Auto-Llama! Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI: For example, 'Entrepreneur-GPT'
AI Name: Traceback (most recent call last):
File "/app/main.py", line 313, in
prompt = construct_prompt()
^^^^^^^^^^^^^^^^^^
File "/app/main.py", line 205, in construct_prompt
config = prompt_user()
^^^^^^^^^^^^^
File "/app/main.py", line 231, in prompt_user
ai_name = utils.clean_input("AI Name: ")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/utils.py", line 3, in clean_input
return input(prompt)
^^^^^^^^^^^^^
EOFError: EOF when reading a line

Any idea how to fix it?

Current behavior ๐Ÿ˜ฏ

No response

Expected behavior ๐Ÿค”

No response

Your prompt ๐Ÿ“

# Paste your prompt here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.