biobootloader / wolverine Goto Github PK

View Code? Open in Web Editor NEW

5.2K 89.0 523.0 59 KB

License: MIT License

Python 100.00%

wolverine's Introduction

DEPRECATED: Try Mentat instead! https://github.com/AbanteAI/mentat

Wolverine

About

Give your python scripts regenerative healing abilities!

Run your scripts with Wolverine and when they crash, GPT-4 edits them and explains what went wrong. Even if you have many bugs it will repeatedly rerun until it's fixed.

For a quick demonstration see my demo video on twitter.

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.sample .env

Add your openAI api key to .env

warning! By default wolverine uses GPT-4 and may make many repeated calls to the api.

Example Usage

To run with gpt-4 (the default, tested option):

python -m wolverine examples/buggy_script.py "subtract" 20 3

You can also run with other models, but be warned they may not adhere to the edit format as well:

python -m wolverine --model=gpt-3.5-turbo examples/buggy_script.py "subtract" 20 3

If you want to use GPT-3.5 by default instead of GPT-4 uncomment the default model line in .env:

DEFAULT_MODEL=gpt-3.5-turbo

You can also use flag --confirm=True which will ask you yes or no before making changes to the file. If flag is not used then it will apply the changes to the file

python -m wolverine examples/buggy_script.py "subtract" 20 3 --confirm=True

Environment variables

env name	description	default value
OPENAI_API_KEY	OpenAI API key	None
DEFAULT_MODEL	GPT model to use	"gpt-4"
VALIDATE_JSON_RETRY	Number of retries when requesting OpenAI API (-1 means unlimites)	-1

Future Plans

This is just a quick prototype I threw together in a few hours. There are many possible extensions and contributions are welcome:

add flags to customize usage, such as asking for user confirmation before running changed code
further iterations on the edit format that GPT responds in. Currently it struggles a bit with indentation, but I'm sure that can be improved
a suite of example buggy files that we can test prompts on to ensure reliability and measure improvement
multiple files / codebases: send GPT everything that appears in the stacktrace
graceful handling of large files - should we just send GPT relevant classes / functions?
extension to languages other than python

Star History

wolverine's People

Contributors

Stargazers

Watchers

Forkers

nigelrudolf t-holland oaustegard saransh-sharma touristshaun chasebrignac faisal-alsrheed samching therockstardba rahulmeena baocin vikgomat kristianmk husnainfareed kaxzuma x86nick rishirelan techthiyanes valeman msiesse youminxue axsddlr freecamel vladsokolovv9 vangig fabiorizzomatos projecttopstep maseratigo audyzhu gigglegithub aldiakhou kennethcassel tranquilo12 tonyhanzhisu lee-b marcohoovy xdyb marouan-chak wildownes jmenergyinsights coletrumbocole kreijstal irvinealgotrading cat-stack-boop hogenyuan dan255 cosmojg mistobaan ktavabi monigarr ohmnow 0xsmw klu-ai hanafsky saafaaaa brandino bellyfat hiroakiyamane iatosh kurarusu rmonsur aswanthmanoj 0xcha05 rahul711sharma lowdias gotchiee900 gijigae sikkgit etherair cibernin pedroa2silva masonjames annias epylar zakorainc danfred360 azure-arc-0 nathanielrh darnyte gabnix zeranamu ra312 lusephur gr3yr0n1n robertartigas lupin4 peacedata0 macgio jorik041 kortuma corlixa calvinloveland josefalio dzhang-sc polpod caliraftdude v0rts prehensilecode hhy5277 dboee

wolverine's Issues

<Asigned>

Originally posted by @Alphanummeric in Alphanummeric/H#3 (comment)

reorder venv commands in README

Hello,

I noticed an issue in the README file. The current instructions are:

python3 -m venv venv
pip install -r requirements.txt
source venv/bin/activate

I believe the correct order should be:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

4823

Originally posted by @Alphanummeric in awesome-lists/awesome-bash#91

The term 'source' is not recognized

source : The term 'source' is not recognized as the name of a cmdlet, function,
script file, or operable program. Check the spelling of the name, or if a path
was included, verify that the path is correct and try again.
At line:1 char:1
+ source venv/bin/activate
+ ~~~~~~
    + CategoryInfo          : ObjectNotFound: (source:String) [], CommandNotFou
   ndException
    + FullyQualifiedErrorId : CommandNotFoundException

Can Wolverine modify machine learning code?Why didn't I output anything after running it?

just an idea

Hi.

I've been writing a package for probing ChatGPT using self-referential patterns and figured I would try running Wolverine swapping out your dict parser or combining with what I have here.

Happy to send you PR's or collaborate but don't want to create bloat or deps you don't want.

-Peter

no package

idunno if am stupid but this is my issus

Traceback (most recent call last):

File "", line 189, in _run_module_as_main
File "", line 112, in _get_module_details
File "/root/wolverine/wolverine/wolverine.py", line 8, in
import openai
ModuleNotFoundError: No module named 'openai'

OPEN FEATURES

Some ideas for Future Features

Configuration file: Add support for a configuration file where users can specify their API key, default model, and other settings, making it easier to manage and customize the script.
Multiple models: Allow users to choose from different GPT models or use multiple models sequentially, which could potentially improve the quality of suggestions.
Rate limiting and retries: Implement rate limiting and automatic retries for API requests, which can help avoid exceeding API limits and handle occasional API errors more gracefully.
Code formatting: Integrate with code formatters like Black or autopep8 to automatically format the fixed code according to Python style guidelines.
Version control integration: Add support for automatically creating a new branch or commit in the version control system (e.g., Git) when changes are applied, making it easier to track and manage changes made by the script.
Test execution: If the project includes unit tests, run them after applying changes to verify that the fixes haven't broken any existing functionality.
Incremental improvements: Instead of applying all suggested changes at once, apply one change at a time and rerun the script to see if the issue has been resolved. This approach can help identify which suggestions are most effective and minimize unnecessary changes.
Custom prompt: Allow users to provide a custom prompt for the GPT model, giving more control over the type of suggestions generated.
Interactive mode: Implement an interactive mode where users can review and approve or reject each suggestion before applying it. This can help ensure that only the desired changes are made to the script.
#23
Performance metrics: Collect and display performance metrics, such as the number of iterations, time taken for each iteration, and total time taken to fix the script, helping users understand the efficiency of the script.
Logging: Add proper logging to keep track of the actions taken by the script, which can be useful for debugging and monitoring purposes.
#25
User-friendly error messages: Improve error messages to be more descriptive and user-friendly, making it easier for users to understand and resolve issues.

81143a3

Please add 3.5

I think 3.5 would be worth adding because even though it may not get things right as much as 4, it's still way cheaper and GPT-4 is still only available to a small number of people.

docker support, gpt 3.5 turbo

I've added these to my fork, seems to do a decent job with 3.5-turbo

https://github.com/0xcha05/elixir

Planning for smart utilization of 3.5

GPT 3.5 is so much cheaper (.002 vs .06 / k tokens), not to mention it it usually returns faster and is less throttled.
Given that, it makes sense to always at least attempt to use GPT 3.5 first.

Given we are gonna try GPT 3.5 first, how do we determine when to fallback to GPT 4?

When compiling the prompt for our completion, if it leaves less than n tokens remaining for completion, where n is the smallest number we expect to possibly hold an expected completion. IMO 500 tokens is a reasonable amount to reserve. But that's a variable that could use empirical measurement.
When receiving the completion, we should prompt for the answer to be wrapped in some delimiters to detect if there were not enough tokens for GPT 3.5 to complete it's attempted answer.
When detecting if the completion resulted in a fix for the current error. It may however be worth retrying here while slowly ratcheting up temperature. Or feeding new error back in. Need a way to check if GPT is just introducing new errors that happen before the original error could happen.

Additionally, code should include future proofing for fallback to the 32K model using rules 1 & 2 (since it's not smarter, just bigger). Obviously disabled by flag. Similarly allow disabling of 4-8k using the same system.

Can I make CodeLlama 70b python work with this?

Can I make CodeLlama 70b python work with this package? I only see info about apis gpt3.5 and gpt4.

Is anyone running LLM hooked up to this on own hardware?

Allow for installing and running system-wide with just `wolverine`

Free

Validate that there are actual code changes suggested to prevent loop

In json_validated_response ensure that there is at least one change to the code suggested. I saw it get stuck in a loop once making no changes

Extensions for Github Actions

This project seems great if combined with CI/CD process.

For example...

User upload project with test codes
CI workflow, such as GitHub Action, run tests on top of wolverine
Wolverine catches the bug and create pull requests

As result, the developers can accelerate debugging in TDD.

Extension for oobabooga/text-generation-webui

If this thing converted to extension for obabooga/text-generation-webui it may be run locally.

apoyo

class Empleado:
def init(self, nombre, apellido, sueldo_base, afp, fecha_ingreso, hijos):
self.nombre = nombre
self.apellido = apellido
self.sueldo_base = sueldo_base
self.afp = afp
self.fecha_ingreso = fecha_ingreso
self.hijos = hijos

def calcular_base_imponible(self):
meses_trabajados = (2021 - int(self.fecha_ingreso.split("/")[-1])) * 12
bonificacion = self.sueldo_base * (meses_trabajados * 0.01)
asignacion_familiar = self.sueldo_base * (self.hijos * 0.05)
base_imponible = self.sueldo_base + bonificacion + asignacion_familiar
return base_imponible

def calcular_descuentos(self):
base_imponible = self.calcular_base_imponible()

    essalud = base_imponible * 0.07
    
    if self.afp == "AFP(X)":
        afp = base_imponible * 0.12
    elif self.afp == "AFP(Y)":
        afp = base_imponible * 0.114
    else:
        afp = 0 # Asignación predeterminada si no se cumple ninguna condición
    
    return essalud, afp

def calcular_pago_total(self):
essalud, afp = self.calcular_descuentos()
base_imponible = self.calcular_base_imponible()

    pago_total = base_imponible - essalud - afp
    
    return pago_total

Pedir datos de los empleados

empleados = []
for i in range(2):
print(f"Ingrese los datos del empleado {i+1}:")
nombre = input("Nombre: ")
apellido = input("Apellido: ")
sueldo_base = float(input("Sueldo base: "))
afp = input("AFP (AFP(X) o AFP(Y)): ")
fecha_ingreso = input("Fecha de ingreso (DD/MM/YYYY): ")
hijos = int(input("Cantidad de hijos: "))

empleado = "Empleados"(nombre, apellido, sueldo_base, afp, fecha_ingreso, hijos)
empleados.append(empleado)

Calcular y mostrar los pagos individuales

for empleado in empleados:
base_imponible = empleado.calcular_base_imponible()
essalud, afp = empleado.calcular_descuentos()
pago_total = empleado.calcular_pago_total()

print(f"\nEmpleado: {empleado.nombre} {empleado.apellido}")
print(f"Base imponible: {base_imponible:.2f}")
print(f"Descuento ESSALUD: {essalud:.2f}")
print(f"Descuento AFP: {afp:.2f}")
print(f"Pago total: {pago_total:.2f}")

Calcular y mostrar los promedios de pago

total_pagos = sum(empleado.calcular_pago_total() for empleado in empleados)
promedio_pago = total_pagos / len(empleados)

print(f"\nPromedio de pago a los empleados: {promedio_pago:.2f}")

openai.error.InvalidRequestError: The model `gpt-4` does not exist

[$USER@$OS wolverine]$ python3 -m venv venv
[$USER@$OS wolverine]$ source venv/bin/activate
(venv) [$USER@$OS wolverine]$ pip install -r requirements.txt
Collecting aiohttp==3.8.4
Using cached aiohttp-3.8.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting aiosignal==1.3.1
Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting async-timeout==4.0.2
Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting attrs==22.2.0
Using cached attrs-22.2.0-py3-none-any.whl (60 kB)
Collecting certifi==2022.12.7
Using cached certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting charset-normalizer==3.1.0
Using cached charset_normalizer-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197 kB)
Collecting fire==0.5.0
Using cached fire-0.5.0.tar.gz (88 kB)
Preparing metadata (setup.py) ... done
Collecting flake8==6.0.0
Using cached flake8-6.0.0-py2.py3-none-any.whl (57 kB)
Collecting frozenlist==1.3.3
Using cached frozenlist-1.3.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (154 kB)
Collecting idna==3.4
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting mccabe==0.7.0
Using cached mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting multidict==6.0.4
Using cached multidict-6.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117 kB)
Collecting openai==0.27.2
Using cached openai-0.27.2-py3-none-any.whl (70 kB)
Collecting pycodestyle==2.10.0
Using cached pycodestyle-2.10.0-py2.py3-none-any.whl (41 kB)
Collecting pyflakes==3.0.1
Using cached pyflakes-3.0.1-py2.py3-none-any.whl (62 kB)
Collecting requests==2.28.2
Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Collecting six==1.16.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting termcolor==2.2.0
Using cached termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting tqdm==4.65.0
Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting urllib3==1.26.15
Using cached urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting yarl==1.8.2
Using cached yarl-1.8.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (278 kB)
Using legacy 'setup.py install' for fire, since package 'wheel' is not installed.
Installing collected packages: urllib3, tqdm, termcolor, six, pyflakes, pycodestyle, multidict, mccabe, idna, frozenlist, charset-normalizer, certifi, attrs, async-timeout, yarl, requests, flake8, fire, aiosignal, aiohttp, openai
Running setup.py install for fire ... done
Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 fire-0.5.0 flake8-6.0.0 frozenlist-1.3.3 idna-3.4 mccabe-0.7.0 multidict-6.0.4 openai-0.27.2 pycodestyle-2.10.0 pyflakes-3.0.1 requests-2.28.2 six-1.16.0 termcolor-2.2.0 tqdm-4.65.0 urllib3-1.26.15 yarl-1.8.2

[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: pip install --upgrade pip
(venv) [$USER@$OS wolverine]$ python wolverine.py buggy_script.py "subtract" 20 3
Script crashed. Trying to fix...
Output: Traceback (most recent call last):
File "/home/$USER/rse/open_source/wolverine/buggy_script.py", line 30, in
fire.Fire(calculate)
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/buggy_script.py", line 18, in calculate
result = subtract_numbers(num1, num2)
^^^^^^^^^^^^^^^^
NameError: name 'subtract_numbers' is not defined

Traceback (most recent call last):
File "/home/$USER/rse/open_source/wolverine/wolverine.py", line 153, in
fire.Fire(main)
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/wolverine.py", line 142, in main
json_response = send_error_to_gpt(
^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/wolverine.py", line 55, in send_error_to_gpt
response = openai.ChatCompletion.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/home/$USER/rse/open_source/wolverine/venv/lib64/python3.11/site-packages/openai/api_requestor.py", line 682, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: The model gpt-4 does not exist

from .wolverine import main ImportError: attempted relative import with no known parent package！

3.5 - has anyone tested?

First of all, this is f-ing brilliant and I'm going to be using it frequently... I'm wondering if the agent is able to do its job properly when you drop down to 3.5 turbo, because i have an idea that involves the automated deployment of wolverine at scale, and gpt4 is too expensive for that purpose - will 3.5 work or is it a waste of time?

Use subprocess.run with encoding="utf-8"

wolverine/wolverine.py

Lines 36 to 40 in 2f5a026

 try: 

 result = subprocess.check_output(subprocess_args, stderr=subprocess.STDOUT) 

 except subprocess.CalledProcessError as e: 

 return e.output.decode("utf-8"), e.returncode 

 return result.decode("utf-8"), 0

if you pass encoding='utf-8' to subprocess.run, your strings are automatically decoded:

>>> s = subprocess.run("/bin/echo 'hello'".split(" "), stdout=subprocess.PIPE)
>>> s.stdout
b"'hello'\n"
>>> s = subprocess.run("/bin/echo 'hello'".split(" "), stdout=subprocess.PIPE, encoding="utf-8")
>>> s.stdout
"'hello'\n"

So:

    try:
        result = subprocess.check_output(subprocess_args, stderr=subprocess.STDOUT, encoding="utf-8")
    except subprocess.CalledProcessError as e:
        return e.output, e.returncode
    return result, 0

Suggestion: add pysnooper.snoop decorator to add failing function's variables' types/values/updates to error message context

pysnooper.snoop() is a decorator that helps automate printf style debugging: example

import pysnooper

@pysnooper.snoop()
def number_to_bits(number):
    if number:
        bits = []
        while number:
            number, remainder = divmod(number, 2)
            bits.insert(0, remainder)
        return bits
    else:
        return [0]

number_to_bits(6)

I think this will help focus/improve GPT-4's debugging ability - https://github.com/cool-RR/PySnooper

There's also torchsnooper for even better snoop insight into pytorch

Add chroma to wolverine

If you will add the vector database for the future update to change large amounts of code it will help out GPT-4 & GPT-3.5.

	try:
	result = subprocess.check_output(subprocess_args, stderr=subprocess.STDOUT)
	except subprocess.CalledProcessError as e:
	return e.output.decode("utf-8"), e.returncode
	return result.decode("utf-8"), 0