helixml / helix Goto Github PK

Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server

Home Page: https://tryhelix.ai

License: Other

Shell 0.62% Go 54.98% TypeScript 41.22% HTML 0.15% Dockerfile 0.14% Python 2.57% Mako 0.05% Smarty 0.27%

golang llama llm mistral openai self-hosted mixtral sdxl stable-diffusion api

helix's People

Contributors

Stargazers

Watchers

Forkers

drasaadmoosa barshy praveenaki k2m5t2 gvc0461082002 trevyn xiejiss krzynio ivanfioravanti waterfrog chocobar akollegger flankerad everblake

helix's Issues

non-english language qapairs

currently the qapair thing seems to translate non-english input data to english, however we have users who want to be able to do it all in, say, french

when working get back to french user on crisp

render markdown

Can we run mistral on older GPUs?

E.g 1080

Jsonl input data

If the user uploads their own qapairs, skip the qapair generation phase

Multi GPU support

Support multiple GPUs on a single node. Initially we can workaround this just by running N runners with CUDA_VISIBLE_DEVICES passed through to the runner python processes

add your own runner

show API calls to replicate many actions

(e.g. text & image inference to start with)

basically show the curl equivalent of the UI action - i.e. make it clear that you can use the API for each of these actions

"ignore errors" button is broken

it goes to notfound

when done, reply to user

multi-model group

(i.e. train an image and text and combine them into one chat)

the session page scrolls to the bottom randomly

There is some useMemo that is reloading (possibly from keycloak) that is causing the "the session has changed scroll to the bottom" behaviour even when the session clearly has not changed - it's annoying because you are actively scrolling up and down just reading and then it will just jump to the bottom of the page

GPU OOM avoiding by memory tracking & kill any pid that shows up in nvidia-smi that it doesn't own

too few questions in small dataset

If you put a small bit of text like:

Bob lives at 6 Crow Terrace

It will generate a single question / answer pair and then axotl complains there are too few questions in the training data set

multiple SDXL at the same time causes error

this could be in same session or not

dashboard memory reporting

can you make it use GiB not GB? as in, gibibytes 1GB = 1024 * 1024 * 1024 bytes

When using "Fine Tune" option, it does not open and save a new session like it does with the "Create" option - Cannot Restore a fine tune session

If I start a conversation / generation with the Create option selected, then it opens and saves a session in the sidebar, but this does not happen when I use the Finetune option.

I cannot find a way to restore my Fine Tune session

automatically notify users when their training is completed

also, make it automatically proceed to training when qapairs is completed

Mixtral or yi for qapair generation

For fully private deploys

Let's test out 4bit mixtral, and adapt pkg/dataprep/text/dynamic.go to call into it

check URLs have text

some URLs are just javascript and break unstructured - we need a better error: https://www.reuters.com/legal/colorado-ballot-case-adds-fuel-trumps-nomination-drive-2023-12-20/

dashboard session search

we need a way to see what happened to a single session in the dashboard

speedway empty error

https://mlops-community.slack.com/archives/C0675EX9V2Q/p1701725991656799

Improve deploy it yourself docs

See discord. In particular what URL to put in the browser to open the app

Model seems obsessed with more fine tuning of dataset

Having submitted a document (random doc, outline of a fictional story), then asking what a character should do in the story, I keep being met with "Character should continue fine-tuning the data to improve the accuracy of the model." This seemed to be an inescapable answer, no matter how I posed the question.

It also does not appear to learn from any further conversation I have after the dataset is submitted.

switch to isStale everywhere

the logic for whether a model instance is stale is currently in 3 places (search for stale := and nonStale :=)

move it to one

url box mime type detection

if you put a URL to a file in the URL box - detect the bloody mime type so we don't split docs that are downloaded

the URL box should download files first

OpenAI compatible API

Vote for this from a user on discord and karolis

fine tuning fails when filename ends in JPG

in all caps

long-running finetunes could get killed prematurely

we should probably distinguish between ACTIVE and IDLE sessions and not kill the active ones

empty response messages error

https://mlops-community.slack.com/archives/C0675EX9V2Q/p1701727773319809

Old list "done"

Things we did whilst using the old "list"

place in the queue indiciation

if it's more than 5 seconds

we already have the "this is taking a while" window - this is to show the place in the queue also

reverse the color of the active session

make it more obvious what session we are looking at

nignx 500's in the runner "load session from api" handler

we are getting nignx 500's in the runner "load session from api" handler https://mlops-community.slack.com/archives/C0675EX9V2Q/p1702369315736539

the delete button shows for read only folders in filestore

Remove analytics from default images

If a user does a private deploy they probably don't want our GA or crisp widget

We should enable an opt out version check though

new activity dot

show a dot next to sessions that are currently active or have new replies

consider speeding up sdxl fine tuning

probably doesn't need that many epochs, or, at least you could choose to tune it for longer if not happy with the results

check URL type

make it clear that URLs need to be of text content - for example a youtube URL will not work

sometimes model parallelism on single gpu is desirable

when the queue depth for a certain model on a certain runner is high, we might actually want to start multiple instances of a model on a single gpu

Use huggingface tokenizer chat template for inference

In the llm model go code (e.g. here) we build up a prompt that is a formatted string based on the chat template associated with the model.

We could instead store a generic json-ised version of the chat history in task.Prompt, like:

[{"role": "user", "content": "What's the capital of France'?"}, {"role": "assistant", "content": "It's Paris."}]

and the use the model's tokenizer to format the message for us inside axolotl at inference time:

messages = json.loads(json_messages)
tokenizer = AutoTokenizer.from_pretrained(model_name)
encoded_messages = tokenizer.apply_chat_template(new_messages, tokenize=False)

This will reduce the effort needed to add subsequent models with potentially different chat templates.

ensure the order of things in the dashboard

make sure we are seeing things date ordered

sessions that are too long cause weird python errors

instead we need to truncate the start of the session and allow the user to carry on chatting

reply to user when issue is fixed

scheduler not hitting spun up model

quite often there's a model ready to serve and a new one gets spun up on the other node - maybe the clocks are drifting between the machines so the 2 second head start doesn't work? or the python processes aren't polling every 100ms or something?

fine tuning hangs

why? do we need to automatically restart things if they haven't started in a timeout?

why do finetunes stick around in GPU memory?

once they're done they should exit right?

analyse all sessions in the database

for each one:

are there errors? if so, add issues to github. calculate which issues caused the most errors
is there a trained model with no interactions? if so add to chris's spreadsheet and ping him. also, #43
were they successful at doing anything?

overall: what % of sessions were successful and what were the biggest pain points? categorise the use cases