Giter Club home page Giter Club logo

h2oai / h2ogpt Goto Github PK

View Code? Open in Web Editor NEW
10.5K 153.0 1.2K 48.15 MB

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

Home Page: http://h2o.ai

License: Apache License 2.0

Python 91.42% Dockerfile 0.03% Shell 0.60% Makefile 0.15% TeX 2.52% Groovy 0.26% Smarty 0.06% HTML 1.79% Jupyter Notebook 3.17%
chatgpt llm ai embeddings generative gpt gpt4all pdf private privategpt

h2ogpt's Introduction

h2oGPT

Turn โ˜… into โญ (top-right corner) if you like the project!

Query and summarize your documents or just chat with local private GPT LLMs using h2oGPT, an Apache V2 open-source project.

  • Private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.)
    • Persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.)
    • Efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach)
    • Parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model
    • HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses
  • Variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.)
    • GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models
    • Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.)
  • UI or CLI with streaming of all models
    • Upload and View documents through the UI (control multiple collaborative or personal collections)
    • Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision
    • Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2)
    • Voice STT using Whisper with streaming audio conversion
    • Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion
    • Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion
    • AI Assistant Voice Control Mode for hands-free control of h2oGPT chat
    • Bake-off UI mode against many models at the same time
    • Easy Download of model artifacts and control over models like LLaMa.cpp through the UI
    • Authentication in the UI by user/password via Native or Google OAuth
    • State Preservation in the UI by user/password
  • Linux, Docker, macOS, and Windows support
  • Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic)
  • OpenAI-compliant
    • Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server)
    • Python client API (to talk to Gradio server)
  • JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema
  • Web-Search integration with Chat and Document Q/A
  • Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently)
  • Evaluate performance using reward models
  • Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours

Get Started

GitHub license Linux macOS Windows Docker

To quickly try out h2oGPT with limited document Q/A capability, create a fresh Python 3.10 environment and run:

  • CPU or MAC (M1/M2):
    # for windows/mac use "set" or relevant environment setting mechanism
    export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
  • Linux/Windows CPU/CUDA/ROC:
    # for windows/mac use "set" or relevant environment setting mechanism
    export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu121 https://huggingface.github.io/autogptq-index/whl/cu121"
    # for cu118 use export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu118 https://huggingface.github.io/autogptq-index/whl/cu118"

Then choose your llama_cpp_python options, by changing CMAKE_ARGS to whichever system you have according to llama_cpp_python backend documentation. E.g. CUDA on Linux:

export LLAMA_CUBLAS=1
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
export FORCE_CMAKE=1

Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time. Windows CUDA:

set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all
set LLAMA_CUBLAS=1
set FORCE_CMAKE=1

Note for some reason things will fail with llama_cpp_python if don't add all cuda arches, and building with all those arches does take some time. Metal M1/M2:

export CMAKE_ARGS="-DLLAMA_METAL=on"
export FORCE_CMAKE=1

Then run the following commands on any system:

git clone https://github.com/h2oai/h2ogpt.git
cd h2ogpt
pip install -r requirements.txt
pip install -r reqs_optional/requirements_optional_langchain.txt

pip uninstall llama_cpp_python llama_cpp_python_cuda -y
pip install -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt --no-cache-dir

pip install -r reqs_optional/requirements_optional_langchain.urls.txt
# GPL, only run next line if that is ok:
pip install -r reqs_optional/requirements_optional_langchain.gpllike.txt

# choose up to 32768 if have enough GPU memory:
python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.2-GGUF --prompt_type=mistral --max_seq_len=4096

Next, go to your browser by visiting http://127.0.0.1:7860 or http://localhost:7860. Choose 13B for a better model than 7B.

We recommend quantized models for most small-GPU systems, e.g. LLaMa-2-7B-Chat-GGUF for 9GB+ GPU memory or larger models like LLaMa-2-13B-Chat-GGUF if you have 16GB+ GPU memory.

See Offline for how to run h2oGPT offline.


Note that for all platforms, some packages such as DocTR, Unstructured, BLIP, Stable Diffusion, etc. download models at runtime that appear to delay operations in the UI. The progress appears in the console logs.

Windows 10/11 64-bit with full document Q/A capability

  • One-Click Installer

    • CPU or GPU: Download h2oGPT Windows Installer (1.3GB file)
      • Once installed, feel free to change start directory for icon from %HOMEDRIVE%\%HOMEPATH% to (e.g.) %HOMEDRIVE%\%HOMEPATH%\h2ogpt_data so all created files (like database) go there. All paths saved are relative to this path.
    • CPU: Click the h2oGPT icon in the Start menu. Give it about 15 seconds to open in a browser if many optional packages are included. By default, the browser will launch with the actual local IP address, not localhost.
    • GPU: Before starting, run the following commands (replace pseud with your user):
      C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe -m pip uninstall -y torch
      C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe -m pip install https://h2o-release.s3.amazonaws.com/h2ogpt/torch-2.1.2%2Bcu118-cp310-cp310-win_amd64.whl
      
      Now click the h2oGPT icon in the Start menu. Give it about 20 seconds to open in a browser if many optional packages are included. By default, the browser will launch with the actual local IP address, not localhost.
      • Some other users may have python located here: C:\Program Files (x86)\h2oGPT\Python\python.exe.
    • To debug any issues, run the following (replace pseud with your user):
      C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\pseud\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw"
      
      Any start-up exceptions are appended to log, e.g. C:\Users\pseud\h2ogpt_exception.log.
  • To control startup, tweak the python startup file, e.g. for user pseud: C:\Users\pseud\AppData\Local\Programs\h2oGPT\pkgs\win_run_app.py

    • In this Python code, set ENVs anywhere before main_h2ogpt() is called
      • E.g. os.environ['name'] = 'value', e.g. os.environ['n_jobs'] = '10' (must be always a string).
    • Environment variables can be changed, e.g.:
      • n_jobs: number of cores for various tasks
      • OMP_NUM_THREADS thread count for LLaMa
      • CUDA_VISIBLE_DEVICES which GPUs are used. Recommend set to single fast GPU, e.g. CUDA_VISIBLE_DEVICES=0 if have multiple GPUs. Note that UI cannot control which GPUs (or CPU mode) for LLaMa models.
      • Any CLI argument from python generate.py --help with environment variable set as h2ogpt_x, e.g. h2ogpt_h2ocolors to False.
      • Set env h2ogpt_server_name to actual IP address for LAN to see app, e.g. h2ogpt_server_name to 192.168.1.172 and allow access through firewall if have Windows Defender activated.
  • One can tweak installed h2oGPT code at, e.g. C:\Users\pseud\AppData\Local\Programs\h2oGPT.

  • To terminate the app, go to System Tab and click Admin and click Shutdown h2oGPT.

    • If startup fails, run as console and check for errors, e.g. and kill any old Python processes.
  • Full Windows 10/11 Manual Installation Script

    • Single .bat file for installation (if you do not skip any optional packages, takes about 9GB filled on disk).
    • Recommend base Conda env, which allows for DocTR that requires pygobject that has otherwise no support (except mysys2 that cannot be used by h2oGPT).
    • Also allows for the TTS package by Coqui, which is otherwise not currently enabled in the one-click installer.

Linux (CPU/CUDA) with full document Q/A capability


macOS (CPU/M1/M2) with full document Q/A capability

  • One-click Installers (Experimental and subject to changes, we haven't tested each and every feature with these installers, we encourage the community to try them and report any issues)

    Mar 07, 2024

    Nov 08, 2023

    Download the runnable file and open it from the Finder. It will take a few minutes to unpack and run the application. These one-click installers are experimental. Report any issues with steps to reproduce at https://github.com/h2oai/h2ogpt/issues.

    Note: The app bundle is unsigned. If you experience any issues with running the app, run the following commands:

    $ xattr -dr com.apple.quarantine {file-path}/h2ogpt-osx-m1-gpu
    $ chmod +x {file-path}/h2ogpt-osx-m1-gpu
  • macOS Manual Install and Run Docs


Example Models

GPU mode requires CUDA support via torch and transformers. A 7B/13B model in 16-bit uses 14GB/26GB of GPU memory to store the weights (2 bytes per weight). Compression such as 4-bit precision (bitsandbytes, AWQ, GPTQ, etc.) can further reduce memory requirements down to less than 6GB when asking a question about your documents. (For more information, see low-memory mode.)

CPU mode uses GPT4ALL and LLaMa.cpp, e.g. gpt4all-j, requiring about 14GB of system RAM in typical use.


Live Demos

Inference Benchmarks for Summarization & Generation

Resources

Partners

Video Demo

demo2.mp4

YouTube 4K version: https://www.youtube.com/watch?v=_iktbj4obAI

Docs Guide

Experimental features

These are not part of normal installation instructions and are experimental.

  • Agents -- in Alpha testing. Optimal for OpenAI, but that also fails sometimes.

Roadmap

  • Integration of code and resulting LLMs with downstream applications and low/no-code platforms
  • Complement h2oGPT chatbot with other APIs like ToolBench
  • Enhance the model's code completion, reasoning, and mathematical capabilities, ensure factual correctness, minimize hallucinations, and avoid repetitive output
  • Add better agents for SQL and CSV question/answer

Development

  • To create a development environment for training and generation, follow the installation instructions.
  • To fine-tune any LLM models on your data, follow the fine-tuning instructions.
  • To run h2oGPT tests:
    pip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0
    conda install -c conda-forge gst-python
    sudo apt-get install gstreamer-1.0
    pip install pygame
    pytest --instafail -s -v tests
    # for client tests
    make -C client setup
    make -C client build
    pytest --instafail -s -v client/tests
    # for openai server test on already-running local server
    pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_client
    or tweak/run tests/test4gpus.sh to run tests in parallel.

Help

Acknowledgements

Why H2O.ai?

Our Makers at H2O.ai have built several world-class Machine Learning, Deep Learning and AI platforms:

We also built platforms for deployment and monitoring, and for data wrangling and governance:

  • H2O MLOps to deploy and monitor models at scale
  • H2O Feature Store in collaboration with AT&T
  • Open-source Low-Code AI App Development Frameworks Wave and Nitro
  • Open-source Python datatable (the engine for H2O Driverless AI feature engineering)

Many of our customers are creating models and deploying them enterprise-wide and at scale in the H2O AI Cloud:

We are proud to have over 25 (of the world's 280) Kaggle Grandmasters call H2O home, including three Kaggle Grandmasters who have made it to world #1.

Disclaimer

Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.

  • Biases and Offensiveness: The large language model is trained on a diverse range of internet text data, which may contain biased, racist, offensive, or otherwise inappropriate content. By using this model, you acknowledge and accept that the generated content may sometimes exhibit biases or produce content that is offensive or inappropriate. The developers of this repository do not endorse, support, or promote any such content or viewpoints.
  • Limitations: The large language model is an AI-based tool and not a human. It may produce incorrect, nonsensical, or irrelevant responses. It is the user's responsibility to critically evaluate the generated content and use it at their discretion.
  • Use at Your Own Risk: Users of this large language model must assume full responsibility for any consequences that may arise from their use of the tool. The developers and contributors of this repository shall not be held liable for any damages, losses, or harm resulting from the use or misuse of the provided model.
  • Ethical Considerations: Users are encouraged to use the large language model responsibly and ethically. By using this model, you agree not to use it for purposes that promote hate speech, discrimination, harassment, or any form of illegal or harmful activities.
  • Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
  • Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.

By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.

Star History

Star History Chart

h2ogpt's People

Contributors

achraf-mer avatar aniketp04 avatar antoninadert avatar anush008 avatar arnocandel avatar blacksuan19 avatar chathurindaranasinghe avatar efii avatar eltociear avatar eshamaaqib avatar fazpu avatar hemenkapadia avatar hsm207 avatar jamesbraza avatar jefffohl avatar kohakublueleaf avatar lweren avatar mathanraj-sharma avatar mins0o avatar ozahavi avatar pseudotensor avatar robinliubin avatar ryanchesler avatar squidwardthetentacles avatar this avatar tloen avatar tomkraljevic avatar us8945 avatar zainhaq-h2o avatar zba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

h2ogpt's Issues

gradio matplotlib issue then Tcl_AsyncDelete: async handler deleted by the wrong thread

/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/gradio/utils.py:901: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail.
  fig = plt.figure(figsize=(0.01, 0.01))
Exception ignored in: <function Image.__del__ at 0x7f17e015f2e0>
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/tkinter/__init__.py", line 4056, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f17e0107b50>
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/tkinter/__init__.py", line 388, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f17e0107b50>
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/tkinter/__init__.py", line 388, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f17e0107b50>
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/tkinter/__init__.py", line 388, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f17e0107b50>
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/tkinter/__init__.py", line 388, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
Aborted (core dumped)

API for LLM

Design API for application and composability with h2o LLM (along the lines of Langchain / compatibility)

  • PR for langchain

input_ids are not moved to GPU

I'm running this locally with downloaded h2oai_pipeline:

`import torch
from h2oai_pipeline import H2OTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("h2oai/h2ogpt-oig-oasst1-256-20b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("h2oai/h2ogpt-oig-oasst1-256-20b", torch_dtype=torch.bfloat16, device_map="auto")

generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)

res = generate_text("Why is drinking water so healthy?", return_full_text=True, max_new_tokens=100)
print(res[0]["generated_text"])`

And while the generation works, I get this Warning:

Setting pad_token_idtoeos_token_id:0 for open-end generation. /opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1359: UserWarning: You are calling .generate() with the input_idsbeing on a device type different than your model's device.input_idsis on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have putinput_idsto the correct device by calling for example input_ids = input_ids.to('cuda') before running.generate(). warnings.warn(

Question 1: How do I make your custom pipeline move the input_ids to GPU?

Question 2: How do I make your custom pipeline set the pad_token_id to suppress the info log?

Question 3: The response from your custom pipeline is just plain text, no history. How do I build a conversation?

Thanks!

Increase in GPU memory usage as generation continues, imbalanced across GPUs

>>> import torch
>>> from transformers import pipeline
>>> from transformers import pipeline
>>> generate_text = pipeline(model="h2oai/h2ogpt-oasst1-512-20b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
>>> res = generate_text("Why is drinking water so healthy?", max_new_tokens=3000)
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

During this long generation, first starts out balanced, then increasingly imbalanced.

Thu Apr 20 16:37:04 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000                On | 00000000:3B:00.0 Off |                  Off |
|  0%   45C    P2              105W / 250W|  12220MiB / 49140MiB |     33%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000                On | 00000000:5E:00.0 Off |                  Off |
|  0%   45C    P2               72W / 250W|  11744MiB / 49140MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000                On | 00000000:86:00.0 Off |                  Off |
|  0%   45C    P2               98W / 250W|  11744MiB / 49140MiB |     19%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000                On | 00000000:AF:00.0 Off |                  Off |
|  0%   45C    P2              103W / 250W|  11125MiB / 49140MiB |     23%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000                On | 00000000:3B:00.0 Off |                  Off |
|  0%   50C    P2               95W / 250W|  40566MiB / 49140MiB |     73%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000                On | 00000000:5E:00.0 Off |                  Off |
|  0%   48C    P2               76W / 250W|  15926MiB / 49140MiB |     36%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000                On | 00000000:86:00.0 Off |                  Off |
|  0%   48C    P2               87W / 250W|  15926MiB / 49140MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000                On | 00000000:AF:00.0 Off |                  Off |
|  0%   49C    P2              130W / 250W|  14682MiB / 49140MiB |     21%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

but then can go back down by alot still during generation:

Thu Apr 20 16:47:17 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000                On | 00000000:3B:00.0 Off |                  Off |
|  0%   50C    P2               95W / 250W|  18334MiB / 49140MiB |     75%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000                On | 00000000:5E:00.0 Off |                  Off |
|  0%   49C    P2               74W / 250W|  17642MiB / 49140MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000                On | 00000000:86:00.0 Off |                  Off |
|  0%   50C    P2              117W / 250W|  17642MiB / 49140MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000                On | 00000000:AF:00.0 Off |                  Off |
|  0%   49C    P2              115W / 250W|  16139MiB / 49140MiB |     16%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Also eventually fails:

??????????????????????????????? Traceback (most recent call last) ?????????????????????????????????
? in <module>:1                                                                                    ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/pipelines/text_genera ?
? tion.py:209 in __call__                                                                          ?
?                                                                                                  ?
?   206 ?   ?   ?   - **generated_token_ids** (`torch.Tensor` or `tf.Tensor`, present when `retu   ?
?   207 ?   ?   ?     ids of the generated text.                                                   ?
?   208 ?   ?   """                                                                                ?
? ? 209 ?   ?   return super().__call__(text_inputs, **kwargs)                                     ?
?   210 ?                                                                                          ?
?   211 ?   def preprocess(self, prompt_text, prefix="", handle_long_generation=None, **generate   ?
?   212 ?   ?   inputs = self.tokenizer(                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/pipelines/base.py:110 ?
? 9 in __call__                                                                                    ?
?                                                                                                  ?
?   1106 ?   ?   ?   ?   )                                                                         ?
?   1107 ?   ?   ?   )                                                                             ?
?   1108 ?   ?   else:                                                                             ?
? ? 1109 ?   ?   ?   return self.run_single(inputs, preprocess_params, forward_params, postproces  ?
?   1110 ?                                                                                         ?
?   1111 ?   def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):   ?
?   1112 ?   ?   return [self.run_single(item, preprocess_params, forward_params, postprocess_par  ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/pipelines/base.py:111 ?
? 6 in run_single                                                                                  ?
?                                                                                                  ?
?   1113 ?                                                                                         ?
?   1114 ?   def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):  ?
?   1115 ?   ?   model_inputs = self.preprocess(inputs, **preprocess_params)                       ?
? ? 1116 ?   ?   model_outputs = self.forward(model_inputs, **forward_params)                      ?
?   1117 ?   ?   outputs = self.postprocess(model_outputs, **postprocess_params)                   ?
?   1118 ?   ?   return outputs                                                                    ?
?   1119                                                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/pipelines/base.py:101 ?
? 5 in forward                                                                                     ?
?                                                                                                  ?
?   1012 ?   ?   ?   ?   inference_context = self.get_inference_context()                          ?
?   1013 ?   ?   ?   ?   with inference_context():                                                 ?
?   1014 ?   ?   ?   ?   ?   model_inputs = self._ensure_tensor_on_device(model_inputs, device=se  ?
? ? 1015 ?   ?   ?   ?   ?   model_outputs = self._forward(model_inputs, **forward_params)         ?
?   1016 ?   ?   ?   ?   ?   model_outputs = self._ensure_tensor_on_device(model_outputs, device=  ?
?   1017 ?   ?   ?   else:                                                                         ?
?   1018 ?   ?   ?   ?   raise ValueError(f"Framework {self.framework} is not supported")          ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/pipelines/text_genera ?
? tion.py:251 in _forward                                                                          ?
?                                                                                                  ?
?   248 ?   ?   ?   in_b = input_ids.shape[0]                                                      ?
?   249 ?   ?   prompt_text = model_inputs.pop("prompt_text")                                      ?
?   250 ?   ?   # BS x SL                                                                          ?
? ? 251 ?   ?   generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=att   ?
?   252 ?   ?   out_b = generated_sequence.shape[0]                                                ?
?   253 ?   ?   if self.framework == "pt":                                                         ?
?   254 ?   ?   ?   generated_sequence = generated_sequence.reshape(in_b, out_b // in_b, *genera   ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in  ?
? decorate_context                                                                                 ?
?                                                                                                  ?
?   112 ?   @functools.wraps(func)                                                                 ?
?   113 ?   def decorate_context(*args, **kwargs):                                                 ?
?   114 ?   ?   with ctx_factory():                                                                ?
? ? 115 ?   ?   ?   return func(*args, **kwargs)                                                   ?
?   116 ?                                                                                          ?
?   117 ?   return decorate_context                                                                ?
?   118                                                                                            ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py:1 ?
? 437 in generate                                                                                  ?
?                                                                                                  ?
?   1434 ?   ?   ?   ?   )                                                                         ?
?   1435 ?   ?   ?                                                                                 ?
?   1436 ?   ?   ?   # 11. run greedy search                                                       ?
? ? 1437 ?   ?   ?   return self.greedy_search(                                                    ?
?   1438 ?   ?   ?   ?   input_ids,                                                                ?
?   1439 ?   ?   ?   ?   logits_processor=logits_processor,                                        ?
?   1440 ?   ?   ?   ?   stopping_criteria=stopping_criteria,                                      ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py:2 ?
? 248 in greedy_search                                                                             ?
?                                                                                                  ?
?   2245 ?   ?   ?   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  ?
?   2246 ?   ?   ?                                                                                 ?
?   2247 ?   ?   ?   # forward pass to get next token                                              ?
? ? 2248 ?   ?   ?   outputs = self(                                                               ?
?   2249 ?   ?   ?   ?   **model_inputs,                                                           ?
?   2250 ?   ?   ?   ?   return_dict=True,                                                         ?
?   2251 ?   ?   ?   ?   output_attentions=output_attentions,                                      ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in ?
? _call_impl                                                                                       ?
?                                                                                                  ?
?   1498 ?   ?   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   ?
?   1499 ?   ?   ?   ?   or _global_backward_pre_hooks or _global_backward_hooks                   ?
?   1500 ?   ?   ?   ?   or _global_forward_hooks or _global_forward_pre_hooks):                   ?
? ? 1501 ?   ?   ?   return forward_call(*args, **kwargs)                                          ?
?   1502 ?   ?   # Do not call functions when jit is used                                          ?
?   1503 ?   ?   full_backward_hooks, non_full_backward_hooks = [], []                             ?
?   1504 ?   ?   backward_pre_hooks = []                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/accelerate/hooks.py:165 in         ?
? new_forward                                                                                      ?
?                                                                                                  ?
?   162 ?   ?   ?   with torch.no_grad():                                                          ?
?   163 ?   ?   ?   ?   output = old_forward(*args, **kwargs)                                      ?
?   164 ?   ?   else:                                                                              ?
? ? 165 ?   ?   ?   output = old_forward(*args, **kwargs)                                          ?
?   166 ?   ?   return module._hf_hook.post_forward(module, output)                                ?
?   167 ?                                                                                          ?
?   168 ?   module.forward = new_forward                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/models/gpt_neox/model ?
? ing_gpt_neox.py:662 in forward                                                                   ?
?                                                                                                  ?
?   659 ?   ?   ```"""                                                                             ?
?   660 ?   ?   return_dict = return_dict if return_dict is not None else self.config.use_return   ?
?   661 ?   ?                                                                                      ?
? ? 662 ?   ?   outputs = self.gpt_neox(                                                           ?
?   663 ?   ?   ?   input_ids,                                                                     ?
?   664 ?   ?   ?   attention_mask=attention_mask,                                                 ?
?   665 ?   ?   ?   position_ids=position_ids,                                                     ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in ?
? _call_impl                                                                                       ?
?                                                                                                  ?
?   1498 ?   ?   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   ?
?   1499 ?   ?   ?   ?   or _global_backward_pre_hooks or _global_backward_hooks                   ?
?   1500 ?   ?   ?   ?   or _global_forward_hooks or _global_forward_pre_hooks):                   ?
? ? 1501 ?   ?   ?   return forward_call(*args, **kwargs)                                          ?
?   1502 ?   ?   # Do not call functions when jit is used                                          ?
?   1503 ?   ?   full_backward_hooks, non_full_backward_hooks = [], []                             ?
?   1504 ?   ?   backward_pre_hooks = []                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/models/gpt_neox/model ?
? ing_gpt_neox.py:553 in forward                                                                   ?
?                                                                                                  ?
?   550 ?   ?   ?   ?   ?   head_mask[i],                                                          ?
?   551 ?   ?   ?   ?   )                                                                          ?
?   552 ?   ?   ?   else:                                                                          ?
? ? 553 ?   ?   ?   ?   outputs = layer(                                                           ?
?   554 ?   ?   ?   ?   ?   hidden_states,                                                         ?
?   555 ?   ?   ?   ?   ?   attention_mask=attention_mask,                                         ?
?   556 ?   ?   ?   ?   ?   position_ids=position_ids,                                             ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in ?
? _call_impl                                                                                       ?
?                                                                                                  ?
?   1498 ?   ?   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   ?
?   1499 ?   ?   ?   ?   or _global_backward_pre_hooks or _global_backward_hooks                   ?
?   1500 ?   ?   ?   ?   or _global_forward_hooks or _global_forward_pre_hooks):                   ?
? ? 1501 ?   ?   ?   return forward_call(*args, **kwargs)                                          ?
?   1502 ?   ?   # Do not call functions when jit is used                                          ?
?   1503 ?   ?   full_backward_hooks, non_full_backward_hooks = [], []                             ?
?   1504 ?   ?   backward_pre_hooks = []                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/accelerate/hooks.py:165 in         ?
? new_forward                                                                                      ?
?                                                                                                  ?
?   162 ?   ?   ?   with torch.no_grad():                                                          ?
?   163 ?   ?   ?   ?   output = old_forward(*args, **kwargs)                                      ?
?   164 ?   ?   else:                                                                              ?
? ? 165 ?   ?   ?   output = old_forward(*args, **kwargs)                                          ?
?   166 ?   ?   return module._hf_hook.post_forward(module, output)                                ?
?   167 ?                                                                                          ?
?   168 ?   module.forward = new_forward                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/models/gpt_neox/model ?
? ing_gpt_neox.py:320 in forward                                                                   ?
?                                                                                                  ?
?   317 ?   ?   layer_past: Optional[Tuple[torch.Tensor]] = None,                                  ?
?   318 ?   ?   output_attentions: Optional[bool] = False,                                         ?
?   319 ?   ):                                                                                     ?
? ? 320 ?   ?   attention_layer_outputs = self.attention(                                          ?
?   321 ?   ?   ?   self.input_layernorm(hidden_states),                                           ?
?   322 ?   ?   ?   attention_mask=attention_mask,                                                 ?
?   323 ?   ?   ?   position_ids=position_ids,                                                     ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in ?
? _call_impl                                                                                       ?
?                                                                                                  ?
?   1498 ?   ?   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   ?
?   1499 ?   ?   ?   ?   or _global_backward_pre_hooks or _global_backward_hooks                   ?
?   1500 ?   ?   ?   ?   or _global_forward_hooks or _global_forward_pre_hooks):                   ?
? ? 1501 ?   ?   ?   return forward_call(*args, **kwargs)                                          ?
?   1502 ?   ?   # Do not call functions when jit is used                                          ?
?   1503 ?   ?   full_backward_hooks, non_full_backward_hooks = [], []                             ?
?   1504 ?   ?   backward_pre_hooks = []                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/accelerate/hooks.py:165 in         ?
? new_forward                                                                                      ?
?                                                                                                  ?
?   162 ?   ?   ?   with torch.no_grad():                                                          ?
?   163 ?   ?   ?   ?   output = old_forward(*args, **kwargs)                                      ?
?   164 ?   ?   else:                                                                              ?
? ? 165 ?   ?   ?   output = old_forward(*args, **kwargs)                                          ?
?   166 ?   ?   return module._hf_hook.post_forward(module, output)                                ?
?   167 ?                                                                                          ?
?   168 ?   module.forward = new_forward                                                           ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/models/gpt_neox/model ?
? ing_gpt_neox.py:152 in forward                                                                   ?
?                                                                                                  ?
?   149 ?   ?   present = (key, value) if use_cache else None                                      ?
?   150 ?   ?                                                                                      ?
?   151 ?   ?   # Compute attention                                                                ?
? ? 152 ?   ?   attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_m   ?
?   153 ?   ?                                                                                      ?
?   154 ?   ?   # Reshape outputs                                                                  ?
?   155 ?   ?   attn_output = self._merge_heads(attn_output, self.num_attention_heads, self.head   ?
?                                                                                                  ?
? /home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/models/gpt_neox/model ?
? ing_gpt_neox.py:219 in _attn                                                                     ?
?                                                                                                  ?
?   216 ?   ?   # Need to be a tensor, otherwise we get error: `RuntimeError: expected scalar ty   ?
?   217 ?   ?   # Need to be on the same device, otherwise `RuntimeError: ..., x and y to be on    ?
?   218 ?   ?   mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.de   ?
? ? 219 ?   ?   attn_scores = torch.where(causal_mask, attn_scores, mask_value)                    ?
?   220 ?   ?                                                                                      ?
?   221 ?   ?   if attention_mask is not None:                                                     ?
?   222 ?   ?   ?   # Apply the attention mask                                                     ?
????????????????????????????????????????????????????????????????????????????????????????????????????
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3
>>> 

Ensemble multi-task LORAs

Plan is to develop multiple LORAs. Point is base can be inferenced once, then each new task can be:
1 base + first
2) -first + second
3) -second + third
etc.

So base is only forward once. This is normal part of LORA paper.

Mixture-of-experts idea can then be used, where yet another LORA is built, but this time it sits in front of all other LORA outputs an an ensemble model to be able to handle the diverse tasks. In principle alot less data is required for the ensemble LORA for it to just choose which task LORAs to blend.

Benchmarks on 2xA6000 Ada vs 2xA100 80GB (roughly same speed)

2x A6000 Ada:

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node=2 --nnodes=1 finetune.py --data_path=ShareGPT_unfiltered_cleaned_split.json.generate_human_bot.train_plain.json --num_epochs=1 --base_model=togethercomputer/GPT-NeoXT-Chat-Base-20B --prompt_type=plain --data_mix_in_path=None --micro_batch_size=4 --batch_size=16 --cutoff_len=1024 --run_id=4
54%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ– | 2888/5311 [21:08:17<17:33:41, 26.09s/it]

chatbot: starlette.websockets.WebSocketDisconnect: 1001

Task exception was never retrieved
future: <Task finished name='xsce894h9ta_5' coro=<Queue.process_events() done, defined at /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/queueing.py:343> exception=WebSocketDisconnect(1001)>
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/queueing.py", line 347, in process_events
    client_awake = await self.gather_event_data(event)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/queueing.py", line 220, in gather_event_data
    data, client_awake = await self.get_message(event, timeout=receive_timeout)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/queueing.py", line 453, in get_message
    data = await asyncio.wait_for(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/starlette/websockets.py", line 133, in receive_json
    self._raise_on_disconnect(message)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/starlette/websockets.py", line 105, in _raise_on_disconnect
    raise WebSocketDisconnect(message["code"])
starlette.websockets.WebSocketDisconnect: 1001

Recover when GPU OOMs

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 22.20 GiB total capacity; 20.67 GiB already allocated; 4.12 MiB free; 21.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

brings app down, no longer can generate. Protect against GPU OOM or at least recover without hanging.

raise ValueError("\n" + ParseException.explain(err, 0)) from None

Some non-fatal matlab processing issue seen in HF demo:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1305, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1239, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/components.py", line 4626, in postprocess
    self._postprocess_chat_messages(message_pair[1]),
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/components.py", line 4599, in _postprocess_chat_messages
    return self.md.renderInline(chat_message)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/main.py", line 299, in renderInline
    return self.renderer.render(self.parseInline(src, env), self.options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/renderer.py", line 87, in render
    result += self.renderInline(token.children, options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/renderer.py", line 108, in renderInline
    result += self.rules[token.type](tokens, i, options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/mdit_py_plugins/dollarmath/index.py", line 70, in render_math_inline
    content = _renderer(str(tokens[idx].content).strip(), {"display_mode": False})
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/utils.py", line 904, in tex2svg
    fig.savefig(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/figure.py", line 3343, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2342, in print_figure
    self.figure.draw(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 95, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/figure.py", line 3140, in draw
    mimage._draw_list_compositing_images(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 752, in draw
    bbox, info, descent = self._get_layout(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 386, in _get_layout
    w, h, d = _get_text_metrics_with_cache(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 97, in _get_text_metrics_with_cache
    return _get_text_metrics_with_cache_impl(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 105, in _get_text_metrics_with_cache_impl
    return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/backends/backend_svg.py", line 1317, in get_text_width_height_descent
    return self._text2path.get_text_width_height_descent(s, prop, ismath)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/textpath.py", line 60, in get_text_width_height_descent
    self.mathtext_parser.parse(s, 72, prop)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/mathtext.py", line 226, in parse
    return self._parse_cached(s, dpi, prop)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/mathtext.py", line 247, in _parse_cached
    box = self._parser.parse(s, fontset, fontsize, dpi)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/_mathtext.py", line 1995, in parse
    raise ValueError("\n" + ParseException.explain(err, 0)) from None
ValueError: 
$"{p1.Name} is {p1.Age} years old.");<br>    Console.WriteLine($
^

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1039, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/utils.py", line 491, in async_iteration
    return next(iterator)
  File "app.py", line 914, in bot
    for output in fun1(*tuple(args_list)):
  File "app.py", line 1346, in evaluate
    for output in CallbackToGenerator(generate, callback=None, **gen_kwargs):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/_collections_abc.py", line 317, in __next__
    return self.send(None)
  File "/home/user/app/stopping.py", line 119, in send
    return self._put('send', value)
  File "/home/user/app/stopping.py", line 111, in _put
    raise val
  File "/home/user/app/stopping.py", line 95, in thread_func
    ret = func(callback=val_callback, **self.kwargs)
  File "app.py", line 1324, in generate
    model.generate(**kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/transformers/generation/utils.py", line 2560, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

"Unable to locate package nvidia-container-toolkit" on Debian (Ubuntu) x86_64

Hi Team,

Nice work and appreciate your efforts on this project ๐Ÿซก

I am trying to run the Docker container and I had the following issue when executing the command sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base

Hit:1 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 https://download.docker.com/linux/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Fetched 110 kB in 1s (195 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package nvidia-container-toolkit-base

And the solution I found was to:

wget https://nvidia.github.io/nvidia-docker/gpgkey --no-check-certificate
sudo apt-key add gpgkey
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

This fix the problem but still giving the following error for the command docker run --runtime=nvidia --shm-size=64g -p 7860:7860 -v ${HOME}/.cache:/root/.cache --rm h2o-llm -it generate.py --base_model=EleutherAI/gpt-neox-20b --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

Could someone help me on this? I am trying to run the Docker container. Tried with docker compose up but still the same.

Have to push stop twice, once for stopping output and another to stop actual GPU generation, fix

Tried adding click_event twice in cancel, didn't help.

Also, while message stops instantly, generation might continue for 2-3 seconds more since in middle of hard generation.

Also, bit uncontrolled, hits the ValueError when generation finally stopped:

Traceback (most recent call last):
  File "/data/jon/h2o-llm/callbacks.py", line 48, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/data/jon/h2o-llm/generate.py", line 597, in generate_with_callback
    model.generate(**kwargs)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/peft/peft_model.py", line 581, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py", line 1406, in generate
    return self.greedy_search(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py", line 2256, in greedy_search
    if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 113, in __call__
    return any(criteria(input_ids, scores) for criteria in self)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 113, in <genexpr>
    return any(criteria(input_ids, scores) for criteria in self)
  File "/data/jon/h2o-llm/callbacks.py", line 22, in __call__
    self.callback_func(input_ids[0])
  File "/data/jon/h2o-llm/callbacks.py", line 43, in _callback
    raise ValueError
ValueError


Cannot train 'EleutherAI/gpt-neox-20b' on 2x 24GB cards

Need to step up to larger models with permissive license. 30b Llama works, but can't be used. 6b is too small, bad results. So next better choice is gpt-neox-20b.

this works:
CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 python finetune.py --data_path=alpaca_data_cleaned.json --base_model="decapoda-research/llama-30b-hf" --llama_type=True --ddp=False

this fails:
CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 torchrun finetune.py --data_path=alpaca_data_cleaned.json --base_model="decapoda-research/llama-30b-hf" --llama_type=True --ddp=False

this fails:
CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 torchrun finetune.py --data_path=alpaca_data_cleaned.json --llama_type=False --ddp=False --lora_target_modules="['query_key_value']" --base_model="EleutherAI/gpt-neox-20b" with python too.

Add option to replace attention with flash attention

Flash attention has already been integrated into gpt-neox models here: https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/models/gpt.py#L215

Can add the swapped model definition as an option to the training and generation scripts and benchmark the speed difference.

Converting Llama and others might be more work. it uses a pretty standard looking attention, but not sure how it differs from the pytorch default. Might just need to remap some layer names https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L160

Something went wrong $"{p1.Name} is {p1.Age} years old.");<br> Console.WriteLine($ ^ ParseException: Expected end of text, found '$' (at char 0), (line:1, col:1)

gradio error for certain inputs:

Downloading pytorch_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.74G/1.74G [00:25<00:00, 67.6MB/s]
/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 1}
  warnings.warn(
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Started GUI
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
WARNING: Special characters in prompt
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1305, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1239, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/components.py", line 4626, in postprocess
    self._postprocess_chat_messages(message_pair[1]),
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/components.py", line 4599, in _postprocess_chat_messages
    return self.md.renderInline(chat_message)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/main.py", line 299, in renderInline
    return self.renderer.render(self.parseInline(src, env), self.options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/renderer.py", line 87, in render
    result += self.renderInline(token.children, options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/markdown_it/renderer.py", line 108, in renderInline
    result += self.rules[token.type](tokens, i, options, env)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/mdit_py_plugins/dollarmath/index.py", line 70, in render_math_inline
    content = _renderer(str(tokens[idx].content).strip(), {"display_mode": False})
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/utils.py", line 904, in tex2svg
    fig.savefig(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/figure.py", line 3343, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2342, in print_figure
    self.figure.draw(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 95, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/figure.py", line 3140, in draw
    mimage._draw_list_compositing_images(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 752, in draw
    bbox, info, descent = self._get_layout(renderer)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 386, in _get_layout
    w, h, d = _get_text_metrics_with_cache(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 97, in _get_text_metrics_with_cache
    return _get_text_metrics_with_cache_impl(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/text.py", line 105, in _get_text_metrics_with_cache_impl
    return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/backends/backend_svg.py", line 1317, in get_text_width_height_descent
    return self._text2path.get_text_width_height_descent(s, prop, ismath)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/textpath.py", line 60, in get_text_width_height_descent
    self.mathtext_parser.parse(s, 72, prop)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/mathtext.py", line 226, in parse
    return self._parse_cached(s, dpi, prop)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/mathtext.py", line 247, in _parse_cached
    box = self._parser.parse(s, fontset, fontsize, dpi)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/matplotlib/_mathtext.py", line 1995, in parse
    raise ValueError("\n" + ParseException.explain(err, 0)) from None
ValueError: 
$"{p1.Name} is {p1.Age} years old.");<br>    Console.WriteLine($
^
ParseException: Expected end of text, found '$'  (at char 0), (line:1, col:1)
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.

not able to run inference on the docker

When I ran "sudo docker-compose up -d --build" and use docker-compose logs -f to check, I got the following errors:
My system has 32 GB DRAM and TitanX GPU, 12GB VRAM:

h2ogpt-h2o-llm-1 | python generate.py --base_model='togethercomputer/GPT-NeoXT-Chat-Base-20B' --prompt_type='human_bot' --lora_weights='GPT-NeoXT-Chat-Base-20B.merged.json.8_epochs.57b2892c53df5b8cefac45f84d019cace803ef26.28'
h2ogpt-h2o-llm-1 |
h2ogpt-h2o-llm-1 |
h2ogpt-h2o-llm-1 | Using Model eleutherai/gpt-j-6b
h2ogpt-h2o-llm-1 | Get EleutherAI/gpt-j-6B model
h2ogpt-h2o-llm-1 | Traceback (most recent call last):
h2ogpt-h2o-llm-1 | File "/workspace/generate.py", line 1515, in
h2ogpt-h2o-llm-1 | fire.Fire(main)
h2ogpt-h2o-llm-1 | File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
h2ogpt-h2o-llm-1 | component_trace = _Fire(component, args, parsed_flag_args, context, name)
h2ogpt-h2o-llm-1 | File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
h2ogpt-h2o-llm-1 | component, remaining_args = _CallAndUpdateTrace(
h2ogpt-h2o-llm-1 | File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
h2ogpt-h2o-llm-1 | component = fn(*varargs, **kwargs)
h2ogpt-h2o-llm-1 | File "/workspace/generate.py", line 249, in main
h2ogpt-h2o-llm-1 | go_gradio(**locals())
h2ogpt-h2o-llm-1 | File "/workspace/generate.py", line 490, in go_gradio
h2ogpt-h2o-llm-1 | model0, tokenizer0, device = get_model(**all_kwargs)
h2ogpt-h2o-llm-1 | File "/workspace/generate.py", line 358, in get_model
h2ogpt-h2o-llm-1 | device = get_device()
h2ogpt-h2o-llm-1 | File "/workspace/generate.py", line 256, in get_device
h2ogpt-h2o-llm-1 | raise RuntimeError("only cuda supported")
h2ogpt-h2o-llm-1 | RuntimeError: only cuda supported
h2ogpt-h2o-llm-1 | /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
h2ogpt-h2o-llm-1 | return torch._C._cuda_getDeviceCount() > 0
h2ogpt-h2o-llm-1 | /usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
h2ogpt-h2o-llm-1 | warn("The installed version of bitsandbytes was compiled without GPU support. "

Train with all clean OSS data + model

Step 1: Get best open-source model:

model: togethercomputer/GPT-NeoXT-Chat-Base-20B https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B

Step 2: Get good open-source instruct data:

Inspired by
https://bair.berkeley.edu/blog/2023/04/03/koala/

Note: GPT-NeoXT-Chat-Base-20B was already trained on OIG data, so "nothing new", just fine-tuning on high-quality data. We need to include new good datasets too.

Run these pytests to create data:
https://github.com/h2oai/h2o-llm/blob/8a1636e35bba5be28d41ab27719d0f70d7eccd91/scrape_dai_docs.py#L364-L398

https://slack-files.com/T0329MHH6-F051UHFFUTD-d93fe5bb76 direct link to data (136MB)

Adversarial attack on reward models

Question: What do reward models really optimize for? How much assumed context do they have?

E.g. adversarial attack might include:

  • arbitrary \n after some average number of words
  • long semi-random sequence of words in paragraphs
    i.e. just formatting.

It might still give high score. If detects coherence etc., would be impressive since then has to be as good as an LLM itself.

Then reward models might assume alot about nature of input data, that already human readable, correct, etc.

How can RLHF can prune wrong/hallucinated responses?

Also, human may be picking up on trivial changes, like formatting, which is easily trainable for. E.g.

  • thesis at front
  • average words per sentence
  • average sentences per paragraph
  • new lines between paragraphs
  • summary at end.

At least the length part is easily chosen from available open data. Summary can be generated from samsum type models, and thesis may not be as important for now.

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/gradio/blocks.py", line 1059, in process_api
    result = await self.call_function(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/gradio/blocks.py", line 868, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/data/jon/h2o-llm/generate.py", line 132, in evaluate
    outputs = model.generate(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/peft/peft_model.py", line 581, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py", line 1528, in generate
    return self.beam_sample(
  File "/home/jon/miniconda3/envs/alpaca/lib/python3.10/site-packages/transformers/generation/utils.py", line 3126, in beam_sample
    next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

image

"recipe refers to # Recipe type## Recipes override any GUI settings- **'auto'**: all models and features automatically determined by experiment settings, toml settings, and feature_engineering_effort- **'compliant'** : like 'auto' except:    - *interpretability=10* (to avoid complexity, overrides GUI or python client chose for interpretability)    - *enable_glm='on'* (rest 'off', to avoid complexity and be compatible with algorithms supported by MLI)    - *fixed_ensemble_level=0*: Don't use any ensemble    - *feature_brain_level=0*(: No feature brain used (to ensure every restart is identical)    - *max_feature_interaction_depth=1*: interaction depth is set to 1 (no multi-feature interactions to avoid complexity)    - *target_transformer='identity'*: for regression (to avoid complexity)    - *check_distribution_shift_drop='off'*: Don't use distribution shift between train, valid, and test to drop features (bit risky without fine-tuning)- **'monotonic_gbm'** : like 'auto' except:    - *monotonicity_constraints_interpretability_switch=1*: enable monotonicity constraints    - *self.config.monotonicity_constraints_correlation_threshold = 0.01*: see below    - *monotonicity_constraints_drop_low_correlation_features=true*: drop features that aren't correlated with target by at least 0.01 (specified by parameter above)    - *fixed_ensemble_level=0*: Don't use any ensemble (to avoid complexity)    - *included_models=['LightGBMModel']*    - *included_transformers=['OriginalTransformer']*: only original (numeric) features will be used    - *feature_brain_level=0*: No feature brain used (to ensure every restart is identical)    - *monotonicity_constraints_log_level='high'*    - *autodoc_pd_max_runtime=-1*: no timeout for PDP creation in AutoDoc- **'kaggle'** : like 'auto' except:    - external validation set is concatenated with train set, with target marked as missing    - test set is concatenated with train set, with target marked as missing    - transformers that do not use the target are allowed to fit_transform across entire train + validation + test    - several config toml expert options open-up limits (e.g. more numerics are treated as categoricals)    - Note: If plentiful memory, can:        - choose kaggle mode and then change fixed_feature_interaction_depth to large negative number,    otherwise default number of features given to transformer is limited to 50 by default        - choose mutation_mode = \"full\", so even more types are transformations are done at once per transformer- **'nlp_model'**: Only enables NLP models that process pure text- **'nlp_transformer'**: Only enables NLP transformers that process pure text, while any model type is allowed- **'image_model'**: Only enables Image models that process pure images- **'image_transformer'**: Only enables Image transformers that process pure images, while any model type is allowed- **'unsupervised'**: Only enables unsupervised transformers, models and scorers- **'gpus_max'**: Maximize use of GPUs (e.g. use XGBoost, rapids, Optuna hyperparameter search, etc.)- **'more_overfit_protection'**: Potentially improve overfit, esp. for small data, by disabling target encoding and making GA behave like final model for tree counts and learning rate- **'feature_store_mojo'**: Creates a MOJO to be used as transformer in the H2O Feature Store, to augment data on a row-by-row level based on Driverless AI's feature engineering. Only includes transformers that don't depend on the target, since features like target encoding need to be created at model fitting time to avoid data leakage. And features like lags need to be created from the raw data, they can't be computed with a row-by-row MOJO transformer.Each pipeline building recipe mode can be chosen, and then fine-tuned using each expert settings.  Changing thepipeline building recipe will reset all pipeline building recipe options back to default and then re-apply thespecific rules for the new mode, which will undo any fine-tuning of expert options that are part of pipeline buildingrecipe rules.If choose to do new/continued/refitted/retrained experiment from parent experiment, the recipe rules are not re-appliedand any fine-tuning is preserved.  To reset recipe behavior, one can switch between 'auto' and the desired mode.  Thisway the new child experiment will use the default settings for the chosen recipe." Summarize the above into a single paragraph.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.