togethercomputer / openchatkit Goto Github PK

License: Apache License 2.0

Python 96.17% Shell 3.83%

openchatkit's Introduction

OpenChatKit

OpenChatKit provides a powerful, open-source base to create both specialized and general purpose models for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai.

In this repo, you'll find code for:

Training GPT-NeoXT-Chat-Base-20B, a 20B parameter chat model (see docs/GPT-NeoXT-Chat-Base-20B.md)
Fine-tuning Llama-2-7B-32K-beta, a 7B parameter long context model
Training Pythia-Chat-Base-7B, a 7B parameter chat model
Testing inference using either of the chat models
Augmenting the model with additional context from a retrieval index

Getting Started
- Requirements
- Chatting with Pythia-Chat-Base-7B
Fine-tuning Llama-2-7B-32K-beta
Reproducing Pythia-Chat-Base-7B
Monitoring
- Loguru
- Weights & Biases
Experimental: Retrieval-Augmented Models
See Also
License
Citing OpenChatKit
Acknowledgements

Getting Started

In this tutorial, you will download Pythia-Chat-Base-7B, an instruction-tuned language model, and run some some inference requests against it using a command-line tool.

Pythia-Chat-Base-7B is a 7B-parameter fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. Pre-trained weights for this model are available on Hugging Face as togethercomputer/Pythia-Chat-Base-7B under an Apache 2.0 license.

More details can be found on the model card for Pythia-Chat-Base-7B on Hugging Face.

Requirements

Before you begin, you need to install PyTorch and other dependencies.

Install Miniconda from their website.
Install Git LFS from their website.
Install the git lfs hooks.

git lfs install

Install mamba in the base environment so it's available in all environments.

conda install mamba -n base -c conda-forge

Create an environment called OpenChatKit using the environment.yml file at the root of this repo.

Note Use mamba to create the environment. It's much faster than using conda.

mamba env create -f environment.yml

Activate the new conda environment.

conda activate OpenChatKit

Chatting with Pythia-Chat-Base-7B

To help you try the model, inference/bot.py is a simple command-line test harness that provides a shell inferface enabling you to chat with the model. Simply enter text at the prompt and the model replies. The test harness also maintains conversation history to provide the model with context.

Start the bot by calling bot.py from the root for the repo.

python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B

Loading the model can take some time, but once it's loaded, you are greeted with a prompt. Say hello.

$ python inference/bot.py 
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Hello.
Hello human.

>>>

Enter additional queries at the prompt, and the model replies. Under the covers, the shell is forming a prompt with all previous queries and passes that to the model to generate more text.

The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a /.

Note The /quit command exits the shell.

Please see the inference README for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.

Fine-tuning Llama-2-7B-32K-beta

Llama-2-7B-32K-beta model can be fine-tuned using various datasets. In this tutorial, we will use the multi-document natural questions dataset and BookSum dataset.

Downloading and converting the base model

To download model Llama-2-7B-32K-beta and prepare it for fine-tuning, run this command from the root of the repository.

python pretrained/Llama-2-7B-32K-beta/prepare.py

The weights for this model will be in the pretrained/Llama-2-7B-32K-beta/togethercomputer_Llama-2-7B-32K-beta directory.

Fine-tuning the model

The training/finetune_llama-2-7b-32k-mqa.sh and training/finetune_llama-2-7b-32k-booksum.sh scripts configure and run the training loop.

To fine-tune the multi-document natural questions dataset, run:
```
bash training/finetune_llama-2-7b-32k-mqa.sh
```

To fine-tune the BookSum dataset, run:

bash training/finetune_llama-2-7b-32k-booksum.sh

As the training loop runs, checkpoints are saved to the model_ckpts directory at the root of the repo.

Please see the training README for more details about customizing the training run.

Converting trained weights to Hugging Face format

Before you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.

For example

mkdir huggingface_models \
  && python tools/convert_to_hf_llama.py \
       --config-name togethercomputer/Llama-2-7B-32K-beta \
       --ckpt-path model_ckpts/llama-2-7b-32k-mqa/checkpoint_10 \
       --save-path huggingface_models/llama-2-7b-32k-mqa \
       --n-stages 4 \
       --n-layer-per-stage 8 \
       --fp16

where the --fp16 flag will load and store models in fp16.

Make sure to replace model_ckpts/llama-2-7b-32k-mqa/checkpoint_10with the latest checkpoint in themodel_ckpts/llama-2-7b-32k-mqaormodel_ckpts/llama-2-7b-32k-booksum` directory.

Reproducing Pythia-Chat-Base-7B

This tutorial walks through reproducing the Pythia-Chat-Base-7B model by fine-tuning Eleuther AI's Pythia-6.9B-deduped model using the OIG dataset.

Downloading training data and the base model

The chat model was trained on the OIG dataset built by LAION, Together, and Ontocord.ai. To download the dataset from Hugging Face run the command below from the root of the repo.

python data/OIG/prepare.py

Note You can help make this chat model better by contributing data! See the OpenDataHub repo for more details.

Once the command completes, the data will be in the data/OIG/files directory.

Pythia-Chat-Base-7B is a fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. To download the model and prepare it for fine tuning, run this command from the root of the repo.

python pretrained/Pythia-6.9B-deduped/prepare.py

The weights for this model will be in the pretrained/Pythia-6.9B-deduped/EleutherAI_pythia-6.9b-deduped directory.

(Optional) 8bit Adam

To use 8bit-adam during training, install the bitsandbytes package.

pip install bitsandbytes # optional, to use 8bit-adam

Training the model

The training/finetune_Pythia-Chat-Base-7B.sh script configures and runs the training loop. After downloading the dataset and the base model, run:

bash training/finetune_Pythia-Chat-Base-7B.sh

As the training loop runs, checkpoints are saved to the model_ckpts directory at the root of the repo.

Please see the training README for more details about customizing the training run.

Converting weights to Hugging Face format

Before you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.

mkdir huggingface_models \
  && python tools/convert_to_hf_gptneox.py \
       --config-name EleutherAI/pythia-6.9b-deduped \
       --ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \
       --save-path huggingface_models/Pythia-Chat-Base-7B \
       --n-stages 4 \
       --n-layer-per-stage 8 \
       --fp16

where the --fp16 flag will load and store models in fp16.

Make sure to replace model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 with the latest checkpoint in the model_ckpts/Pythia-Chat-Base-7B directory.

Testing the new model

You can use the OpenChatKit Shell test harness to chat with the new model. From the root of the repo, run

python inference/bot.py

By default the script will load the model named Pythia-Chat-Base-7B under the huggingface_models directory, but you can override that behavior by specifying --model.

python inference/bot.py --model ./huggingface_models/GPT-NeoXT-Chat-Base-20B

Once the model has loaded, enter text at the prompt and the model will reply.

$ python inference/bot.py 
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Hello.
Hello human.

>>>

The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a /.

Note The /quit command exits the shell.

Please see the inference README for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.

Monitoring

By default, the training script simply prints the loss as training proceeds, but it can also output metrics to a file using loguru or report them to Weights & Biases.

Loguru

Add the flag --train-log-backend loguru to your training script to log to ./logs/file_{time}.log

Weights & Biases

To use Weights & Biases, first login with your Weights & Biases token.

wandb login

And set --train-log-backend wandb in the training script to enable logging to Weights & Biases.

Experimental: Retrieval-Augmented Models

Warning Retrieval support is experimental.

The code in /retrieval implements a python package for querying a Faiss index of Wikipedia. The following steps explain how to use this index to augment queries in the test harness with context from the retriever.

Download the Wikipedia index.

python data/wikipedia-3sentence-level-retrieval-index/prepare.py

Run the bot with the --retrieval flag.

python inference/bot.py --retrieval

After starting, the bot will load both the chat model and the retrieval index, which takes a long time. Once the model and the index are loaded, all queries will be augmented with extra context.

$ python inference/bot.py --retrieval
Loading /OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...
Loading retrieval index...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> Where is Zurich?
Where is Zurich?
Zurich is located in Switzerland.

>>>

License

All code in this repository was developed by Together Computer except where otherwise noted. Copyright (c) 2023, Together Computer. All rights reserved. The code is licensed under the Apache 2.0 license.

Copyright 2023 Together Computer

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.

For full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please contact us.

Citing OpenChatKit

@software{openchatkit,
  title = {{OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications}},
  author = {Together Computer},
  url = {https://github.com/togethercomputer/OpenChatKit}
  month = {3},
  year = {2023},
  version = {0.15},
}

Acknowledgements

Our models are fine-tuned versions of large language models trained by Eleuther AI. We evaluated our model on HELM provided by the Center for Research on Foundation Models. And we collaborated with both CRFM and HazyResearch at Stanford to build this model.

We collaborated with LAION and Ontocord.ai to build the training data used to fine tune this model.

openchatkit's People

Contributors

Stargazers

Watchers

Forkers

tinkerwilco mistobaan weiplanet leonardaukea techthiyanes jaedukseo pandinosaurus codeaudit syaikhipin daryl149 shadowwalker2718 juncongmoo harsh-dhillon csris noahtavares pingpong-ai-models qqq-tech iseesaw xunyuw dhivehi satpalsr marcus-arcadius enealefons dumpmemory rogervaas bitsnaps mjdhasan cattrace manishgautammg3994 ng268888888 signalprime ebenezerandrade pavadik nimishrao kamikazebr jsgro shirayu idahopotato1 samchen8008 bryanogd pinkdiamond1 superxiang gongyu-lightmatter gkbxs padadox perfmjs gshan4056 xieren58 ftgreat ustcwhy dearzhuzhu shadowkun yushi111 cjltctc lyk0014 petercao penghao1023 qqq641106 dean2021 wangliru frankchu0229 yejiahaoye cycled vpegasus aibots-team wangchangh hasai666 git-models yinwh79 likewei-baidu wangguojim sharpss zhuxianchen iwanglei1 zhtg-cxxc xiaojun207 huangtao36 hellotianxu kou-bin yifree thinkerchina scott219 qacboy zurichrain fmdca380 x-tfrk amutong qnmdxl straitrobot nliver jackstephen ciel-zhang shydh rayjue pcjinglang standingalone panzhi1 feixuelove faithliu mahdidridi

openchatkit's Issues

can not create conda environment

Describe the bug
Followed the instructions but could not get

conda env create -f environment.yml

to work because of

ResolvePackageNotFound: 
  - cudatoolkit=11.6.0
  - faiss-gpu=1.7.2
  - nccl=2.12.12.1
  - cupy=10.4.0

To Reproduce
Steps to reproduce the behavior:
Intall miniconda
run
conda env create -f environment.yml

Expected behavior
Create an environment called OpenChatKit but can't create

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Mac

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add more documentation on inference

The bot commands for inference aren't very well documented. Add more documentation about what they do.

Read me. spelling mistake errors

OpenChatKit Feedback Report

My question:

胜多负少的

Bot response:

所得到的多多

Ideal bot response:

点对点

Bot response was:

Factually incorrect
Not helpful
Harmful, inappropriate or unsafe

Training script needs to print more progress

Feedback from a user: When running the training script, it's not clear that it's making progress. The only way to know that it's doing something is by looking at nvidia-smi.

Will the pythia 7b chat base model also be released?

Resources required to launch

Hello
What is minimum specification to launch (but not train) it on local machine with normal speed?
Thank you

Can't prepare pretrained model

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python pretrained/GPT-NeoX-20B/prepare.py
Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 613/613 [00:00<00:00, 272kB/s]
Downloading tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 156/156 [00:00<00:00, 55.0kB/s]
Downloading vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03M/1.03M [00:01<00:00, 748kB/s]
Downloading merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 555kB/s]
Downloading tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02M/2.02M [00:01<00:00, 1.61MB/s]
Downloading special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90.0/90.0 [00:00<00:00, 39.5kB/s]
Downloading pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56.4k/56.4k [00:00<00:00, 3.44MB/s]
Downloading pytorch_model-00001-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 883M/883M [01:16<00:00, 12.1MB/s]
Downloading pytorch_model-00002-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00003-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00004-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00005-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00006-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00007-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00008-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00009-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00010-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00011-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00012-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00013-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00014-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00015-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:19<00:00, 11.5MB/s]
Downloading pytorch_model-00016-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00017-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00018-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00019-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00020-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00021-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00022-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00023-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00024-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00025-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00026-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00027-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00028-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00029-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00030-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 12.0MB/s]
Downloading pytorch_model-00031-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:17<00:00, 11.7MB/s]
Downloading pytorch_model-00032-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00033-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00034-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00035-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00036-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:26<00:00, 10.5MB/s]
Downloading pytorch_model-00037-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:24<00:00, 10.7MB/s]
Downloading pytorch_model-00038-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00039-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:29<00:00, 10.2MB/s]
Downloading pytorch_model-00040-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00041-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:21<00:00, 11.2MB/s]
Downloading pytorch_model-00042-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:25<00:00, 10.6MB/s]
Downloading pytorch_model-00043-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00044-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:23<00:00, 10.9MB/s]
Downloading pytorch_model-00045-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576M/576M [00:50<00:00, 11.9MB/s]
Downloading pytorch_model-00046-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 591M/591M [00:54<00:00, 11.3MB/s]
Killed


(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$ ls
config.json  special_tokens_map.json  tokenizer_config.json  tokenizer.json
(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$

However /home/georgi/.cache/huggingface/transformers is 41.3 GB. Any ideas what goes wrong?

LORA Training

Hi, it will be super nice if you provide LORA training, to reduce the computational cost. Because 8x80 A100 is too expensive

error de login

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Is it possible to run the system on Google Colab ?

Is it possible to reduce the amount of resources needed to run the system on Google Colab ?
because not everyone has the means to experiment with A100 80gb

python inference/bot.py Killed

git clone https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B
run python inference/bot.py --model GPT-NeoXT-Chat-Base-20B
Loading GPT-NeoXT-Chat-Base-20B to cuda:0...
Killed

run python inference/bot.py
OSError: Can't load the configuration of '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B' is the correct path to a directory containing a config.json file

so cp -r GPT-NeoXT-Chat-Base-20B huggingface_models/

root@msi:~/test/OpenChatKit-main# python inference/bot.py
Loading /root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...
Killed

I am confused, it is running in docker, is the gpu not enough video memory?

Does it support Chinese Q&A?

ChatGPT supports multi-language question answering and reasoning, although in most cases, English answers are generated first and then translated into other languages. So I want to ask whether OpenChatKit supports direct Chinese Q&A, or do I need to use Chinese data set for training before I can conduct Chinese Q&A?

Can't install nccl package

Describe the bug
When trying to set up the conda environment, it is failing to install the nccl package.

(base) PS D:\OpenChatKit> conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - nccl=2.12.12.1

To Reproduce
Steps to reproduce the behavior:

Enter conda env create -f environment.yml

Expected behavior
It should install all of the packages

Desktop (please complete the following information):

Windows 11 Pro

Add print statements to `pretrained/GPT-NeoX-20B/prepare.py` to show progress

Describe the bug
pretrained/GPT-NeoX-20B/prepare.py can take a long time to prepare the base model. It should print progress as it's converting.

To Reproduce
Steps to reproduce the behavior:

run python pretrained/GPT-NeoX-20B/prepare.py from the root of the repo.

Expected behavior
The script should print progress.

Can I fine tune GPT-Neo-XT-Chat-Base-20B with 8 A100?

Can you introduce the computing resources needed for the experiment

One issue on env ResolvePackageNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro OpenChatKit % conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

cupy=10.4.0
nccl=2.12.12.1
faiss-gpu=1.7.2
cudatoolkit=11.6.0

To Reproduce
Steps to reproduce the behavior:

Expected behavior
.

Screenshots

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context
Add any other context about the problem here.

Add documentation for running inference on multiple GPUs

While trying out python inference/bot.py --retrieval --model togethercomputer/GPT-NeoXT-Chat-Base-20B
I got this error on A100 GPU:

File "inference/bot.py", line 185, in <module>
    main()
  File "inference/bot.py", line 173, in main
    OpenChatKitShell(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/cmd.py", line 217, in onecmd
    return func(arg)
  File "inference/bot.py", line 87, in do_say
    output = self._model.do_inference(
  File "inference/bot.py", line 32, in do_inference
    outputs = self._model.generate(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/generation_utils.py", line 1944, in sample
    outputs = self(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 619, in forward
    outputs = self.gpt_neox(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 511, in forward
    outputs = layer(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 115, in forward
    qkv = self.query_key_value(hidden_states)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

TODO: Better Documentation for hyperparameters and fine tuning

How to reproduce HF App's behavior (hyper-parameters, exact prompt)
How to fine-tune models given a new corpus + things to be careful about and best practice

TODO: Add in the training script for the moderation model

Is there an error in this line of code？“python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B”No parameters specified offline directory

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Training LLaMa?

In theory the LLaMa 30b & 65b should be much more capable than the GPT-NeoX 20b.

Does OpenChatKit support LLaMa? If not, is it on the roadmap?

I appreciate that togethercomputer might not be able to release pretrained LLaMa weights due to the licence, but it'd be great if researches can at least play with it.

What kind of data to feed to the model ?

What kind of data to provide for finetuning ? What are the best practices for finetuning ?

Why instruction tuning calculate whole sentence loss?

I noticed that OIG dataset adds human and bot tag in each sample. In your code, you directly pack samples to max seq length and calculate cross entropy on whole sentence. Will this make the model output human, bot tag and not knowing when to stop? Does only calculate the last bot response loss be more suitable?

Having issue on EnvironmentFileNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

To Reproduce
Steps to reproduce the behavior:

Install miniconda3
run " conda env create -f environment.yml"

Expected behavior
Should be move to next step

Screenshots
(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

Desktop (please complete the following information):

OS: [MacOS]

Additional context
Add any other context about the problem here.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 432.00 MiB (GPU 2; 23.65 GiB total capacity; 20.88 GiB already allocated; 259.56 MiB free;

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

[feature]Do you support RLHF training ?

After viewing your code , I found that you haven't support RLHF training yet. Your code is mainly about distributed training using pipeline & data parallel.
Do you have the plan to support RLHF training?Do you think it is necessary?

RuntimeError: Failed to import transformers.optimization

Describe the bug
I've downloaded the corpus and the model weights, I ran the command bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh and I got the following:
https://gist.github.com/riatzukiza/0930307fc90bf940103364be2d3db5c1

To Reproduce
Steps to reproduce the behavior:

Download weights
download corpus
run bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
Bam error

Expected behavior

To fine tune the model, or get an out of memory error

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Pop os

Additional context
Add any other context about the problem here.

${DIR}/../data/OIG/files/unified_ni.jsonl:0.2

Could you please tell me what's the meaning of 0.2? Can I add my own data to the DATASETS? If so, how should i do? Thanks so much!

Build a docker image for openchatkit

Is your feature request related to a problem? Please describe.
A docker image might be easier for people to use.

Describe the solution you'd like
We could add a /docker folder or a simple dockerfile to the repo, so people could build the image by themselves. And maybe we could push the image to dockerhub so they could just pull and test.

Does it run on single nVidia RTX A4000?

Does it run on single nVidia RTX A4000 or do I need two or more?

Issue Converting Weights to Huggingface Format

I'm trying to convert the weights as per the example but running into an issue.

After mkdir huggingface_models \ && python tools/convert_to_hf_gptneox.py \ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path /huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6

I'm getting this error:
Traceback (most recent call last): File "/mnt/c/Users/name/OpenChatKit/tools/convert_to_hf_gptneox.py", line 102, in <module> assert args.save_path is not None AssertionError --save-path: command not found --n-stages: command not found --n-layer-per-stage: command not found

I'm using Windows 11 WSL Ubuntu 22.04.2 LTS

FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/prank_0_checkpoint.pt'

Ubuntu Ubuntu 22.04.2 LTS
After downloading the model and now trying to convert:

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python3.10 tools/convert_to_hf_gptneox.py --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6
loading stage 0
Traceback (most recent call last):
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/convert_to_hf_gptneox.py", line 110, in <module>
    load_decentralized_checkpoint(
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/convert_to_hf_gptneox.py", line 43, in load_decentralized_checkpoint
    checkpoint = torch.load(os.path.join(input_path, f'prank_{i}_checkpoint.pt'), map_location=torch.device("cpu"))
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/prank_0_checkpoint.pt'

Any ideas?

Exception in subprocess.py

I run the following command:
python prepare.py

The result is as follows:

error: RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
Traceback (most recent call last):
File "prepare.py", line 18, in
process = subprocess.run(
File "/root/miniconda3/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git clone https://huggingface.co/datasets/laion/OIG /www/wwwroot/OpenChatKit/data/OIG/files' returned non-zero exit status 128.

Add:
curl https://huggingface.co/datasets/laion/OIG is OK.
And Permission is 777 in /www/wwwroot/OpenChatKit/data/OIG/files

Why?

ResolvePackageNotFound: - nccl=2.12.12.1

When I try to creat the environment,it happens.
I run it on the Windows.

Is that possible to have Chinese version of README?

Is your feature request related to a problem? Please describe.
Looks like there are not clear on installation in Chinese

Describe the solution you'd like
I can help to translate it into Chinese

Describe alternatives you've considered

Additional context

Before running this software, please make sure computer have minimum requirements

It is recommended to describe the operating environment required for installation (for example, macos is not recommended), cpu, memory, storage and other conditions in the readme.md file

where is the wiki-server.py file in retrieval part?

As mentioned above

Conda takes too long to install dependencies

Describe the bug
One user reported conda env create -f environment.yml taking over 60 minutes. We need a better solution.

To Reproduce
Steps to reproduce the behavior:

Run conda env create -f environment.yml from the root of the repo.

Expected behavior
Should finish in a "reasonable" amount of time.

PytorchStreamReader failed reading zip archive: failed finding central directory

Anyone know how to issue this exception?

I have tried use_new_zipfile_serialization=False, but it doesn't work:

How I manage my own domain knowledge articles?

I want to know the format of my documents if I want to fine-tune a model on my domain knowledge.
If my documents are many complete articles should I split them into many small questions :
: questions from articles :answers from articles

or can I feed the model with original article(how can I feed the model with my whole article?).

many thanks!

"No package retrieval" error running inference

One of our users tried running inference and got an error saying that there was no package called retrieval.

how to identify the process of training?

I started a training process with 4*V100S(32GB VRAM each) at 18:00, and i got a "training starts..." prompt.
With nvidia-smi, i can see that 3 GPUs are running with utils 100%.
The next morning, the processes are still running, but nothing in output folder, neither the log message.
So, is there someway to see how the training job is going?

Cupy error while training (`CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal`)

Describe the bug
The bash script to train the model does not work because of a Cupy error:

(OpenChatKit-Test) user@pc:~/OpenChatKit$ bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
Traceback (most recent call last):
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
Traceback (most recent call last):
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Initialize NCCLCommunicator: < pipeline_group_0 >; rank: 0
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal

To Reproduce
Steps to reproduce the behavior:

Run code on WSL-Ubuntu in a Conda Env
Run the bash script bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
The error above is produced

Expected behavior
The code is supposed to execute.

Screenshots
NA

Desktop (please complete the following information):

OS: Windows 11
Ubuntu-WSL
Miniconda
Nvidia GeForce 3060 (Could this be the issue?)

Additional context
Also, the previous steps to download the data and weights also gave me errors. These steps:

python data/OIG/prepare.py
python pretrained/GPT-NeoX-20B/prepare.py

Ended after a couple minutes/hours with the error message "Killed". I was able to acquire the data sets with a simple wget command but I thought that was weird too.

Can you show some specific examples of ipynb documentation reference in fine-tune?

I think the explanation of train and fine-tune process is much few, can Can you show some specific examples of ipynb documentation reference? Many thanks!

Bug when running inference with retrieval augmented model

Describe the bug
Using retrieval-augmented models, a sequence of prompts leads to a runtime error (size mismatch between two tensors).

To Reproduce
Steps to reproduce the behavior:

After downloading the Wikipedia index, run inference using python inference/bot.py --retrieval
In the OpenChatKit Shell, run the following set of queries:

>>> Where is Bern?
...
>>> Where is Switzerland?
...
>>> Is Switzerland in Europe or in America?

Traceback
The queries lead to the following error:

Traceback (most recent call last):
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 185, in <module>
    main()
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 181, in main
    ).cmdloop()
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 217, in onecmd
    return func(arg)
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 87, in do_say
    output = self._model.do_inference(
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 32, in do_inference
    outputs = self._model.generate(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1944, in sample
    outputs = self(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 619, in forward
    outputs = self.gpt_neox(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 511, in forward
    outputs = layer(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 153, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 220, in _attn
    attn_scores = torch.where(causal_mask, attn_scores, mask_value)
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2247) at non-singleton dimension 3

Environment
Setup using mamba in root dir: mamba env create -f environment.yml

Hardware:

OS: Ubuntu 20.04.5 LTS
1x A100 80G GPU
8 vCPU with 128GB RAM

Exception in prepare.py

python data/OIG/prepare.py
File "data/OIG/prepare.py", line 27
gzip.open(f, 'rb') as infile,
^
SyntaxError: invalid syntax

Tell me why, thank you!

Roadmap

What’s the roadmap for the project becoming a true open alternative to chatgpt?

While its capabilities are impressive on their own, stacked against ChatGPT there’s lot lacking.

For example…

it can’t really generate useful code
the code it generates is only python
its reasoning capabilities need a lot of improvement
its knowledge of the world needs a lot of improvement

To me it seems that it is good at generating coherent sentences, but massively lacks reasoning.

Hopefully this feedback doesn’t come across as harsh or critical. It seems this project is the closest there is to a ChatGPT alternative. Impressive work everyone who contributed so far. I’m rooting for this projects success and hope it will truly rival ChatGPT someday.

Resources required to replicate #openchatkit

Hi
What is minimum specification to replicate it on local machine.

OpenChatKit Feedback Report

My question:

Test

Bot response:

Test

Ideal bot response:

Test!

Bot response was:

Factually incorrect
Not helpful
Harmful, inappropriate or unsafe

togethercomputer / openchatkit Goto Github PK

openchatkit's Introduction

OpenChatKit

Contents

Getting Started

Requirements

Chatting with Pythia-Chat-Base-7B

Fine-tuning Llama-2-7B-32K-beta

Downloading and converting the base model

Fine-tuning the model

Converting trained weights to Hugging Face format

Reproducing Pythia-Chat-Base-7B

Downloading training data and the base model

(Optional) 8bit Adam

Training the model

Converting weights to Hugging Face format

Testing the new model

Monitoring

Loguru

Weights & Biases

Experimental: Retrieval-Augmented Models

See Also

License

Citing OpenChatKit

Acknowledgements

openchatkit's People

Contributors

Stargazers

Watchers

Forkers

openchatkit's Issues

My question:

Bot response:

Ideal bot response:

Bot response was:

My question:

Bot response:

Ideal bot response:

Bot response was:

Recommend Projects

Recommend Topics

Recommend Org