This repo aims to fine-tune Mixtral model.
I provided a small Docker image to run the code. You can build it or pull from this repo.
I have not been able to install latest flash_attn
in any container that has the necessary PyTorch and Cuda versions. This image is made to run on H100 GPUs.
- Run the simple_inference.py script to test the model. It actually runs on an A100 with 40GB of memory!
Test you system by running the simple_train.py
script. It will train a model on a small dataset. It takes around 1 hour on a 8xH100 machine.
Our experiments are being conducted using the Axolotl lib. For this we are uisng a Docker image that can be pulled from this repo. It bakes the execution of the axolotl_launcher.py
script as a replacement of the axolotl.cli.train script. We do this so we can inject parameters from the W&B UI as a launch job.