mixtral's Introduction

mixtral

This repo aims to fine-tune Mixtral model.

Requirements

I provided a small Docker image to run the code. You can build it or pull from this repo.

I have not been able to install latest flash_attn in any container that has the necessary PyTorch and Cuda versions. This image is made to run on H100 GPUs.

Run

Run the simple_inference.py script to test the model. It actually runs on an A100 with 40GB of memory!

Train

Test you system by running the simple_train.py script. It will train a model on a small dataset. It takes around 1 hour on a 8xH100 machine.

Axolotl

Our experiments are being conducted using the Axolotl lib. For this we are uisng a Docker image that can be pulled from this repo. It bakes the execution of the axolotl_launcher.py script as a replacement of the axolotl.cli.train script. We do this so we can inject parameters from the W&B UI as a launch job.

Recommend Projects

tcapelle / mixtral Goto Github PK

mixtral's Introduction

mixtral

Requirements

Run

Train

Axolotl

mixtral's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent