Giter Club home page Giter Club logo

mistral's Introduction

Mistral

Mistral: A strong and cool northwesterly wind that builds as it moves, bringing good health and clear skies.

license pre-commit

A framework for transparent and accessible large-scale language model training, built with Hugging Face ๐Ÿค— . Includes tools and helpful scripts for incorporating new pre-training datasets, various schemes for single node and distributed training - including on cloud providers like GCP, and importantly, scripts for evaluation.

Visit our Read the Docs for the full documentation.

A Propulsion Endeavor ๐Ÿš€


Quickstart

Installation

Mistral has been tested with Python 3.8.12, PyTorch 1.11.0 (compiled with CUDA 11.3), CUDA 11.3, NCCL 2.10, Transformers 4.17.0, and DeepSpeed 0.6.0.

The environment can be easily built with the following commands:

conda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch
conda activate mistral
pip install -r setup/pip-requirements.txt

A .yaml export of a tested environment is provided at environments/environment-gpu.yaml.

Environments and non-Python dependencies can be managed with conda, and Python dependencies can be managed with pip (note: conda was used for the PyTorch install to get the version compiled with CUDA 11.3).

Training GPT-2 Micro

Prerequisites

First, make sure to update conf/mistral-micro.yaml with the directories you want to store the Hugging Face cache and model runs.

# Artifacts & Caching
artifacts:
    cache_dir: /path/to/artifacts
    run_dir: /path/to/runs

Next, make sure that /path/to/mistral is on your PYTHONPATH.

Single-node single-GPU training

For single-node single-gpu training, run:

conda activate mistral
cd mistral
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/mistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro

Multi-node multi-GPU training with DeepSpeed

Modify /job/hostfile in the following way:

<Hostname of first machine> slots=<Number of GPUs>
<Hostname of second machine> slots=<Number of GPUs>
...
<Hostname of the nth machine> slots=<Number of GPUs>

Below is an example hostfile where we train on machine1 and machine2 with 8 GPUs each:

machine1 slots=8
machine2 slots=8

To start distributed training, run:

conda activate mistral
cd mistral
deepspeed --num_gpus 8 --num_nodes 2 --master_addr machine1 train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 2 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf/deepspeed/z2-small-conf.json --run_id tutorial-gpt2-micro-multi-node

Note: You may need to adjust your batch size depending on the capacity of your GPUs.

If you are interested in training a model on Google Cloud, check out our Google Cloud + Kubernetes Tutorial.

Using the model

Model checkpoints will be stored in the directory specified by the artifacts.run_dir. An example checkpoint might be in /path/to/runs/tutorial-gpt2-micro/checkpoint-1000.

Mistral stores model checkpoints in the Hugging Face format, so models can be loaded and used in the same manner as if one had trained the model with Hugging Face.

For instance, to generate text with ๐Ÿค— Transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

model = GPT2LMHeadModel.from_pretrained("stanford-crfm/eowyn-x777-checkpoint-400000")

input_ids = tokenizer.encode(
    "Hello world, this is a language model prompt.", return_tensors="pt"
)

sample_output = model.generate(input_ids, do_sample=True, max_length=50, top_k=50)

print("Output:\n" + 100 * "-")
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Check out this Google CoLab Notebook to run this demo!


Resources

The Propulsion team has trained 5 GPT-2 Medium models and 5 GPT-2 Small models on the OpenWebText corpus, as found in ๐Ÿค— datasets.

Each model has 600 checkpoints, subject to the following checkpoint schedule:

  • Every 10 Steps, for the first 0 - 100 Steps.
  • Every 50 Steps, from 100 - 2000 Steps.
  • Every 100 Steps, from 2000 - 20,000 Steps.
  • Every 1000 Steps, from 20,000 - 400,000 Steps.

Checkpoints can be downloaded from ๐Ÿค— hub.

Run Type Seed Download
Alias GPT-2 Small 21 download
Battlestar GPT-2 Small 49 download
Caprica GPT-2 Small 81 download
Darkmatter GPT-2 Small 343 download
Expanse GPT-2 Small 777 download
Arwen GPT-2 Medium 21 download
Beren GPT-2 Medium 49 download
Celebrimbor GPT-2 Medium 81 download
Durin GPT-2 Medium 343 download
Eowyn GPT-2 Medium 777 download

Each model has a distinct git repo, and each checkpoint is stored as a branch.

As an example, here's how to get the battlestar model's checkpoint for step 300000:

# Make sure you have git-lfs installed
# (https://git-lfs.github.com)
git lfs install

# get checkpoint 300000 for battlestar
git clone https://huggingface.co/stanford-crfm/battlestar-gpt2-small-x49 --branch checkpoint-300000 --single-branch
cd battlestar-gpt2-small-x49
git lfs pull

For convenience, every model and step checkpoint is listed in mistral_models.json.


Issues

To ask questions, report issues, or request features, please use the GitHub Issue Tracker. Before creating a new issue, please make sure to search for existing issues that may solve your problem.


Differences between Mistral and Hugging Face

Please visit the following page that outlines the differences between the two codebases.


Contributing

Please see the following page for information on contributing.

mistral's People

Contributors

j38 avatar siddk avatar dlwh avatar tiiiger avatar lorr1 avatar teetone avatar anarayan avatar krandiash avatar skylion007 avatar yifanmai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.