young-geng / easylm Goto Github PK

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

License: Apache License 2.0

Python 94.38% Shell 5.62%

chatbot deep-learning flax jax language-model large-language-models llama natural-language-processing transformer

easylm's People

Contributors

Stargazers

Watchers

Forkers

c00renut wimjan123 sea-snell dumpmemory avanindra plurigrid lixw668 air23zj allchain aliasgharheidaricom jackrain fastrocket chiayewken w32zhong thebloke medinaeder nanqiai erzhuoshao xli4217 jeromehujun batman-do aibibulaatawula logophoman jdkbean sunjeewa jpollard-cs techthiyanes 81cuongvn lianzhouhui lcsouzamenezes bonabobo sarvex wthoutanymmries hamishivi bananemure techventurebuilder ailabteam itsharex patrickmcguinness apx103 sanjibnarzary utkarshx taoyaolaile dattgoswami qqq-tech zbp-sapphire xinqiyang mediapreneur shaneholloman hokhyk musclecademy jluizgomes ahmedjawedaj moto3103 letianyuweng shawnhowell xingyaoww timing2022 vectorinstitute decaldas redzh apollohuang1 guoqiangjia heegyukim smartfridgeiot abirkorched standardgalactic probioticfarmer alexlan123 cygwynd hypersniper05 apertus-dev juliensalinas yspaik perfec-yu oobabooga efocht truncuso csminging philippe-eecs evelynmitchell onlinepreneur shimmeringvoid 1nsane-dev gburachas jonathan-teel datosorgco dty0606 miketout santoshdahale tomlangan instanetk zhanghanghitomi oyelowo cyd3nt rohitn mengjin001 hichemmaiza rlrs petercao

easylm's Issues

Training OPT with Koala dataset

Hi, thank you for opening such a nice work on public.

I have two issues I want to raise.

No.1, in the code for processing all the datasets, https://github.com/young-geng/koala_data_pipeline ,
I'm afraid there are some missing datasets.
For example, in the line 14 of process_chat_data.py,

input_file='/nfs/vault/data/language/chat_data_v3.json'

above file must exists in order to run the file without an error.
Where can I get all those input datasets that are listed in all the processing python files?

No.2, I've tried to look for the documentation on using the EasyLM library to fine-tune
the OPT model with the Koala dataset, but there was only the documentation for fine-tuning
the LLaMA model.
Can I get the any documentation on finetuning, for example, OPT-6.7B with the Koala dataset?

Again, thank you so much for an amazing work!

Can save_checkpoint support writing to a GCS path?

https://github.com/young-geng/EasyLM/blob/main/EasyLM/checkpoint.py#L91
Can save_checkpoint support writing to a GCS path?

How to convert the weights on HF into the format of EasyLM?

https://huggingface.co/decapoda-research/llama-13b-hf
How to convert the weights on HF into the format of EasyLM?

Does wandb_dir support GCP paths?

gcp_path=gs://path/
...
--logger.wandb_dir=gcp_path
...

When I train a 30-billion-parameter Llama model using V3-256, what configuration would be appropriate? I've tried '1, 64, 4', '1, 128, 2', and '1, 32, 8', but none of them worked."

When I train a 30-billion-parameter Llama model using V3-256, what configuration would be appropriate? I've tried '1, 64, 4', '1, 128, 2', and '1, 32, 8', but none of them worked.

When I configured batch size 4 on v3-8, it was normal. But when I configured batch size 128 on v3-256, it reported OOM.What is the reason?

When I configured batch size 4 on v3-8, it was normal. But when I configured batch size 128 on v3-256, it reported OOM.
What is the reason?

...
--mp_mesh_dim='-1, 1'
--train_dataset.json_dataset.batch_size=128
...

Error when trying to convert the Koala deltas to Hf format "can't convert np.ndarray of type bfloat16"

Firstly, thanks very much for releasing Koala and the code. I'm really looking forward to trying it.

I download the 7B delta from https://drive.google.com/drive/folders/10f7wrlAFoPIy-TECHsx9DKIvbQYunCfl and am trying to use convert_easylm_to_hf to put the model into a format I can use with other tools

I am running this on the command line:

tomj@Eddie ~/src/EasyLM (main)$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/Users/tomj/src/llama.cpp/models/koala/7B \
--load_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/koala_7b_diff_v2' \
--tokenizer_path=/Users/tomj/src/llama.cpp/models/tokenizer.model

And getting this error:

TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Screenshot:

I am running on an Intel macOS system on Ventura 13.3. Here's some env info:

transformers version: 4.28.0.dev0
Platform: macOS-10.16-x86_64-i386-64bit
Python version: 3.10.10
Huggingface_hub version: 0.13.3
Safetensors version: not installed
PyTorch version (GPU?): 2.1.0.dev20230402 (False)
Tensorflow version (GPU?): 2.12.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.6.8 (cpu)
Jax version: 0.4.8
JaxLib version: 0.4.7

Any help would be much appreciated.

[Doc error]: Outdated doc for LLAMA

I try to run LLAMA using EasyLM. I follow the README for llama. The first step is conver raw LLAMA parameters.

python -m EasyLM.models.llama.convert_torch_to_easylm.py \
    --checkpoint_dir='path/to/torch/llama/checkpoint' \
    --output_dir='path/to/output/easylm/checkpoint' \
    --streaming=True

The arg output_dir does not appear in convert_torch_to_easylm.py, which should be output_file now, as shown in code.

I wonder if the doc is outdated?

difference between 13b v1 and v2 weight diffs + GPU requirements

first of all I would like to sincerely thank you for providing the model weight diffs

this is just in time for a tool I'm literally planning to start building tomorrow which is intended to increase accessibility of technical material for folks with disabilities

would you mind sharing some information as to what the differences are between version 1 and version 2 of the model weight diffs?

also are you able to provide any details on memory + GPU requirements for running each model for inference? here is a spreadsheet (related thread) someone made for LLaMa if this helps?

Koala hyperparameters + Running on TPU pod?

Hi, Thanks for this great clean codebase!

I was a bit curious about 2 things:

What hyperparameters did you use to train the koala model? It would be useful to know what worked well for training in this framework.
Is there anything special you have to do to run on TPU pods, especially with regards to data loading? I'm not sure how the data partitioning works wrt multi-host processes and making sure each tpu device processes the correct data chunks.

For context, I've been trying to replicate the stanford alpaca model using this codebase and tpu pods, and so far have found trained model isn't as good (~36% on MMLU vs ~41% for 7B trained on formatted alpaca data). Any pointers or advice regarding potential gotchas would be super duper appreciated!

Conda install slow

Hi!

First, thank you for this great repository. I noticed that conda takes a long time to examine conflicts and solve the environment. I was wondering if it would be better to include a setup.py and use pip install -e . instead. This would be much faster and won't require modifying the system PYTHONPATH (export PYTHONPATH="${PWD}:$PYTHONPATH").

optimizer.accumulate_ gradient_ steps Will the related changes to this configuration increase the usage of graphics memory?

for llama 7b on tpu v3-8,
when --optimizer.accumulate_gradient_steps=1, it is normal,
but --optimizer.accumulate_gradient_steps=2, it occurs oom
optimizer.accumulate_ gradient_ steps Will the related changes to this configuration increase the usage of graphics memory?
Do you have any good solutions？

python3 -m EasyLM.models.llama.llama_train
 --mp_mesh_dim='4,1'
--optimizer.accumulate_gradient_steps=1
--fsdp=True 
 ...

jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 23.43G of 15.48G hbm. Exceeded hbm capacity by 7.95G.

Support for Cerebras-GPT models?

Is there any plans for these models to be supported?

https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/

For 30B LLama model, can server be supported by configuring mesh_dims on tpu v3-8 (128g)? I tried 8,1 and 4,1 but they don't seem to work.

Windows gpu_environment.yml

Appears there is issue running this in windows? Getting the following error

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

jaxlib==0.3.15[build=cuda]
pytorch-cpu=1.13.0

I will likely wrap this up in a docker container in the meantime.

FSDP vs Model Parallelism

@young-geng

Great project and documentation.

Can you further elucidate the difference between FSDP and Model Parallelism? Isn't FSDP already a form of model parallelism? Trying to understand the nuanced differences between 3-stage DeepSpeed ZeRO, FSDP, and "model parallelism".

Thanks!

is it hard to support BLOOM?

May I ask about the configs of pre-training? For example, did you use dropout?

err converting to HF

Command:

python -m EasyLM.models.llama.convert_easylm_to_hf \
    --load_checkpoint='params::/home/nap/Downloads/githubs/EasyLM/easylm_checkpoint/koala_13b.diff.weights' \
    --tokenizer_path='/home/nap/Documents/text-generation-webui/models/llama_original/13B/tokenizer.model' \
    --model_size='13b' \
    --output_dir='/home/nap/Downloads/githubs/EasyLM/easylm_checkpoint/koala-13B-HF'

Output:

Traceback (most recent call last):
  File "/home/nap/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/nap/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/nap/Downloads/githubs/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 32, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
ImportError: cannot import name 'LlamaConfig' from 'transformers' (/home/nap/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/__init__.py)

Do I not have the right version of Transformers? I used these commands from the docs to set up conda:

conda env create -f scripts/gpu_environment.yml
conda activate EasyLM

(EasyLM) nap@wintermute:~/Downloads/githubs/EasyLM$ pip freeze | grep transform
transformers==4.27.2
(Ubuntu 22)

Advice/expectations on throughput

Hello!

I'm looking into fine-tuning LLaMA-7b with EasyLM on a TPU v3-8. From my initial runs, I've found that I can get around 975 token/sec. I've tested all the flag combinations I can think of, but am unable to increase the batch size or gradient accumulation steps beyond 1 without OOMing.

I saw that you achieved a high throughput of 2,200 tokens/sec/TPU-v4 chip on OpenLLaMA-7b, and mesh-transformer-jax gets 5k/T/sec on a v3-8 for GPT-J, so I was curious if there was an issue in my config.

Here's how I'm running it:

# Removed "jax_enable_async_all_gather", as it causes a crash on a v3-8. Without these flags, the throughput is 590 tokens/sec.
export LIBTPU_INIT_ARGS='--xla_jf_spmd_threshold_for_windowed_einsum_mib=0 --xla_tpu_spmd_threshold_for_allgather_cse=10000 --xla_tpu_spmd_rewrite_einsum_with_reshape=true --jax_enable_async_collective_offload=true --xla_tpu_enable_latency_hiding_scheduler=true TPU_MEGACORE=MEGACORE_DENSE'

python -m EasyLM.models.llama.llama_train \
    --dtype='fp32' \ # bf16 causes errors - is it only intended for serving?
    --mesh_dim='1,-1,1' \
    --load_llama_config='7b' \
    --optimizer.type='adamw' \
    --train_dataset.json_dataset.seq_length=2048 \
    --train_dataset.json_dataset.batch_size=1 \
    # ... omitting other flags which shouldn't affect throughput

Do you have any tips? Or is higher throughput only expected on larger TPU pods?

Thanks!

Support for multi-host GPU training

Does this support multi-host GPU training? I see the README says it supports GPU/TPU on a single host and multi-host training for TPU, but does not mention multi-host GPU training.

where is vocab file?

How many vocab size used here?

HF tokenizer taking too long to load

I followed the steps to convert the model into HF format, but when I load the tokenizer it takes around 300 seconds to load the converted tokenizer using tokenizer = AutoTokenizer.from_pretrained(model_path). Any ideas why?

Get Error when Recovering the Koala Model Weights

Hi @young-geng
Thank you for release!

Get Error when Recovering the Koala Model Weights

Can the 'load_checkpoint' parameter support reading files from GCP?

When using fsdp=true, the error message is the same. Does this not have any effect?

When I use 13b, batch_size=1, an error occurs on v3-8 as follows:
When using fsdp=true, the error message is the same. Does this not have any effect?

jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 12.50MiB (13107200
B) on device ordinal 0: while running replica 0 and partition 0 of a replicated computation (other replicas
may have failed as well).

...
--fsdp=True
...

Code for fine-tuning?

Hey, awesome work! Thanks for sharing such amazing work with us all.

Can you provide/guide fine-tuning script as can use for our own data, in a multi-gpu setup, in possible, preferably with hf?

Is there a plan to support Falcon?

Is there a plan to support Falcon
Considering the better performance of Falcon on OpenLLM leaderboard, would you consider supporting Falcon?
Thank you~

RAM requirements

Very nice library and I can't wait to get everything up and running. Thank you for sharing!!!!

I have installed the conda env and run the initial conversion as outlined in Koala.md.

Once that was done, I wanted to recover the model using the diff and that is where I ran out of memory.

I watched the system monitor steadily climb until my virtual memory and physical memory were 95% full and that is when the process was killed by the system.

I have a Dell Precision workstation with 32GB of RAM and Quadro P6000 with 24GB VRAM.

Do you provide an API ?

hello, I just learned about your super AI Koala through the media and if I understood that it was possible to try Koala locally (with 128GB RAM, according to one of the discussion topics ! ), I was wondering if you propose an API like ChatGPT for example?

Indeed on Linux, there is an application called Bavarder, in flatpak format, which allows to consult some AI of this type, and I thought it would be really nice to be able to use a really open source and unrestricted AI.
Thanks a lot for your feedback.

Is it normal for the learning rate to reach the peak_value and not decrease, but instead slowly rise?

https://api.wandb.ai/links/matrix-zxw/qmcnboxa
Is it normal for the learning rate to reach the peak_value 1e-4 and not decrease, but instead slowly rise?

python -m EasyLM.models.llama.llama_train
--optimizer.type=adamw
--optimizer.adamw_optimizer.lr=1e-4
...

Strategies for handling large corpora through Google Cloud Storage?

Hi!
I have recently come across this repository and have been conducting tests using TPUv4 pods.

As part of my experimentation, I have explored several approaches for feeding datasets into the model,
including utilizing Hugging Face datasets or employing JSON files (with lines) either locally or through a GCS bucket.

During my analysis,
I have noticed that the JSON data loader appears to download the entire JSON file from the gs:// directory and subsequently tokenize and yield the line-by-line data.
This approach presents a challenge when dealing with corpus files exceeding 1TB in size, as it is not practical to store such extensive data in a single JSON file.

I am curious to learn how do you handle this issue,
and I would appreciate any insights!

Thanks for your great work 👍

Can't understand how to convert Koala deltas into HF format? Keep getting error `TypeError: can't convert np.ndarray of type bfloat16.`

Firstly, thanks very much for releasing Koala and the code. I'm really looking forward to trying it.

I download the 7B delta from https://drive.google.com/drive/folders/10f7wrlAFoPIy-TECHsx9DKIvbQYunCfl and would like to convert the delta to HF format that I can use with other tools

Here are the commands I have run:

First, I convert base Llama weights to EasyLM format

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_torch_to_easylm \
--checkpoint_dir=/Users/tomj/Downloads/Torrents/Done/LLaMA/7B \
--output_file=/Users/tomj/src/llama.cpp/models/koala/7B/llama-7b-LM \
--streaming=True

Then I run diff_checkpoint, comparing original Llama weights with the 7B delta from the google drive link

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.scripts.diff_checkpoint --recover_diff=True \
--load_base_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/llama-7b-LM' \
--load_target_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff_v2' \
--output_file=/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff.diff \
--streaming=True

Finally, I run Convert EasyLM to HF, trying to convert the delta data to HF

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/Users/tomj/src/llama.cpp/models/koala/7B/HF \
--load_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff.diff' \
--tokenizer_path=/Users/tomj/src/llama.cpp/models/tokenizer.model

But I always get this error:

TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Am I misunderstanding how these scripts are supposed to work? How can I get an HF version of the Koala deltas?

Or, how can I apply the Koala deltas to the original Llama 7B, and then convert that to HF?

Here's the full output from running the convert script:

Fetching the tokenizer from /Users/tomj/src/llama.cpp/models/tokenizer.model.
/Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:94: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1680419296502/work/torch/csrc/utils/tensor_numpy.cpp:212.)
  torch_params[key] = torch.from_numpy(tensor)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/runpy.py:196 in _run_module_as_main            │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/runpy.py:86 in _run_code                       │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:233 in <module>               │
│                                                                                                  │
│   230                                                                                            │
│   231                                                                                            │
│   232 if __name__ == "__main__":                                                                 │
│ ❱ 233 │   mlxu.run(main)                                                                         │
│   234                                                                                            │
│                                                                                                  │
│ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/absl/app.py:308 in run           │
│                                                                                                  │
│   305 │     callback = _init_callbacks.popleft()                                                 │
│   306 │     callback()                                                                           │
│   307 │   try:                                                                                   │
│ ❱ 308 │     _run_main(main, args)                                                                │
│   309 │   except UsageError as error:                                                            │
│   310 │     usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode)                 │
│   311 │   except:                                                                                │
│                                                                                                  │
│ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/absl/app.py:254 in _run_main     │
│                                                                                                  │
│   251 │   retval = profiler.runcall(main, argv)                                                  │
│   252 │   sys.exit(retval)                                                                       │
│   253   else:                                                                                    │
│ ❱ 254 │   sys.exit(main(argv))                                                                   │
│   255                                                                                            │
│   256                                                                                            │
│   257 def _call_exception_handlers(exception):                                                   │
│                                                                                                  │
│ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:226 in main                   │
│                                                                                                  │
│   223 │   │   input_tokenizer_path=FLAGS.tokenizer_path,                                         │
│   224 │   )                                                                                      │
│   225 │   write_model(                                                                           │
│ ❱ 226 │   │   load_and_convert_checkpoint(FLAGS.load_checkpoint),                                │
│   227 │   │   model_path=FLAGS.output_dir,                                                       │
│   228 │   │   model_size=FLAGS.model_size,                                                       │
│   229 │   )                                                                                      │
│                                                                                                  │
│ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:94 in                         │
│ load_and_convert_checkpoint                                                                      │
│                                                                                                  │
│    91 │   for key, tensor in flax_params.items():                                                │
│    92 │   │   if match_keywords(key, ["kernel"], ["norm", 'ln_f']):                              │
│    93 │   │   │   tensor = tensor.T                                                              │
│ ❱  94 │   │   torch_params[key] = torch.from_numpy(tensor)                                       │
│    95 │   return torch_params                                                                    │
│    96                                                                                            │
│    97                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

I am running on an Intel macOS system on Ventura 13.3. Here's some env info:

transformers version: 4.28.0.dev0
Platform: macOS-10.16-x86_64-i386-64bit
Python version: 3.10.10
Huggingface_hub version: 0.13.3
Safetensors version: not installed
PyTorch version (GPU?): 2.1.0.dev20230402 (False)
Tensorflow version (GPU?): 2.12.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.6.8 (cpu)
Jax version: 0.4.8
JaxLib version: 0.4.7

Any help would be much appreciated.

How you handle the attention mask for dataset chunk?

Hi,

I find that in the following code, easylm processes the dataset by taking chunks. From my understanding, it might make different documents in the same chunk. For example, the first document might take 512 tokens while the second documents take 128 tokens in a chunk of 640 tokens. In this case, I think the generation for the second document should not see the first document, so we might need to use attention mask to mask that. But I don't see any related code in easylm, so I am wondering how you handle the attention mask for this problem.

EasyLM/EasyLM/data.py

Lines 158 to 183 in 18375bd

 for index, example in enumerate(self._dataset): 

 tokens, loss_masks = self.text_processor(example) 

 token_buffer.extend(tokens) 

 loss_mask_buffer.extend(loss_masks) 

 while len(token_buffer) > chunk_size + 1: 

 total_tokens += chunk_size 

 metrics = { 

 'dataset_example_index': index, 

 'dataset_total_tokens': total_tokens, 

 } 

 batch = { 

 'input_tokens': np.array(token_buffer[:chunk_size], dtype=np.int32).reshape( 

 self.config.batch_size, -1 

 ), 

 'taret_tokens': np.array(token_buffer[1:chunk_size + 1], dtype=np.int32).reshape( 

 self.config.batch_size, -1 

 ), 

 'loss_masks': np.array(loss_mask_buffer[1:chunk_size + 1], dtype=np.float32).reshape( 

 self.config.batch_size, -1 

 ), 

 } 

 if self.config.always_start_with_bos: 

 batch['input_tokens'][:, 0] = self.tokenizer.bos_token_id 

 yield batch, metrics 

 token_buffer = token_buffer[chunk_size:] 

 loss_mask_buffer = loss_mask_buffer[chunk_size:]

Because during training, I saw that the code added a starting character like <s> in the first position. Should we also add this character during inference to maintain consistency with training?

Because during training, I saw that the code added a starting character like

<s>

in the first position. Should we also add this character during inference to maintain consistency with training?

Script to Merge Koala Weights

What is the format of and how are the model weights diff combined into the base weights?

Is there a script to merge them?

Checksum for recovered models?

Hello and thank you for setting up an excellent repo!
I was wondering if you can provide checksum (say md5sum) for models that was recovered from the original LLaMa weights and the diff file?
(I am especially interested in Koala)
This way, people can be confident that they have managed to recover a sane model.

Thanks!

LLaMA Training

Hi!

First of all: thanks for your amazing repo. It's pretty easy to use, even for someone that doesn't have an ML/Python background like me.

I am, however, struggling to use it in training LLaMA on a TPU.
Can you maybe give a short step by step guide?
Like for example: Which weights should I load as checkpoint? The original one from Meta, the Huggingface or the JAX weights, and if so, how do I convert the original from Meta to JAX?

Recommended setup given a v4-512?

Hey,
If I understand correctly, you trained using a v4-512.
Can you share how you configured your node and what sizes?
Thanks!
Ohad

Supporting Falcon-40b-instruct

Hi,

What would it take to support Falcon-40b-Instruct for fine-tuning?

https://huggingface.co/tiiuae/falcon-40b-instruct

Add LoRA support?

LoRA fine-tuning is much faster and use less memory than normal fine-tuning.

Script for SFT?

First of all, thanks for the great work!
could you point me to the SFT script? It might sound silly, But I fail to find it anywhere.

When I increase the accumulate_gradient_steps, can the batch_size also be increased accordingly?

if it is normal
...
--train_dataset.json_dataset.batch_size=4
--optimizer.bf16_accumulate_gradient=True
--optimizer.accumulate_gradient_steps=1
...

so is it right?
...
--train_dataset.json_dataset.batch_size=8
--optimizer.bf16_accumulate_gradient=True
--optimizer.accumulate_gradient_steps=2
...

transformers version doesn't support Llama conversion to huggingface format

the transformers version transformers==4.27.2 in the scripts/gpu_environment.yml file leads to an import issue when I ran EasyLM.models.llama.convert_easylm_to_hf

File "/scratch/users/ruiqi-zhong/conda/envs/EasyLM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/scratch/users/ruiqi-zhong/conda/envs/EasyLM/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scratch/users/ruiqi-zhong/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 33, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
ImportError: cannot import name 'LlamaConfig' from 'transformers'

It can be easily fixed after pip install-ing the latest transformers library, though.

truncated text?

I hope it helps to debug it

A detailed question on LLaMA training script.

Thanks much for implementing the Jax/Flax version of these foundation language models! This is really helpful for TPU-backended researchers.

I am still a beginner of Jax/Flax, and I have a detailed question on the LLaMA training script. When the train_state is created at:

EasyLM/EasyLM/models/llama/llama_train.py

Line 232 in e3e2657

train_state = sharded_create_trainstate_from_params(restored_params)

I wonder why you are using the sharded version of create_trainstate_from_params? It seems that in:

EasyLM/EasyLM/models/llama/llama_train.py

Line 224 in e3e2657

FLAGS.load_checkpoint, train_state_shapes, shard_fns

the sharded_fn is already passed into the checkpointer and the output restored_params is already sharded across all TPU devices. Will there be any problems if I use create_trainstate_from_params instead of shareded_create_trainstate_from_params in Line 232 (assuming that I am not using distributed training)?

Thanks!

Model serving example

Hi,

Thanks for the amazing repo. I am trying to serve local models then do the evaluation. But I can only find related classes implemented in serving.py. Can you give some examples about how to initialize the classes, call the functions and serve the model?

Thanks a lot!

If it is pre-training, can we just omit the [] directly?

If it is pre-training, can we just omit the [] directly?
for example

...
--train_dataset.text_processor.fields='input,output' \
...

When using 13b, with the following configuration, a memory error occurs on v3-8. May I ask what is the reason?

2023-04-06 07:42:51.278258: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 7
2023-04-06 07:42:51.278359: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 1
2023-04-06 07:42:51.278403: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 3
2023-04-06 07:42:51.278439: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 2
2023-04-06 07:42:51.278486: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 0
2023-04-06 07:42:51.278526: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 4
2023-04-06 07:42:51.278553: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 5
2023-04-06 07:42:51.278591: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 6

nohup python -m EasyLM.models.llama.llama_train
--mp_mesh_dim='-1,1'
--load_llama_config='13b'
--load_checkpoint="params::${EASYLM_CHECKPOINT_DIR}/checkpoint"
--tokenizer.vocab_file=${TOKENIZER_FILE}
--seed=42
--initialize_jax_distributed=False
--total_steps=1000
--log_freq=10
--save_model_freq=100
--save_milestone_freq=500
--eval_steps=100
--train_dataset.text_processor.fields='[input],output'
--train_dataset.text_processor.add_eos_token=True
--train_dataset.type='json'
--train_dataset.json_dataset.path=${TRAIN_DATA_FILE}
--train_dataset.json_dataset.seq_length=1024
--train_dataset.json_dataset.batch_size=2
--eval_dataset.text_processor.fields='[input],output'
--eval_dataset.text_processor.add_eos_token=True
--eval_dataset.type='json'
--eval_dataset.json_dataset.path=${EVAL_DATA_FILE}
--eval_dataset.json_dataset.seq_length=1024
--eval_dataset.json_dataset.batch_size=2

install so slow and any discord channel or group for communicate ?

any discord channel or group?

 conda env create -f scripts/gpu_environment.yml

install process is so slow ~

Executing transaction: / By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html                                                                                                                                            
                                                                                                                                                                               
/ By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -                                                            
  https://docs.nvidia.com/deeplearning/cudnn/sla/index.html                                                                                                                    
|                                                                                                                                                                              
|

	for index, example in enumerate(self._dataset):
	tokens, loss_masks = self.text_processor(example)
	token_buffer.extend(tokens)
	loss_mask_buffer.extend(loss_masks)
	while len(token_buffer) > chunk_size + 1:
	total_tokens += chunk_size
	metrics = {
	'dataset_example_index': index,
	'dataset_total_tokens': total_tokens,
	}
	batch = {
	'input_tokens': np.array(token_buffer[:chunk_size], dtype=np.int32).reshape(
	self.config.batch_size, -1
	),
	'taret_tokens': np.array(token_buffer[1:chunk_size + 1], dtype=np.int32).reshape(
	self.config.batch_size, -1
	),
	'loss_masks': np.array(loss_mask_buffer[1:chunk_size + 1], dtype=np.float32).reshape(
	self.config.batch_size, -1
	),
	}
	if self.config.always_start_with_bos:
	batch['input_tokens'][:, 0] = self.tokenizer.bos_token_id
	yield batch, metrics
	token_buffer = token_buffer[chunk_size:]
	loss_mask_buffer = loss_mask_buffer[chunk_size:]

young-geng / easylm Goto Github PK

easylm's People

Contributors

Stargazers

Watchers

Forkers

easylm's Issues

First, I convert base Llama weights to EasyLM format

Then I run diff_checkpoint, comparing original Llama weights with the 7B delta from the google drive link

Finally, I run Convert EasyLM to HF, trying to convert the delta data to HF

But I always get this error:

Recommend Projects

Recommend Topics

Recommend Org