Giter Club home page Giter Club logo

easylm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

easylm's Issues

How you handle the attention mask for dataset chunk?

Hi,

I find that in the following code, easylm processes the dataset by taking chunks. From my understanding, it might make different documents in the same chunk. For example, the first document might take 512 tokens while the second documents take 128 tokens in a chunk of 640 tokens. In this case, I think the generation for the second document should not see the first document, so we might need to use attention mask to mask that. But I don't see any related code in easylm, so I am wondering how you handle the attention mask for this problem.

EasyLM/EasyLM/data.py

Lines 158 to 183 in 18375bd

for index, example in enumerate(self._dataset):
tokens, loss_masks = self.text_processor(example)
token_buffer.extend(tokens)
loss_mask_buffer.extend(loss_masks)
while len(token_buffer) > chunk_size + 1:
total_tokens += chunk_size
metrics = {
'dataset_example_index': index,
'dataset_total_tokens': total_tokens,
}
batch = {
'input_tokens': np.array(token_buffer[:chunk_size], dtype=np.int32).reshape(
self.config.batch_size, -1
),
'taret_tokens': np.array(token_buffer[1:chunk_size + 1], dtype=np.int32).reshape(
self.config.batch_size, -1
),
'loss_masks': np.array(loss_mask_buffer[1:chunk_size + 1], dtype=np.float32).reshape(
self.config.batch_size, -1
),
}
if self.config.always_start_with_bos:
batch['input_tokens'][:, 0] = self.tokenizer.bos_token_id
yield batch, metrics
token_buffer = token_buffer[chunk_size:]
loss_mask_buffer = loss_mask_buffer[chunk_size:]

When using 13b, with the following configuration, a memory error occurs on v3-8. May I ask what is the reason?

2023-04-06 07:42:51.278258: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 7
2023-04-06 07:42:51.278359: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 1
2023-04-06 07:42:51.278403: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 3
2023-04-06 07:42:51.278439: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 2
2023-04-06 07:42:51.278486: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 0
2023-04-06 07:42:51.278526: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 4
2023-04-06 07:42:51.278553: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RE
SOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 5
2023-04-06 07:42:51.278591: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Failed to allocate request for 625.00MiB (655360000B) on device ordinal 6

nohup python -m EasyLM.models.llama.llama_train
--mp_mesh_dim='-1,1'
--load_llama_config='13b'
--load_checkpoint="params::${EASYLM_CHECKPOINT_DIR}/checkpoint"
--tokenizer.vocab_file=${TOKENIZER_FILE}
--seed=42
--initialize_jax_distributed=False
--total_steps=1000
--log_freq=10
--save_model_freq=100
--save_milestone_freq=500
--eval_steps=100
--train_dataset.text_processor.fields='[input],output'
--train_dataset.text_processor.add_eos_token=True
--train_dataset.type='json'
--train_dataset.json_dataset.path=${TRAIN_DATA_FILE}
--train_dataset.json_dataset.seq_length=1024
--train_dataset.json_dataset.batch_size=2
--eval_dataset.text_processor.fields='[input],output'
--eval_dataset.text_processor.add_eos_token=True
--eval_dataset.type='json'
--eval_dataset.json_dataset.path=${EVAL_DATA_FILE}
--eval_dataset.json_dataset.seq_length=1024
--eval_dataset.json_dataset.batch_size=2

err converting to HF

Command:

python -m EasyLM.models.llama.convert_easylm_to_hf \
    --load_checkpoint='params::/home/nap/Downloads/githubs/EasyLM/easylm_checkpoint/koala_13b.diff.weights' \
    --tokenizer_path='/home/nap/Documents/text-generation-webui/models/llama_original/13B/tokenizer.model' \
    --model_size='13b' \
    --output_dir='/home/nap/Downloads/githubs/EasyLM/easylm_checkpoint/koala-13B-HF'

Output:

Traceback (most recent call last):
  File "/home/nap/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/nap/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/nap/Downloads/githubs/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 32, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
ImportError: cannot import name 'LlamaConfig' from 'transformers' (/home/nap/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/__init__.py)

Do I not have the right version of Transformers? I used these commands from the docs to set up conda:

conda env create -f scripts/gpu_environment.yml
conda activate EasyLM

(EasyLM) nap@wintermute:~/Downloads/githubs/EasyLM$ pip freeze | grep transform
transformers==4.27.2
(Ubuntu 22)

Strategies for handling large corpora through Google Cloud Storage?

Hi!
I have recently come across this repository and have been conducting tests using TPUv4 pods.

As part of my experimentation, I have explored several approaches for feeding datasets into the model,
including utilizing Hugging Face datasets or employing JSON files (with lines) either locally or through a GCS bucket.

During my analysis,
I have noticed that the JSON data loader appears to download the entire JSON file from the gs:// directory and subsequently tokenize and yield the line-by-line data.
This approach presents a challenge when dealing with corpus files exceeding 1TB in size, as it is not practical to store such extensive data in a single JSON file.

I am curious to learn how do you handle this issue,
and I would appreciate any insights!

Thanks for your great work ๐Ÿ‘

[Doc error]: Outdated doc for LLAMA

I try to run LLAMA using EasyLM. I follow the README for llama. The first step is conver raw LLAMA parameters.

python -m EasyLM.models.llama.convert_torch_to_easylm.py \
    --checkpoint_dir='path/to/torch/llama/checkpoint' \
    --output_dir='path/to/output/easylm/checkpoint' \
    --streaming=True

The arg output_dir does not appear in convert_torch_to_easylm.py, which should be output_file now, as shown in code.

I wonder if the doc is outdated?

Add LoRA support?

LoRA fine-tuning is much faster and use less memory than normal fine-tuning.

Can't understand how to convert Koala deltas into HF format? Keep getting error `TypeError: can't convert np.ndarray of type bfloat16.`

Firstly, thanks very much for releasing Koala and the code. I'm really looking forward to trying it.

I download the 7B delta from https://drive.google.com/drive/folders/10f7wrlAFoPIy-TECHsx9DKIvbQYunCfl and would like to convert the delta to HF format that I can use with other tools

Here are the commands I have run:

First, I convert base Llama weights to EasyLM format

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_torch_to_easylm \
--checkpoint_dir=/Users/tomj/Downloads/Torrents/Done/LLaMA/7B \
--output_file=/Users/tomj/src/llama.cpp/models/koala/7B/llama-7b-LM \
--streaming=True

Then I run diff_checkpoint, comparing original Llama weights with the 7B delta from the google drive link

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.scripts.diff_checkpoint --recover_diff=True \
--load_base_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/llama-7b-LM' \
--load_target_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff_v2' \
--output_file=/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff.diff \
--streaming=True

Finally, I run Convert EasyLM to HF, trying to convert the delta data to HF

$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/Users/tomj/src/llama.cpp/models/koala/7B/HF \
--load_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/7B/koala_7b_diff.diff' \
--tokenizer_path=/Users/tomj/src/llama.cpp/models/tokenizer.model

But I always get this error:

TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Am I misunderstanding how these scripts are supposed to work? How can I get an HF version of the Koala deltas?

Or, how can I apply the Koala deltas to the original Llama 7B, and then convert that to HF?

Here's the full output from running the convert script:

Fetching the tokenizer from /Users/tomj/src/llama.cpp/models/tokenizer.model.
/Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:94: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1680419296502/work/torch/csrc/utils/tensor_numpy.cpp:212.)
  torch_params[key] = torch.from_numpy(tensor)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/runpy.py:196 in _run_module_as_main            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   193 โ”‚   main_globals = sys.modules["__main__"].__dict__                                        โ”‚
โ”‚   194 โ”‚   if alter_argv:                                                                         โ”‚
โ”‚   195 โ”‚   โ”‚   sys.argv[0] = mod_spec.origin                                                      โ”‚
โ”‚ โฑ 196 โ”‚   return _run_code(code, main_globals, None,                                             โ”‚
โ”‚   197 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    "__main__", mod_spec)                                                 โ”‚
โ”‚   198                                                                                            โ”‚
โ”‚   199 def run_module(mod_name, init_globals=None,                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/runpy.py:86 in _run_code                       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    83 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __loader__ = loader,                                                โ”‚
โ”‚    84 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __package__ = pkg_name,                                             โ”‚
โ”‚    85 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __spec__ = mod_spec)                                                โ”‚
โ”‚ โฑ  86 โ”‚   exec(code, run_globals)                                                                โ”‚
โ”‚    87 โ”‚   return run_globals                                                                     โ”‚
โ”‚    88                                                                                            โ”‚
โ”‚    89 def _run_module_code(code, init_globals=None,                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:233 in <module>               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   230                                                                                            โ”‚
โ”‚   231                                                                                            โ”‚
โ”‚   232 if __name__ == "__main__":                                                                 โ”‚
โ”‚ โฑ 233 โ”‚   mlxu.run(main)                                                                         โ”‚
โ”‚   234                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/absl/app.py:308 in run           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   305 โ”‚     callback = _init_callbacks.popleft()                                                 โ”‚
โ”‚   306 โ”‚     callback()                                                                           โ”‚
โ”‚   307 โ”‚   try:                                                                                   โ”‚
โ”‚ โฑ 308 โ”‚     _run_main(main, args)                                                                โ”‚
โ”‚   309 โ”‚   except UsageError as error:                                                            โ”‚
โ”‚   310 โ”‚     usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode)                 โ”‚
โ”‚   311 โ”‚   except:                                                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/absl/app.py:254 in _run_main     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   251 โ”‚   retval = profiler.runcall(main, argv)                                                  โ”‚
โ”‚   252 โ”‚   sys.exit(retval)                                                                       โ”‚
โ”‚   253   else:                                                                                    โ”‚
โ”‚ โฑ 254 โ”‚   sys.exit(main(argv))                                                                   โ”‚
โ”‚   255                                                                                            โ”‚
โ”‚   256                                                                                            โ”‚
โ”‚   257 def _call_exception_handlers(exception):                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:226 in main                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   223 โ”‚   โ”‚   input_tokenizer_path=FLAGS.tokenizer_path,                                         โ”‚
โ”‚   224 โ”‚   )                                                                                      โ”‚
โ”‚   225 โ”‚   write_model(                                                                           โ”‚
โ”‚ โฑ 226 โ”‚   โ”‚   load_and_convert_checkpoint(FLAGS.load_checkpoint),                                โ”‚
โ”‚   227 โ”‚   โ”‚   model_path=FLAGS.output_dir,                                                       โ”‚
โ”‚   228 โ”‚   โ”‚   model_size=FLAGS.model_size,                                                       โ”‚
โ”‚   229 โ”‚   )                                                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/tomj/src/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py:94 in                         โ”‚
โ”‚ load_and_convert_checkpoint                                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    91 โ”‚   for key, tensor in flax_params.items():                                                โ”‚
โ”‚    92 โ”‚   โ”‚   if match_keywords(key, ["kernel"], ["norm", 'ln_f']):                              โ”‚
โ”‚    93 โ”‚   โ”‚   โ”‚   tensor = tensor.T                                                              โ”‚
โ”‚ โฑ  94 โ”‚   โ”‚   torch_params[key] = torch.from_numpy(tensor)                                       โ”‚
โ”‚    95 โ”‚   return torch_params                                                                    โ”‚
โ”‚    96                                                                                            โ”‚
โ”‚    97                                                                                            โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

I am running on an Intel macOS system on Ventura 13.3. Here's some env info:

  • transformers version: 4.28.0.dev0
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.10.10
  • Huggingface_hub version: 0.13.3
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.1.0.dev20230402 (False)
  • Tensorflow version (GPU?): 2.12.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.6.8 (cpu)
  • Jax version: 0.4.8
  • JaxLib version: 0.4.7

Any help would be much appreciated.

When using fsdp=true, the error message is the same. Does this not have any effect?

When I use 13b, batch_size=1, an error occurs on v3-8 as follows:
When using fsdp=true, the error message is the same. Does this not have any effect?

jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 12.50MiB (13107200
B) on device ordinal 0: while running replica 0 and partition 0 of a replicated computation (other replicas
may have failed as well).

...
--fsdp=True
...

Windows gpu_environment.yml

Appears there is issue running this in windows? Getting the following error

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • jaxlib==0.3.15[build=cuda]
  • pytorch-cpu=1.13.0

I will likely wrap this up in a docker container in the meantime.

transformers version doesn't support Llama conversion to huggingface format

the transformers version transformers==4.27.2 in the scripts/gpu_environment.yml file leads to an import issue when I ran EasyLM.models.llama.convert_easylm_to_hf

File "/scratch/users/ruiqi-zhong/conda/envs/EasyLM/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/scratch/users/ruiqi-zhong/conda/envs/EasyLM/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scratch/users/ruiqi-zhong/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 33, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
ImportError: cannot import name 'LlamaConfig' from 'transformers'

It can be easily fixed after pip install-ing the latest transformers library, though.

install so slow and any discord channel or group for communicate ?

any discord channel or group?

 conda env create -f scripts/gpu_environment.yml

install process is so slow ~

Executing transaction: / By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html                                                                                                                                            
                                                                                                                                                                               
/ By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -                                                            
  https://docs.nvidia.com/deeplearning/cudnn/sla/index.html                                                                                                                    
|                                                                                                                                                                              
|                                                   

LLaMA Training

Hi!

First of all: thanks for your amazing repo. It's pretty easy to use, even for someone that doesn't have an ML/Python background like me.

I am, however, struggling to use it in training LLaMA on a TPU.
Can you maybe give a short step by step guide?
Like for example: Which weights should I load as checkpoint? The original one from Meta, the Huggingface or the JAX weights, and if so, how do I convert the original from Meta to JAX?

Script to Merge Koala Weights

What is the format of and how are the model weights diff combined into the base weights?

Is there a script to merge them?

FSDP vs Model Parallelism

@young-geng

Great project and documentation.

Can you further elucidate the difference between FSDP and Model Parallelism? Isn't FSDP already a form of model parallelism? Trying to understand the nuanced differences between 3-stage DeepSpeed ZeRO, FSDP, and "model parallelism".

Thanks!

Checksum for recovered models?

Hello and thank you for setting up an excellent repo!
I was wondering if you can provide checksum (say md5sum) for models that was recovered from the original LLaMa weights and the diff file?
(I am especially interested in Koala)
This way, people can be confident that they have managed to recover a sane model.

Thanks!

A detailed question on LLaMA training script.

Thanks much for implementing the Jax/Flax version of these foundation language models! This is really helpful for TPU-backended researchers.

I am still a beginner of Jax/Flax, and I have a detailed question on the LLaMA training script. When the train_state is created at:

train_state = sharded_create_trainstate_from_params(restored_params)

I wonder why you are using the sharded version of create_trainstate_from_params? It seems that in:
FLAGS.load_checkpoint, train_state_shapes, shard_fns

the sharded_fn is already passed into the checkpointer and the output restored_params is already sharded across all TPU devices. Will there be any problems if I use create_trainstate_from_params instead of shareded_create_trainstate_from_params in Line 232 (assuming that I am not using distributed training)?

Thanks!

Training OPT with Koala dataset

Hi, thank you for opening such a nice work on public.

I have two issues I want to raise.

No.1, in the code for processing all the datasets, https://github.com/young-geng/koala_data_pipeline ,
I'm afraid there are some missing datasets.
For example, in the line 14 of process_chat_data.py,

input_file='/nfs/vault/data/language/chat_data_v3.json'

above file must exists in order to run the file without an error.
Where can I get all those input datasets that are listed in all the processing python files?

No.2, I've tried to look for the documentation on using the EasyLM library to fine-tune
the OPT model with the Koala dataset, but there was only the documentation for fine-tuning
the LLaMA model.
Can I get the any documentation on finetuning, for example, OPT-6.7B with the Koala dataset?

Again, thank you so much for an amazing work!

HF tokenizer taking too long to load

I followed the steps to convert the model into HF format, but when I load the tokenizer it takes around 300 seconds to load the converted tokenizer using tokenizer = AutoTokenizer.from_pretrained(model_path). Any ideas why?

Error when trying to convert the Koala deltas to Hf format "can't convert np.ndarray of type bfloat16"

Firstly, thanks very much for releasing Koala and the code. I'm really looking forward to trying it.

I download the 7B delta from https://drive.google.com/drive/folders/10f7wrlAFoPIy-TECHsx9DKIvbQYunCfl and am trying to use convert_easylm_to_hf to put the model into a format I can use with other tools

I am running this on the command line:

tomj@Eddie ~/src/EasyLM (main)$ PYTHON_PATH="${PWD}:$PYTHONPATH" ~/anaconda3/envs/torch21/bin/python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/Users/tomj/src/llama.cpp/models/koala/7B \
--load_checkpoint='params::/Users/tomj/src/llama.cpp/models/koala/koala_7b_diff_v2' \
--tokenizer_path=/Users/tomj/src/llama.cpp/models/tokenizer.model

And getting this error:

TypeError: can't convert np.ndarray of type bfloat16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Screenshot:
image

I am running on an Intel macOS system on Ventura 13.3. Here's some env info:

  • transformers version: 4.28.0.dev0
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.10.10
  • Huggingface_hub version: 0.13.3
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.1.0.dev20230402 (False)
  • Tensorflow version (GPU?): 2.12.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.6.8 (cpu)
  • Jax version: 0.4.8
  • JaxLib version: 0.4.7

Any help would be much appreciated.

Script for SFT?

First of all, thanks for the great work!
could you point me to the SFT script? It might sound silly, But I fail to find it anywhere.

Model serving example

Hi,

Thanks for the amazing repo. I am trying to serve local models then do the evaluation. But I can only find related classes implemented in serving.py. Can you give some examples about how to initialize the classes, call the functions and serve the model?

Thanks a lot!

Advice/expectations on throughput

Hello!

I'm looking into fine-tuning LLaMA-7b with EasyLM on a TPU v3-8. From my initial runs, I've found that I can get around 975 token/sec. I've tested all the flag combinations I can think of, but am unable to increase the batch size or gradient accumulation steps beyond 1 without OOMing.

I saw that you achieved a high throughput of 2,200 tokens/sec/TPU-v4 chip on OpenLLaMA-7b, and mesh-transformer-jax gets 5k/T/sec on a v3-8 for GPT-J, so I was curious if there was an issue in my config.

Here's how I'm running it:

# Removed "jax_enable_async_all_gather", as it causes a crash on a v3-8. Without these flags, the throughput is 590 tokens/sec.
export LIBTPU_INIT_ARGS='--xla_jf_spmd_threshold_for_windowed_einsum_mib=0 --xla_tpu_spmd_threshold_for_allgather_cse=10000 --xla_tpu_spmd_rewrite_einsum_with_reshape=true --jax_enable_async_collective_offload=true --xla_tpu_enable_latency_hiding_scheduler=true TPU_MEGACORE=MEGACORE_DENSE'

python -m EasyLM.models.llama.llama_train \
    --dtype='fp32' \ # bf16 causes errors - is it only intended for serving?
    --mesh_dim='1,-1,1' \
    --load_llama_config='7b' \
    --optimizer.type='adamw' \
    --train_dataset.json_dataset.seq_length=2048 \
    --train_dataset.json_dataset.batch_size=1 \
    # ... omitting other flags which shouldn't affect throughput

Do you have any tips? Or is higher throughput only expected on larger TPU pods?

Thanks!

Code for fine-tuning?

Hey, awesome work! Thanks for sharing such amazing work with us all.

Can you provide/guide fine-tuning script as can use for our own data, in a multi-gpu setup, in possible, preferably with hf?

Recommended setup given a v4-512?

Hey,
If I understand correctly, you trained using a v4-512.
Can you share how you configured your node and what sizes?
Thanks!
Ohad

Conda install slow

Hi!

First, thank you for this great repository. I noticed that conda takes a long time to examine conflicts and solve the environment. I was wondering if it would be better to include a setup.py and use pip install -e . instead. This would be much faster and won't require modifying the system PYTHONPATH (export PYTHONPATH="${PWD}:$PYTHONPATH").

Koala hyperparameters + Running on TPU pod?

Hi, Thanks for this great clean codebase!

I was a bit curious about 2 things:

  • What hyperparameters did you use to train the koala model? It would be useful to know what worked well for training in this framework.
  • Is there anything special you have to do to run on TPU pods, especially with regards to data loading? I'm not sure how the data partitioning works wrt multi-host processes and making sure each tpu device processes the correct data chunks.

For context, I've been trying to replicate the stanford alpaca model using this codebase and tpu pods, and so far have found trained model isn't as good (~36% on MMLU vs ~41% for 7B trained on formatted alpaca data). Any pointers or advice regarding potential gotchas would be super duper appreciated!

optimizer.accumulate_ gradient_ steps Will the related changes to this configuration increase the usage of graphics memory?

for llama 7b on tpu v3-8,
when --optimizer.accumulate_gradient_steps=1, it is normal,
but --optimizer.accumulate_gradient_steps=2, it occurs oom
optimizer.accumulate_ gradient_ steps Will the related changes to this configuration increase the usage of graphics memory?
Do you have any good solutions๏ผŸ

python3 -m EasyLM.models.llama.llama_train
 --mp_mesh_dim='4,1'
--optimizer.accumulate_gradient_steps=1
--fsdp=True 
 ...
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 23.43G of 15.48G hbm. Exceeded hbm capacity by 7.95G.

difference between 13b v1 and v2 weight diffs + GPU requirements

first of all I would like to sincerely thank you for providing the model weight diffs

this is just in time for a tool I'm literally planning to start building tomorrow which is intended to increase accessibility of technical material for folks with disabilities

would you mind sharing some information as to what the differences are between version 1 and version 2 of the model weight diffs?

also are you able to provide any details on memory + GPU requirements for running each model for inference? here is a spreadsheet (related thread) someone made for LLaMa if this helps?

Support for multi-host GPU training

Does this support multi-host GPU training? I see the README says it supports GPU/TPU on a single host and multi-host training for TPU, but does not mention multi-host GPU training.

RAM requirements

Very nice library and I can't wait to get everything up and running. Thank you for sharing!!!!

I have installed the conda env and run the initial conversion as outlined in Koala.md.

Once that was done, I wanted to recover the model using the diff and that is where I ran out of memory.

I watched the system monitor steadily climb until my virtual memory and physical memory were 95% full and that is when the process was killed by the system.

I have a Dell Precision workstation with 32GB of RAM and Quadro P6000 with 24GB VRAM.

Do you provide an API ?

hello, I just learned about your super AI Koala through the media and if I understood that it was possible to try Koala locally (with 128GB RAM, according to one of the discussion topics ! ), I was wondering if you propose an API like ChatGPT for example?

Indeed on Linux, there is an application called Bavarder, in flatpak format, which allows to consult some AI of this type, and I thought it would be really nice to be able to use a really open source and unrestricted AI.
Thanks a lot for your feedback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.