rmihaylov / mpttune Goto Github PK

Im training the MPT-7B-STORYWRITER-4BIT-128G with the alpaca_data_cleaned.json and the example setup. The training runs smoothly, but I get this warning:

The following columns in the training set don't have a corresponding argument in PeftModelForCausalLM.forward and have been ignored: output, instruction, input. If output, instruction, input are not expected by PeftModelForCausalLM.forward, you can safely ignore this message.

Im not sure what effect it had on my results and I can't find the part of the code that produces this warning.

Where can I find this part of the code ?
Somebody else had this warning ?

How to load model from check point and save?

After running training. If training stops due to some network error . How to resume from checkpoint and save model?

Add argument for "input" along with "instruction"

Can this run on multi GPU setup?

Hi I am using an ec2 instance 'g5.12xlarge' with 4 A10g GPU (28x4 GB) gpus. I was able to succesfully fine tune and generate on a single GPU but for a multigpu machine. I get this error:

"Traceback (most recent call last):
File "/home/ec2-user/venv/bin/mpttune", line 33, in
sys.exit(load_entry_point('mpttune==0.1.0', 'console_scripts', 'mpttune')())
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/run.py", line 87, in main
args.func(args)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/generate.py", line 71, in generate
generated_ids = model.generate(
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/generate.py", line 27, in autocast_generate
return self.model.non_autocast_generate(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
return self.sample(
File "/home/ec2-user/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
outputs = self(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 864, in forward
outputs = self.transformer(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 772, in forward
layer_outputs = decoder_layer(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 443, in forward
(b, self_attn_weights, present_key_value) = self.attn(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 373, in forward
qkv = self.Wqkv(hidden_states)
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/quantlinear.py", line 17, in forward
out = self._forward_no_grad(x)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/quantlinear.py", line 26, in _forward_no_grad
return tu.triton_matmul(x, self.qweight, self.scales, self.qzeros, self.g_idx, self.bits, self.maxq)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/triton_utils.py", line 246, in triton_matmul
matmul_248_kernel[grid](input, qweight, output,
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/custom_autotune.py", line 110, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "", line 23, in matmul_248_kernel
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered"

cannot import name '_get_submodules' from 'peft.utils

Not able to start. It seems something wrong with lora.py.

Few questions on fine tuning

What is the max context window that a 7B model can take in? I am looking for a business problem with a min of 4K to max of 32K tokens as input.
For above task will it better to fine tune your 4bit GPTQ quantized model or Base model from scratch?
Is single A100 GPU will be enough for the above task?
How long will it take for say 10K sample and 10 epoch?
I want to do predict in a batch. I am seeing since evaluation is happening in batch. Can add or point to the code which can help me use fine tuned model to predict in for a batch say size 8 or 10.

What the structure of the dataset needs to be ?

I want to finetune MPT-7B-Instruct but i don't know how ?
Is mpttune can help me with this ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.