Giter Club home page Giter Club logo

mpttune's Issues

Add license

Please add a license to this repository. Consider using an open-source license like the Apache 2.0 license.

output, instruction, input have been ignored

Im training the MPT-7B-STORYWRITER-4BIT-128G with the alpaca_data_cleaned.json and the example setup. The training runs smoothly, but I get this warning:

The following columns in the training set don't have a corresponding argument in PeftModelForCausalLM.forward and have been ignored: output, instruction, input. If output, instruction, input are not expected by PeftModelForCausalLM.forward, you can safely ignore this message.

Im not sure what effect it had on my results and I can't find the part of the code that produces this warning.

Where can I find this part of the code ?
Somebody else had this warning ?

Can this run on multi GPU setup?

Hi I am using an ec2 instance 'g5.12xlarge' with 4 A10g GPU (28x4 GB) gpus. I was able to succesfully fine tune and generate on a single GPU but for a multigpu machine. I get this error:

"Traceback (most recent call last):
File "/home/ec2-user/venv/bin/mpttune", line 33, in
sys.exit(load_entry_point('mpttune==0.1.0', 'console_scripts', 'mpttune')())
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/run.py", line 87, in main
args.func(args)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/generate.py", line 71, in generate
generated_ids = model.generate(
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/generate.py", line 27, in autocast_generate
return self.model.non_autocast_generate(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
return self.sample(
File "/home/ec2-user/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
outputs = self(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 864, in forward
outputs = self.transformer(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 772, in forward
layer_outputs = decoder_layer(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 443, in forward
(b, self_attn_weights, present_key_value) = self.attn(
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 373, in forward
qkv = self.Wqkv(hidden_states)
File "/home/ec2-user/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/quantlinear.py", line 17, in forward
out = self._forward_no_grad(x)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/quantlinear.py", line 26, in _forward_no_grad
return tu.triton_matmul(x, self.qweight, self.scales, self.qzeros, self.g_idx, self.bits, self.maxq)
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/triton_utils.py", line 246, in triton_matmul
matmul_248_kernel[grid](input, qweight, output,
File "/home/ec2-user/venv/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/backend/triton/custom_autotune.py", line 110, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "", line 23, in matmul_248_kernel
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered"

Few questions on fine tuning

  1. What is the max context window that a 7B model can take in? I am looking for a business problem with a min of 4K to max of 32K tokens as input.
  2. For above task will it better to fine tune your 4bit GPTQ quantized model or Base model from scratch?
  3. Is single A100 GPU will be enough for the above task?
  4. How long will it take for say 10K sample and 10 epoch?
  5. I want to do predict in a batch. I am seeing since evaluation is happening in batch. Can add or point to the code which can help me use fine tuned model to predict in for a batch say size 8 or 10.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.