Comments (4)
Based on that import, it appears that the minimum supported version of transformers is actually v4.35.0.
from yarn.
Yes, I appreciate your input. I did attempt to use Transformers v4.35.0, but encountered the following error:
/opt/conda/lib/python3.10/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='
default'.
table = cls._concat_blocks(blocks, axis=0)
Traceback (most recent call last):
File "/app/yarn_4/finetune.py", line 295, in <module>
main(args.parse_args())
File "/app/yarn_4/finetune.py", line 158, in main
# model.gradient_checkpointing_enable()
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointin
g_enable
self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
TypeError: LlamaPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'
Further investigation led me to related GitHub issues:
According to these discussions, it's suggested that using Transformers v4.34.0 resolves the issue. I wanted to share this additional information for context.
Besides, I can change is_flash_attn_2_available
to another implemtation to fix problem:
at here
change to
from transformers.utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
# is_flash_attn_2_available,
logging,
replace_return_docstrings,
)
from .configuration_llama import LlamaConfig
# if is_flash_attn_2_available():
try:
from flash_attn import flash_attn_func, flash_attn_varlen_func
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
except:
print("Error is_flash_attn_2_available")
from yarn.
But at the same time, if I want to use --deepspeed
, something went wrong:
Traceback (most recent call last):
File "/app/yarn_4/finetune.py", line 295, in <module>
main(args.parse_args())
File "/app/yarn_4/finetune.py", line 160, in main
accelerator.register_for_checkpointing(scheduler)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 3139, in register_for_checkpointing
raise ValueError(err)
ValueError: All `objects` must include a `state_dict` and `load_state_dict` function to be stored. The following inputs are invalid:
- Item at index 0, `DummyScheduler`
-
Further investigation led me to related GitHub issues:
How can I fix this problem?
(this time Transformers is still 4.34.0)
from yarn.
Yes, I appreciate your input. I did attempt to use Transformers v4.35.0, but encountered the following error:
/opt/conda/lib/python3.10/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode=' default'. table = cls._concat_blocks(blocks, axis=0) Traceback (most recent call last): File "/app/yarn_4/finetune.py", line 295, in <module> main(args.parse_args()) File "/app/yarn_4/finetune.py", line 158, in main # model.gradient_checkpointing_enable() File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointin g_enable self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func) TypeError: LlamaPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'
This was caused by a recent transformers
change -- I've pushed cdb6459 to fix this!
from yarn.
Related Issues (20)
- context length and dataset size
- Inquiry Regarding Evaluation Metrics in Your Paper HOT 2
- RoPE scaling config confusing
- Running Error HOT 2
- deepspeed config crashed for `auto` and OOM HOT 3
- cannot load safetensor: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096])) HOT 4
- Unexpected larger perplexity on PG19 HOT 1
- OOM on two 80GB GPUs HOT 6
- Could this repository be used for sft based on YaRN?
- Phi 2
- An OOM error occurred while computing the perplexity of 128k Proofpoint documents with a maximum token count set to 128k.
- Questions about DynamicNTK
- How should I proceed with conducting an evaluation for lm-evaluation-harness?
- Can we run the replication of the results,8 * 80 A100 HOT 1
- Trying to set a tensor of shape torch.Size([257, 1024]) in "weight" (which has shape torch.Size([1226, 1024])), this look incorrect
- Why the updated cache is initialized with seqlen=256?
- cannot connect to hugging face
- OOM error of distributed training on 80GB GPUs with Mistral-7b HOT 2
- Question related to _yarn_linear_ramp_mask HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yarn.