Comments (6)
replacing
model.resize_token_embeddings(len(tokenizer))
with
model_to_resize` = model.module if hasattr(model, 'module') else model
model_to_resize.resize_token_embeddings(len(tokenizer))
src : huggingface/transformers#7146
does not give the above error message , I would like to know if this is a right fix.
from ort.
Hi Bhavya,
In general ORTModule does not forward the attributes of the underlying model. For now, yes, this is the correct fix. However, this API is subject to change as exposing the attribute .module to get the underlying model has led to issues elsewhere. Likely the name will change to something bit less friendly, e.g. ._original_module.
For HF-GPT2, you should be able to use the following repository as-is:
https://github.com/microsoft/huggingface-transformers
In the above, the ORTModule is inserted in the huggingface trainer.py script itself:
https://github.com/microsoft/huggingface-transformers/blob/c1b959563ebb677f744382ea95ca891295092187/src/transformers/trainer.py#L1109
And there's another tweak here to ensure the DDP wrapping occurs correctly for >1 gpu:
https://github.com/microsoft/huggingface-transformers/blob/c1b959563ebb677f744382ea95ca891295092187/src/transformers/trainer.py#L926
I run the model using the following launch command:
python -m torch.distributed.launch --nproc_per_node 8 huggingface-transformers/examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train --label_smoothing 0.1 --max_steps 260 --logging_steps 1 --overwrite_output_dir --output_dir gpt2-results --logging_dir gpt2-tensorboard --per_device_train_batch_size 8 --fp16 --dataloader_num_workers 1 --ort --skip_memory_metrics
Let me know if you have other issues.
-- Suffian
from ort.
Thank you Suffian. I will try from https://github.com/microsoft/huggingface-transformers.
Bhavya
from ort.
Hi @bmedishe, is your issue resolved now?
from ort.
Yes @natke Thank you
from ort.
Great, thanks. I will close this issue. Please reach out again if you need to.
from ort.
Related Issues (20)
- ONNXRuntimeError after enabled fp16 mixed precision training HOT 8
- MaxPool op resolved as Aten OP HOT 6
- Seg fault while training model with maxpool op
- Compatibility between ORTModule and DeepSpeed HOT 6
- [Question] PyTorch 1.11 HOT 2
- Turn off fallback to torch by default HOT 3
- `python -m torch_ort.configure` fails with protobuf errors HOT 1
- CUDA error cudaErrorInvalidConfiguration:invalid configuration argument HOT 3
- Where operator export error when performing fp16 quantization
- torch-ort cannot be installed on windows: onnxruntime-training not found HOT 5
- What does ORT stands for? HOT 1
- Will there be new nightly builds with version 1.13.0.dev? HOT 2
- [torch-ort-infer] Aten fallback doesn't work HOT 6
- RuntimeError: Error in execution: At least one output should be requested.
- Warning: Checker does not support models with experimental ops: ATen HOT 2
- Clarify installation requirements for CUDA vs ROCm HOT 1
- Why should I be forced to have a CUDA or ROCm machine when wanting to run OpenVino on Intel? HOT 2
- python -m torch_ort.configure fail HOT 2
- topKgate loss issues
- Does it support TensorRT backend?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ort.