Comments (5)
This is my command to start the training job:
estimator = PyTorch(
entry_point="train_deploy.py",
source_dir="code_chesterton",
role=role,
framework_version="1.5",
py_version="py3",
instance_count=2, # this script only support distributed training for GPU instances.
instance_type="ml.p3.8xlarge",
debugger_hook_config=False,
)
estimator.fit({"training": inputs_train, "validation": inputs_valid})
from sagemaker-debugger.
In the test script the following tokenizer function when invoked while mapping the dataset changes the datatype of 'os.environ' from 'os._Environ' to 'dict'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
This causes get() method in 'dict' class to fail as it does not support 'default' keyword argument.
from sagemaker-debugger.
IMO we should file an issue with transformers package.
from sagemaker-debugger.
We have filed an issue here: huggingface/datasets#2115
from sagemaker-debugger.
I have run into this issue recently. I use the HuggingFace container because I found it supported on SageMaker.
The command is (I referred this doc about versions of HuggingFace container)
estimator = HuggingFace(
entry_point='train.py',
role=role,
instance_type='ml.p3.2xlarge',
instance_count=1,
transformers_version='4.4.2',
pytorch_version='1.6.0',
py_version='py36'
)
Later I found this issue is solved in the newest version of container (thanks to the contributors)
After upgrading to sagemaker==2.62.0, we can use
estimator = HuggingFace(
entry_point='train.py',
role=role,
instance_type='ml.p3.2xlarge',
instance_count=1,
transformers_version='4.11.0',
pytorch_version='1.9.0',
py_version='py38'
)
from sagemaker-debugger.
Related Issues (20)
- FileNotFoundError when using SageMaker Debugger with PyTorch Distributed Training on SageMaker HOT 2
- Extend Logs to report time and memory usage
- smdebug causes an OperatorNotAllowedInGraphError inside a function decorated with tf.function HOT 1
- Turn off debugger hooks in PyTorch? HOT 4
- Sagemaker debugger hooks for keras unet
- TypeError: os.environ.get() takes no keyword argument (breaking all PyTorch training jobs)
- Compatibility with gradient accumulation HOT 1
- Can we save tensors that match a regex pattern only for a particular collection
- TF keras.py _wrap_tape_gradient breaks for arrays
- Understanding of how sagemaker-debugger works HOT 1
- Error while running sagemaker-debugger with custom pytorch container and custom model
- Is there any way to disable smdebug to start training the model? HOT 1
- [Feature Request] TensorBoardOutputConfig local output path
- test_pytorch_integration.py::test_pytorch[False-False] is incompatible with PyTorch >=1.7
- Cannot run a custom container using smdistributed/dataparallel unless USE_SMDEBUG is turned off
- Update the Hook callback to be compatible with xgboost>1.3.0 callback style HOT 2
- smdebug crashes with newer numpy versions HOT 1
- The latest smdebug in pypi is 1.0.12 and Released: Aug 26, 2021 HOT 1
- pre-commit changes current master?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-debugger.