Comments (4)
Hello, yes it is possible to run it on a single GPU. You need to edit train.sh
to run a command similar to the following:
nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \
python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \
./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz
note the changes in CUDA_VISIBLE_DEVICES
and --nproc_per_node
when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited the network_trainer
entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.
Also there might be a typo in train.py
, you might need to change the --local-rank
argument to --local_rank
, at least that was one of the issues in my case.
Hope this helps.
from 3d-transunet.
Hello, yes it is possible to run it on a single GPU. You need to edit
train.sh
to run a command similar to the following:nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \ python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \ ./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz
note the changes in
CUDA_VISIBLE_DEVICES
and--nproc_per_node
when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited thenetwork_trainer
entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.Also there might be a typo in
train.py
, you might need to change the--local-rank
argument to--local_rank
, at least that was one of the issues in my case.Hope this helps.
Hello, I change the --local-rank
argument to --local_rank
, but it still report error:
usage: train.py [-h] [--network NETWORK] [--network_trainer NETWORK_TRAINER] [--task TASK] [--task_pretrained TASK_PRETRAINED] [--fold FOLD]
[--model MODEL] [--disable_ds DISABLE_DS] [--resume RESUME] [-val] [-c] [-p P] [--use_compressed_data] [--deterministic]
[--fp32] [--dbs] [--npz] [--valbest] [--vallatest] [--find_lr] [--val_folder VAL_FOLDER] [--disable_saving]
[--disable_postprocessing_on_folds] [-pretrained_weights PRETRAINED_WEIGHTS] [--config FILE] [--batch_size BATCH_SIZE]
[--max_num_epochs MAX_NUM_EPOCHS] [--initial_lr INITIAL_LR] [--min_lr MIN_LR] [--opt_eps EPSILON] [--opt_betas BETA [BETA ...]]
[--weight_decay WEIGHT_DECAY] [--local_rank LOCAL_RANK] [--world-size WORLD_SIZE] [--rank RANK]
[--total_batch_size TOTAL_BATCH_SIZE] [--hdfs_base HDFS_BASE] [--optim_name OPTIM_NAME] [--lrschedule LRSCHEDULE]
[--warmup_epochs WARMUP_EPOCHS] [--val_final] [--is_ssl] [--is_spatial_aug_only] [--mask_ratio MASK_RATIO]
[--loss_name LOSS_NAME] [--plan_update PLAN_UPDATE] [--crop_size CROP_SIZE [CROP_SIZE ...]] [--reclip RECLIP [RECLIP ...]]
[--pretrained] [--disable_decoder] [--model_params MODEL_PARAMS] [--layer_decay LAYER_DECAY] [--drop_path PCT]
[--find_zero_weight_decay] [--n_class N_CLASS]
[--deep_supervision_scales DEEP_SUPERVISION_SCALES [DEEP_SUPERVISION_SCALES ...]] [--fix_ds_net_numpool] [--skip_grad_nan]
[--merge_femur] [--is_sigmoid] [--max_loss_cal MAX_LOSS_CAL]
train.py: error: unrecognized arguments: --local-rank=0
Could you help me?
Thank you.
from 3d-transunet.
Hello, yes it is possible to run it on a single GPU. You need to edit
train.sh
to run a command similar to the following:nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \ python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \ ./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz
note the changes in
CUDA_VISIBLE_DEVICES
and--nproc_per_node
when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited thenetwork_trainer
entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.
Also there might be a typo intrain.py
, you might need to change the--local-rank
argument to--local_rank
, at least that was one of the issues in my case.
Hope this helps.Hello, I change the
--local-rank
argument to--local_rank
, but it still report error: usage: train.py [-h] [--network NETWORK] [--network_trainer NETWORK_TRAINER] [--task TASK] [--task_pretrained TASK_PRETRAINED] [--fold FOLD] [--model MODEL] [--disable_ds DISABLE_DS] [--resume RESUME] [-val] [-c] [-p P] [--use_compressed_data] [--deterministic] [--fp32] [--dbs] [--npz] [--valbest] [--vallatest] [--find_lr] [--val_folder VAL_FOLDER] [--disable_saving] [--disable_postprocessing_on_folds] [-pretrained_weights PRETRAINED_WEIGHTS] [--config FILE] [--batch_size BATCH_SIZE] [--max_num_epochs MAX_NUM_EPOCHS] [--initial_lr INITIAL_LR] [--min_lr MIN_LR] [--opt_eps EPSILON] [--opt_betas BETA [BETA ...]] [--weight_decay WEIGHT_DECAY] [--local_rank LOCAL_RANK] [--world-size WORLD_SIZE] [--rank RANK] [--total_batch_size TOTAL_BATCH_SIZE] [--hdfs_base HDFS_BASE] [--optim_name OPTIM_NAME] [--lrschedule LRSCHEDULE] [--warmup_epochs WARMUP_EPOCHS] [--val_final] [--is_ssl] [--is_spatial_aug_only] [--mask_ratio MASK_RATIO] [--loss_name LOSS_NAME] [--plan_update PLAN_UPDATE] [--crop_size CROP_SIZE [CROP_SIZE ...]] [--reclip RECLIP [RECLIP ...]] [--pretrained] [--disable_decoder] [--model_params MODEL_PARAMS] [--layer_decay LAYER_DECAY] [--drop_path PCT] [--find_zero_weight_decay] [--n_class N_CLASS] [--deep_supervision_scales DEEP_SUPERVISION_SCALES [DEEP_SUPERVISION_SCALES ...]] [--fix_ds_net_numpool] [--skip_grad_nan] [--merge_femur] [--is_sigmoid] [--max_loss_cal MAX_LOSS_CAL] train.py: error: unrecognized arguments: --local-rank=0 Could you help me? Thank you.
Hello, im facing the same problem. Have u solved it?
from 3d-transunet.
The arg change to --local_rank
in train.py
Line 109 in 190fe40
__pycache__
dirs you might have, sometimes these can cause issues. If you are running the train.sh
script, it should work.from 3d-transunet.
Related Issues (20)
- from torch.optim.lr_scheduler import LambdaLR, _LRScheduler
- Inquiry Regarding Reproduction of 3D-TransUNet Results on Brats-MET Dataset HOT 4
- pre-trained model
- Typo in inference.py
- How to creat nnUNetPlansv2.1_plans_3D.pkl? HOT 2
- Synapse dataset training issues HOT 2
- how to setup config yaml HOT 1
- 我在运行train.py时出错
- Can you share preprocessed Datasets?
- How to generate default_plans_identifier_plans_2D.pkl? HOT 2
- Problem with validation on folds and inference
- Unresolvable import
- An error:RuntimeError: Function 'SigmoidBackward0' returned nan values in its 0th output. HOT 1
- hi,I want to know if I can run the code with nnUnet v1? HOT 3
- How to start training?
- Can this model be directly used to train MRI?
- Incompatible with NNUnetv2
- Could find module 'ml_collections' HOT 1
- How to train the three configurations - encoder only, decoder only, encoder plus decoder?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3d-transunet.