Giter Club home page Giter Club logo

Comments (4)

dimitar10 avatar dimitar10 commented on June 24, 2024

Hello, yes it is possible to run it on a single GPU. You need to edit train.sh to run a command similar to the following:

nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \
        python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \
        ./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz

note the changes in CUDA_VISIBLE_DEVICES and --nproc_per_node when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited the network_trainer entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.

Also there might be a typo in train.py, you might need to change the --local-rank argument to --local_rank, at least that was one of the issues in my case.

Hope this helps.

from 3d-transunet.

Heanhu avatar Heanhu commented on June 24, 2024

Hello, yes it is possible to run it on a single GPU. You need to edit train.sh to run a command similar to the following:

nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \
        python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \
        ./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz

note the changes in CUDA_VISIBLE_DEVICES and --nproc_per_node when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited the network_trainer entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.

Also there might be a typo in train.py, you might need to change the --local-rank argument to --local_rank, at least that was one of the issues in my case.

Hope this helps.

Hello, I change the --local-rank argument to --local_rank, but it still report error:
usage: train.py [-h] [--network NETWORK] [--network_trainer NETWORK_TRAINER] [--task TASK] [--task_pretrained TASK_PRETRAINED] [--fold FOLD]
[--model MODEL] [--disable_ds DISABLE_DS] [--resume RESUME] [-val] [-c] [-p P] [--use_compressed_data] [--deterministic]
[--fp32] [--dbs] [--npz] [--valbest] [--vallatest] [--find_lr] [--val_folder VAL_FOLDER] [--disable_saving]
[--disable_postprocessing_on_folds] [-pretrained_weights PRETRAINED_WEIGHTS] [--config FILE] [--batch_size BATCH_SIZE]
[--max_num_epochs MAX_NUM_EPOCHS] [--initial_lr INITIAL_LR] [--min_lr MIN_LR] [--opt_eps EPSILON] [--opt_betas BETA [BETA ...]]
[--weight_decay WEIGHT_DECAY] [--local_rank LOCAL_RANK] [--world-size WORLD_SIZE] [--rank RANK]
[--total_batch_size TOTAL_BATCH_SIZE] [--hdfs_base HDFS_BASE] [--optim_name OPTIM_NAME] [--lrschedule LRSCHEDULE]
[--warmup_epochs WARMUP_EPOCHS] [--val_final] [--is_ssl] [--is_spatial_aug_only] [--mask_ratio MASK_RATIO]
[--loss_name LOSS_NAME] [--plan_update PLAN_UPDATE] [--crop_size CROP_SIZE [CROP_SIZE ...]] [--reclip RECLIP [RECLIP ...]]
[--pretrained] [--disable_decoder] [--model_params MODEL_PARAMS] [--layer_decay LAYER_DECAY] [--drop_path PCT]
[--find_zero_weight_decay] [--n_class N_CLASS]
[--deep_supervision_scales DEEP_SUPERVISION_SCALES [DEEP_SUPERVISION_SCALES ...]] [--fix_ds_net_numpool] [--skip_grad_nan]
[--merge_femur] [--is_sigmoid] [--max_loss_cal MAX_LOSS_CAL]
train.py: error: unrecognized arguments: --local-rank=0
Could you help me?
Thank you.

from 3d-transunet.

2DangFilthy avatar 2DangFilthy commented on June 24, 2024

Hello, yes it is possible to run it on a single GPU. You need to edit train.sh to run a command similar to the following:

nnunet_use_progress_bar=1 CUDA_VISIBLE_DEVICES=0 \
        python3 -m torch.distributed.launch --master_port=4322 --nproc_per_node=1 \
        ./train.py --fold=${fold} --config=$CONFIG --resume='local_latest' --npz

note the changes in CUDA_VISIBLE_DEVICES and --nproc_per_node when compared to the default values. Basically, this will use the default _DDP trainer if you haven't edited the network_trainer entry in your config file, but will still run on a single GPU. The proper way is to use a non-DDP trainer, but this will require some more modifications.
Also there might be a typo in train.py, you might need to change the --local-rank argument to --local_rank, at least that was one of the issues in my case.
Hope this helps.

Hello, I change the --local-rank argument to --local_rank, but it still report error: usage: train.py [-h] [--network NETWORK] [--network_trainer NETWORK_TRAINER] [--task TASK] [--task_pretrained TASK_PRETRAINED] [--fold FOLD] [--model MODEL] [--disable_ds DISABLE_DS] [--resume RESUME] [-val] [-c] [-p P] [--use_compressed_data] [--deterministic] [--fp32] [--dbs] [--npz] [--valbest] [--vallatest] [--find_lr] [--val_folder VAL_FOLDER] [--disable_saving] [--disable_postprocessing_on_folds] [-pretrained_weights PRETRAINED_WEIGHTS] [--config FILE] [--batch_size BATCH_SIZE] [--max_num_epochs MAX_NUM_EPOCHS] [--initial_lr INITIAL_LR] [--min_lr MIN_LR] [--opt_eps EPSILON] [--opt_betas BETA [BETA ...]] [--weight_decay WEIGHT_DECAY] [--local_rank LOCAL_RANK] [--world-size WORLD_SIZE] [--rank RANK] [--total_batch_size TOTAL_BATCH_SIZE] [--hdfs_base HDFS_BASE] [--optim_name OPTIM_NAME] [--lrschedule LRSCHEDULE] [--warmup_epochs WARMUP_EPOCHS] [--val_final] [--is_ssl] [--is_spatial_aug_only] [--mask_ratio MASK_RATIO] [--loss_name LOSS_NAME] [--plan_update PLAN_UPDATE] [--crop_size CROP_SIZE [CROP_SIZE ...]] [--reclip RECLIP [RECLIP ...]] [--pretrained] [--disable_decoder] [--model_params MODEL_PARAMS] [--layer_decay LAYER_DECAY] [--drop_path PCT] [--find_zero_weight_decay] [--n_class N_CLASS] [--deep_supervision_scales DEEP_SUPERVISION_SCALES [DEEP_SUPERVISION_SCALES ...]] [--fix_ds_net_numpool] [--skip_grad_nan] [--merge_femur] [--is_sigmoid] [--max_loss_cal MAX_LOSS_CAL] train.py: error: unrecognized arguments: --local-rank=0 Could you help me? Thank you.

Hello, im facing the same problem. Have u solved it?

from 3d-transunet.

dimitar10 avatar dimitar10 commented on June 24, 2024

@Heanhu @2DangFilthy

The arg change to --local_rank in train.py

parser.add_argument("--local-rank", type=int) # must pass
I suggested apparently is not necessary. According to argparse's docs, internal hyphens in args are automatically converted to underscores. Perhaps try deleting any __pycache__ dirs you might have, sometimes these can cause issues. If you are running the train.sh script, it should work.

from 3d-transunet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.