Giter Club home page Giter Club logo

Comments (6)

GuoxiaWang avatar GuoxiaWang commented on September 28, 2024

这个不是错误。这应该是在等待10.10.11.51响应。

你在两台机器上启动的命令是什么?能贴一下吗?

from plsc.

alexiycv avatar alexiycv commented on September 28, 2024

TRAINER_IP_LIST=10.10.11.50,10.10.11.51
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --ips=$TRAINER_IP_LIST --gpus=$CUDA_VISIBLE_DEVICES tools/train.py
--config_file configs/ms1mv3_r50.py
--is_static False
--backbone FresResNet50
--classifier LargeScaleClassifier
--embedding_size 512
--model_parallel True
--dropout 0.0
--sample_ratio 0.1
--loss ArcFace
--batch_size 128
--dataset MS1M_v3
--num_classes 93431
--data_dir MS1M_v3/
--label_file MS1M_v3/label.txt
--is_bin False
--log_interval_step 100
--validation_interval_step 2000
--fp16 True
--use_dynamic_loss_scaling True
--init_loss_scaling 27648.0
--num_workers 8
--train_unit 'epoch'
--warmup_num 0
--train_num 25
--decay_boundaries "10,16,22"
--output MS1M_v3_arcface_dynamic_0.1_NHWC_FP16

from plsc.

GuoxiaWang avatar GuoxiaWang commented on September 28, 2024

你这两个机器是在一个集群环境中吗?平常有训练过多机任务么?看着是没问题的。可能是网络不通的问题?IP 地址是否是你的环境中的地址?

from plsc.

alexiycv avatar alexiycv commented on September 28, 2024

你这两个机器是在一个集群环境中吗?平常有训练过多机任务么?看着是没问题的。可能是网络不通的问题?IP 地址是否是你的环境中的地址?

网络是通的,平时没训练过多机任务

from plsc.

GuoxiaWang avatar GuoxiaWang commented on September 28, 2024

你确定是两台机器上分别执行了上面的启动命令吗?

多机的话,需要在每个机器上都执行启动命令

from plsc.

alexiycv avatar alexiycv commented on September 28, 2024

你确定是两台机器上分别执行了上面的启动命令吗?

多机的话,需要在每个机器上都执行启动命令

哦这样子啊,我试一下

from plsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.