Comments (6)
E:\projects\PaddleClas-master>python -m paddle.distributed.launch --selected_gpus='0' tools/train.py -c configs/quick_start/ResNet50_vd_finetune_my.yaml
----------- Configuration Arguments -----------
cluster_node_ips: 127.0.0.1
log_dir: None
node_ip: 127.0.0.1
print_config: True
selected_gpus: '0'
started_port: 6170
training_script: tools/train.py
training_script_args: ['-c', 'configs/quick_start/ResNet50_vd_finetune_my.yaml']
use_paddlecloud: Falsetrainers_endpoints: 127.0.0.1:6170 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 1
2020-05-13 23:57:14 INFO:
== PaddleClas is powered by PaddlePaddle ! ==
== ==
== For more info please go to the following website. ==
== ==
== https://github.com/PaddlePaddle/PaddleClas ==
2020-05-13 23:57:14 INFO: ARCHITECTURE :
2020-05-13 23:57:14 INFO: name : ResNet50_vd
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: LEARNING_RATE :
2020-05-13 23:57:14 INFO: function : Cosine
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: lr : 0.00375
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: OPTIMIZER :
2020-05-13 23:57:14 INFO: function : Momentum
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: momentum : 0.9
2020-05-13 23:57:14 INFO: regularizer :
2020-05-13 23:57:14 INFO: factor : 1e-06
2020-05-13 23:57:14 INFO: function : L2
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: TRAIN :
2020-05-13 23:57:14 INFO: batch_size : 32
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513train.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: RandCropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: RandFlipImage :
2020-05-13 23:57:14 INFO: flip_code : 1
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1./255.
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: VALID :
2020-05-13 23:57:14 INFO: batch_size : 20
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513test.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: ResizeImage :
2020-05-13 23:57:14 INFO: resize_short : 256
2020-05-13 23:57:14 INFO: CropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1.0/255.0
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: classes_num : 3
2020-05-13 23:57:14 INFO: epochs : 20
2020-05-13 23:57:14 INFO: image_shape : [3, 224, 224]
2020-05-13 23:57:14 INFO: mode : train
2020-05-13 23:57:14 INFO: model_save_dir : E:/projects/PaddleClas-master/output/
2020-05-13 23:57:14 INFO: pretrained_model : E:/projects/PaddleClas-master/ResNet50_vd_pretrained
2020-05-13 23:57:14 INFO: save_interval : 1
2020-05-13 23:57:14 INFO: topk : 5
2020-05-13 23:57:14 INFO: total_images : 795
2020-05-13 23:57:14 INFO: valid_interval : 1
2020-05-13 23:57:14 INFO: validate : TrueAPI is deprecated since 2.0.0 Please use FleetAPI instead.
WIKI: https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpilerTraceback (most recent call last):
File "tools/train.py", line 124, in
main(args)
File "tools/train.py", line 69, in main
config, train_prog, startup_prog, is_train=True)
File "E:\projects\PaddleClas-master\tools\program.py", line 341, in build
optimizer.minimize(fetchs['loss'][0])
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective__init__.py", line 424, in minimize
fleet.main_program = self.try_to_compile(startup_program, main_program) File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective__init_.py", line 358, in try_to_compile
self.transpile(startup_program, main_program) File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective__init.py", line 285, in _transpile
current_endpoint=current_endpoint)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile
wait_port=self.config.wait_port)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2
self.config.hierarchical_allreduce_inter_nranks
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1797, in init
proto = OpProtoHolder.instance().get_op_proto(type)
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1679, in get_op_proto
raise ValueError("Operator "%s" has not been registered." % type)
ValueError: Operator "gen_nccl_id" has not been registered.
2020-05-13 15:57:16,981-ERROR: ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
ERROR 2020-05-13 15:57:16,981 launch.py:284] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.这是什么问题?
It seems that you are running on the Windows platform.
Are you running in the CPU environment?
from paddleclas.
I install paddle for gpu successfully:
python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://mirror.baidu.com/pypi/simple
C:\Users\Administrator>python
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
import paddle.fluid
paddle.fluid.install_check.run_check()
Running Verify Paddle Program ...
W0513 11:50:57.826320 13336 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0513 11:50:57.831334 13336 device_context.cc:245] device: 0, cuDNN Version: 7.6.
Your Paddle works well on SINGLE GPU or CPU.
I0513 11:51:01.541204 13336 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I0513 11:51:01.556252 13336 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0513 11:51:01.556252 13336 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0513 11:51:01.556252 13336 parallel_executor.cc:322] Cross op memory reuse strategy is enabled, when build_strategy.memory_optimize = True or garbage collection strategy is disabled, which is not recommended
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now
exit()
And the command:
python -m paddle.distributed.launch --selected_gpus='0'
mean use gpu to train.
from paddleclas.
python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://mirror.baidu.com/pypi/simple
For windows, distribute training is not enable yet because of the nvidia NCCL library.
We will support it later.
from paddleclas.
“distribute training is not enable”, so how to train on single PC?
from paddleclas.
@WuHaobo 这个什么问题? PaddleClas在Windows下不能用?
from paddleclas.
@WuHaobo 这个什么问题? PaddleClas在Windows下不能用?
As we replied before, Windows is not enable yet.
Also see the issue: #90
from paddleclas.
Related Issues (20)
- 缺少PULC_table_attribute.md文档 HOT 2
- 在进行训练时如果自己真实数据样本没有翻转的情况,数据增强RandFlipImage是不是可以不加 HOT 1
- 图片分类训练时报错 HOT 1
- 关于tripletangularmarginloss.py中的负样本类距离loss计算absolut_loss_an HOT 13
- 请问 paddleClas适合会计票据的分类吗 HOT 2
- 使用Python命令索引库如何更新 HOT 4
- list index out of range的原因是?
- 使用PPLCNetV2_base_ShiTu模型,在GPU上运行加速效果不明显 HOT 1
- PaddleClas,图像识别部署,根据2.5文档服务化部署预测过程中出现报错 HOT 4
- 一张图片中两行文字发生了折叠,如果有文字发生折叠的图片和文字未发生折叠的图片,用哪一个分类模型效果会好一些? HOT 3
- PaddleClas 如何实现模型在train 以及 infer 的时候使用不同分支的forword HOT 1
- KeyError: 'save_infer_model/scale_0.tmp_1.lod' HOT 1
- PPLCNetV2_base_ShiTu模型增加图片的分辨率跟输出的维数会增加检索精度吗? HOT 1
- Direct prediction API HOT 1
- 关于paddleClas和paddlepaddle版本的问题 HOT 5
- release/2.5 ModuleNotFoundError: No module named 'ppcls' HOT 3
- paddleclas.PaddleClas使用inference_model_dir指定模型PULC模型识别有误 HOT 1
- 想要做一个文物拍照识别的课题研究,PP-ShituV2是否适合? HOT 2
- pp-shituv2 使用paddleserving部署后启动服务运行python3.7 pipeline_http_client.py报/home/aistudio/PaddleClas/deploy/paddleserving/recognition {'err_no': 8, 'err_msg': "(data_id=0 log_id=0) [det|0] Failed to postprocess: 'scale_factor.lod'", 'key': [], 'value': [], 'tensors': []} HOT 2
- 训练时报错 OSError: (External) CUDA error(719), unspecified launch failure. HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddleclas.