A Distribution Programming Example Library of Popular Deep Learning Frameworks
python -m torch.distributed.launch --nproc_per_node=GPU_nums pytorch_based_distributed_mnist.py
python tensorflow_based_distributed_mnist.py
- Dataset: MNIST
- Period: 1 Epoch
- Backend: NCCL
- Batch size per GPU for training: 9000
- No validation
Frameworks | Single | Distributed |
---|---|---|
PyTorch | 5.50s | 2.72s |
Tensorflow | 0.65s | 0.27s |