BERT distillation and quantization in a distributed setting using Fairscale library. The repository contains different versions of the training code on the GLUE task using the mRPC dataset with different levels of parallelism using PyTorch and FairScale's constructs:
rkv0id / fair-distribert Goto Github PK
View Code? Open in Web Editor NEWBERT distillation and quantization in a distributed setting using Fairscale library.