Hi thank you very much for your work!
The technical report mentioned that you train T5-XXL-11B and OPT-13B on 2 TPU-v4.
I would like to know if the 2 TPU-v4 refer to two independent TPU-v4 VMs connected by VPC network, or a TPU-v4 Pod Slice(as shown below)?
I am very curious about this because I have access to 5 TPU v3-8 VMs obtained from the TRC program. I wanted to use all these 5 VMs to train a single model, but I could not find an appropriate way to make it work. I know in the TPU Pod, TPUs are connected by dedicated high-speed network so the distributed training can be very fast. But I don't know if it is possible to make multiple TPU VMs work together by connect them to a VPC network.
Any information will be really appreciated!