I used speechbrain in a docker environment for version 0.5.11 with pytorch-1.11 and CUDA 11.1 which was working fine.
When I tried using the same docker image of speechbrain in CUDA-11.9, there was issue in using the GPUs, so I created a new docker image which supported CUDA-11.9, however in this I have to reinstall speechbrain for version 1.13 which was supported on torch-1.13 for CUDA11.8.
In this docker image which was working fine, I used the same training script (which I used in speechbrain version-0.5.11) and it was working fine and all the hyperparameters in train.yaml were also unchanged. However the model which was created was created, was not the same model as I got in speechbrain-0.5.11. I tried to debug this, but could not conclude anything particular, like whether it is due to speechbrain version change or pytorch version change or if there is a change in architecture of speechbrain internally.
I am attaching the error snapshot of the difference in models from two different speechbrain versions.
RuntimeError: Error(s) in loading state_dict for ModuleDict:
Missing key(s) in state_dict: "0.convblock_0.convs.norm_0.norm.running_mean", "0.convblock_0.convs.norm_0.norm.running_var", "0.convblock_1.convs.norm_0.norm.running_mean", "0.convblock_1.convs.norm_0.norm.running_var", "1.encoder.layers.0.convolution_module.after_conv.0.running_mean", "1.encoder.layers.0.convolution_module.after_conv.0.running_var", "1.encoder.layers.1.convolution_module.after_conv.0.running_mean", "1.encoder.layers.1.convolution_module.after_conv.0.running_var", "1.encoder.layers.2.convolution_module.after_conv.0.running_mean", "1.encoder.layers.2.convolution_module.after_conv.0.running_var", "1.encoder.layers.3.convolution_module.after_conv.0.running_mean", "1.encoder.layers.3.convolution_module.after_conv.0.running_var", "1.encoder.layers.4.convolution_module.after_conv.0.running_mean", "1.encoder.layers.4.convolution_module.after_conv.0.running_var", "1.encoder.layers.5.convolution_module.after_conv.0.running_mean", "1.encoder.layers.5.convolution_module.after_conv.0.running_var", "1.encoder.layers.6.convolution_module.after_conv.0.running_mean", "1.encoder.layers.6.convolution_module.after_conv.0.running_var", "1.encoder.layers.7.convolution_module.after_conv.0.running_mean", "1.encoder.layers.7.convolution_module.after_conv.0.running_var", "1.encoder.layers.8.convolution_module.after_conv.0.running_mean", "1.encoder.layers.8.convolution_module.after_conv.0.running_var", "1.encoder.layers.9.convolution_module.after_conv.0.running_mean", "1.encoder.layers.9.convolution_module.after_conv.0.running_var", "1.encoder.layers.10.convolution_module.after_conv.0.running_mean", "1.encoder.layers.10.convolution_module.after_conv.0.running_var", "1.encoder.layers.11.convolution_module.after_conv.0.running_mean", "1.encoder.layers.11.convolution_module.after_conv.0.running_var".
size mismatch for 0.convblock_0.convs.norm_0.norm.weight: copying a param with shape torch.Size([40, 64]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for 0.convblock_0.convs.norm_0.norm.bias: copying a param with shape torch.Size([40, 64]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for 0.convblock_1.convs.norm_0.norm.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for 0.convblock_1.convs.norm_0.norm.bias: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for 1.encoder.layers.0.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.1.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.2.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.3.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.4.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.5.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.6.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.7.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.8.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.9.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.10.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
size mismatch for 1.encoder.layers.11.convolution_module.after_conv.2.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([256, 256, 1]).
Please provide guidance and help in this regard.
Naval