Thanks for the great work. I am however running into issue when training a Levit-128s model and then loading the trained model . It complains with lot of unexpected keys.
` Missing key(s) in state_dict: "patch_embed.0.weight", "patch_embed.0.bias", "patch_embed.2.weight", "patch_embed.2.bias", "patch_embed.4.weight", "patch_embed.4.bias", "patch_embed.6.weight", "patch_embed.6.bias", "blocks.0.m.qkv.weight", "blocks.0.m.qkv.bias", "blocks.0.m.proj.1.weight", "blocks.0.m.proj.1.bias", "blocks.1.m.0.weight", "blocks.1.m.0.bias", "blocks.1.m.2.weight", "blocks.1.m.2.bias", "blocks.2.m.qkv.weight", "blocks.2.m.qkv.bias", "blocks.2.m.proj.1.weight", "blocks.2.m.proj.1.bias", "blocks.3.m.0.weight", "blocks.3.m.0.bias", "blocks.3.m.2.weight", "blocks.3.m.2.bias", "blocks.4.kv.weight", "blocks.4.kv.bias", "blocks.4.q.1.weight", "blocks.4.q.1.bias", "blocks.4.proj.1.weight", "blocks.4.proj.1.bias", "blocks.5.m.0.weight", "blocks.5.m.0.bias", "blocks.5.m.2.weight", "blocks.5.m.2.bias", "blocks.6.m.qkv.weight", "blocks.6.m.qkv.bias", "blocks.6.m.proj.1.weight", "blocks.6.m.proj.1.bias", "blocks.7.m.0.weight", "blocks.7.m.0.bias", "blocks.7.m.2.weight", "blocks.7.m.2.bias", "blocks.8.m.qkv.weight", "blocks.8.m.qkv.bias", "blocks.8.m.proj.1.weight", "blocks.8.m.proj.1.bias", "blocks.9.m.0.weight", "blocks.9.m.0.bias", "blocks.9.m.2.weight", "blocks.9.m.2.bias", "blocks.10.m.qkv.weight", "blocks.10.m.qkv.bias", "blocks.10.m.proj.1.weight", "blocks.10.m.proj.1.bias", "blocks.11.m.0.weight", "blocks.11.m.0.bias", "blocks.11.m.2.weight", "blocks.11.m.2.bias", "blocks.12.kv.weight", "blocks.12.kv.bias", "blocks.12.q.1.weight", "blocks.12.q.1.bias", "blocks.12.proj.1.weight", "blocks.12.proj.1.bias", "blocks.13.m.0.weight", "blocks.13.m.0.bias", "blocks.13.m.2.weight", "blocks.13.m.2.bias", "blocks.14.m.qkv.weight", "blocks.14.m.qkv.bias", "blocks.14.m.proj.1.weight", "blocks.14.m.proj.1.bias", "blocks.15.m.0.weight", "blocks.15.m.0.bias", "blocks.15.m.2.weight", "blocks.15.m.2.bias", "blocks.16.m.qkv.weight", "blocks.16.m.qkv.bias", "blocks.16.m.proj.1.weight", "blocks.16.m.proj.1.bias", "blocks.17.m.0.weight", "blocks.17.m.0.bias", "blocks.17.m.2.weight", "blocks.17.m.2.bias", "blocks.18.m.qkv.weight", "blocks.18.m.qkv.bias", "blocks.18.m.proj.1.weight", "blocks.18.m.proj.1.bias", "blocks.19.m.0.weight", "blocks.19.m.0.bias", "blocks.19.m.2.weight", "blocks.19.m.2.bias", "blocks.20.m.qkv.weight", "blocks.20.m.qkv.bias", "blocks.20.m.proj.1.weight", "blocks.20.m.proj.1.bias", "blocks.21.m.0.weight", "blocks.21.m.0.bias", "blocks.21.m.2.weight", "blocks.21.m.2.bias", "head.weight", "head.bias".
Unexpected key(s) in state_dict: "patch_embed.0.c.weight", "patch_embed.0.bn.weight", "patch_embed.0.bn.bias", "patch_embed.0.bn.running_mean", "patch_embed.0.bn.running_var", "patch_embed.0.bn.num_batches_tracked", "patch_embed.2.c.weight", "patch_embed.2.bn.weight", "patch_embed.2.bn.bias", "patch_embed.2.bn.running_mean", "patch_embed.2.bn.running_var", "patch_embed.2.bn.num_batches_tracked", "patch_embed.4.c.weight", "patch_embed.4.bn.weight", "patch_embed.4.bn.bias", "patch_embed.4.bn.running_mean", "patch_embed.4.bn.running_var", "patch_embed.4.bn.num_batches_tracked", "patch_embed.6.c.weight", "patch_embed.6.bn.weight", "patch_embed.6.bn.bias", "patch_embed.6.bn.running_mean", "patch_embed.6.bn.running_var", "patch_embed.6.bn.num_batches_tracked", "blocks.0.m.qkv.c.weight", "blocks.0.m.qkv.bn.weight", "blocks.0.m.qkv.bn.bias", "blocks.0.m.qkv.bn.running_mean", "blocks.0.m.qkv.bn.running_var", "blocks.0.m.qkv.bn.num_batches_tracked", "blocks.0.m.proj.1.c.weight", "blocks.0.m.proj.1.bn.weight", "blocks.0.m.proj.1.bn.bias", "blocks.0.m.proj.1.bn.running_mean", "blocks.0.m.proj.1.bn.running_var", "blocks.0.m.proj.1.bn.num_batches_tracked", "blocks.1.m.0.c.weight", "blocks.1.m.0.bn.weight", "blocks.1.m.0.bn.bias", "blocks.1.m.0.bn.running_mean", "blocks.1.m.0.bn.running_var", "blocks.1.m.0.bn.num_batches_tracked", "blocks.1.m.2.c.weight", "blocks.1.m.2.bn.weight", "blocks.1.m.2.bn.bias", "blocks.1.m.2.bn.running_mean", "blocks.1.m.2.bn.running_var", "blocks.1.m.2.bn.num_batches_tracked", "blocks.2.m.qkv.c.weight", "blocks.2.m.qkv.bn.weight", "blocks.2.m.qkv.bn.bias", "blocks.2.m.qkv.bn.running_mean", "blocks.2.m.qkv.bn.running_var", "blocks.2.m.qkv.bn.num_batches_tracked", "blocks.2.m.proj.1.c.weight", "blocks.2.m.proj.1.bn.weight", "blocks.2.m.proj.1.bn.bias", "blocks.2.m.proj.1.bn.running_mean", "blocks.2.m.proj.1.bn.running_var", "blocks.2.m.proj.1.bn.num_batches_tracked", "blocks.3.m.0.c.weight", "blocks.3.m.0.bn.weight", "blocks.3.m.0.bn.bias", "blocks.3.m.0.bn.running_mean", "blocks.3.m.0.bn.running_var", "blocks.3.m.0.bn.num_batches_tracked", "blocks.3.m.2.c.weight", "blocks.3.m.2.bn.weight", "blocks.3.m.2.bn.bias", "blocks.3.m.2.bn.running_mean", "blocks.3.m.2.bn.running_var", "blocks.3.m.2.bn.num_batches_tracked", "blocks.4.kv.c.weight", "blocks.4.kv.bn.weight", "blocks.4.kv.bn.bias", "blocks.4.kv.bn.running_mean", "blocks.4.kv.bn.running_var", "blocks.4.kv.bn.num_batches_tracked", "blocks.4.q.1.c.weight", "blocks.4.q.1.bn.weight", "blocks.4.q.1.bn.bias", "blocks.4.q.1.bn.running_mean", "blocks.4.q.1.bn.running_var", "blocks.4.q.1.bn.num_batches_tracked", "blocks.4.proj.1.c.weight", "blocks.4.proj.1.bn.weight", "blocks.4.proj.1.bn.bias", "blocks.4.proj.1.bn.running_mean", "blocks.4.proj.1.bn.running_var", "blocks.4.proj.1.bn.num_batches_tracked", "blocks.5.m.0.c.weight", "blocks.5.m.0.bn.weight", "blocks.5.m.0.bn.bias", "blocks.5.m.0.bn.running_mean", "blocks.5.m.0.bn.running_var", "blocks.5.m.0.bn.num_batches_tracked", "blocks.5.m.2.c.weight", "blocks.5.m.2.bn.weight", "blocks.5.m.2.bn.bias", "blocks.5.m.2.bn.running_mean", "blocks.5.m.2.bn.running_var", "blocks.5.m.2.bn.num_batches_tracked", "blocks.6.m.qkv.c.weight", "blocks.6.m.qkv.bn.weight", "blocks.6.m.qkv.bn.bias", "blocks.6.m.qkv.bn.running_mean", "blocks.6.m.qkv.bn.running_var", "blocks.6.m.qkv.bn.num_batches_tracked", "blocks.6.m.proj.1.c.weight", "blocks.6.m.proj.1.bn.weight", "blocks.6.m.proj.1.bn.bias", "blocks.6.m.proj.1.bn.running_mean", "blocks.6.m.proj.1.bn.running_var", "blocks.6.m.proj.1.bn.num_batches_tracked", "blocks.7.m.0.c.weight", "blocks.7.m.0.bn.weight", "blocks.7.m.0.bn.bias", "blocks.7.m.0.bn.running_mean", "blocks.7.m.0.bn.running_var", "blocks.7.m.0.bn.num_batches_tracked", "blocks.7.m.2.c.weight", "blocks.7.m.2.bn.weight", "blocks.7.m.2.bn.bias", "blocks.7.m.2.bn.running_mean", "blocks.7.m.2.bn.running_var", "blocks.7.m.2.bn.num_batches_tracked", "blocks.8.m.qkv.c.weight", "blocks.8.m.qkv.bn.weight", "blocks.8.m.qkv.bn.bias", "blocks.8.m.qkv.bn.running_mean", "blocks.8.m.qkv.bn.running_var", "blocks.8.m.qkv.bn.num_batches_tracked", "blocks.8.m.proj.1.c.weight", "blocks.8.m.proj.1.bn.weight", "blocks.8.m.proj.1.bn.bias", "blocks.8.m.proj.1.bn.running_mean", "blocks.8.m.proj.1.bn.running_var", "blocks.8.m.proj.1.bn.num_batches_tracked", "blocks.9.m.0.c.weight", "blocks.9.m.0.bn.weight", "blocks.9.m.0.bn.bias", "blocks.9.m.0.bn.running_mean", "blocks.9.m.0.bn.running_var", "blocks.9.m.0.bn.num_batches_tracked", "blocks.9.m.2.c.weight", "blocks.9.m.2.bn.weight", "blocks.9.m.2.bn.bias", "blocks.9.m.2.bn.running_mean", "blocks.9.m.2.bn.running_var", "blocks.9.m.2.bn.num_batches_tracked", "blocks.10.m.qkv.c.weight", "blocks.10.m.qkv.bn.weight", "blocks.10.m.qkv.bn.bias", "blocks.10.m.qkv.bn.running_mean", "blocks.10.m.qkv.bn.running_var", "blocks.10.m.qkv.bn.num_batches_tracked", "blocks.10.m.proj.1.c.weight", "blocks.10.m.proj.1.bn.weight", "blocks.10.m.proj.1.bn.bias", "blocks.10.m.proj.1.bn.running_mean", "blocks.10.m.proj.1.bn.running_var", "blocks.10.m.proj.1.bn.num_batches_tracked", "blocks.11.m.0.c.weight", "blocks.11.m.0.bn.weight", "blocks.11.m.0.bn.bias", "blocks.11.m.0.bn.running_mean", "blocks.11.m.0.bn.running_var", "blocks.11.m.0.bn.num_batches_tracked", "blocks.11.m.2.c.weight", "blocks.11.m.2.bn.weight", "blocks.11.m.2.bn.bias", "blocks.11.m.2.bn.running_mean", "blocks.11.m.2.bn.running_var", "blocks.11.m.2.bn.num_batches_tracked", "blocks.12.kv.c.weight", "blocks.12.kv.bn.weight", "blocks.12.kv.bn.bias", "blocks.12.kv.bn.running_mean", "blocks.12.kv.bn.running_var", "blocks.12.kv.bn.num_batches_tracked", "blocks.12.q.1.c.weight", "blocks.12.q.1.bn.weight", "blocks.12.q.1.bn.bias", "blocks.12.q.1.bn.running_mean", "blocks.12.q.1.bn.running_var", "blocks.12.q.1.bn.num_batches_tracked", "blocks.12.proj.1.c.weight", "blocks.12.proj.1.bn.weight", "blocks.12.proj.1.bn.bias", "blocks.12.proj.1.bn.running_mean", "blocks.12.proj.1.bn.running_var", "blocks.12.proj.1.bn.num_batches_tracked", "blocks.13.m.0.c.weight", "blocks.13.m.0.bn.weight", "blocks.13.m.0.bn.bias", "blocks.13.m.0.bn.running_mean", "blocks.13.m.0.bn.running_var", "blocks.13.m.0.bn.num_batches_tracked", "blocks.13.m.2.c.weight", "blocks.13.m.2.bn.weight", "blocks.13.m.2.bn.bias", "blocks.13.m.2.bn.running_mean", "blocks.13.m.2.bn.running_var", "blocks.13.m.2.bn.num_batches_tracked", "blocks.14.m.qkv.c.weight", "blocks.14.m.qkv.bn.weight", "blocks.14.m.qkv.bn.bias", "blocks.14.m.qkv.bn.running_mean", "blocks.14.m.qkv.bn.running_var", "blocks.14.m.qkv.bn.num_batches_tracked", "blocks.14.m.proj.1.c.weight", "blocks.14.m.proj.1.bn.weight", "blocks.14.m.proj.1.bn.bias", "blocks.14.m.proj.1.bn.running_mean", "blocks.14.m.proj.1.bn.running_var", "blocks.14.m.proj.1.bn.num_batches_tracked", "blocks.15.m.0.c.weight", "blocks.15.m.0.bn.weight", "blocks.15.m.0.bn.bias", "blocks.15.m.0.bn.running_mean", "blocks.15.m.0.bn.running_var", "blocks.15.m.0.bn.num_batches_tracked", "blocks.15.m.2.c.weight", "blocks.15.m.2.bn.weight", "blocks.15.m.2.bn.bias", "blocks.15.m.2.bn.running_mean", "blocks.15.m.2.bn.running_var", "blocks.15.m.2.bn.num_batches_tracked", "blocks.16.m.qkv.c.weight", "blocks.16.m.qkv.bn.weight", "blocks.16.m.qkv.bn.bias", "blocks.16.m.qkv.bn.running_mean", "blocks.16.m.qkv.bn.running_var", "blocks.16.m.qkv.bn.num_batches_tracked", "blocks.16.m.proj.1.c.weight", "blocks.16.m.proj.1.bn.weight", "blocks.16.m.proj.1.bn.bias", "blocks.16.m.proj.1.bn.running_mean", "blocks.16.m.proj.1.bn.running_var", "blocks.16.m.proj.1.bn.num_batches_tracked", "blocks.17.m.0.c.weight", "blocks.17.m.0.bn.weight", "blocks.17.m.0.bn.bias", "blocks.17.m.0.bn.running_mean", "blocks.17.m.0.bn.running_var", "blocks.17.m.0.bn.num_batches_tracked", "blocks.17.m.2.c.weight", "blocks.17.m.2.bn.weight", "blocks.17.m.2.bn.bias", "blocks.17.m.2.bn.running_mean", "blocks.17.m.2.bn.running_var", "blocks.17.m.2.bn.num_batches_tracked", "blocks.18.m.qkv.c.weight", "blocks.18.m.qkv.bn.weight", "blocks.18.m.qkv.bn.bias", "blocks.18.m.qkv.bn.running_mean", "blocks.18.m.qkv.bn.running_var", "blocks.18.m.qkv.bn.num_batches_tracked", "blocks.18.m.proj.1.c.weight", "blocks.18.m.proj.1.bn.weight", "blocks.18.m.proj.1.bn.bias", "blocks.18.m.proj.1.bn.running_mean", "blocks.18.m.proj.1.bn.running_var", "blocks.18.m.proj.1.bn.num_batches_tracked", "blocks.19.m.0.c.weight", "blocks.19.m.0.bn.weight", "blocks.19.m.0.bn.bias", "blocks.19.m.0.bn.running_mean", "blocks.19.m.0.bn.running_var", "blocks.19.m.0.bn.num_batches_tracked", "blocks.19.m.2.c.weight", "blocks.19.m.2.bn.weight", "blocks.19.m.2.bn.bias", "blocks.19.m.2.bn.running_mean", "blocks.19.m.2.bn.running_var", "blocks.19.m.2.bn.num_batches_tracked", "blocks.20.m.qkv.c.weight", "blocks.20.m.qkv.bn.weight", "blocks.20.m.qkv.bn.bias", "blocks.20.m.qkv.bn.running_mean", "blocks.20.m.qkv.bn.running_var", "blocks.20.m.qkv.bn.num_batches_tracked", "blocks.20.m.proj.1.c.weight", "blocks.20.m.proj.1.bn.weight", "blocks.20.m.proj.1.bn.bias", "blocks.20.m.proj.1.bn.running_mean", "blocks.20.m.proj.1.bn.running_var", "blocks.20.m.proj.1.bn.num_batches_tracked", "blocks.21.m.0.c.weight", "blocks.21.m.0.bn.weight", "blocks.21.m.0.bn.bias", "blocks.21.m.0.bn.running_mean", "blocks.21.m.0.bn.running_var", "blocks.21.m.0.bn.num_batches_tracked", "blocks.21.m.2.c.weight", "blocks.21.m.2.bn.weight", "blocks.21.m.2.bn.bias", "blocks.21.m.2.bn.running_mean", "blocks.21.m.2.bn.running_var", "blocks.21.m.2.bn.num_batches_tracked", "head.bn.weight", "head.bn.bias", "head.bn.running_mean", "head.bn.running_var", "head.bn.num_batches_tracked", "head.l.weight", "head.l.bias".
Update: create_model function from timm has fuse argument setting it to False , loads the model correctly. Is there a function to save the fused model though?