Hi, I am trying to reproduce CIFAR10 and CIFAR100 results listed in the paper (8.34 test error rate for CIFAR10 and 34.30 error rate for CIFAR100).
I used ex2_input_target_max_rand.py, ex4_input_target_topk.py, and examples.py to run
python ex2_input_target_max_rand.py --sigprop --model vgg8 --dataset CIFAR10 --dropout 0.2 --lr 5e-4 --nonlin leakyrelu
python ex4_input_target_topk.py --sigprop --model vgg8 --dataset CIFAR10 --dropout 0.2 --lr 5e-4 --nonlin leakyrelu
Here are some additional configuration I tested
-
--norm batch_norm
or --norm instance_norm
-
I also tested "v9_input_target_max_all" and "v1_input_label_direct" loss by replacing them with input_target_max_rand in the ex2_input_target_max_rand.py file.
-
I also tested different topk values, e.g. 2, (default =6) for input_target_topk loss for CIFAR10
The default values of --lr-decay-milestones and --lr-decay-fact, coupled with MultiStepLR in the HyperParamsWrapper, should handle lr scheduling described in the paper. However, it was not possible to get test accuracy higher than 87% for CIFAR10, and 50% for CIFAR100. Can you please provide training configuration or environment to reproduce the results described in the paper?
Here are some example printout results. The accuracy did not improve after about 250~300 epochs. I ran the script on a Linux machine using Python 3.9.16 and Pytorch 1.13.1.
for CIFAR100
Epoch Start: 399
[Info][Train Epoch 399/400][Batch 390/391] [loss 2.0399] [acc 0.4259]
[Sequential] Acc: 0.4750 (0.4345, 21727/50000) Loss: 24.8152 (27.2924)
[BlockConv] Acc: 0.3375 (0.3133, 15664/50000) Loss: 22.0432 (22.7874)
[BlockConv] Acc: 0.3125 (0.2719, 13596/50000) Loss: 21.8403 (22.1944)
[BlockConv] Acc: 0.2625 (0.2637, 13185/50000) Loss: 21.3491 (21.5723)
[BlockConv] Acc: 0.2125 (0.2766, 13830/50000) Loss: 20.7694 (20.7466)
[BlockConv] Acc: 0.2750 (0.2814, 14068/50000) Loss: 19.9495 (19.8684)
[BlockConv] Acc: 0.2750 (0.2650, 13250/50000) Loss: 19.6585 (19.0249)
[BlockLinear] Acc: 0.3250 (0.2553, 12764/50000) Loss: 20.2943 (19.3511)
[Info][Test Epoch 399/400] [loss 1.7940] [acc 0.4967]
[Sequential] Acc: 0.6875 (0.4439, 4439/10000) Loss: 17.5677 (27.2676)
[BlockConv] Acc: 0.6250 (0.3671, 3671/10000) Loss: 20.6346 (21.4852)
[BlockConv] Acc: 0.4375 (0.3414, 3414/10000) Loss: 22.7599 (20.6480)
[BlockConv] Acc: 0.5000 (0.3426, 3426/10000) Loss: 23.8240 (20.1075)
[BlockConv] Acc: 0.7500 (0.3659, 3659/10000) Loss: 25.7010 (20.2997)
[BlockConv] Acc: 0.7500 (0.3792, 3792/10000) Loss: 30.8968 (20.1886)
[BlockConv] Acc: 0.7500 (0.3566, 3566/10000) Loss: 29.7236 (18.8794)
[BlockLinear] Acc: 0.6250 (0.3487, 3487/10000) Loss: 21.7481 (18.2785)
for CIFAR10
Epoch Start: 399
[Info][Train Epoch 399/400][Batch 390/391] [loss 0.4211] [acc 0.8557]
[Sequential] Acc: 0.5125 (0.6852, 34259/50000) Loss: 3.3270 (3.4904)
[BlockConv] Acc: 0.5875 (0.6892, 34460/50000) Loss: 3.3551 (3.6682)
[BlockConv] Acc: 0.6500 (0.7443, 37215/50000) Loss: 3.2299 (3.4566)
[BlockConv] Acc: 0.6500 (0.7846, 39229/50000) Loss: 3.0880 (3.3314)
[BlockConv] Acc: 0.6750 (0.8164, 40821/50000) Loss: 2.9348 (3.2015)
[BlockConv] Acc: 0.7750 (0.8463, 42317/50000) Loss: 2.8138 (3.1072)
[BlockConv] Acc: 0.8000 (0.8634, 43169/50000) Loss: 2.6970 (3.0334)
[BlockLinear] Acc: 0.7875 (0.8558, 42789/50000) Loss: 2.6793 (3.0491)
[Info][Test Epoch 399/400] [loss 0.4259] [acc 0.8633]
[Sequential] Acc: 0.6875 (0.7101, 7101/10000) Loss: 1.6176 (3.4179)
[BlockConv] Acc: 0.6250 (0.7316, 7316/10000) Loss: 1.6897 (3.3831)
[BlockConv] Acc: 0.7500 (0.7777, 7777/10000) Loss: 1.5615 (3.2262)
[BlockConv] Acc: 0.7500 (0.8093, 8093/10000) Loss: 1.3019 (3.1340)
[BlockConv] Acc: 0.7500 (0.8350, 8350/10000) Loss: 1.2648 (3.0709)
[BlockConv] Acc: 0.7500 (0.8573, 8573/10000) Loss: 1.2363 (3.0167)
[BlockConv] Acc: 0.7500 (0.8650, 8650/10000) Loss: 1.2136 (2.9883)
[BlockLinear] Acc: 0.7500 (0.8627, 8627/10000) Loss: 1.2342 (2.9913)
I am also having trouble finding code implementation of the equation 10 in the paper. Can you please locate where it is?