AlexNet VS. ResNet

Zhuoquan
8 min readJun 4, 2021

INSTRUCTION:

The purpose of this test is to compare the performance of AlexNet and ResNet models in fine-tuning. In this test, I will focus on two parts, which the accuracies of prediction how different for both models with resetting final fully connected layers and freezing all the network except the final layer. Here are what I prepared:

> AlexNet model and ResNet model

> “hymenoptera_data” dataset

AlexNet:

AlexNet(

(features): Sequential(

(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))

(1): ReLU(inplace=True)

(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))

(4): ReLU(inplace=True)

(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(7): ReLU(inplace=True)

(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(9): ReLU(inplace=True)

(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(11): ReLU(inplace=True)

(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False))

(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))

(classifier): Sequential(

(0): Dropout(p=0.5, inplace=False)

(1): Linear(in_features=9216, out_features=4096, bias=True)

(2): ReLU(inplace=True)

(3): Dropout(p=0.5, inplace=False)

(4): Linear(in_features=4096, out_features=4096, bias=True)

(5): ReLU(inplace=True)

(6): Linear(in_features=4096, out_features=1000, bias=True)

)

)

Phase 1: The outcome of resetting final fully connected layers

Epoch 0/14

— — — — —

train Loss: 0.6595 Acc: 0.6598

val Loss: 0.4382 Acc: 0.8235

Epoch 1/14

— — — — —

train Loss: 0.4752 Acc: 0.7664

val Loss: 0.4759 Acc: 0.8105

Epoch 2/14

— — — — —

train Loss: 0.4995 Acc: 0.7787

val Loss: 0.4776 Acc: 0.7778

Epoch 3/14

— — — — —

train Loss: 0.3268 Acc: 0.8566

val Loss: 0.6494 Acc: 0.8301

Epoch 4/14

— — — — —

train Loss: 0.4099 Acc: 0.8279

val Loss: 0.4451 Acc: 0.8301

Epoch 5/14

— — — — —

train Loss: 0.4701 Acc: 0.7951

val Loss: 0.5185 Acc: 0.6928

Epoch 6/14

— — — — —

train Loss: 0.3866 Acc: 0.8320

val Loss: 0.2749 Acc: 0.8889

Epoch 7/14

— — — — —

train Loss: 0.2740 Acc: 0.8852

val Loss: 0.2851 Acc: 0.8627

Epoch 8/14

— — — — —

train Loss: 0.2431 Acc: 0.9057

val Loss: 0.3033 Acc: 0.8824

Epoch 9/14

— — — — —

train Loss: 0.2497 Acc: 0.8893

val Loss: 0.2900 Acc: 0.8889

Epoch 10/14

— — — — —

train Loss: 0.2187 Acc: 0.8975

val Loss: 0.2969 Acc: 0.8824

Epoch 11/14

— — — — —

train Loss: 0.2277 Acc: 0.8934

val Loss: 0.2945 Acc: 0.8889

Epoch 12/14

— — — — —

train Loss: 0.1979 Acc: 0.9426

val Loss: 0.2922 Acc: 0.8889

Epoch 13/14

— — — — —

train Loss: 0.1998 Acc: 0.9180

val Loss: 0.2901 Acc: 0.9020

Epoch 14/14

— — — — —

train Loss: 0.1743 Acc: 0.9262

val Loss: 0.2924 Acc: 0.8889

Training complete in 14m 23s

Best val Acc: 0.901961

Phase 2: The Outcome of Freezing All The Network Except The Final Layers

Epoch 0/14

— — — — —

train Loss: 0.6314 Acc: 0.6598

val Loss: 0.2297 Acc: 0.9477

Epoch 1/14

— — — — —

train Loss: 0.5601 Acc: 0.7500

val Loss: 0.2439 Acc: 0.9281

Epoch 2/14

— — — — —

train Loss: 0.4822 Acc: 0.7869

val Loss: 0.1763 Acc: 0.9346

Epoch 3/14

— — — — —

train Loss: 0.4494 Acc: 0.7910

val Loss: 0.1831 Acc: 0.9412

Epoch 4/14

— — — — —

train Loss: 0.4496 Acc: 0.7992

val Loss: 0.2922 Acc: 0.9085

Epoch 5/14

— — — — —

train Loss: 0.3606 Acc: 0.8525

val Loss: 0.3132 Acc: 0.9020

Epoch 6/14

— — — — —

train Loss: 0.4802 Acc: 0.7828

val Loss: 0.1922 Acc: 0.9477

Epoch 7/14

— — — — —

train Loss: 0.3938 Acc: 0.8156

val Loss: 0.2064 Acc: 0.9412

Epoch 8/14

— — — — —

train Loss: 0.3838 Acc: 0.8361

val Loss: 0.2250 Acc: 0.9216

Epoch 9/14

— — — — —

train Loss: 0.3087 Acc: 0.8566

val Loss: 0.2187 Acc: 0.9346

Epoch 10/14

— — — — —

train Loss: 0.3046 Acc: 0.8689

val Loss: 0.2033 Acc: 0.9346

Epoch 11/14

— — — — —

train Loss: 0.3461 Acc: 0.8361

val Loss: 0.1966 Acc: 0.9477

Epoch 12/14

— — — — —

train Loss: 0.3689 Acc: 0.8443

val Loss: 0.2092 Acc: 0.9412

Epoch 13/14

— — — — —

train Loss: 0.4293 Acc: 0.8279

val Loss: 0.1785 Acc: 0.9477

Epoch 14/14

— — — — —

train Loss: 0.3445 Acc: 0.8607

val Loss: 0.2066 Acc: 0.9346

Training complete in 11m 6s

Best val Acc: 0.947712

ResNet:

ResNet(

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

(layer1): Sequential(

(0): BasicBlock(

(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))

(1): BasicBlock(

(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(layer2): Sequential(

(0): BasicBlock(

(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(downsample): Sequential(

(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)

(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(1): BasicBlock(

(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(layer3): Sequential(

(0): BasicBlock(

(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(downsample): Sequential(

(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)

(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(1): BasicBlock(

(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(layer4): Sequential(

(0): BasicBlock(

(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(downsample): Sequential(

(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)

(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(1): BasicBlock(

(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(relu): ReLU(inplace=True)

(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

)

)

(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))

(fc): Linear(in_features=512, out_features=1000, bias=True)

)

Phase 1: The Outcome of Resetting Final Fully Connected Layers

Epoch 0/14

— — — — —

train Loss: 0.5882 Acc: 0.7459

val Loss: 0.1426 Acc: 0.9346

Epoch 1/14

— — — — —

train Loss: 0.5981 Acc: 0.7951

val Loss: 0.4745 Acc: 0.8301

Epoch 2/14

— — — — —

train Loss: 0.6853 Acc: 0.7828

val Loss: 0.8639 Acc: 0.7778

Epoch 3/14

— — — — —

train Loss: 0.9354 Acc: 0.6967

val Loss: 0.3016 Acc: 0.8758

Epoch 4/14

— — — — —

train Loss: 0.6290 Acc: 0.7541

val Loss: 0.3368 Acc: 0.8562

Epoch 5/14

— — — — —

train Loss: 0.5338 Acc: 0.7541

val Loss: 0.4152 Acc: 0.8301

Epoch 6/14

— — — — —

train Loss: 0.4063 Acc: 0.8320

val Loss: 0.3412 Acc: 0.9020

Epoch 7/14

— — — — —

train Loss: 0.4077 Acc: 0.8279

val Loss: 0.2137 Acc: 0.9477

Epoch 8/14

— — — — —

train Loss: 0.3439 Acc: 0.8770

val Loss: 0.2119 Acc: 0.9412

Epoch 9/14

— — — — —

train Loss: 0.3291 Acc: 0.8607

val Loss: 0.2117 Acc: 0.9346

Epoch 10/14

— — — — —

train Loss: 0.4157 Acc: 0.8443

val Loss: 0.2027 Acc: 0.9412

Epoch 11/14

— — — — —

train Loss: 0.2794 Acc: 0.9016

val Loss: 0.2076 Acc: 0.9346

Epoch 12/14

— — — — —

train Loss: 0.2976 Acc: 0.8770

val Loss: 0.2107 Acc: 0.9281

Epoch 13/14

— — — — —

train Loss: 0.2501 Acc: 0.9057

val Loss: 0.1999 Acc: 0.9412

Epoch 14/14

— — — — —

train Loss: 0.3074 Acc: 0.8811

val Loss: 0.2140 Acc: 0.9412

Training complete in 20m 60s

Best val Acc: 0.947712

Phase 2: The Outcome of Freezing All The Network Except The Final Layers

Epoch 0/14

— — — — —

train Loss: 0.6285 Acc: 0.6475

val Loss: 0.2612 Acc: 0.9085

Epoch 1/14

— — — — —

train Loss: 0.5436 Acc: 0.7459

val Loss: 0.2110 Acc: 0.9150

Epoch 2/14

— — — — —

train Loss: 0.5486 Acc: 0.7500

val Loss: 0.3061 Acc: 0.8824

Epoch 3/14

— — — — —

train Loss: 0.7213 Acc: 0.7336

val Loss: 0.2644 Acc: 0.9216

Epoch 4/14

— — — — —

train Loss: 0.3707 Acc: 0.8443

val Loss: 0.1817 Acc: 0.9477

Epoch 5/14

— — — — —

train Loss: 0.4422 Acc: 0.8197

val Loss: 0.3583 Acc: 0.8758

Epoch 6/14

— — — — —

train Loss: 0.5729 Acc: 0.7869

val Loss: 0.2564 Acc: 0.9216

Epoch 7/14

— — — — —

train Loss: 0.4123 Acc: 0.8320

val Loss: 0.1989 Acc: 0.9477

Epoch 8/14

— — — — —

train Loss: 0.3596 Acc: 0.8238

val Loss: 0.1899 Acc: 0.9477

Epoch 9/14

— — — — —

train Loss: 0.3535 Acc: 0.8484

val Loss: 0.1852 Acc: 0.9477

Epoch 10/14

— — — — —

train Loss: 0.3129 Acc: 0.8525

val Loss: 0.1914 Acc: 0.9412

Epoch 11/14

— — — — —

train Loss: 0.2913 Acc: 0.8689

val Loss: 0.1811 Acc: 0.9542

Epoch 12/14

— — — — —

train Loss: 0.3694 Acc: 0.8361

val Loss: 0.1829 Acc: 0.9542

Epoch 13/14

— — — — —

train Loss: 0.3626 Acc: 0.8607

val Loss: 0.2132 Acc: 0.9216

Epoch 14/14

— — — — —

train Loss: 0.3358 Acc: 0.8279

val Loss: 0.2175 Acc: 0.9346

Training complete in 9m 5s

Best val Acc: 0.954248

CONCLUSION:

In this text, phase 1 of the AlexNet model’s training completes in 14m and 23s, the best accuracy is 0.901961, while the phase 2 training completes in 11m and 6s, the best accuracy is 0.947712, so the final outcome of the AlexNet model is 25m and 29s. Phase 1 of ResNet model’s training completes in 20m and 60s, the best accuracy is 0.947712, while the phase 2 training completes in 9m and 5s, the best accuracy is 0.954248, so the final outcome of the ResNet model is 30m and 5s. From resetting the final fully-connected layers to freezing all the network except the final layers, the accuracy does improve. In this case, the improvement rate of the AlexNet model is higher than the ResNet model, and ResNet uses more run time than AlexNet, which is over expected. Because the parameters of AlexNet are more than Resnet, so AlexNet’s calculation should be more expensive than ResNet. Therefore, I guess that the dept of the layer determines the run time. Finally, the test shows that the accuracy of ResNet will be higher a little than AlexNet.

--

--