-
You could simply do pruning after the pre-training stage
-
You can maybe also try a student-teacher network to reduce ur model parameters ? https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764
Then someone asks this question, which I don't know how we did it:
whats the finetuning strategy? Arey you just training the head of the model or unfreezing everything?
Then this other solution that sounds interesting but I need to think about it:
if you dont want to remove the original feature detection, then create a second decoder (classification/seg/whatever) which accepts the output of the encoder.
you can train that seperately and then combine them at production, or you can freeze all the layers, attach the second decoder (unfrozen), and train from that.
He even made a code for us!
class ComboNetB0(nn.Module):
def init(self, fc1_n_classes, fc2_n_classes, **override_params):
super().init()
# Create a pretrained EfficientNet model from efficientnet_pytorch package
self.network = EfficientNet.from_pretrained("efficientnet-b0", num_classes=fc1_n_classes, **override_params)
# Pretained model will create self.network._fc which is the pretrained final layer, looks something like this:
# self.network._fc = nn.Sequential(nn.Linear(self.network._fc.in_features, 512),
# nn.ReLU(),
# nn.Dropout(0.25),
# nn.Linear(512, 128),
# nn.ReLU(),
# nn.Dropout(0.50),
# nn.Linear(128,num_classes))
# Create a second final layer
self.network._fc2 = nn.Sequential(nn.Linear(self.network._fc.in_features, 512),
nn.ReLU(),
nn.Dropout(0.25),
nn.Linear(512, 128),
nn.ReLU(),
nn.Dropout(0.50),
nn.Linear(128,fc2_n_classes))
def forward(self,x):
features = self.network.extract_features(img)
out1 = self.network._fc(features) # logits of the pretrained classification set
out2 = self.network._fc2(features) # logits of your second classification set
return torch.cat((out1, out2), 0)