The following tasks are performed.
- A pretrained Resnet50 model from pytorch vision library is used in the project (https://pytorch.org/vision/master/generated/torchvision.models.resnet50.html)
- Fine-tune the model with hyperparameter tuning and Network Re-shaping
- Implement Profiling and Debugging with hooks
- Deploy the model and perform inference
Enter AWS through the gateway in the course and open SageMaker Studio. Download the starter files. Download/Make the dataset available.
Udacity's Dog Classification Data set is used to complete the task.
The dataset can be downloaded here.
Python 3.7
Pytorch AWS Instance
hpo.py
- This script file contains code that will be used by the hyperparameter tuning jobs to train and test/validate the models with different hyperparameters to find the best hyperparametertrain_model.py
- This script file contains the code that will be used by the training job to train and test/validate the model with the best hyperparameters that we got from hyperparameter tuninginference.py
- This script contains code that is used by the deployed endpoint to perform some preprocessing (transformations) , serialization- deserialization and predictions/inferences and post-processing using the saved model from the training job.train_and_deploy.ipynb
- This jupyter notebook contains all the code and the steps performed in this project and their outputs.
- The ResNet model represents the deep Residual Learning Framework to ease the training process.
- A pair of fully connected Neural Networks has been added on top of the pretrained model to perform the classification task with 133 output nodes.
- AdamW from torch.optm is used as an optimizer.
- The Following hyperparamets are used: "batch-size": sagemaker.tuner.CategoricalParameter([32, 64, 128, 256]), "lr": sagemaker.tuner.ContinuousParameter(0.01, 0.1), "epochs": sagemaker.tuner.IntegerParameter(2, 4)
The hpo.py
script is used to perform hyperparameter tuning.
The Graphical representation of the Cross Entropy Loss is shown below.
Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?
- There is no smooth output line and there are different highs and lows for the batch sets. If not, suppose there was an error. What would that error look like and how would you have fixed it?
- A proper mix of the batches with shuffling could help the model learn better
- Trying out different neural network architecture.
The profiler report can be found here.
- Model was deployed to a "ml.m5.large" instance type and "endpoint_inference.py" script is used to setup and deploy our working endpoint.
- For testing purposes ,few test images are stored in the "testImages" folder.
- Those images are fed to the endpoint for inference/
- The inference is performed using both the approaches.
- Using the Predictor Object