Azure Container Instances is a great deployment strategy for deploying ML models endpoints that you know will get a consistant traffic or want to take advantage of accelerators like GPUs. You can also add autoscaling capabilities on top of this.
- An active Azure account configured on the machine with Azure CLI installed and configured
- Install instruction: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli (Version >= 2.6.0)
- Configure Azure account instruction: https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli
- Docker is installed and running on the machine.
- Install instruction: https://docs.docker.com/install
- Install required python packages
$ pip install -r requirements.txt
To try this out let us deploy the IrisClassifier demo from the BentoML quick start guide.
-
Build and save Bento Bundle from BentoML quick start guide notebook mentioned above.
-
Create Azure Container Instance deployment with the deployment tool. Make sure that you copy the config file and make the changes required for your deployment. The reference for the config file is given below.
Run deploy script in the command line:
$ BENTO_BUNDLE_PATH=$(bentoml get IrisClassifier:latest --print-location -q) $ python deploy.py $BENTO_BUNDLE_PATH iristest azure_config.json # Sample output Creating Azure ACR iristest0acr Build and push image iristest0acr.azurecr.io/irisclassifier:20210803234622_65f4f4 Generating ACI template Creating the Container Instance Deployment successful!
Get Container Instance deployment information and status
$ python describe.py iristest azure_config.json # Sample output { "state": "Running", "IPAddress": { "dnsNameLabel": "iristest-aci", "fqdn": "iristest-aci.eastus.azurecontainer.io", "ip": "20.81.69.156", "ports": [ { "port": 5000, "protocol": "TCP" } ], "type": "Public" } }
-
Make sample request against deployed service
$ curl -i \ --header "Content-Type: application/json" \ --request POST \ --data '[[5.1, 3.5, 1.4, 0.2]]' \ http://iristest-aci.eastus.azurecontainer.io:5000/predict # Sample output HTTP/1.1 200 OK Content-Type: application/json X-Request-Id: 3ca3526e-4278-4812-9d30-a448f43e878d Content-Length: 3 Date: Wed, 04 Aug 2021 19:08:43 GMT Server: Python/3.8 aiohttp/3.7.4.post0 [0]%
-
Delete container instance deployment
python delete.py iristest azure_config.json # sample output Deleting ACR Deleting Container Instance
Use command line
$ python deploy.py <Bento_bundle_path> <Deployment_name> <Config_JSON default is azure_config.json>
Example:
$ MY_BUNDLE_PATH=${bentoml get IrisClassifier:latest --print-location -q)
$ python deploy.py $MY_BUNDLE_PATH my_first_deployment azure_config.json
Use Python API
from deploy import deploy_to_azure
deploy(BENTO_BUNDLE_PATH, DEPLOYMENT_NAME, CONFIG_JSON)
You can have a config file to specify the specifics for your deployment. There is a sample config provide here
{
"resource_group_name": "bentoml",
"location": "location",
"acr_sku": "Standard",
"port": 5000,
"memory": 2,
"cpu_count": 1,
"gpu": {"count": 1, "type": "k80"},
"environment_vars": {
"var": "value",
"another_var": "value"
}
}
resource_group_name
: All Azure resources are created inside a resource group. If you already have a resource group that you would like to use for the deployment, put its name here. If you don't have one, you can easily create it withaz group create --name <rg_name> --location <location>"
location
: Azure region or location that you want to deploy to. By default it will use the same one as your resource groupacr_sku
: The SKU of the container registry. Allowed values: Basic, Classic, Premium, Standard. Default isStandard
port
: The port you want the endpoint to use. By default it is 5000memory
: The memory (in GBs) you want each instance to have.cpu_count
: The number of CPU cores you want for your instance.gpu
: Optional field which specifies the GPU you want the instance to have. Takes a dict withcount
andtype
specified. Possible types are K80, P100, V100. eg."gpu": {"count": 1, "type": "K80"}
environment_vars
: Optional field to specify any additional environment variable you want to pass to the container instance. eg"environment_vars": {"BENTOML_ENABLE_MICROBATCH": "", "BENTOML_MB_MAX_LATENCY": "100"}
Use command line
$ python update.py <Bento_bundle_path> <Deployment_name> <Config_JSON>
Use Python API
from update import update_azure
update(BENTO_BUNDLE_PATH, DEPLOYMENT_NAME, CONFIG_JSON)
Do note that there are some limitation to which all features of the container instance you can update without first deleting and recreating the instance first. Some properties like CPU, memeory or GPU resources will require you to delete and then redeploy but updating images works just fine. If you want more info check out the official docs
Use command line
$ python describe.py <Deployment_name> <Config_JSON>
Use Python API
from describe import describe_azure
describe(DEPLOYMENT_NAME)
Use command line
$ python delete.py <Deployment_name> <Config_JSON>
Use Python API
from delete import delete_azure
delete(DEPLOYMENT_NAME)