Using Spot VMs allows you to take advantage of Azure unused capacity at a significant cost savings. At any point in time when Azure needs the capacity back, the Azure infrastructure will evict Spot VMs. Therefore, Spot VMs are great for workloads that can handle interruptions like batch processing jobs, dev/test environments, large compute workloads, and more.
A Spot VM offers no high availability guarantees. At any point in time when Azure needs the capacity back, the Azure infrastructure will evict Spot VMs with 30 seconds notice.
To efficiently handle this problem, the checkpointing schemes are recommended. The checkpointing saves the execution status of tasks if a certain condition is met and then recovers the task status from the last saved point upon a failure.
Use second hard drive mounted at /datadrive to store your data.
In this project we provide:
- Manifest to create VMs using Terraform.
- Manifest to configure VMs using Ansible.
- Monitoring system capable of reporting the status of the VM through a slack channel.
- Script that tries to restart the machine once every hour if it has been deallocated by azure and restart tag value is set to true.
TO DO
-
Sign in to the Azure portal at https://portal.azure.com.
-
From the main menu on the left side, select Resource Group
- From the list of resource groups, select your Resource Group, and select the VM.
- Select the Start button on the overview page for your virtual machine.
- From the menu on the left side, select Tags and change restart tag from false to true.
- Select the Stop button on the overview page for your virtual machine.
- From the menu on the left side, select Tags and change restart tag from true to false and click on Save button.
- Select the Stop button on the overview page for your virtual machine.
- From the menu on the left side, select Size, select appropriate VM size from the list and click on resize button.
- User [email protected] requested to start VMGPUSPOTTEST virtual machine.
VM VMGPUSPOTTEST was Started by [email protected]
- The VM is ready. You can connect to the VM through your favorite SSH client.
VM VMGPUSPOTTEST is Ready
- User [email protected] requested to shutdown VMGPUSPOTTEST virtual machine.
VM VMGPUSPOTTEST was Stopped by [email protected]
-
The VM gets evicted. You get a 30s notification before actual eviction.
VM VMGPUSPOTTEST will be evicted in 30s