ec2-spot-labs is a collection of code examples and scripts that illustrates some of the best practices in using Amazon EC2 Spot Instances.
Please address any issues or feedback via issues.
Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.
Home Page: https://aws.amazon.com/ec2/spot/
License: Other
ec2-spot-labs is a collection of code examples and scripts that illustrates some of the best practices in using Amazon EC2 Spot Instances.
Please address any issues or feedback via issues.
Hi, I've a use case where I need to use the spot fleet. I'm referring to the solution from "asg-capacity-optimized.json" . I would like to know how do I handle the interruption if I use this solution. Also for this capacity-optimized spot fleet startergy. How do I can use the combination of on demand and spot instances.
I'm going to try this from the kops launch configuration template to provision the self managed k8's cluster . I believe this should work. Please advise.
Thanks for making it clear. I've also watched your https://www.youtube.com/watch?v=whFb8YHjdFo its pretty clear demonstration. We're planning to use this in dev environment first. The plan is keep away from spotinst tool.
When running the deployment of the app with CodeDeploy it fails due to the fact that vfsStream is called with an uppercase S.
The fix is very simple:composer.json should be edited and vfsStream should be vfsstream.
I did the change locally and verified it works. I will create a pull request with the fix soon.
https://github.com/awslabs/ec2-spot-labs/blob/master/ecs-ec2-spot-fleet/ecs-ec2-spot-fleet.yaml#L442
[ec2-user@ip-172-31-7-218 ~]$ curl --version
curl 7.61.1 (x86_64-koji-linux-gnu) libcurl/7.61.1 OpenSSL/1.0.2k zlib/1.2.7 libidn2/2.3.0 libssh2/1.4.3 nghttp2/1.41.0
Release-Date: 2018-09-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy Metalink
[ec2-user@ip-172-31-7-218 ~]$ [ -z $(curl -Isf http://169.254.169.254/latest/meta-data/spot/termination-time) ];
[ec2-user@ip-172-31-7-218 ~]$
[ec2-user@ip-172-31-7-218 ~]$ curl --version
curl 7.76.1 (x86_64-koji-linux-gnu) libcurl/7.76.1 OpenSSL/1.0.2k-fips zlib/1.2.7 libidn2/2.3.0 libssh2/1.4.3 nghttp2/1.41.0
Release-Date: 2021-04-14
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB SPNEGO SSL UnixSockets
[ec2-user@ip-172-31-7-218 ~]$ [ -z $(curl -Isf http://169.254.169.254/latest/meta-data/spot/termination-time) ];
-bash: [: too many arguments
If customer uses spot-instance-termination-notice-handler.sh script in their environment due the change curl version output format, the code will always execute else statement and thereby will keep all the on boarded registered container instances into draining state for lifetime despite whether the instance is on demand or spot
On a mac.
MacBook-Pro:ec2-spot-duration XXXXXX$ python get_spot_duration.py --r us-east-1 --product-description 'Linux/UNIX' --bids c5.18xlarge:3
Duration Instance Type Availability Zone
I have messed around with the bids, nothing. Additionally have used --region instead of --r
AWS CLI is up date, as well as AWS Shell
the daemon set returns 404 as /meta-data/spot/ is not available. I checked thru the EC2 console that Lifecycle
is spot
Any other ways to get interruption notices?
this is what I get when I kubectl exec
to the pod that runs the spot-sig.
/ # curl http://169.254.169.254/latest/meta-data/spot
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>404 - Not Found</title>
</head>
<body>
<h1>404 - Not Found</h1>
</body>
</html>
It seems that autoscaling:DescribeTags
was added twice
Thanks for making this list. There are so many wonderful sessions at re:Invent, itβs hard to get to all of them. Do you think the detailed walkthrough documentation that each Workshop and Builders Session has will be made public? They are fantastic learning resources.
Is there any possibility for lambda function to request spot fleet during specific time..?
Hello,
I am new to EKS, I am following this link to create worker nodes where i am using combination of on demand as well as spot instances.
[ec2-user@ip-192-168-100-253 ec2-spot-eks-solution]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-101-67.eu-central-1.compute.internal Ready <none> 13m v1.13.8-eks-cd3eb0
ip-192-168-103-103.eu-central-1.compute.internal Ready <none> 13m v1.13.8-eks-cd3eb0
ip-192-168-103-70.eu-central-1.compute.internal Ready <none> 13m v1.13.8-eks-cd3eb0
While using cluster auto-scaler, i am getting below errors.
E0828 16:27:42.353452 1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]
I0828 16:27:42.895068 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:44.905532 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:46.915096 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:48.924381 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:50.934511 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:52.353611 1 static_autoscaler.go:114] Starting main loop
E0828 16:27:52.450797 1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]
Here is cluster-autoscaler policy which is attached to NodeInstanceRole
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:SetDesiredCapacity",
"autoscaling:DescribeTags",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"autoscaling:DescribeTags"
],
"Resource": "*",
"Effect": "Allow",
"Sid": "K8NodeASGPerms"
}
]
}
cluster-autoscaler-ds.yaml
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --nodes=1:3:[<REDACTED>]
- --nodes=1:3:[<REDACTED>]
- --nodes=1:3:[<REDACTED>]
- --skip-nodes-with-system-pods=false
env:
- name: AWS_REGION
value: eu-central-1
Am i missing something?
Hi,
I'm trying to spin up ec2 spot instances and attach them to my EKS cluster, but your user-data instructions specified in https://github.com/awslabs/ec2-spot-labs/blob/master/ec2-spot-eks-solution/provision-worker-nodes/amazon-eks-nodegroup-with-spot.yaml are not working:
aws eks describe-cluster --region=us-east-1 --service-name=aws-eks-spot-serverless-demo-dev --query 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint}' --debug
2018-09-23 17:40:03,396 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.16.13 Python/2.7.14 Linux/4.14.62-70.117.amzn2.x86_64 botocore/1.12.3
2018-09-23 17:40:03,397 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['eks', 'describe-cluster', '--region=us-east-1', '--service-name=aws-eks-spot-serverless-demo-dev', '--query', 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint}', '--debug']
2018-09-23 17:40:03,397 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_scalar_parsers at 0x7f9b98c43758>
2018-09-23 17:40:03,397 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x7f9b92696ed8>
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f9b92663230>
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable credentials_file from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable config_file from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_timeout from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_num_attempts from defaults.
2018-09-23 17:40:03,400 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x7f9b91e501b8>
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable api_versions from defaults.
2018-09-23 17:40:03,402 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /root/.aws/models/eks/2017-11-01/service-2.json
2018-09-23 17:40:03,418 - MainThread - botocore.hooks - DEBUG - Event service-data-loaded.eks: calling handler <function register_retries_for_service at 0x7f9b9383c7d0>
2018-09-23 17:40:03,418 - MainThread - awscli.clidriver - DEBUG - Exception caught in main()
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 207, in main
return command_table[parsed_args.command](remaining, parsed_args)
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 341, in __call__
service_parser = self._create_parser()
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 381, in _create_parser
command_table = self._get_command_table()
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 326, in _get_command_table
self._command_table = self._create_command_table()
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 348, in _create_command_table
service_model = self._get_service_model()
File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 334, in _get_service_model
self._service_name, api_version=api_version)
File "/usr/lib/python2.7/site-packages/botocore/session.py", line 540, in get_service_model
service_description = self.get_service_data(service_name, api_version)
File "/usr/lib/python2.7/site-packages/botocore/session.py", line 568, in get_service_data
service_name=service_name, session=self)
File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/usr/lib/python2.7/site-packages/botocore/handlers.py", line 278, in register_retries_for_service
service_event_name = hyphenize_service_id(service_id)
File "/usr/lib/python2.7/site-packages/botocore/utils.py", line 981, in hyphenize_service_id
return service_id.replace(' ', '-').lower()
AttributeError: 'NoneType' object has no attribute 'replace'
2018-09-23 17:40:03,419 - MainThread - awscli.clidriver - DEBUG - Exiting with rc 255
'NoneType' object has no attribute 'replace'
I guess something wrong with model.
AMI (official from AWS documentation): ami-0440e4f6b9713faf6
aws --version
aws-cli/1.16.13 Python/2.7.14 Linux/4.14.62-70.117.amzn2.x86_64 botocore/1.12.3
I think something strange is happening in this script:
$ python get_spot_duration.py --region us-east-1 --product-description 'Linux/UNIX' --bids c3.2xlarge:0.42
Duration Instance Type Availability Zone
11.4 c3.2xlarge us-east-1a
1.0 c3.2xlarge us-east-1e
0.6 c3.2xlarge us-east-1d
0.4 c3.2xlarge us-east-1b
0.3 c3.2xlarge us-east-1c
I think AZ us-east-1a is under stable price of 0.42 $ during 11 hours dollar but in aws spot history graph shows that the price is stable with 4.2 $ instead of 0.42 $
Greetings and thanks!
Hi,
While trying to follow the example on this blog post - I keep getting this error in the ec2_spot_keras_training.py script - 'AttributeError: 'SpotTermination' object has no attribute 'on_train_batch_begin' and 'on_train_batch_end' -
Please help with this.
Thanks and Regards,
Arpit
This is difficult to understand and provides no value:
UserData:
IyEvYmluL2Jhc2gKeXVtIHVwZGF0ZSAteQphbWF6b24tbGludXgtZXh0cmFzIGluc3RhbGwgLXkgbGFtcC1tYXJpYWRiMTAuMi1waHA3LjIgcGhwNy4yCnl1bSBpbnN0YWxsIC15IGh0dHBkIG1hcmlhZGItc2VydmVyCnN5c3RlbWN0bCBzdGFydCBodHRwZApzeXN0ZW1jdGwgZW5hYmxlIGh0dHBkCnVzZXJtb2QgLWEgLUcgYXBhY2hlIGVjMi11c2VyCmNob3duIC1SIGVjMi11c2VyOmFwYWNoZSAvdmFyL3d3dwpjaG1vZCAyNzc1IC92YXIvd3d3CmZpbmQgL3Zhci93d3cgLXR5cGUgZCAtZXhlYyBjaG1vZCAyNzc1IHt9IFw7CmZpbmQgL3Zhci93d3cgLXR5cGUgZiAtZXhlYyBjaG1vZCAwNjY0IHt9IFw7CmVjaG8gIjw/cGhwIHBocGluZm8oKTsgPz4iID4gL3Zhci93d3cvaHRtbC9waHBpbmZvLnBocA==
Please consider using the Fn:Base64
intrinsic function so we can understand what is happening there.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-base64.html
it will be good if lamda could start with delay
I tried to use the amazon-eks-nodegroup-with-spot.yaml
to provision some EKS nodes. I entered all parameters correctly, but cluster creation fails with:
Encountered non numeric value for property VolumeSize
and
Encountered non numeric value for property MinSize
I quadruple-checked all values entered ARE indeed numeric.
What am I missing?
Hi,
I have been following this blog to spawn multiple spot instances. However, while, creating the policy using the command below,
aws iam create-policy \
--policy-name ec2-permissions-dl-training \
--policy-document ec2-permissions-dl-training.json
I am facing the error:
An error occurred (MalformedPolicyDocument) when calling the CreatePolicy operation: Syntax errors in policy.
Could you suggest what could be the issue with this policy document that is used.
Terraform v4.16.0 added support for ABS, let's create some examples for how to use it.
$ grep 'echo' /etc/init/spot-instance-termination-notice-handler.conf
echo 2791 > /var/run/spot-instance-notice-handler.pid
This is because the $$
isn't escaped like the other dollar signs in other files. Other option is to quote the EOF so the dollar signs arent processed on the fly (and therefore not need to quote the dollar signs):
cat <<"EOF" > /etc/init/spot-instance-termination-notice-handler.conf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.