Giter Club home page Giter Club logo

aegea's Introduction

See the following GitHub organizations for more open source projects that I am involved in:

pyauth XML-Security cloud-utils taxoniq

Development and maintenance of some of my open source projects is generously sponsored by Tidelift. Thanks to Tidelift for investing in our open source software ecosystem. You can support Tidelift by purchasing a Tidelift Subscription, or support my projects directly by clicking the "Sponsor" button.

aegea's People

Contributors

brentp avatar extemporaneousb avatar jameshowardwang avatar jshoe avatar kislyuk avatar markazhang avatar midnighteuler avatar mpcusack-color avatar mrolm avatar yunfangjuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

aegea's Issues

aws batch submission leaves orphan volumes behind due to an uncaught exception

I tried launching over 1500 jobs using aegea batch submit. I some cases, it resulted in the following error:

...
...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
Creating volume
An error occurred (RequestLimitExceeded) when calling the CreateTags operation (reached max retries: 4): Request limit exceeded.

Since this error occurs right after the volume has been created, a tagging failure left uncaught simply kills the instance, but leaves an (or in my case 90!) orphaned volume behind.

AWS thinks ComputeEnvironment exists

After deleting image named 'dex',

aws ecr delete-repository --repository-name dex

attempt to create it anew gives the following stack trace:

Traceback (most recent call last):
  File "/Users/bek/.pyenv/versions/2.7.13/bin/aegea", line 23, in <module>
    aegea.main()
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/aegea/__init__.py", line 80, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/aegea/build_docker_image.py", line 107, in build_docker_image
    job = submit(submit_args)
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/aegea/batch.py", line 265, in submit
    ensure_queue(args.queue)
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/aegea/batch.py", line 242, in ensure_queue
    create_compute_environment(cce_parser.parse_args(args=[name]))
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/aegea/batch.py", line 103, in create_compute_environment
    serviceRole=batch_iam_role.arn)
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/botocore/client.py", line 253, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/site-packages/botocore/client.py", line 557, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the CreateComputeEnvironment operation: Object already exists
Traceback (most recent call last):
  File "/Users/bek/.pyenv/versions/2.7.13/bin/aegea-build-image-for-mission", line 42, in <module>
    env=dict(os.environ, AEGEA_CONFIG_FILE=os.path.join(mission_wd, "config.yml"))
  File "/Users/bek/.pyenv/versions/2.7.13/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[u'aegea', u'build_docker_image', 'dex', u'--tags', u'AegeaMission=docker-example']' returned non-zero exit status 1

'dex' then shows up in the list of images but trying to run a Batch job throws the same error. Could this be an AWS API issue?

attach additional storage during aegea launch

Again, apologies if I don't see this in the options. I built AMIs without configuring additional storage but most users are going to need an additional volume to download and work with their data. When using aegea batch it is straightforward to request /mnt=500 or something but I don't see a way to do this with aegea launch. This would be a useful feature, as the alternatives are

a) rebuild the images so they have storage volumes already (which requires assumptions about space requirements), or
b) have people launch instances from the console, although they'd have to format the volume themselves.

trying to build an ECR image: "unspecified location" when calling CreateBucket (???)

Not sure why it's trying to create a bucket in the first place, but I'm now getting this error when I try to aegea-build-image-for-mission.

aegea-build-image-for-mission --image-type docker --mission-dir /Users/james.webber/projects/utilities aligner aligner  
Traceback (most recent call last):
  File "/Users/james.webber/anaconda3/envs/utilities/bin/aegea", line 23, in <module>
    aegea.main()
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/aegea/__init__.py", line 78, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/aegea/build_docker_image.py", line 110, in build_docker_image
    job = submit(submit_args)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/aegea/batch.py", line 353, in submit
    command, environment = get_command_and_env(args)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/aegea/batch.py", line 273, in get_command_and_env
    bucket = ensure_s3_bucket("aegea-batch-jobs-{}".format(ARN.get_account_id()))
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/aegea/util/aws/__init__.py", line 148, in ensure_s3_bucket
    bucket.create()
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/site-packages/botocore/client.py", line 623, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
Traceback (most recent call last):
  File "/Users/james.webber/anaconda3/envs/utilities/bin/aegea-build-image-for-mission", line 42, in <module>
    env=dict(os.environ, AEGEA_CONFIG_FILE=os.path.join(mission_wd, "config.yml"))
  File "/Users/james.webber/anaconda3/envs/utilities/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['aegea', 'build_docker_image', 'aligner', '--tags', 'AegeaMission=aligner']' returned non-zero exit status 1.

This is with aegea 2.2.4 and the config.yml found here (the same command used to work, as I far as I can remember): https://github.com/czbiohub/utilities/tree/master/aligner

aegea launch should not die if SFR instance is terminated during waiting period

INFO:aegea:Launching <aegea.util.aws.SpotFleetBuilder object at 0x10f6c9908: {'cores': 1, 'dry_run': False, 'gpus_per_instance': 0, 'iam_fleet_role': iam.Role...otFleet'), ...}>
Traceback (most recent call last):
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/bin/aegea", line 23, in <module>
    aegea.main()
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/aegea/__init__.py", line 58, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/aegea/launch.py", line 142, in launch
    instance.wait_until_running()
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/boto3/resources/factory.py", line 369, in do_waiter
    waiter(self, *args, **kwargs)
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/boto3/resources/action.py", line 202, in __call__
    response = waiter.wait(**params)
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/botocore/waiter.py", line 53, in wait
    Waiter.wait(self, **kwargs)
  File "/Users/kislyuk/projects/mutt/build/mutt-master/VE/lib/python2.7/site-packages/botocore/waiter.py", line 321, in wait
    last_response=response,
botocore.exceptions.WaiterError: Waiter InstanceRunning failed: Waiter encountered a terminal failure state

How does aegea launch batch jobs?

Hello,

I'm trying to do some local testing of my aws batch jobs as submitted through aegea, and I often get different results when running commands through aegea vs. running them locally. Here's an example of how I run the commands locally for testing:

docker run DOCKER_IMAGE_LOCATION /bin/bash -c "COMMAND_PASSED TO AEGEA HERE"

Given a Docker image location and a command, how is aegea running this information through aws batch?

Thank you in advance,
Matt

aegea launch will overwrite role definition

the launch sub-command will set up IAM roles for newly launched instances according to the configuration specified by the user. If multiple users utilize the default (aegea.launch) role, they will clobber each other's configuration.

Example, in a single AWS account:

  • user 1 launches an instance using custom config, eg, adding a policy to the launch role in their .../aegea/config.yml file
  • user 2 launches an instance, with the default config
    At this point, aegea.launch role is reset to the default (missing user 1's customization)

Ideally two user's would not share the namespace when using the default role.

--ami-name option or similar for aegea launch

We're building up a set of common AMIs for general Biohub users, and it'd be nice to launch them by AMI Name rather than ID. I don't believe this is possible unless I just didn't see the option.

NVMe volume mounts do not persist upon reboot

This happens because systemd interferes with mount -a and unmounts the mountpoint specifies in /etc/fstab:

systemd[1]: dev-xvdz.device: Job dev-xvdz.device/start timed out.
systemd[1]: Timed out waiting for device /dev/xvdz.

The device name /dev/xvdz is a symlink that we create in cloudinit code to the actual device node. This apparently breaks systemd logic and it starts to unmount the device immediately even when we mount it manually after this.

Address this by disabling this systemd mount "helper" behavior.

How to send AWS_ACCESS_KEY/.aws and public key to instance?

Hi @kislyuk! I'm running Reflow on aegea-launched EC2 instances and am having trouble getting Reflow to recognize the AWS credentials. I know they're there because I'm able to aws s3 sync to the buckets I have access to.

 Tue 19 Jun - 03:49  ~ 
 ubuntu@olgabot-reflow-v5  reflow setup-ec2
reflow: error reading SSH key: open /home/ubuntu/.ssh/id_rsa.pub: no such file or directory
failed to retrieve AWS credentials: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment
SharedCredsLoad: failed to load shared credentials file
caused by: open /home/ubuntu/.aws/credentials: no such file or directory

In the end, I scp-d over the credentials to get it to work:

(base) 
 Mon 18 Jun - 13:56  ~/code/sourmash   origin ☊ master ✔ ☗v2.0.0a6  
  scp -r -i ~/.ssh/aegea.launch.olgabot.Olgas-MacBook-Pro.pem ~/.aws [email protected]:~
config                                                                                                                                                                                                    100%   43     1.2KB/s   00:00    
credentials                                                                                                                                                                                               100%  116     2.8KB/s   00:00    
olga-czirna1.pem                                                                                                                                                                                          100% 1692    42.1KB/s   00:00    
(base) 
 Mon 18 Jun - 20:55  ~/code/sourmash   origin ☊ master ✔ ☗v2.0.0a6  
  scp -r -i ~/.ssh/aegea.launch.olgabot.Olgas-MacBook-Pro.pem ~/.ssh/id_rsa.pub [email protected]:~/.ssh/
id_rsa.pub  

But I'm wondering if I'm missing something and it's easier to do this already with Aegea.

Here's my home directory and environment variables:

 Tue 19 Jun - 03:49  ~ 
 ubuntu@olgabot-reflow-v5  ls -lha
total 204K
drwxr-xr-x 15 ubuntu ubuntu 4.0K Jun 19 03:49 .
drwxr-xr-x  3 root   root   4.0K Jan 30 21:24 ..
drwxrwxr-x  4 ubuntu ubuntu 4.0K Jun 19 03:43 agnosterzak-ohmyzsh-theme
drwxrwxr-x 13 ubuntu ubuntu 4.0K Feb  8 22:24 anaconda
-rw-r--r--  1 ubuntu ubuntu  220 Aug 31  2015 .bash_logout
-rw-r--r--  1 ubuntu ubuntu 4.3K Jun 18 22:59 .bashrc
drwx------  3 ubuntu ubuntu 4.0K Jun 18 22:58 .cache
drwxrwxr-x  3 ubuntu ubuntu 4.0K Jun 19 03:43 code
drwxrwxr-x  3 ubuntu ubuntu 4.0K Feb  8 22:24 .conda
-rw-rw-r--  1 ubuntu ubuntu   92 Feb  8 22:23 .condarc
-rw-rw-r--  1 ubuntu ubuntu 1.8K Jun 19 03:41 .emacs
drwx------  5 ubuntu ubuntu 4.0K Jun 19 03:42 .emacs.d
-rw-rw-r--  1 ubuntu ubuntu  344 Jun 19 03:44 .gitconfig
-rw-rw-r--  1 ubuntu ubuntu 1.5K Jun 19 03:44 .gitignore
drwxrwxr-x  5 ubuntu ubuntu 4.0K Jun 18 22:59 gocode
drwxrwxr-x  3 ubuntu ubuntu 4.0K Jun 19 03:44 hc-zenburn-emacs
drwxrwxr-x  5 ubuntu ubuntu 4.0K Jun 19 03:45 kmer-hashing
-rw-rw-r--  1 ubuntu ubuntu  917 Jun 19 03:41 .macbook_bash_profile
-rw-rw-r--  1 ubuntu ubuntu  694 Jun 19 03:44 Makefile
drwxr-xr-x 11 ubuntu ubuntu 4.0K Jun 19 03:40 .oh-my-zsh
-rw-r--r--  1 ubuntu ubuntu  655 May 16  2017 .profile
drwxrwxr-x  4 ubuntu ubuntu 4.0K Jun 19 03:44 rcfiles
drwxrwxr-x  3 ubuntu ubuntu 4.0K Jun 19 03:46 reflow-workflows
-rwxrwxr-x  1 ubuntu ubuntu  256 Jun 19 03:44 .screenrc
drwx------  2 ubuntu ubuntu 4.0K Jan 30 21:24 .ssh
-rw-r--r--  1 ubuntu ubuntu    0 Jan 30 21:25 .sudo_as_admin_successful
-rw-rw-r--  1 ubuntu ubuntu 3.2K Jun 19 03:41 .ucsd_bashrc
-rw-rw-r--  1 ubuntu ubuntu  217 Jun 19 03:44 .wget-hsts
-rw-rw-r--  1 ubuntu ubuntu  39K Jun 19 03:41 .zcompdump
-rw-rw-r--  1 ubuntu ubuntu  39K Jun 19 03:41 .zcompdump-olgabot-reflow-v5-5.1.1
-rw-------  1 ubuntu ubuntu 1.3K Jun 19 03:49 .zsh_history
-rw-r--r--  1 ubuntu ubuntu 4.4K Jun 19 03:44 .zshrc

 Tue 19 Jun - 03:49  ~ 
 ubuntu@olgabot-reflow-v5  env 
XDG_SESSION_ID=7
SHELL=/bin/bash
TERM=xterm-256color
SSH_CLIENT=24.6.75.181 53191 22
SSH_TTY=/dev/pts/1
ZSH=/home/ubuntu/.oh-my-zsh
USER=ubuntu
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
TERMCAP=SC|xterm-256color|VT 100/ANSI X3.64 virtual terminal:\
        :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\
        :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\
        :do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\
        :le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\
        :li#64:co#236:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\
        :cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\
        :im=\E[4h:ei=\E[4l:mi:IC=\E[%d@:ks=\E[?1h\E=:\
        :ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\
        :ti=\E[?1049h:te=\E[?1049l:us=\E[4m:ue=\E[24m:so=\E[3m:\
        :se=\E[23m:mb=\E[5m:md=\E[1m:mh=\E[2m:mr=\E[7m:\
        :me=\E[m:ms:\
        :Co#8:pa#64:AF=\E[3%dm:AB=\E[4%dm:op=\E[39;49m:AX:\
        :vb=\Eg:G0:as=\E(0:ae=\E(B:\
        :ac=\140\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\
        :po=\E[5i:pf=\E[4i:Km=\E[M:k0=\E[10~:k1=\EOP:k2=\EOQ:\
        :k3=\EOR:k4=\EOS:k5=\E[15~:k6=\E[17~:k7=\E[18~:\
        :k8=\E[19~:k9=\E[20~:k;=\E[21~:F1=\E[23~:F2=\E[24~:\
        :F3=\E[1;2P:F4=\E[1;2Q:F5=\E[1;2R:F6=\E[1;2S:\
        :F7=\E[15;2~:F8=\E[17;2~:F9=\E[18;2~:FA=\E[19;2~:kb=:\
        :K2=\EOE:kB=\E[Z:kF=\E[1;2B:kR=\E[1;2A:*4=\E[3;2~:\
        :*7=\E[1;2F:#2=\E[1;2H:#3=\E[2;2~:#4=\E[1;2D:%c=\E[6;2~:\
        :%e=\E[5;2~:%i=\E[1;2C:kh=\E[1~:@1=\E[1~:kH=\E[4~:\
        :@7=\E[4~:kN=\E[6~:kP=\E[5~:kI=\E[2~:kD=\E[3~:ku=\EOA:\
        :kd=\EOB:kr=\EOC:kl=\EOD:km:
PAGER=less
LSCOLORS=Gxfxcxdxbxegedabagacad
PATH=/home/ubuntu/anaconda/bin:/usr/lib/go-1.10/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/ubuntu/gocode/bin:/home/ubuntu/gocode/bin
MAIL=/var/mail/ubuntu
STY=5811.pts-1.olgabot-reflow-v5
PWD=/home/ubuntu
EDITOR=emacs
LANG=en_US.UTF-8
SSH_KEY_PATH=~/.ssh/rsa_id
HOME=/home/ubuntu
SHLVL=5
LESS=-R
LOGNAME=ubuntu
WINDOW=2
SSH_CONNECTION=24.6.75.181 53191 172.31.34.0 22
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
LESSOPEN=| /usr/bin/lesspipe %s
GOPATH=/home/ubuntu/gocode
XDG_RUNTIME_DIR=/run/user/1000
LESSCLOSE=/usr/bin/lesspipe %s %s
_=/usr/bin/env
OLDPWD=/home/ubuntu
LC_CTYPE=en_US.UTF-8

`aegea launch` error

Here's an example of the command:

aegea launch --instance-type m3.medium --ssh-key-name ...  --ami ... <inst-name>

It returns an error:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListHostedZones operation: User:...:assumed-role/aegea.launch/... is not authorized to perform: route53:ListHostedZones

Initialize on-demand and don't fail on read-only FS

Currently, the following works:

import aegea
aegea.initialize()
from aegea.batch import submit, submit_parser
submit(submit_parser.parse_args([...]))

While the following does not:

import aegea
from aegea.batch import submit, submit_parser
aegea.initialize()
submit(submit_parser.parse_args([...]))

This is because the global aegea.config object is rebound at runtime in the init sequence.

Replace the global object with an auto-vivifying placeholder.

Also, init fails on read-only FS. Fall back and print a warning instead of failing.

Fix IAM eventual consistency

kislyuk@Aurora:~/projects/aegea>aegea launch aktest4 --duration-hours 5
INFO:tweak:Loaded configuration from /Users/kislyuk/.config/aegea/config.yml
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:boto3.resources.collection:Calling ec2:describe_key_pairs with {}
INFO:boto3.resources.collection:Calling ec2:describe_images with {'Owners': ['self'], 'Filters': [{'Name': 'state', 'Values': ['available']}]}
INFO:boto3.resources.collection:Calling ec2:describe_vpcs with {'Filters': [{'Name': 'isDefault', 'Values': ['true']}]}
INFO:boto3.resources.collection:Calling ec2:describe_subnets with {'Filters': [{'Name': 'vpc-id', 'Values': ['vpc-bc9b15d7']}]}
INFO:boto3.resources.collection:Calling ec2:describe_security_groups with {'GroupNames': ['aegea.launch'], 'Filters': [{'Name': 'vpc-id', 'Values': ['vpc-bc9b15d7']}]}
INFO:boto3.resources.collection:Calling paginated iam:list_instance_profiles with {}
INFO:boto3.resources.collection:Calling paginated iam:list_roles with {}
INFO:boto3.resources.action:Calling iam:attach_role_policy with {'RoleName': 'aegea.launch', 'PolicyArn': 'arn:aws:iam::aws:policy/IAMReadOnlyAccess'}
INFO:boto3.resources.action:Calling iam:attach_role_policy with {'RoleName': 'aegea.launch', 'PolicyArn': 'arn:aws:iam::aws:policy/AmazonElasticFileSystemFullAccess'}
INFO:boto3.resources.collection:Calling paginated iam:list_roles with {}
INFO:boto3.resources.action:Calling iam:create_role with {'RoleName': 'SpotFleet', 'AssumeRolePolicyDocument': '{"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRole"], "Principal": {"Service": "spotfleet.amazonaws.com"}, "Effect": "Allow"}]}'}
INFO:boto3.resources.action:Calling iam:attach_role_policy with {'RoleName': 'SpotFleet', 'PolicyArn': 'arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetRole'}
INFO:boto3.resources.action:Calling iam:get_role with {'RoleName': 'SpotFleet'}
INFO:aegea:Launching <aegea.util.aws.SpotFleetBuilder object at 0x109c1e438: {'cores': 1, 'dry_run': False, 'gpus_per_instance': 0, 'iam_fleet_role': iam.Role(name='SpotFleet'), ...}>
Traceback (most recent call last):
  File "/usr/local/bin/aegea", line 24, in <module>
    args.entry_point(args)
  File "/Users/kislyuk/projects/aegea/aegea/launch.py", line 117, in launch
    expect_error_codes(e, "DryRunOperation")
  File "/Users/kislyuk/projects/aegea/aegea/launch.py", line 96, in launch
    sfr_id = spot_fleet_builder(ec2.meta.client)
  File "/Users/kislyuk/projects/aegea/aegea/util/aws.py", line 351, in __call__
    res = client.request_spot_fleet(DryRun=self.dry_run, SpotFleetRequestConfig=self.spot_fleet_request_config, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/botocore/client.py", line 228, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.5/site-packages/botocore/client.py", line 492, in _make_api_call
    raise ClientError(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidSpotFleetRequestConfig) when calling the RequestSpotFleet operation: Parameter: SpotFleetRequestConfig.IamFleetRole is invalid.

str encoding error with aegea-build-image-for-mission

Might just be a python 2 vs 3 problem:

➜  Desktop aegea-build-image-for-mission demuxer demuxer
Traceback (most recent call last):
  File "/Users/james.webber/anaconda3/bin/aegea", line 23, in <module>
    aegea.main()
  File "/Users/james.webber/anaconda3/lib/python3.6/site-packages/aegea/__init__.py", line 76, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Users/james.webber/anaconda3/lib/python3.6/site-packages/aegea/build_docker_image.py", line 94, in build_docker_image
    exec_fh.write(build_docker_image_shellcode % (encode_dockerfile(args), ))
TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
Traceback (most recent call last):
  File "/Users/james.webber/anaconda3/bin/aegea-build-image-for-mission", line 42, in <module>
    env=dict(os.environ, AEGEA_CONFIG_FILE=os.path.join(mission_wd, "config.yml"))
  File "/Users/james.webber/anaconda3/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['aegea', 'build_docker_image', 'demuxer', '--tags', 'AegeaMission=demuxer']' returned non-zero exit status 1.

I was able to fix it by just encoding the input to line 94, but that seems brittle:

exec_fh.write((build_docker_image_shellcode % (encode_dockerfile(args), )).encode())

Unable to unmount from EC2

Hello,

I'm running aegea version 2.6.9 for compatibility reasons and am hitting the following error after my batch jobs finish:

2021-09-30 02:20:48+00:00 Detaching EBS volume vol-0d10807d4007a84bf
2021-09-30 02:20:49+00:00 umount: /mnt: target is busy
2021-09-30 02:20:49+00:00         (In some cases useful info about processes that
2021-09-30 02:20:49+00:00          use the device is found by lsof(8) or fuser(1).)
2021-09-30 02:20:49+00:00 Traceback (most recent call last):
2021-09-30 02:20:49+00:00   File "/usr/local/bin/aegea", line 23, in <module>
2021-09-30 02:20:49+00:00     aegea.main()
2021-09-30 02:20:49+00:00   File "/usr/local/lib/python3.5/dist-packages/aegea/__init__.py", line 89, in main
2021-09-30 02:20:49+00:00     result = parsed_args.entry_point(parsed_args)
2021-09-30 02:20:49+00:00   File "/usr/local/lib/python3.5/dist-packages/aegea/ebs.py", line 177, in detach
2021-09-30 02:20:49+00:00     subprocess.check_call(["umount", find_devnode(volume_id)])
2021-09-30 02:20:49+00:00   File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
2021-09-30 02:20:49+00:00     raise CalledProcessError(retcode, cmd)
2021-09-30 02:20:49+00:00 subprocess.CalledProcessError: Command '['umount', '/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0d10807d4007a84bf-ns-1']' returned non-zero exit status 32
INFO:aegea:Job 6a0d141f-67e4-4510-998e-0a36ff9fd833: Essential container in task exited

The result is that the drives are left up after the jobs finish. Do you have any idea what could be causing this issue? I'm happy to play around in the aegea code of this version to fix things myself, I just don't know where to start.

Thanks in advance,
Matt

aegea has a lot of old requirements

These are starting to become an issue as they get older. For instance, it requires the typing package which conflicts with Python's built-in typing module, breaking it. It also prevents lots of things from being updated. Basically it has to be in its own little environment which is pretty inconvenient for a command-line tool.

IMDSv2 compatibility

git grep 169.254

python3 -c 'import sys, botocore.utils as b; i=b.IMDSFetcher(); i._get_request(sys.argv[1], None, i._fetch_metadata_token()).text' latest/meta-data/hostname

aegea_imds(){ python3 -c 'import sys, botocore.utils as b; i=b.IMDSFetcher(); print(i._get_request(sys.argv[1], None, i._fetch_metadata_token()).text)' $@; }

"Permission denied (publickey)" for recently launched instances

Hello!
This is a bit of a heisenbug as it doesn't always happen. Sometimes when I'm launching an instance and then immediately ssh-ing into it, I get "permission denied." The image does have its own public/private key pair, but I thought aegea launch added the .pem file (in this case /Users/olgabot/.ssh/aegea.launch.olgabot.Olgas-MacBook-Pro.pem) to the ~/.ssh/authorized_keys in ensure_ssh_key.

 ✘  Tue 11 Dec - 09:01  ~/code/packer-images   origin ☊ olgabot/update-reflow ✔ 1⚙ 3☀ 
  aegea launch --ami-tags Name=czbiohub-reflow -t t2.micro  --iam-role S3fromEC2 olgabot-reflow-v08
Identity added: /Users/olgabot/.ssh/aegea.launch.olgabot.Olgas-MacBook-Pro.pem (/Users/olgabot/.ssh/aegea.launch.olgabot.Olgas-MacBook-Pro.pem)
INFO:aegea:Launch spec user data is 1878 bytes long
INFO:aegea:Launched ec2.Instance(id='i-030141a3fd6f30b43') in ec2.Subnet(id='subnet-672e832e') using ami-0969d6307208434cb
{
  "instance_id": "i-030141a3fd6f30b43"
}
(base) 
 ✘  Tue 11 Dec - 09:13  ~/code/packer-images   origin ☊ olgabot/update-reflow 1⚙ 3☀ 1● 
  aegea ssh ubuntu@olgabot-reflow-v08                                                                     
Permission denied (publickey).

Do you know what may be happening?
Thank you!
Warmest,
Olga

python 3.6 compat

$ aegea ssh foo@bar
Traceback (most recent call last):
  File "/home/andrey.kislyuk/.local/bin/aegea", line 23, in <module>
    aegea.main()
  File "/home/andrey.kislyuk/.local/lib/python3.6/site-packages/aegea/__init__.py", line 82, in main
    result = parsed_args.entry_point(parsed_args)
  File "/home/andrey.kislyuk/.local/lib/python3.6/site-packages/aegea/ssh.py", line 228, in ssh
    ssh_opts += init_ssm(get_instance(name).id)
  File "/home/andrey.kislyuk/.local/lib/python3.6/site-packages/aegea/ssh.py", line 217, in init_ssm
    ssm_plugin_path = ensure_session_manager_plugin()
  File "/home/andrey.kislyuk/.local/lib/python3.6/site-packages/aegea/util/aws/ssm.py", line 47, in ensure_session_manager_plugin
    elif "Ubuntu" in subprocess.run(["uname", "-a"], capture_output=True).stdout.decode():  # type: ignore
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
TypeError: __init__() got an unexpected keyword argument 'capture_output'

Error while running aegea-build-image-for-mission

I tried running docker-example mission, but getting the following error:


Traceback (most recent call last):
  File "/usr/local/bin/aegea", line 23, in 
    aegea.main()
  File "/Library/Python/2.7/site-packages/aegea/__init__.py", line 80, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Library/Python/2.7/site-packages/aegea/build_docker_image.py", line 78, in build_docker_image
    ensure_ecr_repo(args.name, read_access=args.read_access)
  File "/Library/Python/2.7/site-packages/aegea/build_docker_image.py", line 70, in ensure_ecr_repo
    clients.ecr.set_repository_policy(repositoryName=name, policyText=str(policy))
  File "/Library/Python/2.7/site-packages/botocore/client.py", line 253, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Library/Python/2.7/site-packages/botocore/client.py", line 544, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the SetRepositoryPolicy operation: Invalid parameter at 'PolicyText' failed to satisfy constraint: 'Invalid repository policy provided'
Traceback (most recent call last):
  File "/usr/local/bin/aegea-build-image-for-mission", line 42, in 
    env=dict(os.environ, AEGEA_CONFIG_FILE=os.path.join(mission_wd, "config.yml"))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[u'aegea', u'build_docker_image', 'dex', u'--tags', u'AegeaMission=docker-example']' returned non-zero exit status 1

Is PolicyText something I should provide to boto?

aegea deploy ls takes a very long time

Despite the results yielding an empty table, this command takes a very long time to run, which makes me feel that there is some detritus that I cannot see, as it is not clear how to see what resources the deploy setup currently has access to.

aegea subscriptions returns thousands of things, so I'm guessing that is what's going on.

launch: allow launch with PowerUser and iam:PassRole only

  • Avoid using IAM resource objects that require iam:Get* on .load()
  • If role policy compositor encounters an IAM permission denied error, fall back to pass instance profile ARN in the blind
Traceback (most recent call last):
  File "/usr/local/bin/aegea", line 23, in <module>
    aegea.main()
  File "/Users/andrey.kislyuk/projects/aegea/aegea/__init__.py", line 86, in main
    result = parsed_args.entry_point(parsed_args)
  File "/Users/andrey.kislyuk/projects/aegea/aegea/launch.py", line 190, in launch
    umbrella_policy = compose_managed_policies(args.iam_policies)
  File "/Users/andrey.kislyuk/projects/aegea/aegea/util/aws/iam.py", line 147, in compose_managed_policies
    doc = resources.iam.Policy(arn="arn:aws:iam::aws:policy/" + policy_name).default_version.document
  File "/usr/local/lib/python3.9/site-packages/boto3/resources/factory.py", line 431, in get_reference
    self.load()
  File "/usr/local/lib/python3.9/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetPolicy operation: User: arn:aws:sts::1234567890123:assumed-role/ROLE is not authorized to perform: iam:GetPolicy on resource: policy arn:aws:iam::aws:policy/IAMReadOnlyAccess

error on `aegea batch submit`: command field cannot have empty strings

It seems that a recent change has broken our previous workflow somehow. We have scripts that build up a command for aegea batch and then use subprocess to submit the job, like so:

aegea batch submit --queue aegea_batch --vcpus 16 --memory 64000 --ecr-image aligner --storage /mnt=500 --command 'PATH=$HOME/anaconda/bin:$PATH; cd utilities; git pull; git checkout master; python setup.py install; python -m utilities.alignment.run_star_and_htseq --taxon mm10-plus --num_partitions 1 --partition_id 0 --s3_input_path s3://czb-seqbot/fastqs/190906_A00111_0366_AHNKGFDSXX/ --s3_output_path s3://czb_maca/Plate_seq/parabiosis/190906_A00111_0366_AHNKGFSDSXX/mm10/

Which now raises the error:

botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the SubmitJob operation: Error executing request, Exception : Command field cannot have empty strings, RequestId: b1aa9dd3-b9cb-4c7d-9e52-72a043b3519e

This happens with versions v2.6.4 and v2.6.5. I recommended downgrading to a known working version of aegea as a quick fix, and they reported success with v2.3.6. We could try bisecting that space but hopefully you will have a better idea what happened. My guess is that there's some issue with how aegea is building the batch command? Maybe due to the complexity of the command we're submitting?

Launch command stuck when launching instances in a private subnet

The instance is successfully launched, but the command never completes its execution:

$ aegea launch --subnet subnet-xxxx-xxxxx davidrc-dev-2

Identity added: /Users/david.rissatocruz/.ssh/aegea.launch.david.rissatocruz.xxxxxxx.pem (/Users/david.rissatocruz/.ssh/aegea.launch.david.rissatocruz.xxxxxxx.pem)
INFO:aegea:Launch spec user data is 1901 bytes long

(... command was stuck here ...)

It looks like this loop will never complete, because there is no public_dns_name in this case:

aegea/aegea/launch.py

Lines 210 to 212 in 49e0c75

while not instance.public_dns_name:
instance = resources.ec2.Instance(instance.id)
time.sleep(1)

add support for amazon-linux-2022 ami

It would be nice to have the Amazon Linux 2022 equivalent of the launch --amazon-linux-ami option (which uses the most recent Amazon Linux 2 AMI)

Is there equivalent of --profile?

I expected I could use --profile staging and --profile production like aws-cli to access the profiles in my credentials file. Looks like --profile is not supported? Is there another way to access these named profiles?

[staging]
aws_access_key_id = XXXXX
aws_secret_access_key = XXXXX

[production]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.