Giter Club home page Giter Club logo

aws-wrapper's People

Contributors

afred avatar foo4thought avatar mccalluc avatar muraszko-wgbh avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nick-otter

aws-wrapper's Issues

DNS lookup failed during build

Not sure why this happened: Maybe I still had an old record cached locally? Have not reproduced.

DEBUG [2015-12-16 13:14:52]: [Aws::Route53::Client 200 0.209527 0 retries] get_change(id:"C39I947A3HT0N6")

INFO [2015-12-16 13:14:52]: Created CNAMEs
...
INFO [2015-12-16 13:15:49]: Swap instances and do it again.
/Users/chuck_mccallum/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/resolv.rb:491:in `getresource': DNS result has no information for zzz.wgbh-mla-test.org (Resolv::ResolvError)
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/core/dns_wrapper.rb:25:in `lookup_cname'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/swapper.rb:28:in `lookup_elb_and_instance'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/swapper.rb:8:in `swap'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/builder.rb:64:in `block in build'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/builder.rb:49:in `tap'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/builder.rb:49:in `build'
        from scripts/build.rb:46:in `<main>'
[aws-wrapper]$ dig zzz.wgbh-mla-test.org
...
zzz.wgbh-mla-test.org.  300     IN      CNAME   zzz-wgbh-mla-test-org-a-176640697.us-east-1.elb.amazonaws.com.
zzz-wgbh-mla-test-org-a-176640697.us-east-1.elb.amazonaws.com. 60 IN A 50.19.104.155
zzz-wgbh-mla-test-org-a-176640697.us-east-1.elb.amazonaws.com. 60 IN A 23.23.141.161

Perhaps try to build/destroy/build quickly in succession and see if that reproduces it.

build.rb doesn't do DNS?

I though build.rb also set up the DNS... but that seems to be a separate step now? Make sure behavior and documentation and tests are in sync.

--ips_by_dns option raising error even though DNS is valid

@mccalluc bug?

;^) ruby scripts/ssh_opt.rb --name openvault.wgbh-mla.org --ips_by_dns
/Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:68:in `lookup_dns_cname_record_set': Expected 1 record set, not 0 (RuntimeError)
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:55:in `lookup_dns_cname_record'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:13:in `lookup_cname'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:25:in `lookup_ip'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:36:in `ip_by_dns'
    from scripts/ssh_opt.rb:39:in `<main>'

Cannot build with t2.medium instance type

When specifying instance_type: "t2.medium" in defaults.yml, I get the following error.

The specified instance type can only be used in a VPC. A subnet ID or network interface ID is required to carry out the request. (Aws::EC2::Errors::VPCResourceNotSpecified)

Done when I can run the build script with t2.medium instance type

Add script for creating EC2 + mounted EBS

As a DevOps
I can create a combo of EC2 instance with a mounted EBS volume
independently of anything else
in order to create one-off instances for demos, testing provisioning scripts, etc.

Relax errors when multiple reservations are returned

Rationale: I built a pair of instances, but it failed midway due to intermittent network connection. So I destroyed that pair, and tried to rebuild another, which worked. But then I couldn't retrieve the IPs using scripts/ssh_opt.rb --name [name] --just_ips because the SDK returned multiple "reservations".

Apparently the destroy.rb script created reservations for the instances it was terminating. And as we have seen, instances will stick around for an unknown amount of time with the state of "terminated". So too, it seems, will the reservation that terminated them.

So the presence of additional reservations does not necessarily mean things are in a bad state.

Done when

  • an error is not thrown immediately upon finding that a request returns multiple reservations
  • rather, and an error is thrown only when multiple reservations are returned that contain instances that are not in a "terminated" state.

ELB less fussy

it should accept more failures, and require fewer successes to reset. Still annoyed that this is even necessary.

quick delete script

The script creates all kind of things across AWS. Another script to clean it up would be good. Double check that you really want to delete it.

conceptual review of swap_both

@afred: Not quite a code review, but I'd like it if we could go through swap_both.rb, make sure it makes sense to you (code organization and the sequence of API calls), and figure out the next step.

Clean up unused EBS volumes

This ticket isn't specific to aws-wrapper, but it needs to be done, so putting it here so it's part of the workflow. Is there a better place for tracking infrastructure tasks no related to any single project?

Done when

  1. we've gone through all EBS volumes and determine which each is being used for
  2. we are certain we have backups of the ones we need backups for
  3. we've deleted the ones we know no longer need
  4. we've asked about the ones we're unsure of

Error when using ssh_opt.rb with --ips_by_dns

Getting this error:

;^) ruby scripts/ssh_opt.rb --ips_by_dns --name demo.openvault.wgbh-mla.org
/Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:68:in `lookup_dns_cname_record_set': Expected 1 record set, not 0 (RuntimeError)
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:55:in `lookup_dns_cname_record'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:13:in `lookup_cname'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:25:in `lookup_ip'
    from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:36:in `ip_by_dns'
    from scripts/ssh_opt.rb:39:in `<main>'

rsync between two remotes?

rsync can't handle two remotes. Alas. So here are some other options:

  • BAD: two hop rsync: remoteA -> local -> remoteB. Wastes disk on local, and for every person that does a deploy.
  • BAD: scp with two remotes. Blah: Will be really slow with AAPB.
  • BAD: put private keys on servers so remoteA <-> remoteB can connect to each other directly.
  • MAYBE: rsyncd? rsync can run as its own daemon, instead of relying on ssh. Not impossible: could imagine opening up the firewall on both to allow them only to talk with each other on that port. This just seems like a whole lot more setup on the server, when I'd like to encapsulate it in one line.
  • MAYBE: ssh tunneling: http://unix.stackexchange.com/questions/183504 or http://superuser.com/questions/179412 Still two hops, so all the data is coming through the local machine, and that could be a bottle neck.
  • HOPEFULLY: ssh agent forwarding? http://serverfault.com/questions/411552 The idea is that successive connections from the remote machine can use credentials from the original connection.

Apply security updates to new servers

3 package(s) needed for security, out of 28 available
Run "sudo yum update" to apply all updates.
  • make it part of the build script
  • non-interactive
  • wait for completion?

Partially created gives weird error

Not sure about the exact preconditions for this, but look into it:

$ ruby scripts/ec2_elb_start.rb --name zyx.wgbh-mla-test.org
INFO [2015-12-04 09:24:55]: Created key pair and stored private key at /Users/chuck_mccallum/.ssh/zyx.wgbh-mla-test.org.pem. Fingerprint: 8d:0c:6a:d3:16:1b:9e:22:f4:7c:15:d3:fd:17:76:34:34:c9:9f:34
INFO [2015-12-04 09:24:55]: Created PK for zyx.wgbh-mla-test.org
INFO [2015-12-04 09:24:55]: Created group zyx.wgbh-mla-test.org, and added current user
/Users/chuck_mccallum/starting/aws-wrapper/lib/core/ec2_wrapper.rb:199:in `block in config_wait': undefined method `reservations' for nil:NilClass (NoMethodError)
...
        from /Users/chuck_mccallum/.rvm/gems/ruby-2.0.0-p481/gems/aws-sdk-core-2.1.23/lib/aws-sdk-core/client_waiters.rb:110:in `wait_until'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/core/ec2_wrapper.rb:96:in `start_instances'
        from /Users/chuck_mccallum/starting/aws-wrapper/lib/util/ec2_elb_starter.rb:13:in `start'
        from scripts/ec2_elb_start.rb:32:in `<main>'

travis ssh-add missing -K?

The -K option seems to be missing from the ssh-add on travis. (We need to add keys to the ssh agent, and we need the agent so we can do agent forwarding for rsync.)

travis:

ssh-add: illegal option -- K
usage: ssh-add [options] [file ...]
Options:
  -l          List fingerprints of all identities.
  -L          List public key parameters of all identities.
  -d          Delete identity.
  -D          Delete all identities.
  -x          Lock agent.
  -X          Unlock agent.
  -t life     Set lifetime (in seconds) when adding identities.
  -c          Require confirmation to sign using identities
  -s pkcs11   Add keys from PKCS#11 provider.
  -e pkcs11   Remove keys provided by PKCS#11 provider.

but locally:

usage: ssh-add [options] [file ...]
Options:
  -l          List fingerprints of all identities.
  -L          List public key parameters of all identities.
  -k          Load only keys and not certificates.
  -c          Require confirmation to sign using identities
  -t life     Set lifetime (in seconds) when adding identities.
  -d          Delete identity.
  -D          Delete all identities.
  -x          Lock agent.
  -X          Unlock agent.
  -s pkcs11   Add keys from PKCS#11 provider.
  -e pkcs11   Remove keys provided by PKCS#11 provider.
  -A          Add all identities stored in your keychain.
  -K          Store passphrases in your keychain.

Restrict the IPs that can connect to instances

Limit to only the WGBH IP range (plus travis). If something really needs to be done from off-site, that's what VPNs are for. (Also, we should not be counting on the longevity of any particular instance, so it shouldn't be the end of the world if we can't connect and have to start over.)

Error when trying to destroy

When I try to run:

ruby scripts/destroy.rb --name deploy-test2.wgbh-mla-test.org

... i get the following error...

/Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/lister.rb:4:in `list': wrong number of arguments (3 for 1..2) (ArgumentError)

ssh retry only if 255?

Not sure about this, but it seems the failures to connect after launch always return 255: Maybe only repeat the loop for those failures, and for anything else we fail fast?

INFO [2015-12-14 11:13:01]: try 4: ssh -A -o StrictHostKeyChecking=no [email protected] -t -t 'sudo sh -c '\''mkfs -t ext4 /dev/sdb && mkdir /mnt/ebs && mount /dev/sdb /mnt/ebs && chown ec2-user /mnt/ebs && ruby -i.back -pne '\''\'\'''\''$_="AllowAgentForwarding yes\n" if /AllowAgentForwarding/'\''\'\'''\'' /etc/ssh/sshd_config && yum update --assumeyes'\'''
INFO [2015-12-14 11:13:01]: demo.zee.wgbh-mla-test.org: ssh: connect to host 54.158.191.217 port 22: Connection refused
WARN [2015-12-14 11:13:01]: ssh was not successful: pid 90982 exit 255

update readme?

No such file or directory -- scripts/build.rb

this got replaced with ec2_elb_start.rb perhaps?

Script to get IP of demo instance

The EC2 instances are behind ELBs, which won't have the ssh port open, so we can't use DNS to point us at the right address. (And even if it were, the actual machine behind is being swapped, so ssh will be grumpy.)

Intended usage:

ssh -i ~/.ssh/xxx.wgbh-mla-test.org.pem  ec2-user@`ruby scripts/demo_ip.rb --name xxx.wgbh-mla-test.org`

or maybe:

ssh `ruby scripts/demo_ssh_args.rb --name xxx.wgbh-mla-test.org`

Discuss: how to "reset" a failed build, in order to prepare for a new one?

After getting some failures during a build, there's a lot of stuff to undo before you can try a new build using the same parameters (mainly the name parameters).

Not sure if there are clean ways to undo everything that has been done, but the things I'm bumping into include:

  • key exists (on local machine)
  • key pair already exists (in AWS)
  • group already exists (in AWS)

maybe more. That's what I've seen so far.

Error when trying to safely destroy ELBs

When trying to destroy an otherwise complete set of AWS resources, I'm getting this.

/aws-wrapper/lib/util/destroyer.rb:50:in `block in safe_destroy': Still need to clean up [:elb_names] (RuntimeError)

Which forces me to use --unsafe flag.

shell script tests

Now that we have the cleaner script, we could have a non-travis test that runs the entire create-login-delete cycle.

SshOpter#just_ips fails

Reproduce:

  • build resources with: ruby scripts/build.sh --name foo.wgbh-mla-test.org

  • try to run: ruby scripts/ssh_opt.rb --name foo.wgbh-mla-test.org --just_ips

  • You'll get the following error:

    /Users/andrew_myers/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/resolv.rb:491:in `getresource': DNS result has no information for openvault-qa1.wgbh-mla-test.org (Resolv::ResolvError)
    
        from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/core/dns_wrapper.rb:12:in `lookup_cname'
        from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:25:in `lookup_ip'
        from /Users/andrew_myers/Projects/WGBH/aws-wrapper/lib/util/ssh_opter.rb:32:in `just_ips'
    

The #just_ips method expects the DNS to have been set up already, but we've factored that out of scirpts/build.sh doesn't do that anymore, by design.

So we need a way to retrieve IP addresses, given only the name tag (not the CNAME, which may not exist).

Done when running ruby scripts/ssh_opt.rb --name foo.wgbh-mla-test.org --just_ips returns IPs associated with EC2 instances tagged with --name.

Release EC2/EIP pair

(For developing scripts, easy cleanup is good... but we want to be sure we don't accidentally delete live sites. Perhaps check list of known names and make sure they don't resolve to these? Not the end of the world to clean-up by hand, either.)

Inventory script

start from the dns -> elb -> ec2 -> volume and identify all the resources in use under a given name.

(and at some point maybe delete all the resources which are not in use.)

Restrict name length

rationale when trying to run the build script with the name openvault-demo.wgbh-mla-test.org, I get an error from the AWS SDK: LoadBalancer name cannot be longer than 32 characters (Aws::ElasticLoadBalancing::Errors::ValidationError)

This doesn't get thrown until a lot of resources have already been created. Many of those need to be deleted manually before trying again. It would be better to catch invalid names before any resources are created.

Done when invalid names raise exceptions before resources are created.

Create group + elb perms for pair

As part of the pair creation script

  • create a group
  • add current user to that group
  • provide instructions on how to add other users
  • only members of the group may swap the elbs

rethink ssh_opter/sudoer

Looking at how we're actually using it, I think these changes would be good:

  • no sudo one-liners: just get rid of this entirely
  • easy to get the IP of the current demo

Wait loops are asking for off-by-one errors

1.upto(WAIT_ATTEMPTS) do |try|
        fail('Giving up') if try >= WAIT_ATTEMPTS

The loop should be infinite. Otherwise if someone forgets the '=' we'll still escape the loop even in the error state.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.