Giter Club home page Giter Club logo

dce's People

Contributors

amudapalani avatar amyschoen avatar anushaakkiraju avatar bjfish25 avatar bytebounder avatar dependabot[bot] avatar dilip0515 avatar eschwartz avatar jayanandagit avatar joshmarsh avatar jrsholly avatar kapilt avatar kddejong avatar marinatedpork avatar marissaann avatar mxu88 avatar nathanagood avatar pkn4645 avatar shubydo avatar wrschneider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dce's Issues

Error requesting SES email identity verification: Invalid email

terraform apply is failing with:

Error: Error requesting SES email identity verification: InvalidParameterValue: Invalid email address<STUB>.
        status code: 400, request id: 1265ca55-d539-4571-a909-b77400a19dfc

  on email_identity.tf line 2, in resource "aws_ses_email_identity" "from_email_address":
   2: resource "aws_ses_email_identity" "from_email_address” {

...when using default tfvar values.

redbox-artifacts logs are in //

Looks like there is a missing folder name for logs which are being put into // for the artifacts bucket.

Also the access logs are being put into the same bucket. We should move them to another bucket or use CloudTrail with S3 logging enabled. There should at least be a switch to turn this off.
When your source bucket and target bucket are the same, additional logs are created for the logs that are written to the bucket. These extra logs can increase your storage billing and make it harder to find the logs that you're looking for.

Account Pool Monitoring

I would like to have better visibility into the state of my DCE Account pool.
Specifically, I would like to how many accounts are in each AccountStatus, and be notified if too many are NotReady, or not enough are Ready.

This would allow us to:

  • dive into reset CodeBuild failures, if too many accounts are failing to reset
  • add more accounts to the pool, if demand for leases grows over time.

As a business analyst type user, I may also be interested to see time-series graphs of lease usage (how many account are Status=Leased)


I'm thinking we can put together a simple Lambda that runs every hour, and queries DynDB on the AccountStatus GSI. Then write the results to three different custom CW event metrics (one for each status).
We can configure alarms against these events, and maybe create a CW Dashboard for quick viewing.

S3 Malformed Policy issue

Version information

v0.3.0 of the CLI

Describe the bug

Error: Error putting S3 policy: MalformedPolicy: Policies must be valid JSON and the first byte must be '{'
	status code: 400, request id: 61647D60EB3D3F2B, host id: odaWMv3YYNxrPvZd7ochsjTvdpkKeKOGBSUO9ah6FaVBl3NUOe6m74fcxcqnMeckAHcnzs+rBWU=

  on init.tf line 57, in resource "aws_s3_bucket_policy" "reset_codepipeline_source_ssl_policy":
  57: 	resource "aws_s3_bucket_policy" "reset_codepipeline_source_ssl_policy" {

To Reproduce

  1. Download the CLI via zip file
  2. Run DCE init
  3. Run DCE system deploy
  4. Hit the error

Additional context
This is a fresh account. We had some SCP policies but moved to an org with no policies and attempted to redeploy locally via CLI.

Principal missing `execute-api:Invoke` permissions

Version information

v0.24.1

Describe the bug

DCEPrincipal role is missing the execute-api:Invoke permissions.
Besides this being something DCE users should have access to, it's also blocking some of our "DCE inside DCE" work
(see #206 ), because our DCE principal can't run functional tests

POST /accounts does not create an account

Currently doing a create operation on the accounts API doesn't actually create an account.

Given a lot of people are using a variation of a landing zone process when creating the account this may be the ideal solution. If we are to create an account we would need a way to initiate that landing zone process.

However, this shouldn't prevent us from evaluating creating accounts inside an organization. We could make this configurable but the payload in the accounts API would have to be different which could make things confusing.

Recommended SCP in docs results in failing update_lease_status lambda

Version information

v0.23.0

Describe the bug

The documentation contains an example Service Control Policy (SCP) to use in an AWS organization here.

Using this SCP results in an error in the update_lease_status lambda when AWS Cost Explorer is queried:

Failed check budget: Failed to calculate spend for lease [username] @ [username]: Failed to calculate spend for account [child account number]: AccessDeniedException: User: arn:aws:sts::[child account number]:assumed-role/OrganizationAccountAccessRole/0000000000000000 is not authorized to perform: ce:GetCostAndUsage on resource: arn:aws:ce:[region]:[child account number]:/GetCostAndUsage with an explicit deny

To Reproduce

  1. Deploy DCE into an OU in an AWS org
  2. Attach this SCP to that OU.
  3. Invoke the fan_out_update_lease_status lambda
  4. Check CloudWatch alarms for update_lease_status. It will be in alarm.

Expected behavior

Using the SCP should lock down DCE users and not interfere with DCE operations.

Additional context

N/A

Create a service layer

Create a service layer for accounts/leases/usage. Move business logic out of /cmd into the service layer.

Use interfaces to extract away the data and eventing systems.

Uninstall DCE

Is your feature request related to a problem? Please describe.
To clean up after a bug I need to recompile the dce software. It appears that to get things right I would have to redeploy (to change scripts running on aws).

As far as I can see, deploying again recreates everything. For example, the api gateway would change.

While I am testing I would just like to remove everything that was installed by the first deploy, then deploy it again with the new software. Even a list in the docs that describes what to remove manually would help. More generally a suggestion about how to go about repairing bugs or upgrading the software would be nice.

However in production, all the users of the service would seem to have to adjust their settings (the gateway url for example) to cope with an upgrade. So a feature that updates the software without changing basic setting or duplicating scripts at aws would help.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Add consistent query parameter handling to resources

Such as /accounts?status=active or /accounts?filter=. Leases as well. Staring with supporting all fields that are indexed in the DB and adding some query string parameters for pagination, sort, etc. that are consistent across all REST URLs.

Normal users should not be able to create/end leases for other people

Is your feature request related to a problem? Please describe.
While authenticating with temporary sts credentials mapped to a cognito user called testuser, I observed the following behavior.

~ dce leases create --budget-amount 100.0 --budget-currency USD --email [email protected] --principle-id jdoe99
Lease created: {"accountId":"XXX","budgetAmount":100,"budgetCurrency":"USD","budgetNotificationEmails":["[email protected]"],"createdOn":1574347343,"expiresOn":1574952143,"id":"d7586b82-2b57-4ba0-9469-ba59d865e823","lastModifiedOn":1574347343,"leaseStatus":"Active","leaseStatusModifiedOn":1574347343,"leaseStatusReason":"Active","principalId":"jdoe99"}
~ dce leases end --account-id XXX --principle-id jdoe99
Lease ended

Describe the solution you'd like
Respond with 403 for any leases requests involving a principaID that is not your own.

DCE should serve Swagger UI

Is your feature request related to a problem? Please describe.

In order to understand our API, users much either read the how-to or the swagger file itself. The former is not guaranteed to be comprehensive and must be manually kept up to date. The latter is not easy for humans to parse.

Describe the solution you'd like

DCE could serve swagger-ui from the DCE Master Account. One way of doing this may be to serve swagger-ui static web assets through API Gateway in the same way as we serve the credentials web page.

Describe alternatives you've considered

Additional context

Ending a lease does not return the child account to a ready state

Version information
dce 0.2.1, os centos 5 go 1.13.1

Describe the bug
An account was added and in the ready state, a lease was created, cde auth login succeeded, used leased account successfully, ended lease - at this point the account was never returned to the ready state (though the lease was properly ended)

The last error message was admin role is not authorized to perform iam:ListRoles.
Indeed the AdminstratorAccess policy had been removed from the admin role in the child account - which makes the authorization error seem sensible.
I suppose the question is why did dce remove the Administrator policy (using the role with remote login from the parent account does not have that effect).

Replacing the Administrator access policy in the child account doesn't help. The account just stays in the NotReady state for ever.

To Reproduce
Follow quick start guide then end the lease as described in the how to.

Expected behavior
The lease is terminated and the child account is cleaned up and returned to ready state

Additional context

Fix API Reference Docs

API Reference docs are rendering in a way that isn't particularly useful.

  • No navigation links
  • Missing response object schemas
  • No JSON examples

We actually have some other rendered HTML docs, using swagger-ui. They aren't perfect either, by more usable that the markdown ones we currently have. Might be worth looking a bit into our options here.


Markdown docs

image

Swagger-ui docs

image
image

Use pkg/config to configure Lambdas and CodeBuild

The new pkg/config exists in master but needs to be implemented for all appropriate resources.

  • cmd/lambda/accounts
  • cmd/lambda/fan_out_update_lease_status
  • cmd/lambda/lease_auth
  • cmd/lambda/leases
  • cmd/lambda/populate_reset_queue
  • cmd/lambda/process_reset_queue
  • cmd/lambda/publish_lease_events_queue
  • cmd/lambda/update_lease_status
  • cmd/lambda/update_principal_policy
  • cmd/lambda/usage
  • cmd/codebuild/reset

Delete lease by lease id

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
There is currently no DELETE method implementation on the /leases/{id} endpoint. Users must instead provide principal id and account id to the /leases endpoint in order to end a lease. Deleting leases by id would be more intuitive.

Describe alternatives you've considered

Additional context

Credentials Web Page Lambda not populating Cognito values in Lambda template

Version information

  • DCE v0.24.0
  • Go 1.13.3
  • TFE: 12.19

Describe the bug
Vue template values not populating from Credentials Web Page Lambda

To Reproduce
On a fresh deploy, after configuring cognito, the values aren't being populated in the index.html template via the Settings struct in the lambda.

The Settings struct specifically only has &{us-east-1 auth api } set.

These ENV variables are all set, and each has a corresponding value in parameter store

PS_IDENTITY_POOL_ID, 
PS_USER_POOL_APP_WEB_DOMAIN, 
PS_USER_POOL_CLIENT_ID, 
PS_USER_POOL_ID, 
PS_USER_POOL_PROVIDER_NAME

Expected behavior
A Cognito login should appear and the values should not be blank in the Vue template

Additional context
Currently debugging, trying to further verify if its a just me problem or a problem with the specific lambda.

Each SSM parameter is being returned from SSM based on the ENV values

2020/01/10 18:39:06 Retrieved SSM Parameter: {
18:39:06 ARN: "arn:aws:ssm:us-east-1:114018273183:parameter/sandbox/auth/client_id",
18:39:06 LastModifiedDate: 2020-01-10 15:49:55 +0000 UTC,
18:39:06 Name: "/sandbox/auth/client_id",
18:39:06 Type: "String",
18:39:06 Value: "5m01il8aisn2dud8no5rso45uk",
18:39:06 Version: 1
18:39:06 }

Edit: Additional Info:

	// load up the values into the various settings...
        ...
	if err := cfgBldr.Dump(Settings); err != nil {
		errorMessage := fmt.Sprintf("Failed to initialize parameter store: %s", err)
		log.Fatal(errorMessage)
	}

	fmt.Println("Settings in Init")
	fmt.Println(Settings)

I put a print here, after the cfg builder runs with the parameter store config and the output of the settings is the same as above still &{us-east-1 auth api }

dce-api-test-admin-role created by functional tests is not getting deleted from PR accounr

Describe the bug
dce-api-test-admin-role created by functional tests is not getting deleted from PR account

To Reproduce
Steps to reproduce the behavior:

  1. Run api_test.go in a PR environment
  2. Verify dce-api-test-admin-role created by the test is deleted
  3. See dce-api-test-admin-role still exists

Expected behavior
dce-api-test-admin-role should be deleted after test completes.

Merge CLI "quickstart" with DCE "How To" doc

Is your feature request related to a problem? Please describe.

DCE documentation is currently split between two repos and there isn't a single clearly documented "happy path" that a user can expect to follow while using the application.

Describe the solution you'd like

Merge the DCE CLI quickstart with the How To section of the docs in this repo. The end result should consist of detailed steps that users can follow chronologically to deploy and use DCE.

Describe alternatives you've considered

Additional context

Clean up Lambda Permissions and Lambda module work

We have two locations that IAM Permissions for Lambdas are defined. https://github.com/Optum/Redbox/blob/v0.18.0/modules/lambda_iam.tf#L3
and
https://github.com/Optum/Redbox/blob/v0.18.0/modules/lambda/iam.tf#L2

Additionally the lambda/iam.tf approach looks to use the same name.

Two issues here.

  1. Not having a common role name for each module IAM permission means that every module is writing to the same role (which may be fine but worth noting the repetitive work)
  2. That there are Lambdas being configured using aws_lambda_function and others being defined using lambdas module. We should conform to one and clean up the remaining items.

Manual RDS nuking fails on new Aurora RDS clusters

2020/01/23 14:14:13 Modify dbInstance retention period to 0 for arn:aws:rds:us-east-1:352684066999:db:database-1-instance-1
2020/01/23 14:14:13 Failed to execute aws-nuke RDS backup on account 352684066999: InvalidParameterCombination: The specified DB Instance is a member of a cluster. Modify backup retention for the DB Cluster using the ModifyDbCluster API
status code: 400, request id: 2964690c-d880-439f-98d7-f143162ca104

terraform specify lambda checksum in module

https://www.terraform.io/docs/providers/aws/r/lambda_function.html#source_code_hash

is a key attribute for doing lambda with terraform per best practice, as it avoids reuploading the function if there are no actual changes to its deployment pkg. There are some comments in due tf lambda module source that are a little hard to parse about deployments for DCE actually being done by Jenkins (?), but for normal terraform usage (including in Jenkins), this prevents spurious uploads by checksum comparison.

Create a config service

Create a Configurater interface for configuring the Lambda functions in a common way (eg., and using something like a Builder pattern) to reduce the amount of duplicate code for creating resources.

S3 Buckets shouldn't have logging enabled

Terraform is currently enabling S3 logging on buckets. This should not exist by default since this implementation is specific to the user and isn't core the functionality of DCE.

Rename `accountStatus`/`leaseStatus` --> `status`

Currently, our account and lease JSON payloads have accountStatus and leaseStatus fields. The naming is redundant, since the field is a property of the account or lease object.

Also, it's inconsistent with our API GET query param syntax, which accepts, for example, GET /accounts?status=Ready

PR #236 proposes to rename the query param to GET /accounts?accountStatus=Ready. If we're making a breaking change here, I'd rather make a change to the API payload than the query parameter.


Note that we do have internal systems at Optum relying on these API interfaces. And DCE CLI may rely on these params as well. I'd recommend that if we make a change here, we have at least one release that's backwards compatible (ie. contains both status and accountStatus fields). This will give us time to update logic in our clients.

eg
GET /accounts?status=Ready

[
  {
     id: "123456789012",
     status: "Ready",
     accountStatus: "Ready",
     ...
  }
]

POST /accounts does not persist metadata in request body

The account model contains a Metadata field (contains data not relevant to DCE, but is relevant to the DCE implementer), however when you make a POST request to the /accounts end point with a metadata property in the request body it does not persist in the database.

The /cmd/lambda/accounts/create.go should be updated to handle metadata fields and persist them in the Dynamo table.

GET /leases - returns 404 when there are no records matching query parameters

Version information

This was found in PR https://github.com/Optum/dce/pull/205/files

Describe the bug

When there are no records satisfying query parameters, GET /leases returns 404. It should return an empty array.
To Reproduce

  1. Go to API Gateway
  2. Select /leases GET endpoint
  3. Ensure no records exist in Leases table
  4. Run the above endpoint with query parameter status=Active

Expected output:
Returns 200 OK with an empty array

Actual output:
Returns a 404 error
Expected behavior

Additional context

Log in page does not work on Firefox

Version information

Describe the bug

When logging in using a Firefox browser, the credentials web page infinitely refreshes after entering your log in info.

To Reproduce

  1. Set default browser to Firefox
  2. Type dce auth
  3. Log in
  4. The credentials web page infinitely refreshes

Expected behavior

Additional context

Fix broken documentation links

Version information

Describe the bug

Documentation links (http://dce.readthedocs.org) in Markdown files end on .md and do not work after being rendered to HTML.

To Reproduce

  1. Go to http://dce.readthedocs.org
  2. Click Quickstart
  3. Click the link for concepts
  4. See 404

Expected behavior

All links should work

Additional context

setup repo for squash and merge

currently commits are being done as merge commits, which for some of the commit histories on the pull requests will only clutter the repo history. switching to squash and merge as repo setting will ensure cleaner history regardless of the branch/pr history.

Flesh out docs / Migrate from README.md

Flesh out some parts to the docs.
We also have a good deal of content in the README.md that could be moved into our docs/ dir

Here's what I'm thinking, as a general outline, for new/modified pages:


How to

  • Use the DCE API
  • Use the DCE CLI
  • Login to your DCE account
  • Add accounts to your account pool
  • Deploy DCE with Terraform
  • Configure budgets
  • Customize account resets
  • Customize email templates
  • Authenticate using cognito
  • Hook into lifecycle events
  • Manage DB backups
  • Customize IAM user policies

Local Development

  • how to run unit tests locally
  • how to run functional tests locally
  • guidelines? patterns?

Contributing
update with info about opening PRs, etc.
signing changes? (squash and merge gets signed)
etc?

Review quickstart

  • maybe want to point at a separate section for deploying with terraform
    (eg. "Deploy using REST API" doesn't really make sense)

Some of these will be short stubs, pointing at docs elsewhere. For example, Using the CLI might just point at dce-cli docs.

Query Parameters for API Pagination

The prior art in the repo has us:

  • Return a Link header, which includes the URL for the next page of results
  • The URL includes query parameters for the "next' primary key in DynDB. eg:
    GET /leases?principalId=&nextPrincipalId=&nextAccountId=&

That URL will look a little nicer if we can treat Lease IDs as a primary key
GET /leases?principalId=&nextId=
...though we may actually need to migrate to using Lease.ID as a primary key to get that to work (today it's a GSI, for historical reason...).


One current pain point with pagination queries is that if your DB query is using a FilterExpression, DynamoDB will potentially return you some pages of empty results, in between pages of actual results. This is because DynDB grabs a page of data, and then filters out values.
This is not a great experience for end users. I'd like to either:

  1. Only support API query params that can be queried against an index (no FilterExpressions
  2. Hide this behavior from end users (eg. continue to paginate server-side until we get the requested number of records)

option 1. would be much simpler, if we're ok with it.
For example, the GET /leases?status= request should be using our GSI on LeaseStatus (currently it's doing a full table scan)

Implement pagination for GET /usage

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Implement pagination for GET /usage
Describe alternatives you've considered

Additional context

Remove TF provider from module

Version information
DCE v0.24.0

Describe the bug
The version of the TF AWS provider is explicitly hard-coded into the DCE terraform module.
Terraform docs recommend not configuring providers in submodules:
https://www.terraform.io/docs/configuration/modules.html#providers-within-modules

While in principle provider blocks can appear in any module, it is recommended that they be placed only in the root module of a configuration, since this approach allows users to configure providers just once and re-use them across all descendent modules.

Specifically, this causes problems for consumers of DCE:

  • Cannot use any other version except for the one required by DCE
  • Other bugs related to providers in submodules (eg, and [https://github.com/hashicorp/terraform/issues/15762])

To Reproduce

# main.tf
provider "aws" {
  version = "2.13.0"
}

module "base" {
  source = "path/to/dce/modules"
}
terraform apply

> No provider "aws" plugins meet the constraint "2.13.0,2.43.0".

Unable to create lease: "has already spent $0.00 of their budget"

Version information

DCE v0.24.1

Describe the bug

Unable to create a lease for the same principal a second time:

$ dce leases create -p myuser -b 100 -c USD -e [email protected]
> Lease created: { "accountId": 123, ... }
$ dce leases end -a 123 -p myuser
$ dce leases create -p myuser -b 100 -c USD -e [email protected]
> [POST /leases][400] postLeasesBadRequest 

Showing with dce-cli for clarify, but I'm able to reproduce this hitting the API directly.
Here's the actual 400 response in postman:

Screenshot_2020-01-10 12 23 52_oPJWu8

I verified that my leases lambda is configured with the correct values:
image
...so I would guess this is something in the code (not in my TF configuration)

To Reproduce

See above

Expected behavior

I should be able to create a new lease for the same principal, unless that principal has actually exceeded the usage limit

Additional context

This is new bug introduced since v0.23.0 (our last working deployment)

Automated / Scheduled DB Backups

Is your feature request related to a problem? Please describe.

I would like an out-of-the-box solution to automated/scheduled DB backups

Describe the solution you'd like

A lambda on a cloudwatch scheduled event, that backs up the DB

Additional context

This is something we've dealt with internally already at Optum. Seems like an easy win to either bring the DB backup work into this repo, or to create a new open source repo to do scheduled DynDB backups.

We could make this toggle-able in Terraform, so end-users could disable it if they don't want it for any reason.

Automated testing updates

Currently a PR will run a pipeline that does unit/lint testing, terraform create, functional tests, and then creates artifacts if its a tagged release. Additionally there is a destroy pipeline that needs to be run on comment.

What I'm proposing.

  1. Switch PR automated testing to just run unit & lint tests
  2. Create a separate Terraform create and functional test pipeline that is initiated on comment by contributor
  3. Leave the destroy Terraform pipeline as is

Additionally we move the creation of the artifacts to the unit/lint test to only work off of tags. Which means functional tests will run after artifacts are created. Ideally we could create artifacts on every unit/lint test. Then we could run a release pipeline that tests those artifacts before publishing them to git. Not sure how possible this is.

readme - remove `tree` output

the package content listing in the readme feels a bit gratititous, its basically the output of the tree command, I think it primarily just adds noise to the readme. a top level explanation of pkg/ cmd/ directories would suffice, but even that might be overkill, as this structure is very common for golang projects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.