Giter Club home page Giter Club logo

reply-fr / sustainable-personal-accounts Goto Github PK

View Code? Open in Web Editor NEW
44.0 6.0 7.0 17.42 MB

automate the control, the purge and the management of AWS accounts assigned permanently to selected employees - foster innovation from cloud teams

License: Apache License 2.0

Python 88.05% Gherkin 9.01% Makefile 1.63% Shell 1.31%
accounts aws management aws-codebuild aws-control-tower aws-lambda-python aws-organizations aws-tags python aws-incident-manager

sustainable-personal-accounts's People

Contributors

bernard357 avatar bgauchon-reply avatar geoffroyrenaud avatar nassim-reply avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sustainable-personal-accounts's Issues

design ChatOps event-driven architecture

What is this?

With ChatOps we can bring progressive and explainable automation to the operation of SPA. A use case is when a billing alert has been hit, to send a message over chat to someone and to validate that an action has been taken. Another use case is the conditional deletion of cloud resources, validated by the resource owner.

Why is it important?

SPA has been introduced to reduce as much as possible the administration burden of many AWS accounts. In order to scale, there is a need for SPA to interact directly with the owners of AWS accounts for simple and repetitive tasks.

How to implement this?

The architecture is leveraging event-driven supported by Eventbridge

Learn more:

Definition of Done

  • consolidate reference information from other public projects:
  • produce reference architecture for integrating social media with event-driven architecture
  • write feature file for notifications to Microsoft Teams
  • write feature file for integration with Slack

Other tickets will be created for the progressive development of ChatOps capabilities:

  • transmit notifications from SPA to Microsoft Teams -- feature existed previously, this is an updated architecture and setup
  • transmit notifications from SPA to Slack
  • accept, execute and respond to commands received from Slack
  • accept, execute and respond to commands received from Microsoft Teams
  • interact with individual persons over Slack

epic: purge IAM users from AWS sandbox account 792281704736

Purpose of this epic is to ensure that no Storm reply consultant is using IAM user credential to access the sandbox account, once everybody has received a personal AWS account for himself.

For each IAM user found on sandbox account 792281704736:

  • disable console access
  • disable CLI access with AK and SK

Send alerts and notifications to Microsoft Teams

What is this?

This optional feature is aiming to close the feedback loop via social channels. When a billing alert is raised, or when a CodeBuild projects fails for some reason, the information is pushed to a channel with multiple eyes.

Why is it important?

Currently SPA distributes alerts and notifications over e-mail. While mailing lists can be helpful, they have proven less reactive than social exposure. For example, if a team member goes beyond his budget, he can neglect proper handling of the billing alert in his inbox. But if the same alert is exposed on a social channel, then the person will receive encouragements from the entire team to fix the issue.

How is this implemented?

  • add pymsteams package as a project dependence
  • accept webhook parameter in settings file
  • pass webhook to Lambda functions as an environment variable
  • expand OnAlert Lambda function to push messages to webhooks
  • add a workbook to explain how to setup a Microsoft teams webhook and how to use it in SPA

Create a role in Management account that can be assumed from within Automation account

Our first requirement is to limit blast radius of any action engaged in the context of the Management account of the AWS Organization. For that, we will run code either in the context of the Automation account, or in the context of individual personal accounts. Since code ran in Automation has to move accounts around, there is a need to grant specific permissions via a role and a trusted relationship managed in IAM.

Here is how the sequence of actions is looking like for moving an account from one OU to another one:

  • code is logged in in the context of the Automation account - this one is in OU Sandbox
  • code assumes role ServiceRoleForAutomation in the Management account
  • this role is passing permissions to manage accounts in OU Sandbox and in children OU of Sandbox via a customized policy SandboxPowerAccess
  • code can run boto3 code to move accounts around, to act on accounts, to create IAM resources within accounts, to create Codebuild projects within accounts, all within the context of the OU Sandbox and children OU

Policy SandboxPowerAccess is defined using multiple identifiers:

  • organization_identifier - something like o-a1b2c3d4e5
  • root_ou_identifier - something like r-f6g7h8i9j0example
  • sandbox_ou_identifier - a string starting with ou-
  • vanilla_accounts_ou_identifier - a string starting with ou-
  • assigned_accounts_ou_identifier - a string starting with ou-
  • released_accounts_ou_identifier - a string starting with ou-
  • expired_accounts_ou_identifier - a string starting with ou-
  • management_account
  • automation_account
  • test_account - account used for tests

With that in mind, you can create the policy SandboxPowerAccess by substituting values in following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "organizations:*"
            ],
            "Resource": [
                "arn:aws:organizations::{management_account}:account/{organization_identifier}/*",
                "arn:aws:organizations::{management_account}:ou/{organization_identifier}/{vanilla_accounts_ou_identifier}",
                "arn:aws:organizations::{management_account}:ou/{organization_identifier}/{assigned_accounts_ou_identifier}",
                "arn:aws:organizations::{management_account}:ou/{organization_identifier}/{released_accounts_ou_identifier}",
                "arn:aws:organizations::{management_account}:ou/{organization_identifier}/{expired_accounts_ou_identifier}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "account:*",
                "codebuild:*",
                "iam:*"
            ],
            "Resource": "*"
        }
    ]
}

Epic:

  • Authenticate in root account and take note of identifiers listed above

  • Go to IAM console and create policy SandboxPowerAccess, based on previous template

  • Create role ServiceRoleForAutomation and attach policy SandboxPowerAccess to it. Add trusted relationship with account Automation.

  • Authenticate to account Automation and add an IAM policy AssumeServiceRoleForAutomation, that allows to assume role ServiceRoleForAutomation in account Management

  • To test the overall setup, there is a need to create an EC2 instance in the context of Automation, to connect to it, to assume role ServiceRoleForAutomation in Management account, and to pass commands to move accounts in AWS CLI

add cost report and tags to incident record

What is this?

With this feature, SPA gets from the Cost Explorer API a report for a given account and attaches it to an ongoing incident record

Why is this important?

When an incident record is created on budget alert, there is a need to provide contextual information to responders. This is eliminating manual checks that are performed during resolution.

How to implement it?

  • when an account id is provided in an exception, get cost report from the cost explorer api and add it to the incident record
  • develop a lambda function url to fetch the report over public Internet
  • when an account id is provided in an exception, get account information and tag the incident report
  • add an architectural diagram made with draw.io
  • complement feature file related to exceptions
  • add account id to the budget exception

More information:

Problems between AWS organization and AWS ControlTower

When accounts are moved between OU , this cause a drift status in the CT dashboard, and the account isn't governed anymore. You need to reregister the OU afterward.
PB :

  • Only one registration seems to be able to occur at a time, and it can take a fair amount of time (~ 40 min for 6 accounts)
  • Sometimes enrollment fail. Each time it happened it was due to CT SNS queue which can't be reused and need to be manually deleted prior to enrollment (need to assume CTExecution Role in the account else it's blocked by SCPs)

NPM -g fails if not root in make setup

In make setup :

	npm install -g aws-cdk@latest --force
...
	npm install -g @marp-team/marp-cli --force

fails with permission denied if not root (cause -g is global and /usr/lib/node_modules/ is not world writable neither all of modules)

Prepare Automation AWS account

The Automation AWS account is hosting all central computing and storage resources used for SPA. The EventBridge bus used by SPA is the default bus of the Automation AWS account itself.

  • Grant permissions to allow events from other AWS accounts. While the bus does not need to be created, there is a need to configure it for cross-account event submission. This has to be done once.

  • Create a CloudWatch log group to reflect events generated by AWS Organizations. After authentication to Automation account, go to Ireland region and create a CloudWatch log group. Select name /aws/events/aws-organizations-events and retention period of 30 days.

  • #9

  • Test the manual deployment by moving an account from one OU to another OU, and then move it back to the original OU. This should generate two events visible in CloudWatch logs. Copy these events and put each of them in a .json file in the folder /tests/events of this git repository.

  • Delete the CloudWatch log group and the EventBridge rule that were created manually.

  • Create python code with CDK to automate the creation of CloudWatch log group and EventBridge. Script should be named deploy_resources.py and be placed in the root directory of this repository.

  • Run python code and deploy resources in Automation account.

  • Test the automated deployment by moving an account from one OU to another OU, and then move it back to the original OU. This should generate two events visible in CloudWatch logs.

  • NB : Tag all manually created ressources with : "Stack:sustainable-personal-accounts" . BGA

publish purge log to parameter store

Scenario: where cloud engineers can review what would be destroyed in their accounts
Given an expired account
When the purge is done in dry-run mode
Then the log of the purge is pushed to parameter store for easy access by cloud engineer

add a Lambda function to release all accounts

What is this?

This Lambda function will scan organizational units and managed accounts, and tag accounts with state 'released'.

Why is it important?

During normal operations the accounts are transitioned automatically from the state released to the states expired, assigned, prepared and released again. However, in a case of a bug into one of the processing Lambda functions, it may happen that the cycles break and that accounts stay in an intermediate state without additional processing.
Since the state of an account is contained in a tag attached to it, for such situations you can use AWS Organizations to change the state of any account. For this you would visit the page of each account and then change the tag account:state to the value released. This operation is feasible for some dozens of accounts, but can become tedious for large number of accounts. Therefore the need to reset accounts to the state released with a simple invocation of a Lambda function devoted to this usage.

How to implement it?

  • document recovery scenarios in a feature file
  • create a new Lambda function ReleaseAccounts
  • add a workbook to help system administrators

Enrollment issue of existing accounts

With the last arrivals, I encountered 2 problems and had to open a case with AWS support to get back control of the account :

  • If the newly created account is enrolled BEFORE the root user finish the configuration ; you end in a stale state where you lose all perms from both sides and can't remove account from organisation due to billing info not completed.

Accounts should be created from the Account factory only. Else, setup must be completed and CTExecution Role manually created in the account (pb occured whith saad account).

  • In some cases, you have to create manually CTExecution Role manually. This happen if the account was used prior to enrollment or was in another organisation before (you have to change the org id in the trust relationship)

Change the tagging prefix used by SPA in settings

What is this?

Today SPA tags accounts with names such as account-owner and account-state. However, large corporations may want to use a different prefix.

Why is this important?

Purpose of this change is to align SPA with corporate tagging policies defined for cloud resources.

How to implement this feature?

  • Define a new parameter for settings file
  • Use the parameter for the creation of event rules
  • Pass the parameter to Lambda functions with a new environment variable
  • Leverage the environment variable within tagging code

add constructor to Worker

Currently, Worker only has class methods. Most methods rely on a session parameter. Because of this, there is heavy cascading of session transmission across methods. To fix this, we suggest that instances of Worker are created to encapsulate session as instance attribute.

generalize default settings

What it this ?

We introduce a single section of the settings file to capture all default settings that apply at account level. This is a change from previous situation where the default keyword could be used jointly within the accounts section and also in the organizational_units section.

Why is this important?

This change streamlines the configuration of SPA with a single place to set default settings. Previously you could have to replciate default settings at two different places of the configuration file.

How to implement it?

  • change configuration scenarios in feature files
  • change fixtures/settings/settings.yaml accordingly
  • change tests/test_configuration.py accordingly
  • introduce a new section default in resources/configuration.py
  • eliminate usage of the sub-section default in the section organizational_units in resources/configuration.py
  • ensure that all sub-sections of the section organizational_units have a name that are prefixed with ou-
  • eliminate usage of the sub-section default in the section accounts in resources/configuration.py
  • ensure that all sub-sections of the section accounts have a name that are 12-digits
  • deploy in production

Arm architecture is not supported in eu-west-3 region

Make deploy will fail if you configure to deploy on eu-west-3 due to arm64 no't available yet.

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named SpaCommitted failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Architecture "arm64" is not supported in eu-west-3. Please select "x86_64" or remove the Architectures value from your request and try again (Service: Lambda, Status Code: 400, Request ID: 86cc75f7-53e7-481c-b049-7f464154bdd8)" (RequestToken: c8b878fb-3909-4930-05cb-2f6b01d54543, HandlerErrorCode: InvalidRequest), Resource handler returned message: "Architecture "arm64" is not supported in eu-west-3. Please select "x86_64" or remove the Architectures value from your request and try again (Service: Lambda, Status Code: 400, Request ID: a0e085ba-c4c7-41cf-a058-17ef5f3f99f0)" (RequestToken: 6b221b58-2201-02de-8377-f3dc062d7b5b, HandlerErrorCode: InvalidRequest), Resource handler returned message: "Architecture "arm64" is not supported in eu-west-3. Please select "x86_64" or remove the Architectures value from your request and try again (Service: Lambda, Status Code: 400, Request ID: 4ffd6aff-3192-4abc-8510-bf5fbddef80e)" (RequestToken: d31c3eb6-ed6f-8098-97da-0fa70f2bb833, HandlerErrorCode: InvalidRequest), Resource handler returned message: "Architecture "arm64" is not supported in eu-west-3. Please select "x86_64" or remove the Architectures value from your request and try again (Service: Lambda, Status Code: 400, Request ID: 43986f59-f13c-4b92-81e7-1ca615fce04d)" (RequestToken: f7fd9bb9-9b3e-72a4-e687-e7b9cffcbb7b, HandlerErrorCode: InvalidRequest), Resource handler returned message: "Architecture "arm64" is not supported in eu-west-3. Please select "x86_64" or remove the Architectures value from your request and try again (Service: Lambda, Status Code: 400, Request ID: 56180a28-09d3-48de-b2d9-38ff471a8da1)" (RequestToken: 3c159c43-bfc9-5a35-62b8-217a119c6e36, HandlerErrorCode: InvalidRequest)
    at deployStacks (/usr/lib/node_modules/aws-cdk/lib/deploy.ts:61:11)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at CdkToolkit.deploy (/usr/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:314:7)
    at initCommandLine (/usr/lib/node_modules/aws-cdk/lib/cli.ts:357:12)

Document SPA contribution to GDPR compliance

What is this about?

Since SPA tags account with personal addresses of persons, it should be considered as a data store of PII.

Why is is important?

The management of PII is related to GDPR compliance.

How to implement it?

[ ] Review GDPR constraints on PII for company employees
[ ] Add workbooks to document the life cycle of AWS accounts and related PII

Make setup errors

Hello, on my side encountered :

ERROR: flake8 4.0.1 has requirement mccabe<0.7.0,>=0.6.0, but you'll have mccabe 0.7.0 which is incompatible.

and

npm ERR! Error: EACCES: permission denied, rename '/usr/lib/node_modules/aws-cdk'

while make setup due to prior cdk install I guess
When doing sudo make setup, the setup manage both downgrading of mcabe and cdk

Deploy lambda functions in Automation account

With this issue we develop and deploy all Lambda functions that are part of the SPA project. We do not include CodeBuild projects yet, and only want to control the overall flow of transitions across OU.

Pre-conditions:

  • Issue #8 has been completed
  • Issue #4 has been completed

Epic:

  • Write code for Lambda function MoveVanillaAccount in /code/move_vanilla_account_lambda.py. This function will receive an event generated by AWS Organizations, ensure that the OU is the one for Vanilla Accounts, extract the account id from the event itself, and move the account to the OU Assigned Accounts. Identifiers will be provided to the function as environment variables VANILLA_ACCOUNTS_OU_IDENTIFIER and ASSIGNED_ACCOUNTS_OU_IDENTIFIER.

  • Extend python script deploy_resources.py to automate the deployment of Lambda function MoveVanillaAccount. You should create a specific python construct for that purpose, with two parameters vanilla_accounts_ou_identifier and assigned_accounts_ou_identifier.

  • Test the deployment by moving one AWS account to the OU Vanilla Accounts. Ensure that execution of MoveVanillaAccount is reported in CloudWatch logs, and that the account is moved automatically to the OU Assigned Accounts.

  • Write code for Lambda function SignalAssignedAccount in /code/signal_assigned_account_lambda.py. This function will receive an event generated by AWS Organizations, ensure that the OU is the one for Assigned Accounts, and put an event PreparedAccount on the event bus. Identifier will be provided to the function as environment variable ASSIGNED_ACCOUNTS_OU_IDENTIFIER. Note that code is not complete here, since we do not create nor start a CodeBuild project yet.

  • Extend python script deploy_resources.py to automate the deployment of Lambda function SignalAssignedAccount. You should create a specific python construct for that purpose, with one parameters assigned_accounts_ou_identifier.

  • Write code for Lambda function MoveAssignedAccount in /code/move_assigned_account_lambda.py. This function will receive an event PreparedAccount, extract the account id from the event itself, and move the account from the OU Assigned Accounts to the OU Released Accounts. Identifiers will be provided to the function as environment variables ASSIGNED_ACCOUNTS_OU_IDENTIFIER and RELEASED_ACCOUNTS_OU_IDENTIFIER.

  • Extend python script deploy_resources.py to automate the deployment of Lambda function MoveAssignedAccount. You should create a specific python construct for that purpose, with two parameters assigned_accounts_ou_identifier and released_accounts_ou_identifier.

  • Test the deployment by moving one AWS account to the OU Vanilla Accounts. Ensure that execution of MoveVanillaAccount, of SignalAssignedAccount, and of MoveAssignedAccount are reported in CloudWatch logs, and that the account is moved automatically to the OU Released Accounts.

  • Write code for Lambda function SignalExpiredAccount in /code/signal_expired_account_lambda.py. This function will receive an event generated by AWS Organizations, ensure that the OU is the one for Expired Accounts, and put an event PurgedAccount on the event bus. Identifier will be provided to the function as environment variable EXPIRED_ACCOUNTS_OU_IDENTIFIER. Note that code is not complete here, since we do not create nor start a CodeBuild project yet.

  • Extend python script deploy_resources.py to automate the deployment of Lambda function SignalExpiredAccount. You should create a specific python construct for that purpose, with one parameters expired_accounts_ou_identifier.

  • Write code for Lambda function MoveExpiredAccount in /code/move_expired_account_lambda.py. This function will receive an event PurgedAccount, extract the account id from the event itself, and move the account from the OU Expired Accounts to the OU Assigned Accounts. Identifiers will be provided to the function as environment variables EXPIRED_ACCOUNTS_OU_IDENTIFIER and ASSIGNED_ACCOUNTS_OU_IDENTIFIER.

  • Extend python script deploy_resources.py to automate the deployment of Lambda function MoveExpiredAccount. You should create a specific python construct for that purpose, with two parameters expired_accounts_ou_identifier and assigned_accounts_ou_identifier.

  • Test the deployment by moving one AWS account to the OU Expired Accounts. Ensure that execution of SignalExpiredAccount, of MoveExpiredAccount, of SignalAssignedAccount and of MoveAssignedAccount are reported in CloudWatch logs, and that the account is moved automatically to the OU Released Accounts.

select best platform for codebuild project (x86 versus ARM, size)

Currently Codebuild projects are deployed on ARM small. However, the Codebuild free tier provides 100 minutes every months on x86 small environment. If this is proving better value for money, code should be adjusted to better align with the Codebuild free tier. Also, buildspec templates should be adjusted accordingly. For example, different release of AWS_Nuke has to be considered.

Reference:

Screen rendering of make coverage fails on WSL

venv/bin/python -c "$BROWSER_PYSCRIPT" htmlcov/index.html
Start : Impossible dÔÇÖex├®cuter cette commande en raison de lÔÇÖerreur┬á: Le fichier sp├®cifi├® est introuvable.
Au caractère Ligne:1 : 1
+ Start "file:///home/bgauchon/Melexis-OracleEBS-POC/sustainable-person ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation : (:) [Start-Process], InvalidOperationException
    + FullyQualifiedErrorId : InvalidOperationException,Microsoft.PowerShell.Commands.StartProcessCommand

Use Microsoft teams webhooks from native python

What is this?

Purpose of this enhancement is to remove dependency pymsteams

Why is it important?

We try to keep SPA as slim as possible, and to support native web requests instead of loading external modules.

How can it be implemented?

[ ] write a function to post a notification (title and subject) to Microsoft Teams
[ ] change code to use this function instead of pymsteams
[ ] remove dependency in setup.py

Reference code exist to submit notification to an incoming webhook of Teams:

automate the approval of terms of service

What is this?

When an account is assigned to a person, we want to share terms of service for the corporate resource that is provided.

Why is this important?

There is a need to align the management of personal AWS accounts with the management of other corporate resources such as computers or smart phones.

How to implement it?

  • add an architectural diagram in draw.io
  • add a feature file to describe the scenarios related to the approbation of terms of service
  • add a fixture file to be used as default terms of service template
  • add a workbook to explain the experience of the system manager
  • add cdk/documents.py to handle templates with frontmatter
  • add tests/test_cdk_documents.py using fixtures/documents/terms-of-use.md
  • pass make all-tests
  • pass make lint
  • add configuration parameters to integrate external message files into fixtures/settings/settings-with-notifications.yaml
  • add loading settings test to tests/test_cdk_configuration.py
  • edit cdk/parameters_construct.py to load notifications in SSM parameter store
  • add integration test to tests/test_cdk_parameters_construct.py with mocked SSM
  • pass make all-tests
  • pass make lint
  • add cdk/on_notification_construct.py to build DynamoDB table and related Lambdas functions
  • add tests/test_cdk_on_notification_construct.py to ensure data encryption in DynamoDB
  • edit cdk/serverless_stack.py to integrate new construct
  • add notifications to tests/fixture_small_setup.py
  • add function get_notification_for_account to lambdas/settings.py
  • add related integration test to `tests/test_lambda_settings.py``
  • pass make all-tests
  • pass make lint
  • introduce module lambdas/on_notification_handler.py to handle events related to notifications
  • add a Lambda that can be invoked over the web to confirm explicit approbation by the end user
  • add tests/test_lambda_on_integration_handler.py for integration tests
  • pass make all-tests
  • pass make lint
  • deploy in production
  • validate notifications sent on next maintenance window
  • confirm notifications in production
  • inspect DynamoDB table on notifications
  • inspect DynamoDB table on activities
  • inspect monthly notifications report

automate the life cycle of accounts

What is this?

This is a workbook for enterprises that have thousands of AWS accounts to manage.

Learn more:

Why it is important?

SPA by itself does not create nor terminate accounts. When a new consultant joins Reply, we use Control Tower to create an account from the Account Factory (implemented with Service Catalog). This works well with the current rate of staff arrival, but of course it will not sustain a high-rate of people on-boarding, and is prone to configuration errors.

Therefore the need to provide, aside SPA itself, workbooks and code that control the creation and termination of thousands of AWS accounts. Current thinking is limit code dept by leveraging existing solution.

How to implement it?

  • Create a new workbook in Markdown .md in the workbooks directory

rename 'master' branch to 'main' branch

While this is not an urgent topic, we should not use terms such as 'master' across Reply projects. For trunk-based workflows, we should prefer to name 'main' for the trunk itself.

prevent lambda loops in metrics

Cette nuit AWS m’a envoyé le message d’alerte ci-dessous. Un petit tour dans la console de billing de mon compte perso confirme un coût de plus de 1,000 USD depuis le début du mois … Après analyse, il s’avère qu’une lambda déployée il y trois jours est invoquée près de 380 fois chaque seconde. Le problème provient de la façon dont EventBridge gère les sources d’événements pour l’envoi de métrique CloudWatch, ce qui a créé une boucle d’appels infinie. L’exécution de la Lambda crée un événement qui lance la Lambda, etc. L’ensemble du stack a été détruit pour stopper le phénomène et mettre fin aux dépenses inutiles.

Quelques leçons apprises :

  • Les alertes de consommation et de billing sont vraiment importantes et utiles !

  • /!\ Le Namespace d’envoi de CloudWatch Metrics est interprété comme event Source par EventBridge – pas indiqué dans la documentation

  • Bonne pratique - 15 minutes après déploiement ou mise à jour de Lambda activées sur événement, il convient de monitorer le nombre global d’invocations sur la console Lambda pour détecter une boucle éventuelle – si ça monte à 115 000 invocations ou plus pour 5 minutes, il y a un problème…

  • Comme j’étais sur mon compte perso, personne d’autre n’a été impacté – pas de pb de service limit sur les autres comptes, ni en production – l’isolation a fonctionné

  • Comme nous sommes organisés dans Control Tower, les coûts de plus de 1 000 USD ont été couverts par les crédits que nous donne AWS – il n’y a pas eu de dépense réelle, heureusement

duplicated events on multiple deployments

despite precautions taken, we observe extraneous 'AssignedAccount' and 'ReleasedAccount' when the solution is deployed multiple times into the same automation account.

PB regarding cdk version

The setup install cdk v2.0.0 but we have 2.13 available and already set up for other projects ; resulting in downgrading existing install and breaking other cdk stacks as installing in another path and taking precedence in systems paths

stack cdk version:

/usr/local/bin/cdk --version
2.0.0-rc.33 (build 336ff5e)

On other stacks for example Melexis

npm install -g aws-cdk@latest

Install cdk in /usr/bin

 /usr/bin/cdk --version
2.13.0 (build b0b744d)

restructure code with simple naming conventions

What is this?

With this ticket we rename directories that contain CDK code and Lambda code. We also rename files in the test directory to reflect the location of target python module.

Why is this important?

The structure of the code base is not simple for newcomers to the project. Currently the directory resources contains code for CDK, and the directory code contains code for the Lambda functions. Names of test modules do not reflect if tested modules are in resources or in code. We want to streamline the experience of software engineers coming to SPA so as to accelerate their on-boarding.

How to implement it?

  • rename the directory code to lambdas
  • rename the directory resources to cdk
  • fix all tests in the command make all-tests
  • review the README and other Markdown files to reflect the updated code structure
  • rename files in the directory test to reflect the origin of tested modules, e.g., test_lambda_events.py
  • add lambda function CheckHealth to validate the state of the run-time
  • fix run-time errors

make setup install package locally , in a PATH which is not in $PATH

... and so we encounter command not found error.

make setup execute this command : npm install --save-dev @marp-team/marp-cli
This install marp in ~/sustainable-personal-accounts/node_modules/.bin/marp which of course is not part of $PATH

In order to fix problem we should either :

  • Expand path and source it for user
    or
  • Install globally which put binary in /usr/bin/ : sudo npm install -g @marp-team/marp-cli

Introduce per-account settings

What is this?

This is an adaptation of SPA where configuration file accepts settings per individual accounts.

Why is it important?

Today SPA settings is structured per organizational unit. This is great for collections of accounts that are similar. However, for production accounts or for atypical accounts, there is a need for individual settings. A typical use case is for production accounts, where you want to adjust budget thresholds to individual accounts.

How to implement it?

  • add a feature file features/account_management.feature to document settings file and configuration options
  • introduce new key accounts: in SPA settings file
  • save account settings under individual parameter name that reflects account identifier
  • add a python module code/settings.py to fetch settings for a given account, that looks OU settings if needed

segment parameters per organizational unit

purpose is to spread managed aws accounts across multiple OU. Each OU will have its own set of parameters for accounts that it contains, including:

  • budget limit
  • feature flags

epic: provide one AWS account to each consultant

Epic steps:

Steps for the provisioning of an AWS account:

  • go to SSO > Users and ensure that target e-mail account exists and has at least one registered MFA device
  • collect input information: e-mail address, first name and last name
  • go to Service Catalog > Products > AWS Control Tower Account Factory
  • click on button Launch product
  • in field Provisioned product name, append automatic label with consultant id, e.g. 'AWS_Control_Tower_Account_Factory-11261013-s.loubeyres'
  • paste email address in fields: AccountEmail, AccountName, SSOUserEmail
  • select OU Released Accounts for field ManagedOrganizationalUnit
  • type first name in field SSOUserFirstName
  • type last name in field SSOUserLastName
  • go to Control Tower > Accounts
  • wait until account has been created and successfully enrolled
  • go to Organizations and navigate to the new account in its OU
  • add a tag with key 'account:holder' and e-mail address as value
  • go to AWS SSO > AWS Accounts
  • select new AWS account and click on Assign users ; on following page, select the user with same e-mail address ; on following page, select AWSAdministratorAccess and click on button Finish
  • go to AWS SSO > Users
  • click on user account for the target e-mail and review groups ; apply changes
  • update table on Teams with the list of AWS accounts used by team members

Add CodeBuild projects to prepare personal accounts

With this issue we introduce a CodeBuild project to prepare each assigned account.

Pre-conditions:

  • Issue #6 has been completed

Epic:

  • Add a file templates/prepare_account_buildspec.yaml that will use AWS CLI to put an event PreparedAccount. This template should use placeholder {account_identifier} that will be replaced in python with the id of the actual account that is handled by some CodeBuild project.

  • Modify code of Lambda function SignalAssignedAccount in /code/signal_assigned_account_lambda.py. Suppress code that is putting event PreparedAccount. Instead, add boto3 code to create a CodeBuild project using templates/prepare_account_buildspec.yaml transformed with the account id that is passed to the Lambda function. Code should also start the execution of the Codebuild project.

  •  Run deploy_resources.py again to update lambda function, then test the deployment by moving one AWS account to the OU Assigned Accounts. Ensure that execution of Lambda function SignalAssignedAccount, of Codebuild project PrepareAccount and of Lambda function MoveAssignedAccount are reported in CloudWatch logs, and that the account is moved automatically to the OU Released Accounts. Then delete the CodeBuild project that was created by boto3 code.

  • Modify again the code of Lambda function SignalAssignedAccount in /code/signal_assigned_account_lambda.py. This time, you will have to add code that 1) assumes role ServiceRoleForAutomation in Management account, 2) create CodeBuild project in target account using templates/prepare_account_buildspec.yaml transformed with the account id that is passed to the Lambda function. Code should also start the execution of the Codebuild project.

  •  Run deploy_resources.py again to update lambda function, then test the deployment by moving one AWS account to the OU Assigned Accounts. Ensure that execution of Lambda function SignalAssignedAccount, of Codebuild project PrepareAccount and of Lambda function MoveAssignedAccount are reported in CloudWatch logs, and that the account is moved automatically to the OU Released Accounts.

add a new Lambda function to reset all accounts

What is this?

This Lambda function will scan organizational units and managed accounts, and tag accounts with state 'vanilla'.

Why is it important?

The state machine is based on some tag structure. When code or settings break these tag structure, then state machines do not work anymore. Since the state of an account is contained in a tag attached to it, for such situations you can use AWS Organizations to change the state of any account. For this you would visit the page of each account and then change the tag account-state to the value vanilla. This operation is feasible for some dozens of accounts, but can become tedious for large number of accounts. Therefore the need to reset accounts to the state vanilla with a simple invocation of a Lambda function devoted to this usage.

How to implement it?

  • document recovery scenarios in a feature file
  • create a new Lambda function ResetAccounts
  • add a workbook to help system administrators
  • test in production environment

add a Makefile command to estimate infrastructure costs

What is this?

This is a new CLI command added to the SPA project to estimate the cost of SPA deployment. When you type the command make cost-estimation this generates a CloudFormation template and passes it to the AWS CLI command estimate-template-cost. The output is a link to the AWS Calculator with cost indication for resources in the template.

Learn more:

Why is this important?

While SPA is based on serverless AWS products and services, and very cost-effective, we need to factually estimate costs that it can incur.

How to implement it?

The AWS CLI has a command to bridge with the AWS Cost Calculator

Definition of Done

  • Add to Makefile a new target cost-estimation with appropriate shell commands
  • send temporary template to a S3 bucket

add an event store to persist events

What is this?

This is a new persistence layer for events related to SPA. We would like to persists all events captured from the bus into a store and we want to stream these changes to publishers. In addition, we will also correlate events participating to the same transaction. Initially we have two kinds of transaction:

  • the release of a new account: vanilla --> preparation --> release
  • the maintenance of an account: expiration --> purge --> preparation --> release

Learn more:

Why is it important?

The introduction of an event store into SPA is serving multiple goals:

  • Aggregate events per accounts to identify complete and partial maintenance cycles -- from technical view to business view
  • Provide a clear representation of each account state and history of most recent maintenance window -- functional observability
  • Generate new events on completed maintenance cycles -- to meter SPA as a service on the long run

How to implement it?

The following is the recommended list of activities as per Behaviour-Driven Development (BDD):

  • Add a new features/metering.feature file to describe expected behaviour of the event store
  • Add to fixtures/settings/setting.yaml a parameter related to events TTL in days
  • Ensure that make all-tests fails
  • Change resources/configuration.py to accept configurable TTL for events in days
  • Add unit tests in tests/test_configuration.py to validate TTL configuration input
  • Ensure that make all-tests passes
  • Create a new resources/metering_construct.py with a construct Metering and an empty __init__() function
  • Include this new construct into resources/serverless_stack.py and ensure that make all-tests passes correctly
  • Add code to resources/metering_construct.py that creates a DynamoDB table to persist events for configurable TTL
  • Copy code/on_events_handler.py to code/on_events_then_store_handler.py
  • Copy tests/test_on_events_handler.py to tests/test_on_events_then_store_handler.py
  • Ensure that make all-tests passes
  • Edit tests/test_on_events_handler.py to create a lits of events for a given account and to pass it to the handler
  • Edit Lambda code in code/on_events_then_store_handler.py to save events to the DynamoDB table
  • Ensure that make all-tests passes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.