Comments (5)
After some research, I'm going to give Pulumi a try as our IaC tooling.
Terraform, the industry standard IaC product, is a great product but has a large learning curve and uses a domain-specific scripting language (HCL). Given the team's lack of full-time dev-ops staff and our fairly simple AWS usage, Terraform seems like overkill.
I haven't used Pulumi before, but it has reasonable adoption and supports Python as our IaC language.
Detailed notes on the research (links to internal Reich Lab Confluence...if anyone outside the lab is interested, please let me know): https://reichlab.atlassian.net/wiki/spaces/RLD/pages/4325787/Infrastructure+as+Code+IaC
Will go through the Pulumi tutorial and report back.
from hubverse-cloud.
Would Ansible be a potential candidate for our purposes?
I have worked with it before and found it reasonably easy to pick up.
The fact that it is effectively coded in YAML makes it more language agnostic which I find quite appealing. Of course that's because I'm not a python person. 😜
from hubverse-cloud.
Heya @annakrystalli sorry for not seeing your note sooner...I don't have github notifications going to my main inbox b/c they're so noisy, will try to fix it up so I see the important pings!
I didn't have Ansible on my "things to look at list" because the practice of "provisioning infrastructure with Terraform and doing config management/deployments/orchestration with Ansible" is so common that I hadn't considered Ansible for the former.
Assuming that Ansible can configure every type of AWS resource we'll need (it probably can, our needs are simple: S3 buckets, IAM roles, IAM policies, OIDC identity provider), the main difference between it and a tool like Terraform or Pulumi is state management.
IaC tools that track state (usually by storing it in the vendor's cloud or by self-hosting state in your own cloud) make it much easier to 1) audit changes to infrastructure and 2) manage "drift", which happens when someone modifies a managed resource outside of the tool (e.g., using the AWS console)
For example, tools that manage state are able to provide a "diff" before you actually apply infrastructure changes (this is from pulumi, but Terraform works the same way):
pulumi up
Previewing update (hubverse)
View in Browser (Ctrl+O): https://app.pulumi.com/bsweger/hubverse-aws/hubverse/previews/283337c1-0892-431f-8618-00938007ca32
Type Name Plan
+ pulumi:pulumi:Stack hubverse-aws-hubverse create
+ └─ aws:iam:OpenIdConnectProvider github-actions create
Resources:
+ 2 to create
Do you want to perform this update?
Ansible doesn't track state, it just creates/changes infrastructure as defined in the playbook. Which is easier in some ways, because there's no state to track. But you lose the advantages of above.
All things considered, I'd vote for something with state management. Despite the extra moving part, having a mechanism to know when the state of our infrastructure as defined via code differs from the state of our actual infrastructure is worthwhile for a smaller team.
from hubverse-cloud.
@annakrystalli given that the lab as a whole is trying to level-up on Python, my original thought was that using Python here might be more accessible than Terraform's specific language. But maybe I'm making unwarranted assumptions? FWIW, my YAML experience in this space is that it's hard to test, maintain, and debug once you reach a certain level of complexity.
What do you think about something like this: https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-infrastructure/blob/main/__main__.py
This is an experimental repo to get a feel for Pulumi, which I've never used before.
Will be adding a README soon, but the upshot is that you define your resources in the Python app and then apply them either:
- via a GitHub action which adds the diff (see above comment) to PRs for review
- via command line, which is what I'm doing to experiment
from hubverse-cloud.
I changed the title of this issue after realizing that "deciding" on a tool isn't something we can do via a discussion here.
I did a test drive of Pulumi and got a process working that will provision the AWS infrastructure required to mirror hub data to S3. The repo is here: https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-infrastructure
It looks somewhat intimidating (every tool will look intimidating in its own way as we learn about it). It's important, however, to note that the learning curve for IaC tools is separate than the learning curve for understanding the AWS resources themselves. The former we can control to some extent, the latter we can't...it's part of the cost of being on the cloud.
I propose that a next step would be a Pulumi demo to get feedback to see what people think and determine if we'd like to explore an alternative.
from hubverse-cloud.
Related Issues (20)
- Create an AWS account for the Hubverse
- Create an initial proof of concept for syncing hub data to AWS S3
- Create AWS alert for unusual activity HOT 3
- Decide on a data format for hubverse cloud storage HOT 3
- Schedule a demo of Hubverse cloud infrastructure HOT 1
- How will we automate the conversion of hub data to parquet after syncing to S3? HOT 5
- Investigate the actual behavior of S3 sync HOT 4
- Switch sync utility used in hubverse-aws-upload workflow HOT 1
- Create a test function to transform model-output data HOT 2
- Test the hubverse-aws-upload workflow against a large volume of data HOT 5
- Create proof-of-concept for using S3 triggers for automated conversion of model-output files HOT 3
- Get IaC production-ready: documentation HOT 2
- Get IaC production-ready: add branch protections HOT 1
- Get IaC production-ready: add linting and type checking
- Get IaC production-ready: remove GitHub secret for Pulumi AWS access
- Move model-output transform function to its own repository
- Get IaC production-ready: add test suite to Pulumi code
- test item - delete me
- [ORG NAME CHANGE]: Update repo to hubverse-org organisation name HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hubverse-cloud.