getwilds / sixtyfour Goto Github PK
View Code? Open in Web Editor NEW🚚 CEO, entrepreneur
Home Page: https://getwilds.org/sixtyfour/
License: Other
🚚 CEO, entrepreneur
Home Page: https://getwilds.org/sixtyfour/
License: Other
... and we should get really opinionated what the columns of those data frames should be.
Originally via @seankross in #3 (review)
On branch s3-iam
aws_secrets_all()
Error in `relocate()`:
! Can't select columns that don't exist.
✖ Column `arn` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Probably the ideal is it returns an empty tibble (like when there are no buckets)
I'm thinking we should have a set of docs like these as a baseline, and then more of a "cookbook" set of docs based on the interactions that we find people using the most, but that's down the line.
Originally posted by @seankross in #25 (review)
The list (add new ones as needed) and how to help users avoid them:
Some error messages that I've seen thus far in paws
are not going to be useful to the typical sixtyfour
user. e.g.,
Below, probably users won't be familiar with http status codes, though I know that a 404 means not found, so I can intuit that something wasn't found, but was it the bucket or the key? In addition, the error message is not useful to the user at all.
aws_file_attr(bucket = "s64-test-2", key = "doesntexist")
#> Error: SerializationError (HTTP 404). failed to read from query HTTP response body
Though sometimes the error messages are good:
aws_bucket_list_objects(bucket="s64-test-211")
#> Error: NoSuchBucket (HTTP 404). The specified bucket does not exist
and another good error message
desc_file <- file.path(system.file(), "DESCRIPTION")
aws_file_upload(bucket = "not-a-bucket", path = desc_file)
#> Error: BucketAlreadyExists (HTTP 409). The requested bucket name is not available.
#> The bucket namespace is shared by all users of the system. Please select a different name and try again.
@seankross just a placeholder to maybe deal with this, any thoughts welcome
It's too big - factor out of some stuff, simplify, etc.
Right now on the dbs
branch we allow only one client instance for each of Redshift and RDS.
One needs a new client for either of those services only if there are different credentials; if credentials don't change then the same client can be reused.
How to do this?
I was trying to get aws_user_add_to_rds
working to make the RDS IAM flow workable, but its still not working yet. AFAIC remember I still wasn't getting the code working correctly for granting a user permissions in the database via IAM in the function. i.e., with the AWSAuthenticationPlugin
bit in the code
work is on branch rds-iam-flow
paws
is using httr
under the hood https://cran.r-project.org/web/packages/paws.common/index.html - should be able to use vcr
to cache requests, speed up tests, hide secrets, etc.That is, examples for any function shouldn't depend on anything existing already
see https://github.com/sckott/s3fs/commit/bb34aadd0e5801d5ea2742c1aa437a12e251fbed
relying on a fork for now
e.g., was doing this on manage-secrets
branch but it's not specific to that branch ...
invisible(vcr::vcr_configure(
dir = vcr::vcr_test_path("fixtures"),
filter_sensitive_data = list(
"<<aws_region>>" = Sys.getenv("AWS_REGION"),
"ClientRequestToken" = "something"
),
filter_request_headers = list(
Authorization = "redacted",
"X-Amz-Content-Sha256" = "redacted",
"X-Amz-Target" = "redacted",
"User-Agent" = "redacted"
),
filter_response_headers = list(
"x-amz-id-2" = "redacted",
"x-amz-request-id" = "redacted",
"x-amzn-requestid" = "redacted"
)
))
might be harder to automate locally for everyone
E.g. Use @family buckets
for all bucket fxns
either in markdown in pkg or in notion?
I was looking at addressing this comment about checking an env var, and then realized that paws has a number of different ways to find user auth details. So we can't just look for env vars.
However, we still need to have access to some credentials for our own use within this package. e.g, the link above where we want to get the aws region the user has set in their creds.
We can hack getting access key and secret key via calling this anonymous function in an environment, but that's hacky for sure. and it doesn't list the aws region either
s3 <- paws::s3()
s3$.internal$config$credentials$provider[[2]]()
Perhaps there's a way in paws to fetch user creds somehow, and I just haven't found it yet
user story: people want to be able to interact with their data via dplyr
, etc.
costexplorer
It'd be perfect if there was a way to spin up a temporary top level S3 account to run a test suite, then clean it up afterwards INSTEAD OF messing with our real account.
Though unit tests I think will be using cached fixtures anyway, so on CI won't be hitting our real buckets.
It could delete a bucket even if it has files in it. We should keep the Are you sure? interactive feature and force = FALSE as an arg.
paws
and s3fs
docsvia #17 (comment) and #17 (comment)
Let's make an interface to AWS secrets manager. A prompt interface with cli
/symbols
pkgs to select credentials or dbi interface, etc.
rds <- paws::rds()
token <- rds$build_auth_token(endpoint, region, user)
# then token passed to DBI::dbConnect()
Everything else, but we SHOULD document in vignettes how to do certain things:
Uploads a file to a bucket by specifying the bucket name, not a remote path. In fact maybe there should just be a six_bucket_upload that can handle files or folders as things that are upload-able, and bucket names or remote paths to specify where to put the files.
when an admin adds a user
as an admin
Prefix for magic fxns: six_
Analogy we talked about: magic fxns (six_
) are like rails, and non magic fxns aws
are not rails (maybe ~ sinatra)
Idea is to maintain aws
as the non-magic fxns, they don't do any magic, just interface with the AWS rest api. While magic fxns do more and have side effects that are harder to reason about
The code below was originally part of the #55 pull request, but was removed as it wasn't completely working and deemed not important enough to spend more time on now to make it work.
check_simulated_user
was called as the last stop before finishing in six_admin_setup
#' Check if a user has access to an AWS service
#' @export
#' @param fun (funcction) a function. required
#' @param ... additional named args passed to `fun`
#' @return single boolean. checks [rlang::is_null()] against `$error` result of
#' call to [purrr::safely()]
#' @details really just a generic check that any function can run with
#' its inputs; not specific to AWS or any particular function
has_access <- function(fun, ...) {
rlang::is_null(purrr::safely(fun, FALSE)(...)$error)
}
#' @importFrom dplyr any_of
check_simulated_user <- function(group) {
rlang::check_installed("callr")
cli_info("Checking that a simulated user can access {.strong {group}} group")
randuser <- random_user()
creds <- suppm(six_user_create(randuser))
aws_user_add_to_group(randuser, group)
creds_mapper <- c(
"AWS_ACCESS_KEY_ID" = "AccessKeyId",
"AWS_SECRET_ACCESS_KEY" = "SecretAccessKey",
"AWS_REGION" = "AwsRegion"
)
creds_lst <- as_tibble(creds) %>%
rename(any_of(creds_mapper)) %>%
select(starts_with("AWS")) %>%
as.list()
all_checks <- callr::r(function(creds) {
withr::with_envvar(
creds,
{
check_iam <- sixtyfour::has_access(sixtyfour::aws_user)
check_rds <- sixtyfour::has_access(sixtyfour::aws_db_instance_details)
check_rs <- sixtyfour::has_access(sixtyfour::aws_db_cluster_details)
check_s3 <- sixtyfour::has_access(sixtyfour::aws_buckets)
check_bil <- sixtyfour::has_access(sixtyfour::aws_billing_raw,
date_start = Sys.Date() - 1,
metrics = "BlendedCost"
)
list(
IAM = check_iam,
RDS = check_rds,
Redshift = check_rs,
S3 = check_s3,
Billing = check_bil
)
}
)
}, args = list(creds_lst))
if (all(unlist(all_checks))) {
cli_success(" All checks passed!")
} else {
cli_warning(c(
" At least one check didn't pass ",
"({names(keep(all_checks, isFALSE))}) ",
"try again or open an issue"
))
}
cli_info(" Cleaning up simulated user")
aws_user_remove_from_group(randuser, group)
suppm(six_user_delete(randuser))
cli_alert_info("") # nolint
}
Notes
aws_db_instance_details
is the same instance_details
in the current version of the pkgaws_db_cluster_details
is the same cluster_details
in the current version of the pkgThe parts that were not working:
callr::r
WAS working interactively after loading all the code in the package, but WAS NOT working if I load sixytour
then call the six_admin_setup
function - i'm not sure exactly why, but I think it has to do with the complex-ish nature of how paws
loads credentials. I think I needed to make sure the R session that callr::r
was running was not loading any of the credentials I have saved, and only the creds passed into the function, but that was not happening successfully (I kept getting a 403 error) like:six_admin_setup("uzers", "zadmin")
#> ℹ whoami: scott (account: 744061095407)
#> ℹ
#> ! uzers group NOT created - a uzers group already exists in your account
#> ℹ Not adding policies to the uzers group
#> ℹ
#> ! zadmin group NOT created - an zadmin group already exists in your account
#> ℹ Not adding policies to the zadmin group
#> ℹ
#> ℹ Checking that a simulated user can access uzers group
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> Type .Last.error to see the more details.
#>
#> :p .Last.error
#> <callr_error/rlib_error_3_0/rlib_error/error>
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> ---
#> Backtrace:
#> 1. sixtyfour::six_admin_setup("uzers", "zadmin")
#> 2. sixtyfour:::check_simulated_user(users_group) at admin.R:113:3
#> 3. callr::r(function(creds) { … at admin.R:155:3
#> 4. callr:::get_result(output = out, options)
#> 5. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
#> ---
#> Subprocess backtrace:
#> 1. sixtyfour::aws_user()
#> 2. env64$iam$get_user(username)$User %>% list(.) %>% user_list_tidy()
#> 3. sixtyfour:::user_list_tidy(.)
#> 4. rlang::is_empty(x)
#> 5. env64$iam$get_user(username)
#> 6. paws.common::send_request(request)
#> 7. paws.common:::retry(request)
#> 8. paws.common:::run(request, retry)
#> 9. handler$fn(request)
#> 10. base::stop(error)
#> 11. global (function (e) …
@seankross we chatted briefly about this. some notes
It might be nice for an admin of an AWS account to see what the other folks on their account see - just to check that permissions are set correctly i imagine
Was thinking this
users <- list(
list(
user = "sally",
AWS_ACCESS_KEY_ID = "ASPDF80ASDFDF",
AWS_SECRET_ACCESS_KEY = "ADFPA8FAADF",
AWS_REGION = "us-west-2"
),
list(
user = "malorie",
AWS_ACCESS_KEY_ID = "ASDF08AFAD80ADSF",
AWS_SECRET_ACCESS_KEY = "ADFPAADF80A999",
AWS_REGION = "us-west-2"
)
)
fake_aws_user <- function() {
Filter(
function(z) z$AWS_ACCESS_KEY_ID == Sys.getenv("AWS_ACCESS_KEY_ID"),
users
)
}
withr::with_envvar(
c(
"AWS_ACCESS_KEY_ID" = "ASDF08AFAD80ADSF",
"AWS_SECRET_ACCESS_KEY" = "ADFPA8FAADF",
"AWS_REGION" = "us-west-2"
),
fake_aws_user()
)
aws_user_impersonate <- function(username, code) {
withr::with_envvar(
# get user creds somehow?,
force(code)
)
}
# hmm, this wouldn't work - as an admin i'd want to put in a username, but you wouldn't have those creds
# unless you saved them all somewhere, which seems unlikely
aws_user_impersonate("sally")
But then thought this probably doesn't make sense b/c the admin wouldn't probably have tokens for each user saved - and you can't look them up after the fact unless you create a new set.
What would have to be done so that this doesn't happen?
six_user_create("amy")
ℹ Added policy UserInfo to amy
✔ Key pair created for amy
ℹ UserName: amy
...
aws_bucket_add_user("dasl-project1", "amy", permissions = "read")
✔ amy now has read access to bucket dasl-project1
aws_bucket_permissions("dasl-project1")
# A tibble: 3 × 4
user permissions policy_read policy_admin
<chr> <chr> <chr> <chr>
1 amy read S3ReadOnlyAccessDaslProject1 NA
2 scott admin NA NA
3 sean admin NA NA
> Sys.setenv(
+ AWS_ACCESS_KEY_ID = "AmysKey",
+ AWS_SECRET_ACCESS_KEY = "AmysSecret",
+ AWS_REGION = "us-west-2"
+ )
aws_user_current()
[1] "amy"
aws_buckets()
# A tibble: 2 × 8
bucket_name key uri size type owner etag last_modified
<chr> <chr> <chr> <fs::bytes> <chr> <chr> <chr> <dttm>
1 dasl-project1 "" s3://dasl-project1 0 bucket "" "" NA
2 dasl-project2 "" s3://dasl-project2 0 bucket "" "" NA
https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/
@seankross any sense for whether FH folks are generally running newer versions of R or not so much?
If folks generally use 4.1 or greater, then we could use |>
, but if not, then we'd probably want to stick with %>%
thoughts?
Can sixtyfour::aws_users()
include a list column called Tags that contains dataframes with columns Key
and Value
? Same with aws_user()
?
probably could be cleaner
I feel like I left things up in the air with with respect to RDS features, and if you don't think it's wise for it to be in scope for v0.1 that's okay with me. For now I'm getting these errors on branch s3-iam #43:
> aws_db_rds_list()
Error in `mutate()`:
ℹ In argument: `AccountId = split_grep(DBInstanceArn, ":", "^[0-9]+$")`.
Caused by error:
! object 'DBInstanceArn' not found
Run `rlang::last_trace()` to see where the error occurred.
> aws_db_rds_create(
+ id = "testing135",
+ engine = "mariadb",
+ class = "db.t3.micro",
+ BackupRetentionPeriod = 0
+ )
ℹ `user` is NULL; created user: ListeningDiffere
ℹ `pwd` is NULL; created password: *******
Error in `dplyr::filter()`:
ℹ In argument: `map_lgl(IpPermissions, ~.$ToPort == port)`.
Caused by error in `map_lgl()`:
ℹ In index: 5.
Caused by error:
! Result must be length 1, not 0.
Run `rlang::last_trace()` to see where the error occurred.
> aws_file_delete("s3://dasl-project2/account_id.Rd")
$DeleteMarker
logical(0)
$VersionId
character(0)
$RequestCharged
character(0)
I'm okay with this return value but maybe we should return this value invisibly.
The results of paws
calls are generally named lists, sometimes nested. What should we do to these data before giving back to users:
paws
- a named list, in most casesThere's other cases where what we return is more clear cut. Eg.., in a fxn that checks if a bucket exists we give back a boolean
@seankross thoughts?
possible fxn to add in the future
Thinking about this from the perspective of this image
from the youtube video sean shared
Here's what I'm thinking:
aws_user*
/aws_users*
aws_group*
/aws_groups*
aws_role*
/aws_roles*
aws_policy*
/aws_policies*
- some of these fxns used for attaching policies to users, groups, rolesso in the end we could have a workflow like:
# in each case below aws_policy_attach determines from input whether
# its a group, role, or user. And prefixes policy with `arn:aws:iam::aws:policy`
aws_group_create("testers") %>% aws_policy_attach("ReadOnlyAccess")
aws_role_create("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")
aws_user_create("jane") %>% aws_policy_attach("AdministratorAccess")
# or if already created, then:
aws_role("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")
Another example
aws_group_add_users(group = "testers",
aws_user_create("jane"),
aws_user_create("sally"),
aws_user_create("susy")
)
@seankross feedback plz
see also #32
how to work with databases with sixtyfour, make sure to include
At least I currently don't have permission to modify bucket ACLs, so can't test and make sure that aws_bucket_acl_modify
works.
Perhaps with the new test AWS account i'll be able to test this.
Messing around uploading folders in the sixtyfour working directory, I get:
> aws_bucket_upload("man", "dasl-project1")
[1] "s3://Users/skross/Developer/sixtyfour/man"
The return value strikes me as weird. That's not the path in the bucket (which is correct):
> aws_bucket_list_objects("dasl-project1") |> head() |> glimpse()
Rows: 6
Columns: 8
$ bucket_name <chr> "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1"
$ key <chr> "account_id.Rd", "as_policy_arn.Rd", "aws_billing.Rd", "aws_billing_raw.Rd", "aws_bucket_add_user.Rd",…
$ uri <chr> "s3://dasl-project1/account_id.Rd", "s3://dasl-project1/as_policy_arn.Rd", "s3://dasl-project1/aws_bil…
$ size <fs::bytes> 401, 1.69K, 2.37K, 1.39K, 1.21K, 1.46K
$ type <chr> "file", "file", "file", "file", "file", "file"
$ owner <chr> NA, NA, NA, NA, NA, NA
$ etag <chr> "\"ee5e5d92046f900647bfa4c0f0ef14cf\"", "\"a4c6748e4380c88eac01878e298f882e\"", "\"ec1816624f1e3e4e221…
$ last_modified <dttm> 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 0…
Does the return value of aws_bucket_upload()
strike you as weird?
can't do this yet as we can't get historical data - should be able to do with Localstack
https://github.com/DyfanJones/s3fs
just saw pkg, check to see if it might make sense to use here
from #10
Possibly via some helper fxn, ideally not through every function, which would add extra work for everyone.
e.g., costexplorer
has
costexplorer(
config = list(),
credentials = list(),
endpoint = NULL,
region = NULL
)
We initialize that list of functions for cost explorer ourselves - we could have helper funs to allow users to set creds so whenever it initializes they are used
Originally posted by @sckott in #10 (comment)
aws_file_upload
uses s3fs::s3_file_copy
internally. look into that function, perhaps somethjing wrong
On branch more-magic
> aws_user_current()
[1] "sean"
> six_bucket_permissions("dasl-project1")
# A tibble: 3 × 4
user permissions policy_write policy_admin
<chr> <chr> <chr> <chr>
1 amy write S3FullAccessDaslProject1 NA
2 scott admin NA NA
3 sean admin NA NA
...
> library(sixtyfour)
> aws_user_current()
[1] "amy"
> aws_bucket_upload("data-raw", "dasl-project1")
Error: AccessDenied (HTTP 403). Access Denied
> aws_file_upload("DESCRIPTION", "s3://dasl-project1")
Error in `map2()`:
ℹ In index: 1.
Caused by error:
! AccessDenied (HTTP 403). Access Denied
Run `rlang::last_trace()` to see where the error occurred.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.