getwilds / sixtyfour Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 1.0 7.64 MB

🚚 CEO, entrepreneur

Home Page: https://getwilds.org/sixtyfour/

License: Other

R 99.05% Makefile 0.95%

aws aws-billing r r-pkg rstats wilds-r

sixtyfour's People

Contributors

Stargazers

Watchers

sixtyfour's Issues

bucket and files fxns should return data frames

... and we should get really opinionated what the columns of those data frames should be.

Originally via @seankross in #3 (review)

aws_secrets_all errors when there are no secrets

On branch s3-iam

aws_secrets_all()

Error in `relocate()`:
! Can't select columns that don't exist.
✖ Column `arn` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.

Probably the ideal is it returns an empty tibble (like when there are no buckets)

"cookbook" docs

I'm thinking we should have a set of docs like these as a baseline, and then more of a "cookbook" set of docs based on the interactions that we find people using the most, but that's down the line.

Originally posted by @seankross in #25 (review)

Stumbling blocks and avoiding them

The list (add new ones as needed) and how to help users avoid them:

authentication
leaving an expensive process running
messing up permissions

How to do error handling with paws?

Some error messages that I've seen thus far in paws are not going to be useful to the typical sixtyfour user. e.g.,

Below, probably users won't be familiar with http status codes, though I know that a 404 means not found, so I can intuit that something wasn't found, but was it the bucket or the key? In addition, the error message is not useful to the user at all.

aws_file_attr(bucket = "s64-test-2", key = "doesntexist")
#> Error: SerializationError (HTTP 404). failed to read from query HTTP response body

Though sometimes the error messages are good:

aws_bucket_list_objects(bucket="s64-test-211")
#> Error: NoSuchBucket (HTTP 404). The specified bucket does not exist

and another good error message

desc_file <- file.path(system.file(), "DESCRIPTION")
aws_file_upload(bucket = "not-a-bucket", path = desc_file)
#> Error: BucketAlreadyExists (HTTP 409). The requested bucket name is not available. 
#> The bucket namespace is shared by all users of the system. Please select a different name and try again.

@seankross just a placeholder to maybe deal with this, any thoughts welcome

Clean up security_group_handler fxn

It's too big - factor out of some stuff, simplify, etc.

Allow for managing separate client objects?

Right now on the dbs branch we allow only one client instance for each of Redshift and RDS.

One needs a new client for either of those services only if there are different credentials; if credentials don't change then the same client can be reused.

How to do this?

Tell users to open up separate R sessions, one for each set of unique credentials.
Require passing in client object into each function that interacts with Redshift/RDS.

RDS IAM flow function aws_user_add_to_rds

I was trying to get aws_user_add_to_rds working to make the RDS IAM flow workable, but its still not working yet. AFAIC remember I still wasn't getting the code working correctly for granting a user permissions in the database via IAM in the function. i.e., with the AWSAuthenticationPlugin bit in the code

work is on branch rds-iam-flow

Keep improving tests

http request caching: looks like paws is using httr under the hood https://cran.r-project.org/web/packages/paws.common/index.html - should be able to use vcr to cache requests, speed up tests, hide secrets, etc.
be very sure no secrets are in fixtures
aws env vars need to be present within this github repo, but they don't need to be real
what else?

Make all examples self contained

That is, examples for any function shouldn't depend on anything existing already

Merge fix for s3fs pkg into origin

see https://github.com/sckott/s3fs/commit/bb34aadd0e5801d5ea2742c1aa437a12e251fbed

relying on a fork for now

Update vcr filters

e.g., was doing this on manage-secrets branch but it's not specific to that branch ...

invisible(vcr::vcr_configure(
  dir = vcr::vcr_test_path("fixtures"),
  filter_sensitive_data = list(
    "<<aws_region>>" = Sys.getenv("AWS_REGION"),
    "ClientRequestToken" = "something"
  ),
  filter_request_headers = list(
    Authorization = "redacted",
    "X-Amz-Content-Sha256" = "redacted",
    "X-Amz-Target" = "redacted",
    "User-Agent" = "redacted"
  ),
  filter_response_headers = list(
    "x-amz-id-2" = "redacted",
    "x-amz-request-id" = "redacted",
    "x-amzn-requestid" = "redacted"
  )
))

use minio for s3 testing at least on gh actions

might be harder to automate locally for everyone

Use family roxygen tag to generate seealso's for related fxns

E.g. Use @family buckets for all bucket fxns

Document minio usage in more detail

either in markdown in pkg or in notion?

Add Getting Started vignette

start vignette, add vignette infrastructure
use our new testing AWS account so its easier to build this and other vignettes

Dealing with credentials

I was looking at addressing this comment about checking an env var, and then realized that paws has a number of different ways to find user auth details. So we can't just look for env vars.

However, we still need to have access to some credentials for our own use within this package. e.g, the link above where we want to get the aws region the user has set in their creds.

We can hack getting access key and secret key via calling this anonymous function in an environment, but that's hacky for sure. and it doesn't list the aws region either

s3 <- paws::s3()
s3$.internal$config$credentials$provider[[2]]()

Perhaps there's a way in paws to fetch user creds somehow, and I just haven't found it yet

Billing improvements

user story: people want to be able to interact with their data via dplyr, etc.

have two different interfaces in the package? We're only doing costexplorer
- one for https://www.paws-r-sdk.com/docs/costandusagereportservice/ specifically this fxn - AWS page - I don't have access to this in either prod AWS or test AWS accounts
- and one for https://www.paws-r-sdk.com/docs/costexplorer/ - this I can work on and have access to
S3 costs per bucket - for some insane reason there's a complicated workflow for getting per bucket costs, do we deal with that ourselves so users don't have to?
- not doing now, maybe later
- see docs for per bucket pricing
- possibly this paws method could help
add a vignette with visualizations. maybe incorporate canned plots into the package later.
function to get rawest data, in list form
#39

Temporary accounts for testing?

It'd be perfect if there was a way to spin up a temporary top level S3 account to run a test suite, then clean it up afterwards INSTEAD OF messing with our real account.

Though unit tests I think will be using cached fixtures anyway, so on CI won't be hitting our real buckets.

magic function idea: six_bucket_delete

It could delete a bucket even if it has files in it. We should keep the Are you sure? interactive feature and force = FALSE as an arg.

repo tidying

README:
- move scan secrets into a new "developer" section or similar
- add sections like docs, bugs, etc. like rcromwell
- link to paws and s3fs docs
- link to the WILDS website and wilds guide
change repo status badge to experimental

Secrets manager interface

via #17 (comment) and #17 (comment)

Let's make an interface to AWS secrets manager. A prompt interface with cli/symbols pkgs to select credentials or dbi interface, etc.

Database functionality

What kinds of functions do we want to provide

Setup a simple database; make sane choices for users, with assumption that users needs are for basic data science tasks
Change permissions for a database, table, or row
Fetch a key to use to connect to a database, e.g.:

rds <- paws::rds()
token <- rds$build_auth_token(endpoint, region, user)
# then token passed to DBI::dbConnect()

Things we will leave as an exercise for the user

Everything else, but we SHOULD document in vignettes how to do certain things:

Table management
Simple queries

Fill out Description field in DESCRIPTION file

magic function: six_file_upload

Uploads a file to a bucket by specifying the bucket name, not a remote path. In fact maybe there should just be a six_bucket_upload that can handle files or folders as things that are upload-able, and bucket names or remote paths to specify where to put the files.

Magic functions

when an admin adds a user

add policies for that user:
- get user
not adding
- list other users
- etc.

as an admin

should there be a setup fxn? it could
- create a users group
- create an admin group

Prefix for magic fxns: six_

Analogy we talked about: magic fxns (six_) are like rails, and non magic fxns aws are not rails (maybe ~ sinatra)

Idea is to maintain aws as the non-magic fxns, they don't do any magic, just interface with the AWS rest api. While magic fxns do more and have side effects that are harder to reason about

@seankross

Simulate user

The code below was originally part of the #55 pull request, but was removed as it wasn't completely working and deemed not important enough to spend more time on now to make it work.

check_simulated_user was called as the last stop before finishing in six_admin_setup

#' Check if a user has access to an AWS service
#' @export
#' @param fun (funcction) a function. required
#' @param ... additional named args passed to `fun`
#' @return single boolean. checks [rlang::is_null()] against `$error` result of
#' call to [purrr::safely()]
#' @details really just a generic check that any function can run with
#' its inputs; not specific to AWS or any particular function
has_access <- function(fun, ...) {
  rlang::is_null(purrr::safely(fun, FALSE)(...)$error)
}

#' @importFrom dplyr any_of
check_simulated_user <- function(group) {
  rlang::check_installed("callr")
  cli_info("Checking that a simulated user can access {.strong {group}} group")
  randuser <- random_user()
  creds <- suppm(six_user_create(randuser))
  aws_user_add_to_group(randuser, group)

  creds_mapper <- c(
    "AWS_ACCESS_KEY_ID" = "AccessKeyId",
    "AWS_SECRET_ACCESS_KEY" = "SecretAccessKey",
    "AWS_REGION" = "AwsRegion"
  )
  creds_lst <- as_tibble(creds) %>%
    rename(any_of(creds_mapper)) %>%
    select(starts_with("AWS")) %>%
    as.list()

  all_checks <- callr::r(function(creds) {
    withr::with_envvar(
      creds,
      {
        check_iam <- sixtyfour::has_access(sixtyfour::aws_user)
        check_rds <- sixtyfour::has_access(sixtyfour::aws_db_instance_details)
        check_rs <- sixtyfour::has_access(sixtyfour::aws_db_cluster_details)
        check_s3 <- sixtyfour::has_access(sixtyfour::aws_buckets)
        check_bil <- sixtyfour::has_access(sixtyfour::aws_billing_raw,
          date_start = Sys.Date() - 1,
          metrics = "BlendedCost"
        )
        list(
          IAM = check_iam,
          RDS = check_rds,
          Redshift = check_rs,
          S3 = check_s3,
          Billing = check_bil
        )
      }
    )
  }, args = list(creds_lst))

  if (all(unlist(all_checks))) {
    cli_success("  All checks passed!")
  } else {
    cli_warning(c(
      "  At least one check didn't pass ",
      "({names(keep(all_checks, isFALSE))}) ",
      "try again or open an issue"
    ))
  }

  cli_info("  Cleaning up simulated user")
  aws_user_remove_from_group(randuser, group)
  suppm(six_user_delete(randuser))
  cli_alert_info("") # nolint
}

Notes

aws_db_instance_details is the same instance_details in the current version of the pkg
aws_db_cluster_details is the same cluster_details in the current version of the pkg

The parts that were not working:

Running callr::r WAS working interactively after loading all the code in the package, but WAS NOT working if I load sixytour then call the six_admin_setup function - i'm not sure exactly why, but I think it has to do with the complex-ish nature of how paws loads credentials. I think I needed to make sure the R session that callr::r was running was not loading any of the credentials I have saved, and only the creds passed into the function, but that was not happening successfully (I kept getting a 403 error) like:

six_admin_setup("uzers", "zadmin")
#> ℹ whoami: scott (account: 744061095407)
#> ℹ
#> ! uzers group NOT created - a uzers group already exists in your account
#> ℹ Not adding policies to the uzers group
#> ℹ
#> ! zadmin group NOT created - an zadmin group already exists in your account
#> ℹ Not adding policies to the zadmin group
#> ℹ
#> ℹ Checking that a simulated user can access uzers group
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> Type .Last.error to see the more details.
#> 
#> :p .Last.error
#> <callr_error/rlib_error_3_0/rlib_error/error>
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> ---
#> Backtrace:
#> 1. sixtyfour::six_admin_setup("uzers", "zadmin")
#> 2. sixtyfour:::check_simulated_user(users_group) at admin.R:113:3
#> 3. callr::r(function(creds) { … at admin.R:155:3
#> 4. callr:::get_result(output = out, options)
#> 5. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
#> ---
#> Subprocess backtrace:
#> 1. sixtyfour::aws_user()
#> 2. env64$iam$get_user(username)$User %>% list(.) %>% user_list_tidy()
#> 3. sixtyfour:::user_list_tidy(.)
#> 4. rlang::is_empty(x)
#> 5. env64$iam$get_user(username)
#> 6. paws.common::send_request(request)
#> 7. paws.common:::retry(request)
#> 8. paws.common:::run(request, retry)
#> 9. handler$fn(request)
#> 10. base::stop(error)
#> 11. global (function (e) …

Impersonate for admins?

@seankross we chatted briefly about this. some notes

It might be nice for an admin of an AWS account to see what the other folks on their account see - just to check that permissions are set correctly i imagine

Was thinking this

users <- list(
  list(
    user = "sally",
    AWS_ACCESS_KEY_ID = "ASPDF80ASDFDF", 
    AWS_SECRET_ACCESS_KEY = "ADFPA8FAADF",
    AWS_REGION = "us-west-2"
  ),
  list(
    user = "malorie",
    AWS_ACCESS_KEY_ID = "ASDF08AFAD80ADSF", 
    AWS_SECRET_ACCESS_KEY = "ADFPAADF80A999",
    AWS_REGION = "us-west-2"
  )
)

fake_aws_user <- function() {
  Filter(
    function(z) z$AWS_ACCESS_KEY_ID == Sys.getenv("AWS_ACCESS_KEY_ID"), 
    users
  )
}

withr::with_envvar(
  c(
    "AWS_ACCESS_KEY_ID" = "ASDF08AFAD80ADSF", 
    "AWS_SECRET_ACCESS_KEY" = "ADFPA8FAADF",
    "AWS_REGION" = "us-west-2"
  ),
  fake_aws_user()
)

aws_user_impersonate <- function(username, code) {
  withr::with_envvar(
   # get user creds somehow?,
    force(code)
  )
}

# hmm, this wouldn't work - as an admin i'd want to put in a username, but you wouldn't have those creds
# unless you saved them all somewhere, which seems unlikely
aws_user_impersonate("sally")

But then thought this probably doesn't make sense b/c the admin wouldn't probably have tokens for each user saved - and you can't look them up after the fact unless you create a new set.

@seankross

Users can see buckets they haven't been granted permissions for

What would have to be done so that this doesn't happen?

six_user_create("amy")

ℹ Added policy UserInfo to amy
✔ Key pair created for amy
ℹ UserName: amy
...

aws_bucket_add_user("dasl-project1", "amy", permissions = "read")

✔ amy now has read access to bucket dasl-project1

aws_bucket_permissions("dasl-project1")

# A tibble: 3 × 4
  user  permissions policy_read                  policy_admin
  <chr> <chr>       <chr>                        <chr>       
1 amy   read        S3ReadOnlyAccessDaslProject1 NA          
2 scott admin       NA                           NA          
3 sean  admin       NA                           NA

> Sys.setenv(
+   AWS_ACCESS_KEY_ID = "AmysKey",
+   AWS_SECRET_ACCESS_KEY = "AmysSecret",
+   AWS_REGION = "us-west-2"
+ )

aws_user_current()

[1] "amy"

aws_buckets()

# A tibble: 2 × 8
  bucket_name   key   uri                       size type   owner etag  last_modified
  <chr>         <chr> <chr>              <fs::bytes> <chr>  <chr> <chr> <dttm>       
1 dasl-project1 ""    s3://dasl-project1           0 bucket ""    ""    NA           
2 dasl-project2 ""    s3://dasl-project2           0 bucket ""    ""    NA

Base pipe?

https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/

@seankross any sense for whether FH folks are generally running newer versions of R or not so much?

If folks generally use 4.1 or greater, then we could use |>, but if not, then we'd probably want to stick with %>%

thoughts?

Tags and Users

Can sixtyfour::aws_users() include a list column called Tags that contains dataframes with columns Key and Value? Same with aws_user()?

aws_bucket_delete visibly returns an empty list on successful bucket deletion

probably could be cleaner

RDS list and create throwing errors

I feel like I left things up in the air with with respect to RDS features, and if you don't think it's wise for it to be in scope for v0.1 that's okay with me. For now I'm getting these errors on branch s3-iam #43:

> aws_db_rds_list()
Error in `mutate()`:
ℹ In argument: `AccountId = split_grep(DBInstanceArn, ":", "^[0-9]+$")`.
Caused by error:
! object 'DBInstanceArn' not found
Run `rlang::last_trace()` to see where the error occurred.

> aws_db_rds_create(
+   id = "testing135", 
+   engine = "mariadb",
+   class = "db.t3.micro",
+   BackupRetentionPeriod = 0
+ )
ℹ `user` is NULL; created user: ListeningDiffere
ℹ `pwd` is NULL; created password: *******
Error in `dplyr::filter()`:
ℹ In argument: `map_lgl(IpPermissions, ~.$ToPort == port)`.
Caused by error in `map_lgl()`:
ℹ In index: 5.
Caused by error:
! Result must be length 1, not 0.
Run `rlang::last_trace()` to see where the error occurred.

Return value for aws_file_delete

> aws_file_delete("s3://dasl-project2/account_id.Rd")
$DeleteMarker
logical(0)

$VersionId
character(0)

$RequestCharged
character(0)

I'm okay with this return value but maybe we should return this value invisibly.

How to handle data from paws functions

The results of paws calls are generally named lists, sometimes nested. What should we do to these data before giving back to users:

Just give them what we get back from paws - a named list, in most cases
Coerce named lists to an S3 object that summarizes the list and hides the full list underneath?
Coerce named lists to a vctrs object that summarizes the list and hides the full list underneath?
Coerrce named lists to tibbles?
Something else?

There's other cases where what we return is more clear cut. Eg.., in a fxn that checks if a bucket exists we give back a boolean

@seankross thoughts?

get a presigned url for a file

possible fxn to add in the future

Permissions framework?

Thinking about this from the perspective of this image

from the youtube video sean shared

Here's what I'm thinking:

suite of fxns for users (already in the works) - aws_user*/aws_users*
suite of fxns for groups - aws_group*/aws_groups*
suite of fxns for roles - aws_role*/aws_roles*
suite of fxns for policies - aws_policy*/aws_policies* - some of these fxns used for attaching policies to users, groups, roles

so in the end we could have a workflow like:

# in each case below aws_policy_attach determines from input whether
# its a group, role, or user. And prefixes policy with `arn:aws:iam::aws:policy`
aws_group_create("testers") %>% aws_policy_attach("ReadOnlyAccess")
aws_role_create("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")
aws_user_create("jane") %>% aws_policy_attach("AdministratorAccess")

# or if already created, then:
aws_role("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")

Another example

aws_group_add_users(group = "testers", 
  aws_user_create("jane"),
  aws_user_create("sally"),
  aws_user_create("susy")
)

@seankross feedback plz

Vignette for working with databases

Bucket (and file?) policies

At least I currently don't have permission to modify bucket ACLs, so can't test and make sure that aws_bucket_acl_modify works.

Perhaps with the new test AWS account i'll be able to test this.

Is return value of aws_bucket_upload correct?

Messing around uploading folders in the sixtyfour working directory, I get:

> aws_bucket_upload("man", "dasl-project1")
[1] "s3://Users/skross/Developer/sixtyfour/man"

The return value strikes me as weird. That's not the path in the bucket (which is correct):

> aws_bucket_list_objects("dasl-project1") |> head() |> glimpse()
Rows: 6
Columns: 8
$ bucket_name   <chr> "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1"
$ key           <chr> "account_id.Rd", "as_policy_arn.Rd", "aws_billing.Rd", "aws_billing_raw.Rd", "aws_bucket_add_user.Rd",…
$ uri           <chr> "s3://dasl-project1/account_id.Rd", "s3://dasl-project1/as_policy_arn.Rd", "s3://dasl-project1/aws_bil…
$ size          <fs::bytes> 401, 1.69K, 2.37K, 1.39K, 1.21K, 1.46K
$ type          <chr> "file", "file", "file", "file", "file", "file"
$ owner         <chr> NA, NA, NA, NA, NA, NA
$ etag          <chr> "\"ee5e5d92046f900647bfa4c0f0ef14cf\"", "\"a4c6748e4380c88eac01878e298f882e\"", "\"ec1816624f1e3e4e221…
$ last_modified <dttm> 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 0…

Does the return value of aws_bucket_upload() strike you as weird?

build out remainder of file fxns

Billing: automatically paginate

can't do this yet as we can't get historical data - should be able to do with Localstack

File functions - maybe use s3fs?

https://github.com/DyfanJones/s3fs

just saw pkg, check to see if it might make sense to use here

Localstack

Can it be used to test some aws services? has to be at least 1 of locally and on CI
if can be used, document usage wherever #51 is done

Allow users to set their AWS account within exported functions?

from #10

Possibly via some helper fxn, ideally not through every function, which would add extra work for everyone.

e.g., costexplorer has

costexplorer(
  config = list(),
  credentials = list(),
  endpoint = NULL,
  region = NULL
)

We initialize that list of functions for cost explorer ourselves - we could have helper funs to allow users to set creds so whenever it initializes they are used

Originally posted by @sckott in #10 (comment)

aws_file_upload changes file w/o extension into a directory within the s3 bucket

aws_file_upload uses s3fs::s3_file_copy internally. look into that function, perhaps somethjing wrong

user with write permissions for bucket unable to upload folders or files

On branch more-magic

> aws_user_current()
[1] "sean"
> six_bucket_permissions("dasl-project1")
# A tibble: 3 × 4
  user  permissions policy_write             policy_admin
  <chr> <chr>       <chr>                    <chr>       
1 amy   write       S3FullAccessDaslProject1 NA          
2 scott admin       NA                       NA          
3 sean  admin       NA                       NA    
...
> library(sixtyfour)
> aws_user_current()
[1] "amy"
> aws_bucket_upload("data-raw", "dasl-project1")
Error: AccessDenied (HTTP 403). Access Denied
> aws_file_upload("DESCRIPTION", "s3://dasl-project1")
Error in `map2()`:
ℹ In index: 1.
Caused by error:
! AccessDenied (HTTP 403). Access Denied
Run `rlang::last_trace()` to see where the error occurred.