Giter Club home page Giter Club logo

sixtyfour's People

Contributors

sckott avatar seankross avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sixtyfour's Issues

aws_secrets_all errors when there are no secrets

On branch s3-iam

aws_secrets_all()
Error in `relocate()`:
! Can't select columns that don't exist.
✖ Column `arn` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.

Probably the ideal is it returns an empty tibble (like when there are no buckets)

"cookbook" docs

I'm thinking we should have a set of docs like these as a baseline, and then more of a "cookbook" set of docs based on the interactions that we find people using the most, but that's down the line.

Originally posted by @seankross in #25 (review)

Stumbling blocks and avoiding them

The list (add new ones as needed) and how to help users avoid them:

  • authentication
  • leaving an expensive process running
  • messing up permissions

How to do error handling with paws?

Some error messages that I've seen thus far in paws are not going to be useful to the typical sixtyfour user. e.g.,

Below, probably users won't be familiar with http status codes, though I know that a 404 means not found, so I can intuit that something wasn't found, but was it the bucket or the key? In addition, the error message is not useful to the user at all.

aws_file_attr(bucket = "s64-test-2", key = "doesntexist")
#> Error: SerializationError (HTTP 404). failed to read from query HTTP response body

Though sometimes the error messages are good:

aws_bucket_list_objects(bucket="s64-test-211")
#> Error: NoSuchBucket (HTTP 404). The specified bucket does not exist

and another good error message

desc_file <- file.path(system.file(), "DESCRIPTION")
aws_file_upload(bucket = "not-a-bucket", path = desc_file)
#> Error: BucketAlreadyExists (HTTP 409). The requested bucket name is not available. 
#> The bucket namespace is shared by all users of the system. Please select a different name and try again.

@seankross just a placeholder to maybe deal with this, any thoughts welcome

Allow for managing separate client objects?

Right now on the dbs branch we allow only one client instance for each of Redshift and RDS.

One needs a new client for either of those services only if there are different credentials; if credentials don't change then the same client can be reused.

How to do this?

  1. Tell users to open up separate R sessions, one for each set of unique credentials.
  2. Require passing in client object into each function that interacts with Redshift/RDS.

RDS IAM flow function aws_user_add_to_rds

I was trying to get aws_user_add_to_rds working to make the RDS IAM flow workable, but its still not working yet. AFAIC remember I still wasn't getting the code working correctly for granting a user permissions in the database via IAM in the function. i.e., with the AWSAuthenticationPlugin bit in the code

work is on branch rds-iam-flow

Update vcr filters

e.g., was doing this on manage-secrets branch but it's not specific to that branch ...

invisible(vcr::vcr_configure(
  dir = vcr::vcr_test_path("fixtures"),
  filter_sensitive_data = list(
    "<<aws_region>>" = Sys.getenv("AWS_REGION"),
    "ClientRequestToken" = "something"
  ),
  filter_request_headers = list(
    Authorization = "redacted",
    "X-Amz-Content-Sha256" = "redacted",
    "X-Amz-Target" = "redacted",
    "User-Agent" = "redacted"
  ),
  filter_response_headers = list(
    "x-amz-id-2" = "redacted",
    "x-amz-request-id" = "redacted",
    "x-amzn-requestid" = "redacted"
  )
))

Add Getting Started vignette

  • start vignette, add vignette infrastructure
  • use our new testing AWS account so its easier to build this and other vignettes

Dealing with credentials

I was looking at addressing this comment about checking an env var, and then realized that paws has a number of different ways to find user auth details. So we can't just look for env vars.

However, we still need to have access to some credentials for our own use within this package. e.g, the link above where we want to get the aws region the user has set in their creds.

We can hack getting access key and secret key via calling this anonymous function in an environment, but that's hacky for sure. and it doesn't list the aws region either

s3 <- paws::s3()
s3$.internal$config$credentials$provider[[2]]()

Perhaps there's a way in paws to fetch user creds somehow, and I just haven't found it yet

Billing improvements

user story: people want to be able to interact with their data via dplyr, etc.

Temporary accounts for testing?

It'd be perfect if there was a way to spin up a temporary top level S3 account to run a test suite, then clean it up afterwards INSTEAD OF messing with our real account.

Though unit tests I think will be using cached fixtures anyway, so on CI won't be hitting our real buckets.

repo tidying

  • README:
    • move scan secrets into a new "developer" section or similar
    • add sections like docs, bugs, etc. like rcromwell
    • link to paws and s3fs docs
    • link to the WILDS website and wilds guide
  • change repo status badge to experimental

Database functionality

What kinds of functions do we want to provide

  • Setup a simple database; make sane choices for users, with assumption that users needs are for basic data science tasks
  • Change permissions for a database, table, or row
  • Fetch a key to use to connect to a database, e.g.:
rds <- paws::rds()
token <- rds$build_auth_token(endpoint, region, user)
# then token passed to DBI::dbConnect()

Things we will leave as an exercise for the user

Everything else, but we SHOULD document in vignettes how to do certain things:

  • Table management
  • Simple queries

magic function: six_file_upload

Uploads a file to a bucket by specifying the bucket name, not a remote path. In fact maybe there should just be a six_bucket_upload that can handle files or folders as things that are upload-able, and bucket names or remote paths to specify where to put the files.

Magic functions

when an admin adds a user

  • add policies for that user:
    • get user
  • not adding
    • list other users
    • etc.

as an admin

  • should there be a setup fxn? it could
    • create a users group
    • create an admin group

Prefix for magic fxns: six_

Analogy we talked about: magic fxns (six_) are like rails, and non magic fxns aws are not rails (maybe ~ sinatra)

Idea is to maintain aws as the non-magic fxns, they don't do any magic, just interface with the AWS rest api. While magic fxns do more and have side effects that are harder to reason about

@seankross

Simulate user

The code below was originally part of the #55 pull request, but was removed as it wasn't completely working and deemed not important enough to spend more time on now to make it work.

check_simulated_user was called as the last stop before finishing in six_admin_setup

#' Check if a user has access to an AWS service
#' @export
#' @param fun (funcction) a function. required
#' @param ... additional named args passed to `fun`
#' @return single boolean. checks [rlang::is_null()] against `$error` result of
#' call to [purrr::safely()]
#' @details really just a generic check that any function can run with
#' its inputs; not specific to AWS or any particular function
has_access <- function(fun, ...) {
  rlang::is_null(purrr::safely(fun, FALSE)(...)$error)
}

#' @importFrom dplyr any_of
check_simulated_user <- function(group) {
  rlang::check_installed("callr")
  cli_info("Checking that a simulated user can access {.strong {group}} group")
  randuser <- random_user()
  creds <- suppm(six_user_create(randuser))
  aws_user_add_to_group(randuser, group)

  creds_mapper <- c(
    "AWS_ACCESS_KEY_ID" = "AccessKeyId",
    "AWS_SECRET_ACCESS_KEY" = "SecretAccessKey",
    "AWS_REGION" = "AwsRegion"
  )
  creds_lst <- as_tibble(creds) %>%
    rename(any_of(creds_mapper)) %>%
    select(starts_with("AWS")) %>%
    as.list()

  all_checks <- callr::r(function(creds) {
    withr::with_envvar(
      creds,
      {
        check_iam <- sixtyfour::has_access(sixtyfour::aws_user)
        check_rds <- sixtyfour::has_access(sixtyfour::aws_db_instance_details)
        check_rs <- sixtyfour::has_access(sixtyfour::aws_db_cluster_details)
        check_s3 <- sixtyfour::has_access(sixtyfour::aws_buckets)
        check_bil <- sixtyfour::has_access(sixtyfour::aws_billing_raw,
          date_start = Sys.Date() - 1,
          metrics = "BlendedCost"
        )
        list(
          IAM = check_iam,
          RDS = check_rds,
          Redshift = check_rs,
          S3 = check_s3,
          Billing = check_bil
        )
      }
    )
  }, args = list(creds_lst))

  if (all(unlist(all_checks))) {
    cli_success("  All checks passed!")
  } else {
    cli_warning(c(
      "  At least one check didn't pass ",
      "({names(keep(all_checks, isFALSE))}) ",
      "try again or open an issue"
    ))
  }

  cli_info("  Cleaning up simulated user")
  aws_user_remove_from_group(randuser, group)
  suppm(six_user_delete(randuser))
  cli_alert_info("") # nolint
}

Notes

  • aws_db_instance_details is the same instance_details in the current version of the pkg
  • aws_db_cluster_details is the same cluster_details in the current version of the pkg

The parts that were not working:

  • Running callr::r WAS working interactively after loading all the code in the package, but WAS NOT working if I load sixytour then call the six_admin_setup function - i'm not sure exactly why, but I think it has to do with the complex-ish nature of how paws loads credentials. I think I needed to make sure the R session that callr::r was running was not loading any of the credentials I have saved, and only the creds passed into the function, but that was not happening successfully (I kept getting a 403 error) like:
six_admin_setup("uzers", "zadmin")
#> ℹ whoami: scott (account: 744061095407)
#> ℹ
#> ! uzers group NOT created - a uzers group already exists in your account
#> ℹ Not adding policies to the uzers group
#> ℹ
#> ! zadmin group NOT created - an zadmin group already exists in your account
#> ℹ Not adding policies to the zadmin group
#> ℹ
#> ℹ Checking that a simulated user can access uzers group
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> Type .Last.error to see the more details.
#> 
#> :p .Last.error
#> <callr_error/rlib_error_3_0/rlib_error/error>
#> Error:
#> ! in callr subprocess.
#> Caused by error:
#> ! InvalidClientTokenId (HTTP 403). The security token included in the request is invalid.
#> ---
#> Backtrace:
#> 1. sixtyfour::six_admin_setup("uzers", "zadmin")
#> 2. sixtyfour:::check_simulated_user(users_group) at admin.R:113:3
#> 3. callr::r(function(creds) { … at admin.R:155:3
#> 4. callr:::get_result(output = out, options)
#> 5. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
#> ---
#> Subprocess backtrace:
#> 1. sixtyfour::aws_user()
#> 2. env64$iam$get_user(username)$User %>% list(.) %>% user_list_tidy()
#> 3. sixtyfour:::user_list_tidy(.)
#> 4. rlang::is_empty(x)
#> 5. env64$iam$get_user(username)
#> 6. paws.common::send_request(request)
#> 7. paws.common:::retry(request)
#> 8. paws.common:::run(request, retry)
#> 9. handler$fn(request)
#> 10. base::stop(error)
#> 11. global (function (e) …

Impersonate for admins?

@seankross we chatted briefly about this. some notes

It might be nice for an admin of an AWS account to see what the other folks on their account see - just to check that permissions are set correctly i imagine

Was thinking this

users <- list(
  list(
    user = "sally",
    AWS_ACCESS_KEY_ID = "ASPDF80ASDFDF", 
    AWS_SECRET_ACCESS_KEY = "ADFPA8FAADF",
    AWS_REGION = "us-west-2"
  ),
  list(
    user = "malorie",
    AWS_ACCESS_KEY_ID = "ASDF08AFAD80ADSF", 
    AWS_SECRET_ACCESS_KEY = "ADFPAADF80A999",
    AWS_REGION = "us-west-2"
  )
)

fake_aws_user <- function() {
  Filter(
    function(z) z$AWS_ACCESS_KEY_ID == Sys.getenv("AWS_ACCESS_KEY_ID"), 
    users
  )
}

withr::with_envvar(
  c(
    "AWS_ACCESS_KEY_ID" = "ASDF08AFAD80ADSF", 
    "AWS_SECRET_ACCESS_KEY" = "ADFPA8FAADF",
    "AWS_REGION" = "us-west-2"
  ),
  fake_aws_user()
)

aws_user_impersonate <- function(username, code) {
  withr::with_envvar(
   # get user creds somehow?,
    force(code)
  )
}

# hmm, this wouldn't work - as an admin i'd want to put in a username, but you wouldn't have those creds
# unless you saved them all somewhere, which seems unlikely
aws_user_impersonate("sally")

But then thought this probably doesn't make sense b/c the admin wouldn't probably have tokens for each user saved - and you can't look them up after the fact unless you create a new set.

@seankross

Users can see buckets they haven't been granted permissions for

What would have to be done so that this doesn't happen?

six_user_create("amy")
ℹ Added policy UserInfo to amy
✔ Key pair created for amy
ℹ UserName: amy
...
aws_bucket_add_user("dasl-project1", "amy", permissions = "read")
✔ amy now has read access to bucket dasl-project1
aws_bucket_permissions("dasl-project1")
# A tibble: 3 × 4
  user  permissions policy_read                  policy_admin
  <chr> <chr>       <chr>                        <chr>       
1 amy   read        S3ReadOnlyAccessDaslProject1 NA          
2 scott admin       NA                           NA          
3 sean  admin       NA                           NA          
> Sys.setenv(
+   AWS_ACCESS_KEY_ID = "AmysKey",
+   AWS_SECRET_ACCESS_KEY = "AmysSecret",
+   AWS_REGION = "us-west-2"
+ )
aws_user_current()
[1] "amy"
aws_buckets()
# A tibble: 2 × 8
  bucket_name   key   uri                       size type   owner etag  last_modified
  <chr>         <chr> <chr>              <fs::bytes> <chr>  <chr> <chr> <dttm>       
1 dasl-project1 ""    s3://dasl-project1           0 bucket ""    ""    NA           
2 dasl-project2 ""    s3://dasl-project2           0 bucket ""    ""    NA        

Tags and Users

Can sixtyfour::aws_users() include a list column called Tags that contains dataframes with columns Key and Value? Same with aws_user()?

RDS list and create throwing errors

I feel like I left things up in the air with with respect to RDS features, and if you don't think it's wise for it to be in scope for v0.1 that's okay with me. For now I'm getting these errors on branch s3-iam #43:

> aws_db_rds_list()
Error in `mutate()`:In argument: `AccountId = split_grep(DBInstanceArn, ":", "^[0-9]+$")`.
Caused by error:
! object 'DBInstanceArn' not found
Run `rlang::last_trace()` to see where the error occurred.
> aws_db_rds_create(
+   id = "testing135", 
+   engine = "mariadb",
+   class = "db.t3.micro",
+   BackupRetentionPeriod = 0
+ )
ℹ `user` is NULL; created user: ListeningDiffere`pwd` is NULL; created password: *******
Error in `dplyr::filter()`:In argument: `map_lgl(IpPermissions, ~.$ToPort == port)`.
Caused by error in `map_lgl()`:In index: 5.
Caused by error:
! Result must be length 1, not 0.
Run `rlang::last_trace()` to see where the error occurred.

Return value for aws_file_delete

> aws_file_delete("s3://dasl-project2/account_id.Rd")
$DeleteMarker
logical(0)

$VersionId
character(0)

$RequestCharged
character(0)

I'm okay with this return value but maybe we should return this value invisibly.

How to handle data from paws functions

The results of paws calls are generally named lists, sometimes nested. What should we do to these data before giving back to users:

  1. Just give them what we get back from paws - a named list, in most cases
  2. Coerce named lists to an S3 object that summarizes the list and hides the full list underneath?
  3. Coerce named lists to a vctrs object that summarizes the list and hides the full list underneath?
  4. Coerrce named lists to tibbles?
  5. Something else?

There's other cases where what we return is more clear cut. Eg.., in a fxn that checks if a bucket exists we give back a boolean

@seankross thoughts?

Permissions framework?

Thinking about this from the perspective of this image

Screenshot 2023-11-08 at 9 13 18 AM

from the youtube video sean shared

Here's what I'm thinking:

  • suite of fxns for users (already in the works) - aws_user*/aws_users*
  • suite of fxns for groups - aws_group*/aws_groups*
  • suite of fxns for roles - aws_role*/aws_roles*
  • suite of fxns for policies - aws_policy*/aws_policies* - some of these fxns used for attaching policies to users, groups, roles

so in the end we could have a workflow like:

# in each case below aws_policy_attach determines from input whether
# its a group, role, or user. And prefixes policy with `arn:aws:iam::aws:policy`
aws_group_create("testers") %>% aws_policy_attach("ReadOnlyAccess")
aws_role_create("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")
aws_user_create("jane") %>% aws_policy_attach("AdministratorAccess")

# or if already created, then:
aws_role("ReadOnlyRole") %>% aws_policy_attach("ReadOnlyAccess")

Another example

aws_group_add_users(group = "testers", 
  aws_user_create("jane"),
  aws_user_create("sally"),
  aws_user_create("susy")
)

@seankross feedback plz

Vignette for working with databases

see also #32

how to work with databases with sixtyfour, make sure to include

  • security groups and how to deal with them
  • errors you might see and what they mean and how to fix
  • the suggested workflow (create, get DBI connection, use dplyr, etc. to work with data)
  • screenshots probably of the user interface prompts for rds connect
  • what else?

Bucket (and file?) policies

At least I currently don't have permission to modify bucket ACLs, so can't test and make sure that aws_bucket_acl_modify works.

Perhaps with the new test AWS account i'll be able to test this.

Is return value of aws_bucket_upload correct?

Messing around uploading folders in the sixtyfour working directory, I get:

> aws_bucket_upload("man", "dasl-project1")
[1] "s3://Users/skross/Developer/sixtyfour/man"

The return value strikes me as weird. That's not the path in the bucket (which is correct):

> aws_bucket_list_objects("dasl-project1") |> head() |> glimpse()
Rows: 6
Columns: 8
$ bucket_name   <chr> "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1", "dasl-project1"
$ key           <chr> "account_id.Rd", "as_policy_arn.Rd", "aws_billing.Rd", "aws_billing_raw.Rd", "aws_bucket_add_user.Rd",…
$ uri           <chr> "s3://dasl-project1/account_id.Rd", "s3://dasl-project1/as_policy_arn.Rd", "s3://dasl-project1/aws_bil…
$ size          <fs::bytes> 401, 1.69K, 2.37K, 1.39K, 1.21K, 1.46K
$ type          <chr> "file", "file", "file", "file", "file", "file"
$ owner         <chr> NA, NA, NA, NA, NA, NA
$ etag          <chr> "\"ee5e5d92046f900647bfa4c0f0ef14cf\"", "\"a4c6748e4380c88eac01878e298f882e\"", "\"ec1816624f1e3e4e221…
$ last_modified <dttm> 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 03:44:29, 2024-04-03 0…

Does the return value of aws_bucket_upload() strike you as weird?

Localstack

  • Can it be used to test some aws services? has to be at least 1 of locally and on CI
  • if can be used, document usage wherever #51 is done

Allow users to set their AWS account within exported functions?

from #10

Possibly via some helper fxn, ideally not through every function, which would add extra work for everyone.

e.g., costexplorer has

costexplorer(
  config = list(),
  credentials = list(),
  endpoint = NULL,
  region = NULL
)

We initialize that list of functions for cost explorer ourselves - we could have helper funs to allow users to set creds so whenever it initializes they are used

Originally posted by @sckott in #10 (comment)

user with write permissions for bucket unable to upload folders or files

On branch more-magic

> aws_user_current()
[1] "sean"
> six_bucket_permissions("dasl-project1")
# A tibble: 3 × 4
  user  permissions policy_write             policy_admin
  <chr> <chr>       <chr>                    <chr>       
1 amy   write       S3FullAccessDaslProject1 NA          
2 scott admin       NA                       NA          
3 sean  admin       NA                       NA    
...
> library(sixtyfour)
> aws_user_current()
[1] "amy"
> aws_bucket_upload("data-raw", "dasl-project1")
Error: AccessDenied (HTTP 403). Access Denied
> aws_file_upload("DESCRIPTION", "s3://dasl-project1")
Error in `map2()`:
ℹ In index: 1.
Caused by error:
! AccessDenied (HTTP 403). Access Denied
Run `rlang::last_trace()` to see where the error occurred.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.