Giter Club home page Giter Club logo

data-lake-as-code's Issues

Access denied Chembl Datasets

Hi, I am no longer able to access the CHEMBL29/27/25 datasets from S3. ie: s3://aws-roda-hcls-datalake/chembl_29/

Error: All access to this object has been disabled (Status Code: 403; Error Code: AllAccessDisabled)]

Can someone please confirm that the access is still available?

fix: s3.Bucket.fromBucketName scope argument type

Code version:

Steps to reproduce:

Expected output:

App at '' should be created in the scope of a Stack, but no Stack found
Subprocess exited with error 1

As of AWS CDK 1.70.0, s3.Bucket.fromBucketName expects the type of its first argument (scope) to be Construct, not App.

Proposed solution:

Initialise the S3 bucket in lib/ExampleS3DataSet-stack.ts line 26 instead of passing it as a prop:

sourceBucket: s3.Bucket.fromBucketName(this, 'exampleS3DataSetSourceBucket', '--- YOUR EXISTING BUCKET NAME GOES HERE ---'),

Trying to create cloud formation template causes error

I'm consistently getting the following error when trying to create the datalake as code from the Cloud Formation Template. I don't seem to have access to the template. Is it not public anymore?

S3 error: The bucket you are attempting to access must be addressed using the specified endpoint.

Submission Summary table in RODA Clinvar dataset not queryable in Athena

I'm trying to demo Athena using the pre-built RODA Clinvar dataset. I deployed using the stack here: https://github.com/aws-samples/data-lake-as-code/tree/roda#readme.

The stack deploys without any issues noted, and after that I'm able to query most of the tables from Athena. However, when I run the following command in Athena:

SELECT * FROM "clinvar_summary_variants_dl-awsroda"."submission_summary";

The count of records returned is as-expected, but the values in every column are empty.

Notably, there appears to be a disconnect between the column names in the glue data catalog and the column names in the corresponding parquet metadata: in the data catalog columns are assigned distinct names (e.g. variationid, clinicalsignificance,...) in accordance with the official clinvar file definition README, but in the parquet file for this table each column appears to be named 'col' + autoincremented integer (e.g. col0, col1,...).

No data in "opentargets_latest/targets" S3 path

Hi,

I am trying to launch this Athena query on "opentargets_latest_dl_awsroda" database:

SELECT *
FROM targets
LIMIT 10;

The result is empty. Indeed, following the AWS Glue Data Catalog, it appears that the S3 "s3://aws-roda-hcls-datalake/opentargets_latest/targets/" path does not contains data.

Is this intentionally?

starterAdminPermission Invalid principal

Build scripts call aws sts get-caller-identity and then generate role arn by user arn. Looks very strange - why role name should be related with current user name? Could you please add explanation about this? For now I got error: starterAdminPermission Invalid principal, arn: arn:aws:iam::111111111111:role/user.name (Service: AWSLakeFormation; Status Code: 400; Error Code: InvalidInputException; Request ID: 32d6e4c9-a0e3-4af9-a938-521737333d7a; Proxy: null)

Could you please post iam role definition to create it manually?

OpenTargets: some data in the tables is missing

Hi,

I used OpenTargets CloudFormation to create an AWS Glue catalog and queried data using Athena. However, I recently noticed that some data in the tables is missing, such as in the searchdisease, searchdrug, searchtarget, and molecule tables.

I'm certain that the data was previously there, but I'm not sure why it disappeared, or if it was just my case. I checked and it's possible that the data was removed from the source (https://platform.opentargets.org/downloads/data), and as a result, was also removed from the associated Glue tables.

Could you please check the issue? Thank you!

Best,
Yuki

OpenTargets dataset update in the S3 buckets

Hi Guys,

I'm form OpenTargets. One of our users reported that OT data fetched from S3 has some problem: the data seems to have unexplainable duplication. We believe the problem might due to how the data is synced from EBI ftp. The datasets our pipelines generated via spark are partitioned into smaller chunks with filenames containing a release specific hash. As the hash is different from release to release, the line below probably will not overwrite the content of the S3 buckets, instead, these chunks keep accumulating.

"aws s3 sync opentargets/sourceExports/open-targets-data-releases/latest/ s3://{{openTargetsSourceFileTargetBucketLocation}}/opentargets/sourceExports/latest/"

For more details, please see the issue in our tracker.

Access denied on S3 buckets

Hi!

I'm trying to access y8m dataset, but getting the AccesDenied:

$ aws s3 ls --no-sign-request s3://aws-roda-ml-datalake/yt8m_ods/

An error occurred (AllAccessDisabled) when calling the ListObjectsV2 operation: All access to this object has been disabled

Couple of weeks ago the same command worked fine. Could you please suggest how to get the data with s3 client?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.