Giter Club home page Giter Club logo

delta-sharing-rs's Introduction

Delta Sharing Server

delta-sharing-rs is a Rust-based Delta Sharing server that includes administration functionality. Unlike the reference implementation of a Delta Sharing server, which primarily focuses on the API specification and uses static file-based sharing information, delta-sharing-rs manages its sharing information through an API.

Supported Platforms

Amazon AWS Google GCP Microsoft Azure
๐ŸŸฉ ๐ŸŸฉ ๐ŸŸฅ

Configure Credentials for Cloud Storage Backends

  1. Amazon AWS

To access the S3 Delta table backend, you need to create an IAM user with an Amazon S3 permissions policy. Once you've created the IAM user, you must configure the profile name and region to allow the Delta Sharing server to access the S3 bucket. The location of the credentials file is specified by the environment variable AWS_SHARED_CREDENTIALS_FILE. If this variable is not set, the credentials file should be located at ~/.aws/credentials. delta-sharing-rs utilizes the Object Store crate with the aws-config feature, which requires the AWS_PROFILE and AWS_REGION environment variables if you use S3 Delta table backend.

  1. Google GCP

To access the GCS Delta table backend, you need to create a GCS service account. The location of the GCP service account private key JSON is specified by the environment variable GOOGLE_APPLICATION_CREDENTIALS. If this variable is not set, the private key JSON file should be located at ~/.gcp/service-account-file.json.

  1. Microsoft Azure

Microsoft Azure backed Delta tables will be supported in the near future.

Starting Delta Sharing Server

Due to the donation on 07/07/2023, we haven't prepared a designated Docker repository offering the latest image of delta-sharing-rs. Therefore, you will need to build your own release manually. To build the release binary, please run the following command in the project directory:

$ just build

The following files might be helpful for you in creating your own docker-compose file.

You can start Delta Sharing using one of the following two options:

1. Docker Hub 2. Docker Compose

Please choose the option that best fits your needs and follow the instructions in the corresponding link to start Delta Sharing. You can also find deployment examples here. Please note that these two repositories are not part of the delta-incubator project due to the donation, but they still maintain the latest image of the official build. This arrangement may change in the future.

Starting the Development Server

Since the implementation is still in the early stages, only the development server is currently available. A Helm chart will be added to the project in the near future.

To run the development server, execute the following commands in this directory:

 $ just docker
 $ just server

To run the unit tests, execute the following commands in this directory:

 $ just docker
 $ just test
 $ just testdb

Create a New Sharing via the API

Once you've started the development server, you can create a new sharing via the API. Follow these steps:

  1. Log in to Delta Sharing server and get the admin access token by running the following command:
 $ curl -s -X POST http://localhost:8080/admin/login -H "Content-Type: application/json" -d '{"account": "deltars", "password": "password"}' | jq '.'
{
  "profile": {
    "shareCredentialsVersion": 1,
    "endpoint": "http://127.0.0.1:8080",
    "bearerToken": "YOUR_ADMIN_ACCESS_TOKEN",
    "expirationTime": "2023-04-09 19:34:04 UTC"
  }
}
  1. Register a new share by running the following command:
  $ curl -s -X POST "http://localhost:8080/admin/shares" -H "Authorization: Bearer YOUR_ADMIN_ACCESS_TOKEN" -H "Content-Type: application/json" -d'{ "name": "share1" }' | jq '.'
{
  "share": {
    "id": "6986c361-5e6a-4554-b698-11875d6598e0",
    "name": "share1"
  }
}
  1. Register a new table by running the following command:
 $ curl -s -X POST "http://localhost:8080/admin/tables" -H "Authorization: Bearer YOUR_ADMIN_ACCESS_TOKEN" -H "Content-Type: application/json" -d'{ "name": "table1", "location": "s3://delta-sharing-test/examination" }' | jq '.'
{
  "table": {
    "id": "579df9cd-a674-459d-9599-d38d54583cd0",
    "name": "table1",
    "location": "s3://delta-sharing-test/examination"
  }
}
  1. Register a new table as a part of schema1 in the share1 by running the following command:
 $ curl -s -X POST "http://localhost:8080/admin/shares/share1/schemas/schema1/tables" -H "Authorization: Bearer YOUR_ADMIN_ACCESS_TOKEN" -H "Content-Type: application/json" -d'{ "table": "table1" }' | jq '.'
{
  "schema": {
    "id": "689ed733-bec8-4796-a2dd-4f82dce6beab",
    "name": "schema1"
  }
}
  1. Issue a new recipient profile by running the following command:
 $ curl -s -X GET "http://localhost:8080/admin/profile" -H "Authorization: Bearer YOUR_ADMIN_ACCESS_TOKEN" -H "Content-Type: application/json" | jq '.'
{
  "profile": {
    "shareCredentialsVersion": 1,
    "endpoint": "http://127.0.0.1:8080",
    "bearerToken": "YOUR_RECIPIENT_ACCESS_TOKEN",
    "expirationTime": "2023-04-09 19:55:19 UTC"
  }
}

Delta Sharing Configuration

All TOML, JSON, YAML, INI, RON, and JSON5 files located in the configuration directory will be loaded as configuration files1. The path to the configuration directory can be set using the DELTA_SHARING_RS_CONF_DIR environment variable. You can also configure Delta Sharing using the corresponding environment variables, which is helpful when setting up a Kubernetes cluster2. Please be sure that the environment variables AWS_SHARED_CREDENTIALS_FILE and GOOGLE_APPLICATION_CREDENTIALS are set properly if necessary. Below is a list of the configuration variables:

Name Environment Variable Required Description
db_url DELTA_SHARING_RS_DB_URL yes URL of PostgreSQL server
server_addr DELTA_SHARING_RS_SERVER_ADDR yes URL of Delys Sharing server which will be used for sharing profile
server_bind DELTA_SHARING_RS_SERVER_BIND yes IP address of Korosiro Sharing server which will be used for Axum server binding
admin_name DELTA_SHARING_RS_ADMIN_NAME yes Default admin user name
admin_email DELTA_SHARING_RS_ADMIN_EMAIL yes Default admin user email
admin_password DELTA_SHARING_RS_ADMIN_PASSWORD yes Default admin user password
admin_namespace DELTA_SHARING_RS_ADMIN_NAMESPACE yes Default admin user namespace
admin_ttl DELTA_SHARING_RS_ADMIN_TTL yes Default admin user access token TTL in seconds
signed_url_ttl DELTA_SHARING_RS_SIGNED_URL_TTL yes Valid duration of signed URL of cloud backends in seconds
jwt_secret DELTA_SHARING_RS_JWT_SECRET yes JWT secret key
use_json_log DELTA_SHARING_RS_USE_JSON_LOG yes If this value set to be true, log outputs in JSON format
log_filter DELTA_SHARING_RS_LOG_FILTER yes Tracing log filter

API

SEE ALSO

Status Official Method URL
โœ”๏ธ ๐ŸŸฅ GET /swagger-ui
โœ”๏ธ ๐ŸŸฅ POST /admin/login
โœ”๏ธ ๐ŸŸฅ GET /admin/profile
โœ”๏ธ ๐ŸŸฅ GET /admin/accounts
โœ”๏ธ ๐ŸŸฅ POST /admin/accounts
โœ”๏ธ ๐ŸŸฅ GET /admin/accounts/{account}
โœ”๏ธ ๐ŸŸฅ POST /admin/shares
โœ”๏ธ ๐ŸŸฅ GET /admin/tables
โœ”๏ธ ๐ŸŸฅ POST /admin/tables
โœ”๏ธ ๐ŸŸฅ GET /admin/tables/{table}
โœ”๏ธ ๐ŸŸฅ POST /admin/shares/{share}/schemas/{schema}/tables
๐ŸŸฅ POST /admin/shares/{share}/all-tables
โœ”๏ธ ๐ŸŸฉ GET /shares
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}/schemas
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}/schemas/{schema}/tables
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}/all-tables
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}/schemas/{schema}/tables/{table}/version
โœ”๏ธ ๐ŸŸฉ GET /shares/{share}/schemas/{schema}/tables/{table}/metadata
โœ”๏ธ ๐ŸŸฉ POST /shares/{share}/schemas/{schema}/tables/{table}/query
๐ŸŸฉ GET /shares/{share}/schemas/{schema}/tables/{table}/changes

TODO

  • API
    • CDF Related API
    • Microsoft Azure Pre-Signed URL
  • Documentation
    • README
    • Wiki
  • DevOps
    • Dockerfile
  • Admin Console (React/Frontend)
  • Data Access Audit
    • Enrich Access Log
    • Share Namespaces
    • Token Blacklist

References

Official

  1. Delta Sharing: An Open Protocol for Secure Data Sharing

You can find the Delta Sharing open protocol specification here.

  1. Open source self-hosted Delta Sharing server

My blog post on the official Delta Lake community.

Related Projects

  1. Riverbank

This project was primarily started as a kotosiro-sharing project and was highly motivated by the riverbank project. If the preceding riverbank project had not existed, the road of development would have been much harder.

  1. delta-rs

Needless to say, this great Rust crate allowed me low level access to Delta tables in Rust.

Contributing

We encourage you to reach out, and are committed to provide a welcoming community.

Footnotes

  1. An example configuration can also be found at config directory. โ†ฉ

  2. When delta-sharing-rs detects duplicated configuration variables, the values from environment variables take precedence over those from configuration files. โ†ฉ

delta-sharing-rs's People

Contributors

ognis1205 avatar r3stl355 avatar roeap avatar rtyler avatar tdikland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

delta-sharing-rs's Issues

provide client implementation

Problem description

The APIs implemented by the delta-sharing-rs project go beyond the pure protocol specification. Thus a dedicated client implementation is warranted to interact with the management / metadata APIs on top of the sharing protocol.

Solution

Provide a client implementation to support protocol and additional delta-sharing-rs APIs.

Alternative solutions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

basic token handling capabilities

Problem description

Current the next generation of the sharing server is not properly able to handle authorization tokens. These are part of the delta sharing spec. While we want to enable adopters to inject their own token management, we need to provide some basics to run a secured server.

Once we do this it would of course be great to also support #46 to not expose tokens to the whole world.

Solution

Provide basic abstractions and implementations to handle tokens / profiles.

Alternative solutions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Seperate protocol routes from admin routes

Currently the admin routes and protocol routes (also known as the guest routes) are coupled. Separating them would allow people to implement their own admin routes, while still depending on a sound implementation of the open Delta Sharing protocol.

Microsoft Azure backed Delta tables

Contact Details

[email protected]

Is your feature request related to a problem? Please describe.

I was trying to connect sharing server with ADLS gen 2, but I am getting internal server error once I execute request.

Describe the solution you'd like.

Hi,
is supporting Microsoft Azure backed Delta tables in development? Is there some estimate when it is expected to be done, or if help is needed, I can try to help, with some minor guidelines and info from someone in charge :) ?
Thanks in advance!

Code of Conduct

  • I agree to follow this project's Code of Conduct

Add documentation site

Contact Details

No response

Is your feature request related to a problem? Please describe.

There aren't any docs.

Describe the solution you'd like.

It would be cool to have a documentation site on how to use this project, like this one: https://delta-io.github.io/delta-rs/

We should probably wait till this project is promoted to the delta-io organization, so the URL is stable.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Microsoft Azure backed Delta tables

Contact Details

[email protected]

Is your feature request related to a problem? Please describe.

I was trying to connect sharing server with ADLS gen 2, but I am getting internal server error once I execute request.

Describe the solution you'd like.

Hi,
is supporting Microsoft Azure backed Delta tables in development? Is there some estimate when it is expected to be done, or if help is needed, I can try to help, with some minor guidelines and info from someone in charge :) ?
Thanks in advance!

Code of Conduct

  • I agree to follow this project's Code of Conduct

provide fine granular access to table partitions

Problem description

Adopters may want to share only specific partitions with their recipients. Currently access to shares is binary. We may wont to enable to also share only partial access to delta tables.

Solution

Most straight forward is to track a query we can evaluate based on file metadata - much like file skipping. To decide if a user may receive access to a specific file in a table.

Alternative solutions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Move URL signing to object_store crate

In the discussion on #22, @tustvold and @roeap pointed out the rust ecosystem is building abstractions over object stores (and consequently URL signing) in the object_store crate. Moving to object_store will simplify the implementation.

Dependencies

Before object_store can be integrated, it needs to support signing URLs for Azure and GCP. Alternatively we could also implement the Signer trait in this crate, but I think that would be wasteful.

References:

object_store crate

Introduce extension points for share storage

Currently the Delta Sharing server depends on a postgres database to store information about the defined shares, schemas, and tables. For a variety of reasons users may want to use a different store. My proposed solution if to introduce a ShareStore trait (open to naming suggestions) and let the app state depend on a concrete implementation defined in a library or user supplied.

Move azure signing to object_store crate

Contact Details

No response

Is your feature request related to a problem? Please describe.

When trying to update dependencies, there were some issues since our server (axum) and azure crates have conflicting dependencies on the base64 crate.

Updating dependencies is particularly important, since right now we have a large number of security findings from dependabot.

Describe the solution you'd like.

Use object_store for URL signing, such that we no longer need to pull in azure-* dependencies.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Allow for configuring multiple locations per provider

Contact Details

No response

Is your feature request related to a problem? Please describe.

The current setup assumes that there is either a single bucket per provider, or at least that all locations share the same credential.

We should allow for configuring dedicated locations, possibly assigning a default credential that is used, when no dedicated credential for that provider is available.

Describe the solution you'd like.

Define some sort of registry, that takes the form of the well know URLs, including the host.

  • az://container <-- requires additional account into
  • s3://bucket
  • gs://bucket

Code of Conduct

  • I agree to follow this project's Code of Conduct

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.