Giter Club home page Giter Club logo

puffin's People

Contributors

ghalimi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

puffin's Issues

How should values of DuckDB's HUGEINT be cast to an Iceberg type?

DuckDB supports the HUGEINT type for signed sixteen-byte integers, but Apache Iceberg does not. How should values of this type be cast? A pragmatic option would be to use long 64-bit signed integers, but this would result in significant loss of information. Assuming this is acceptable, what should be done with values that are out of bounds? Otherwise, which alternative options should be considered?

Link: Types

SELECT THROUGH

In EDDI.md you propose a SELECT THROUGH syntax (I think this was previously SELECT REMOTE) like

SELECT THROUGH 'https://myPuffinDB.com/' * FROM remoteTable;

I would suggest that you rather make THROUGH a separate clause since while the current proposal might read more naturally, IMHO it doesn't make so much sense semantically since it sits inside the SELECT clause. The SELECT clause should be about specifying the projections of the relation and it's not clear to me that THROUGH relates to that.

The following still reads like sensible English while having more semantic separation:

THROUGH 'https://myPuffinDB.com/' SELECT * FROM remoteTable;

What would be the best way to integrate a client-side DuckDB engine with PuffinDB?

Moving forward, DuckDB will be found everywhere, both client-side and cloud-side. When used client-side, what would be the best way to integrate it with PuffinDB running cloud-side? As a developer, I would like to make a query from my DuckDB client, have it executed cloud-side by PuffinDB, have its result streamed with Apache Arrow to my client, and have that result saved as a local Apache Parquet file, or loaded into my local DuckDB client with a CREATE TABLE. With that in mind, what needs to be changed in DuckDB to make that dataflow as seamless as possible?

And could this dataflow be further improved upon?

Authentication and Authorization

Hello!

I've been thinking a little about authentication and authorisation.

A few assumptions:

  1. DuckDB does not implement roles. I am 90% sure of this, based on a quick scan of documentation + my working experience.
  2. Such an approach (possibly inherited / controlled by IAM / similar concepts in GCS & Azure) is the way to go
  3. Extensions in DuckDB are helpful

My rough proposal is that:

  1. Puffin builds a (potentially lightweight) role system (obviously there is some work here).
  2. This is configured for each user of puffin, and this leverages / uses cloud services to set this up (eg. parameter store in AWS)
  3. This is configured at run time for puffin users via configuration variables in DuckDB. This is a pattern that works well for s3, but as the allowed configurations are limited in DuckDB. Thankfully, extensions allow for these configuration variables to be added to

@ghalimi have you thought about auth at all? I am happy to flesh this out a little if the above is agreeable. I think the most important point is leaning into the cloud that puffin is hosted on (point 2 above).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.