Giter Club home page Giter Club logo

quilt's Introduction

docs on_gitbook chat on_slack codecov pypi

Quilt is a self-organizing data hub

Python Quick start, tutorials

If you have Python and an S3 bucket, you're ready to create versioned datasets with Quilt. Visit the Quilt docs for installation instructions, a quick start, and more.

Quilt in action

Who is Quilt for?

Quilt is for data-driven teams and offers features for coders (data scientists, data engineers, developers) and business users alike.

What does Quilt do?

Quilt manages data like code so that teams in machine learning, biotech, and analytics can experiment faster, build smarter models, and recover from errors.

How does Quilt work?

Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation.

The backend services are available under a paid license on quiltdata.com.

Use cases

  • Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
  • Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
  • Discover related data by indexing objects in ElasticSearch
  • Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
  • Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation

Roadmap

I - Performance and core services

  • Address performance issues with push (e.g. re-hash)
  • Provide Presto-DB-powered services for filtering package repos with SQL
  • Investigate and implement more efficient manifest formats (e.g. Parquet), that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing
  • Refactor s3://bucket/.quilt for improved listing and delete performance

II - CI/CD for data

  • Ability to fork/merge packages
  • Data quality monitoring

III - Storage agnostic (support Azure, GCP buckets)

  • Evaluate min.io and ceph.io as shims
  • Evaluate feasibility of on-prem local storage as a repo

IV - Cloud agnostic

  • Evaluate K8s and Terraform to replace CloudFormation
  • Shim lambdas (consider serverless.com)
  • Shim ElasticSearch (consider SOLR)
  • Shim IAM via RBAC

quilt's People

Contributors

dimaryaz avatar akarve avatar renovate-bot avatar nl0 avatar renovate[bot] avatar fiskus avatar sir-sigurd avatar kevinemoore avatar meffij avatar cosmic-byte avatar armandmcqueen avatar eode avatar stevededalus avatar dependabot[bot] avatar asah avatar residentmario avatar kurlov avatar mhassan102 avatar affineparameter avatar nathandemaria avatar diwu1989 avatar diegoscarabelli avatar jbn avatar elgalu avatar rinman24 avatar donovanr avatar sanket-deepsource avatar sanketsaurav avatar knaaptime avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.