Giter Club home page Giter Club logo

basin's Introduction

Basin

Extract, transform, load using visual programming that can run Spark jobs on any environment

Create and debug from your browser and export into pure python code!

Basin screenshot

Features

  • Up and running as simple as docker pull

  • Create complex pipelines and flows using drag and drop

  • Debug and preview step by step

  • Integrated dataview grid viewer for easier debugging

  • Auto-generates comments so you don't have to

  • Export to beautiful, pure python code

  • Build artifacts for AWS Glue deployment (Work in progress)

Install

Install from dockerhub

$ docker pull zalmane/basin:latest

Create data folder

$ mkdir data This is the folder that will hold all input and output files

Run image

Run image mapping data directory to your local environment. This is where input/output goes (extract and load)

docker run --rm -d -v $PWD/data:/opt/basin/data --name basin_server -p 3000:3000 zalmane/basin:latest

That's it. Point your browser to http://localhost:3000 and you're done!

Notes:

  • Metadata is stored in the browser's indexeddb.

Install from source

Install dev environment with docker

docker-compose up

This will set up 2 containers: basin-client and basin-server

That's it. Point your browser to http://localhost:8860 and you're done!

To run npm commands in the basin-client container use:

docker exec basin-client npm <command>

To update changes in py files (block templates, lib), use:

docker exec basin-client npm run build-py

Getting started

Creating sources

A source defines the information needed to parse and import a dataset. Sources are referenced when using an Extract block. The source defines the following information:

  • type of file (delimited, fixed width, json, parquet)
  • regular expression to match when identifying the file. This will match against the file name
  • information about headers and footers
  • specific metadata based on type of file (for csv includes the delimiter etc)

Creating a flow

Running and debugging a flow

Exporting to python code

Configuration

Extending

Creating new block types

Each block type consists of:

  • Descriptor json
  • code template
  • optional code library template
  • Properties panel

Descriptor

Code template

Ccode library template

Properties panel

License

This program is free software: you can redistribute it and/or modify it under the terms of the Server Side Public License, version 1, as published by MongoDB, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Server Side Public License for more details. You should have received a copy of the Server Side Public License along with this program. If not, see http://www.mongodb.com/licensing/server-side-public-license

Copyright © 2018-2020 G.M.M Ltd.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.