Giter Club home page Giter Club logo

pluk's Introduction

Pluk

Pluk is a simple dataset management system which stores data in chunks and a virtual filesystem in DB.

Data in a virtual filesystem contains only links to the data chunks while a real data is separated by chunks and named after its SHA512 hash.

It supports mounting a dataset filesystem (read-only) using FUSE.

Installation and running

Using docker image

For simple running pluk in docker container, just use image kuberlab/pluk:latest:

docker run -it --rm kuberlab/pluk:latest

Using this git repo

Prerequisites:

Installation steps:

  • clone the repository:
  • run glide install -v
  • run go install -v ./...
  • binaries are saved in $GOPATH/bin and named pluk, plukefs and kdataset

Note: Paths marked as env variables DATA_DIR and DB_NAME (by default /data and /pluk/pluke.db accordingly, see below) must be available for write.

Configuration variables

There are a couple of environment variables for configuration of authentication, master-slave communication and other:

  • DEBUG: if set to true, enables debug log level. Defaults to false.

  • AUTH_VALIDATION: if set, this URL can be used to proxy authentication to third-party service. Currently, pluk sends Authorization and Cookie headers to that URL. If response status code not in 4xx/5xx codes, then authentication process succeeds and then will be cached for future requests. Currently it is used with cloud-dealer service auth.

  • MASTERS: this variable may contain pluk instance(s) master URL(s). Those pluk instances which have masters specified are treated as slaves and usually slaves re-request datasets file structure and also file chunks if they are absent on this slave. If some data is pushed to slave, then slave reports it to master to keep data consistence.

  • INTERNAL_KEY: used for internal slave-to-master requests to skip authentication on master. The key on the master must be equal to the key on each slave in this case.

  • PLUK_HTTP_PORT: http port which server will listen to upon a start.

  • DATA_DIR: directory which contains real file chunks. Defaults to /data.

  • DB_TYPE: Database type. Only mysql, postgres and sqlite3 are supported. Defaults to sqlite3.

  • DB_NAME: Database name (or path to sqlite3 database). Defaults to /pluk/pluke.db.

  • DB_HOST: Database server host (for mysql or postgres).

  • DB_PORT: Database server port (for mysql or postgres). Defaults: 5432 for postgres and 3306 for mysql.

  • DB_USER: Database user (for mysql or postgres).

  • DB_PASSWORD: Database password (for mysql or postgres).

Mounting dataset using plukefs

Pluk supports mounting a dataset using fuse. There is a fuse implementation for this in plukefs. To mount a plukefs (dataset), need to use either plukefs directly or docker image kuberlab/plukefs:latest:

plukefs binary:

plukefs --debug -o workspace=<workspace> -o dataset=<dataset-name> \
-o version=<version> -o server=http://<IP>:8082 -o mountPoint=<mount-path>

docker image:

docker run -it --rm --mount \
type=bind,source=<host-mount-path>,target=/mnt/mountpoint,bind-propagation=shared \
--privileged kuberlab/plukefs:latest \
plukefs --debug -o workspace=<workspace> -o dataset=<dataset-name> \
-o version=<version> -o server=http://<IP>:8082 -o mountPoint=/mnt/mountpoint

Note: --privileged flag is needed to allow using fuse in docker.

Note: bind-propagation=shared is needed to allow host to see mounts which appear in container.

CLI reference

Installation:

Download the version for your OS from the kdataset release page

https://github.com/kuberlab/pluk/releases

Uncompress the downloaded tarball.

Copy the kdataset utility to the folder pointed to by “PATH” environment” variable, e.g. /usr/bin/ or /usr/local/bin/

sudo cp kdataset /usr/local/bin

Description

CLI simplifies download, upload and authentication processes.

Once you have installed CLI, you will have kdataset entry in you PATH so it can be easily called by typing kdataset.

To see the help, type kdataset --help.

kdataset provides the following commands:

  • kdataset push <workspace> <dataset-name>:<version>
  • kdataset pull <workspace> <dataset-name>:<version>
  • kdataset list <workspace>
  • kdataset version-list <workspace> <dataset-name>
  • kdataset delete <workspace> <dataset-name>
  • kdataset version-delete <workspace> <dataset-name>:<version>

CLI Configuration

In order to pass authentication on server and get the right pluk url, there must be a config file located at ~/.kuberlab/config by default. If a config file doesn't exist, it needs to be created. It contains simple yaml with the following values:

base_url: https://cloud.kibernetika.io/api/v0.2
token: <your-user-token>
# pluk_url: https://cloud.kibernetika.io/pluk/v1 (optional, need in case you want to use another pluk instance)

By default, Pluk URL is calculated automatically using base_url from yaml config. Also, pluk url can be passed to CLI via:

  • config value pluk_url
  • --url parameter of kdataset CLI, e.g. kdataset --url http://host:port/pluk/v1 push workspace dataset:1.0.0

Note: --url parameter takes precedence over config value.

pluk's People

Contributors

nmakhotkin avatar dreyk avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.