Giter Club home page Giter Club logo

bionode-watermill's Introduction

bionode-watermill

npm version node Build Status codecov.io

Watermill: A Streaming Workflow Engine

NPM

Watermill lets you orchestrate tasks using operators like join, junction, and fork. Each task has a lifecycle where

  1. Input glob patterns are resolved to absolute file paths (e.g. *.bam to reads.bam)
  2. The operation is ran, passed resolved input, params, and other props
  3. The operation completes.
  4. Output glob patterns are resolved to absolute file paths.
  5. Validators are ran over the output. Check for non-null files, can pass in custom validators.
  6. Post-validations are ran. Add task and output to DAG.

CWL?

Coming soon.

What is a task?

A task is the fundamental unit pipelines are built with. For more details, see Task. At a glance, a task is created by passing in props and an operationCreator, which will later be called with the resolved input. Consider this task which takes a "lowercase" file and creates an "uppercase" one:

const uppercase = task({
  input: '*.lowercase',
  output: '*.uppercase'
}, function(resolvedProps) {
  const input = resolvedProps.input

  return fs.createReadStream(input)
  	.pipe(through(function(chunk, enc, next) {
      next(null, chunk.toString().toUpperCase())
  	})
    .pipe(fs.createWriteStream(input.replace(/lowercase$/, 'uppercase')))
})

A "task declaration" like above will not immediately run the task. Instead, the task declaration returns an "invocable task" that can either be called directly or used with an orchestration operator. Tasks can also be created to run shell programs:

const fastqDump = task({
  input: '**/*.sra',
  output: [1, 2].map(n => `*_${n}.fastq.gz`),
  name: 'fastq-dump **/*.sra'
}, ({ input }) => `fastq-dump --split-files --skip-technical --gzip ${input}` )

What are orchestrators?

Orchestrators are functions which can take tasks as params in order to let you compose your pipeline from a high level view. This separates task order from task declaration. For more details, see Orchestration. At a glance, here is a complex usage of join, junction, and fork:

const pipeline = join(
  junction(
    join(getReference, bwaIndex),
    join(getSamples, fastqDump)
  ),
  trim, mergeTrimEnds,
  decompressReference, // only b/c mpileup did not like fna.gz
  join(
    fork(filterKMC, filterKHMER),
    alignAndSort, samtoolsIndex, mpileupAndCall // 2 instances each of these
  )
)

Check out bionode-watermill tutorial!

Example pipelines

Why bionode-watermill?

This blog post compares the available tools to deal with NGS workflows, explaining the advantages of each one, including bionode-watermill.

Who is this tool for?

Bionode-watermill is for programmers who desire an efficient and easy-to-write methodology for developing complex and dynamic data pipelines, while handling parallelization as much as possible. Bionode-watermill is an npm module, and is accessible by anyone willing to learn a little JavaScript. This is in contrast to other tools which develop their own DSL (domain specific language), which is not useful outside the tool. By leveraging the npm ecosystem and JavaScript on the client, Bionode-watermill can be built upon for inclusion on web apis, modern web applications, as well as native applications through Electron. Look forward to seeing Galaxy-like applications backed by a completely configurable Node API.

Bionode-watermill is for biologists who understand it is important to experiment with sample data, parameter values, and tools. Compared to other workflow systems, the ease of swapping around parameters and tools is much improved, allowing you to iteratively compare results and construct more confident inferences. Consider the ability to construct your own Teaser for your data with a simple syntax, and getting utmost performance out of the box.

bionode-watermill's People

Contributors

ayangromano avatar bmpvieira avatar thejmazz avatar tiagofilipe12 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.