Giter Club home page Giter Club logo

hijackwust / scipipe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scipipe/scipipe

0.0 2.0 0.0 945 KB

SciPipe is a library for writing scientific workflows (sometimes also called "pipelines") of shell commands that depend on each other, in the Go programming language (aka golang). It was initially designed for problems in cheminformatics and bioinformatics, but applies equally well to any domain involving complex pipelines of interdependent shell commands.

Home Page: http://scipipe.org

License: MIT License

Go 99.15% Shell 0.85%

scipipe's Introduction

SciPipe

Build Status Test Coverage Codebeat Grade Go Report Card GoDoc Gitter DOI

Project links: Documentation & Main Website | Issue Tracker | Mailing List

Project updates

Introduction

SciPipe is a library for writing Scientific Workflows, sometimes also called "pipelines", in the Go programming language.

When you need to run many commandline programs that depend on each other in complex ways, SciPipe helps by making the process of running these programs flexible, robust and reproducible. SciPipe also lets you restart an interrupted run without over-writing already produced output and produces an audit report of what was run, among many other things.

SciPipe is built on the proven principles of Flow-Based Programming (FBP) to achieve maximum flexibility, productivity and agility when designing workflows. Compared to plain dataflow, FBP provides the benefits that processes are fully self-contained, so that a library of re-usable components can be created, and plugged into new workflows ad-hoc.

Similar to other FBP systems, SciPipe workflows can be likened to a network of assembly lines in a factory, where items (files) are flowing through a network of conveyor belts, stopping at different independently running stations (processes) for processing, as depicted in the picture above.

SciPipe was initially created for problems in bioinformatics and cheminformatics, but works equally well for any problem involving pipelines of commandline applications.

Project status: SciPipe is still alpha software and minor breaking API changes still happens as we try to streamline the process of writing workflows. Please follow the commit history closely for any API updates if you have code already written in SciPipe (Let us know if you need any help in migrating code to the latest API).

Benefits

Some key benefits of SciPipe, that are not always found in similar systems:

  • Intuitive behaviour: SciPipe operates by flowing data (files) through a network of channels and processes, not unlike the conveyor belts and stations in a factory.
  • Flexible: Processes that wrap command-line programs or scripts, can be combined with processes coded directly in Golang.
  • Custom file naming: SciPipe gives you full control over how files are named, making it easy to find your way among the output files of your workflow.
  • Portable: Workflows can be distributed either as Go code to be run with go run, or as stand-alone executable files that run on almost any UNIX-like operating system.
  • Easy to debug: As everything in SciPipe is just Go code, you can use some of the available debugging tools, or just println() statements, to debug your workflow.
  • Supports streaming: Can stream outputs via UNIX FIFO files, to avoid temporary storage.
  • Efficient and Parallel: Workflows are compiled into statically compiled code that runs fast. SciPipe also leverages pipeline parallelism between processes as well as task parallelism when there are multiple inputs to a process, making efficient use of multiple CPU cores.

Known limitations

Hello World example

Let's look at an example workflow to get a feel for what writing workflows in SciPipe looks like:

package main

import (
    // Import the SciPipe package, aliased to 'sp'
    sp "github.com/scipipe/scipipe"
)

func main() {
    // Init workflow
    wf := sp.NewWorkflow("hello_world")

    // Initialize processes and set output file paths
    hello := wf.NewProc("hello", "echo 'Hello ' > {o:out}")
    hello.SetPathStatic("out", "hello.txt")

    world := wf.NewProc("world", "echo $(cat {i:in}) World >> {o:out}")
    world.SetPathReplace("in", "out", ".txt", "_world.txt")

    // Connect network
    world.In("in").From(hello.Out("out"))

    // Run workflow
    wf.Run()
}

Running the example

Let's put the code in a file named scipipe_helloworld.go and run it:

$ go run scipipe_helloworld.go 
AUDIT   2017/05/04 17:05:15 Task:hello        Executing command: echo 'Hello ' > hello.txt.tmp
AUDIT   2017/05/04 17:05:15 Task:world        Executing command: echo $(cat hello.txt) World >> hello_world.txt.tmp

Let's check what file SciPipe has generated:

$ ls -1tr hello*
hello.txt.audit.json
hello.txt
hello_world.txt
hello_world.txt.audit.json

As you can see, it has created a file hello.txt, and hello_world.txt, and an accompanying .audit.json for each of these files.

Now, let's check the output of the final resulting file:

$ cat hello_world.txt
Hello World

Now we can rejoice that it contains the text "Hello World", exactly as a proper Hello World example should :)

You can find many more examples in the examples folder in the GitHub repo.

For more information about how to write workflows using SciPipe, and much more, see SciPipe website (scipipe.org)!

More material on SciPipe

Acknowledgements

Related tools

Find below a few tools that are more or less similar to SciPipe that are worth worth checking out before deciding on what tool fits you best (in approximate order of similarity to SciPipe):

scipipe's People

Contributors

samuell avatar miku avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.