Giter Club home page Giter Club logo

pipeline's Introduction

#Allen-AI Pipeline Framework

The Allen-AI Pipeline (AIP) framework is a library that facilitates collaborative on data-driven projects. It allows users to define workflows that share data resources transparently while maintaining complete freedom over the environment in which those workflows execute. AIP falls somewhere between unix make and KNIME. Unlike make, it can operate on cloud storage resources and execute in a distributed environment. Unlike KNIME, it does not lock you into any particular repository for storing your data or execution environment in which the workflows must run.

AIP can be used in two ways:

  1. As PipeScript, a binary that interprets a simple scripting language to execute native commands locally and store the results remotely
  2. Via the Scala API to define workflows that produce strongly-typed objects and execute within any JVM-based environment.

In summary, AIP provides the following benefits:

  • Intermediate data is cached and is sharable by different users on different systems.
  • A record of past runs is maintained, with navigable links to all inputs/output of the pipeline.
  • A pipeline can be visualized before running.
  • Output resource naming is managed to eliminate naming collisions.
  • Input/output data is always compatible with the code reading the data.

Send questions to [email protected]

pipeline's People

Contributors

afader avatar andrewlmurray avatar aria42 avatar cristipp avatar dirkgr avatar jakemannix avatar jefeweisen avatar markschaake avatar rodneykinney avatar sbhaktha avatar schmmd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipeline's Issues

Make a PersistedProducer interface

In src/main/scala/org/allenai/pipeline/Producer.scala:

> @@ -125,7 +156,7 @@ trait CachingDisabled extends CachingEnabled {
>    override def cachingEnabled: Boolean = false
>  }
>  
> -class PersistedProducer[T, -A <: Artifact](
> +class PersistedProducer[T, -A <: Artifact] private[pipeline] (

The Producer.persist method returns a PersistedProducer not a simple Producer This really should be changed to return an interface instead of a concrete class. The only additional thing that PersistedProducer exposes is the Artifact that the data was saved to.

Show pipeline output dynamically

Presently the pipeline produces a static HTML file after a pipeline run completes.

We could decouple the HTML generation from the underlying pipeline graph data. This way we could use an external tool to view the pipeline graph data and that tool could update in real time. For example, it could highlight stages of the pipeline once they complete (if they have output data).

One wrinkle is that timing data would be hard to add to the pipeline data unless the pipeline updated it's pipeline graph data as it ran.

sbt 0.13.9 support

Off a fresh clone: this is likely very minor, but tweaking build.properties to use sbt 0.13.9 rather than 0.13.8 seems to break over scalarifrom. One word of disclaimer is I've not locally set bintray credentials โ€• but I'll venture these are not related. Presumably this stems from the allenai sbt plugin toolchain and not specific to pipeline nor very exciting either, so, just a heads up.

[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/ReplicateResource.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,594,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/Pipeline.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,3320,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,63,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/PipelineStep.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,3736,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,449,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/IoHelpers.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,304,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/ColumnFormats.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,425,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/Producer.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,5411,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,61,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/ArtifactIo.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,223,<)
[error] Scalariform parser error in file /repos/pipeline/src/main/scala/org/allenai/pipeline/s3/CreateCoreArtifacts.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,454,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/ReplicateResource.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,594,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/Pipeline.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,3320,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/ReadFromArtifact.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,63,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/PipelineStep.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,3736,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/ArtifactFactory.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,449,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/IoHelpers.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,304,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/ColumnFormats.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,425,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/Producer.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,5411,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/SaveToArtifact.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,61,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/ArtifactIo.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,223,<)
[error] /repos/pipeline/src/main/scala/org/allenai/pipeline/s3/CreateCoreArtifacts.scala: Expected token RBRACKET but got Token(XML_START_OPEN,<,454,<)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.