Giter Club home page Giter Club logo

biopsy's Introduction

biopsy

Gem Version Build Status Dependency Status Code Climate Coverage Status

An automatic optimisation framework for programs and pipelines.

Biopsy is a framework for optimising the settings of any program or pipeline which produces a measurable output. It is particularly intended for bioinformatics, where computational pipelines take a long time to run, making optimisation of parameters using crude methods unfeasible. Biopsy will use a range of discrete optimisation strategies to rapidly find the settings that perform the best.

It can handle parameter spaces of any size: if it is possible to try every parameter combination in the time you have available, Biopsy will do this. However, Biopsy really shines when handling large numbers of parameter combinations.

Development status

This project is in early development and is not yet ready for deployment. Please don't report issues or request documentation until we are ready for release. If you have a burning desire to use biopsy, get in touch: [email protected].

Installation

Make sure you have Ruby installed, then:

gem install biopsy

Usage

Detailed usage instructions are on the wiki. Here's a quick overview:

  1. Define your optimisation target. This is a program or pipeline you want to optimise, and you define it by filling in a template YAML file and wrapping your program in a tiny Ruby launcher.
  2. Define your objective function. This is a program that analyses the output of your program and gives it a score. You define it by writing a small amount of Ruby code. Don't worry - there's a template and detailed instructions on the wiki.
  3. Run Biopsy, and wait while the experiment runs. Maybe grab a cup of tea, read some hacker news.
  4. Bask in the brilliance of your new optimal settings.

Command line examples

biopsy list targets biopsy list objectives biposy run --target test_target --objective test_objective --input test_file.txt --time-limit 24h

Optimisation algorithms

  1. Parameter Sweeper - a simple combinatorial parameter sweep, with optional subsampling of the parameter space
  2. Tabu Search - a local search with a long memory that takes the consensus of multiple searchers
  3. SPEA2 - a high performance general-purpose genetic algorithm

Documentation

Documentation is in development and will be released with the beta.

Citation

This is pre-release, pre-publication academic software. In lieu of a paper to cite, please cite this Github repo and/or the Figshare DOI (http://dx.doi.org/10.6084/m9.figshare.790660 ) if your use of the software leads to a publication.

Analytics

Software using Biopsy

  • Assemblotron can fully optimise any de-novo transcriptome assembler to produce the optimal assembly possible given a particular input. This typically takes little more time than running a single assembly.

biopsy's People

Contributors

blahah avatar cboursnell avatar parsaakbari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

biopsy's Issues

reuse objective results if outputs are exactly the same as a previous run

Currently, objectives are run for the output of every target run. In some use cases (e.g. transcriptome assembly), multiple different parameter sets will result in exactly the same assembly. Let's take md5 hashes of all output files, and use a set of those hashes as the key in a hash of objective scores. Could potentially shave hours off a run for computationally intensive objectives.

Flags

Add parsing of flags from the yaml file so that an option can be optimised to be either on or off.

domain should deep symbolize definitions on load

assemblotron test failure:

ERROR (0:00:00.003) test: SoapDenovoTrans constructor should automatically include defaults. 
 undefined method `include?' for nil:NilClass
 @ /Users/rs404/.rvm/gems/ruby-1.9.3-p362/gems/biopsy-0.1.0.alpha/lib/biopsy/domain.rb:88:in `block in validate_target_filetypes'

caused by domain definition loading as:

{"min"=>2, "allowed_extensions"=>[".fastq", ".fq", ".fasta", ".fa"]}

keys are expected to be symbols

target definitions should allow range-step

currently target definitions must include an array of possible values for each parameter. We should allow specification of a hash for each parameter, with min, max and step of a range.

Set seed

It should be possible to exactly repeat a run

implement time limits if no convergence

For some parameter spaces, convergence can take a long time depending on stochastic elements of the exploration. We should allow users to specify a time limit and return the best solution reached at that time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.