GenHub (deprecated)

NOTE: GenHub has been integrated into the AEGeAn Toolkit as the fidibus module and IS NO LONGER ACTIVELY DEVELOPED HERE!!!

GenHub is a free open-source software framework for analyzing eukaryotic genome content and organization. The Fidibus program calculates and reports a variety of statistics on interval loci (iLoci). Fidibus can analyze user-supplied genomes, and can also retrieve and process dozens of reference genomes directly from public databases (such as NCBI RefSeq) for easily reproducible comparative analysis.

For or information, see the GenHub user manual

Obtaining GenHub

The easiest way to obtain GenHub is to install from the Python Package Index (PyPI) using the pip command.

pip install genhub

Make sure you have GenomeTools and AEGeAn installed. For more info and troubleshooting tips, be sure to check out the complete installation instructions.

Quick start: example usages

# Show all configuration settings
fidibus --help

# Compute iLoci for a user-supplied genome
fidibus --workdir=./ --local --gdna=MyGenome.fasta --gff3=MyAnnotation.gff3 \
        --prot=MyProteins.fasta --label=Gnm1 \
        prep iloci

# List all available reference genomes
fidibus list

# Download and pre-process the budding yeast genome, but do not compute iLoci
fidibus --workdir=/opt/data/genomes/ --refr=Scer download prep

# Download and completely process a few dozen Hymenopteran genomes, 4 at a time
fidibus --workdir=/opt/data/genomes/ --refr=hymenoptera --numprocs=4 \
        download prep iloci breakdown stats

# Download 9 green algae genomes, cluster proteins to identify homologous iLoci
fidibus --workdir=~/mydata/ --refrbatch=chlorophyta --numprocs=6 \
        download prep iloci breakdown cleanup cluster

# Process a user-supplied genome and several reference genomes for comparison
fidibus --workdir=/data/ --numprocs=4 --local --gdna=MyGenome.fasta \
        --gff3=MyAnnotation.gff3 --prot=MyProteins.fasta --label=Gnm1 \
        --refr=Atha,Bdis,Bole,Cari,Gmax,Grai,Mtru,Osat,Tcac \
        download prep iloci breakdown stats

For more detailed instructions on running Fidibus and other ancillary scripts, see the user manual.

Citing GenHub

GenHub is research software and must be cited if it is used in a published research project. GenHub will soon be in print, but in the mean time it can be cited as follows.

Standage DS, Brendel VP (2016) GenHub. GitHub repository, https://github.com/standage/genhub.

Additional Details

GenHub was originally dubbed HymHub and designed specifically to facilitate reproducible analysis of hymenotperan genomes. The need for a more general solution motivated the development of GenHub in its current incarnation. Rather than distributing processed data (which can occupy more than 1 GB of storage space per genome), GenHub provides portable code so that researchers can easily process reference genomes on their own computing resources. This is all tied closely to our research philosophy and our conviction that published computational results (along with supporting software and data) should be reproducible and transparent. More recently, we have implemented support for processing of user-supplied non-reference genomes.

Built by Daniel Standage [email protected]
Development repository at https://github.com/standage/genhub
Installation instructions
User manual
Developer documentation
GenHub code of conduct

Improve configuration parsing

Currently there are two available options for loading configuration files.

the -c/--cfg option for providing the path of a single config file
the --cfgdir option for providing the path of a directory, from which GenHub will attempt to load all .yml files

The --cfgdir option is fine as is, but I propose the following additions and changes for other config loading options.

--cfglist option for providing a file with config files (one per line)
--cfgpath option for providing one or more directories in which to search for config files
--cfgfullpath option for indicating that value(s) provided by -c/--cfg option or --cfglist option are full file paths; by default, they are treated as relative paths and GenHub searches all directories specified by --cfgpath for these files

The option labels might need tweaking, but I think the functionality supports most/all conceivable use cases with a relatively simple interface.

standage / genhub Goto Github PK

genhub's Introduction

GenHub (deprecated)

Obtaining GenHub

Quick start: example usages

Citing GenHub

Additional Details

genhub's People

Contributors

Stargazers

Watchers

Forkers

genhub's Issues

Recommend Projects

Recommend Topics

Recommend Org