Giter Club home page Giter Club logo

mgems's People

Contributors

tmaklin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

gaworj

mgems's Issues

Incorporate the steps performed with shell commands

msweep-assembly makes use of multiple commands that have to be performed in the command line. To avoid confusion and issues with different shells, these should be performed by the executable whenever possible.

`Error: basic_string` when using mSWEEP abundances with bootstrap iterations in them as input

mGEMS will crash with the error message Error: basic_string if the program is called with an input abundances file (-a option) that contains bootstrap iterations from mSWEEP version 1.4.0 or earlier (abundances file created using the --iters option when running mSWEEP). The crash is caused by an extra empty line at the end of the abundances file, which does not exist in abundances files that have been produced without the mSWEEP bootstrapping option.

Since the issue is in mSWEEP, a workaround for successfully running mGEMS when bootstrapped abundances are used is to delete the empty line from the end of the input files.

However, mGEMS should produce more informative error messages if the input files contain nonsense or are in the wrong format. Closing this issue requires adding better error messages for such cases.

Add option to write the raw read assignments to reference groups table

mGEMS is lacking the ability to output the raw read assignments to reference groups table. While it is possible to manually extract the table from the current output (either the lists of assigned reads for each group, or the extracted fastq files), this approach is cumbersome.

For some applications it would be useful to have the read assignments in a table form, and since writing such a table is an easy task to do, this should be included in the next version of the software as an option.

Create the 'mGEMS' executable

All commands should be merged under a single 'mGEMS' executable. Proposed behaviour:

  • ./mGEMS — run everything.
  • ./mGEMS read — process the alignment files.
  • ./mGEMS assign — assign the reads to the references.
  • ./mGEMS filter — create the samples.

Filter mSWEEP input

At the moment I think mGEMS generates fastq files for all groups form the mSWEEP output. Would it be possible to add a filter so that only those groups with a prevalence say above 1% would be considered?

I'm currently doing this manually using the --groups option but it would be great if it could be made simpler.

Add kallisto support

Current version (v0.2.0) of mGEMS does not support using kallisto as the pseudoaligner because it is not possible to extract the read assignments to equivalence classes from the standard kallisto output.

kallisto pseudo provides a --pseudobam flag to write a .sam file which contains the relevant information for mGEMS. However, this file is massive and impractical for many applications, so adding support for this cumbersome format is relatively low-priority.

Create better error messages

The program will crash with cryptic error messages if anything fails. The messages should be made more useful for public release.

Improve usage documentation

Several points should be addressed in the documentation in order to make mGEMS easier to adopt.

  • Add more information about how to use (and install) themisto & mSWEEP since they are essential parts of the pipeline.
  • Create a new tutorial on how to prepare a reference database and use pubMLST or PopPUNK to assign the reference sequences to lineages.

Improve `mGEMS extract`

Currently running mGEMS extract always names the files with the suffix "_1.fastq.gz", "_2.fastq.gz", "_3.fastq.gz" etc. depending on the number and order of the input files. It would be useful to add an option to change the name (or print to cout) to enable usage with calls like the following:

mGEMS extract --bins input.bin -r reads_1.fastq.gz -o outdir &
mGEMS extract --bins input.bin -r reads_2.fastq.gz -o outdir &
wait

This may be faster than extracting both reads with a single command as compressing the reads sometimes takes more time than actually writing them. Current implementation does not allow the above call to work, because both calls will attempt to write to "input_1.fastq.gz".

Add option to write unassigned reads

It is currently possible for some input reads to remain unassigned. mGEMS should have an option (off by default) to write these reads to a separate bin.

Create a conda recipe for easy installation

(Bio)conda has become somewhat of a standard way to easily install bioinformatics tools and pipelines. mGEMS should be installable via conda to make the tool available for less tech savvy users.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.