Giter Club home page Giter Club logo

virtool's Introduction

Virtool

Virtool is a web-based application for diagnosing pathogen infections using high-throughput sequencing.

ci

Website: https://www.virtool.ca

Getting Started

See the Virtool documentation to get started with the latest version of Virtool 4.0.0.

About Versions

Virtool is currently undergoing a major transformation into a cloud-native application. This will mean Virtool can scale work across multiple hosts and run natively in Kubernetes and public cloud providers.

For current users and administrators:

  1. Virtool 4.0.0 series should be used for now.
  2. Virtool 4.0.0 series will continue to receive bug and security fixes for the forseeable future.
  3. Virtool 5.0.0 will comprise multiple containerized services that need to run together. A deployment and migration guide will be provided.

Tests

In the source directory root:

  1. Start the required backing services in Docker.

    docker compose -f tests/docker-compose.yml -p virtool-test up -d
    
  2. Run the test suite:

    poetry run pytest
    

Multiplexing

The test suite works with pytest-xdist.

poetry run pytest -n 4

This will use multiple Python processes to run the tests in parallel.

Snapshots

We use Syrupy for snapshot testing.

Snapshots are used for tests where we want to assert that an object (eg. database record, Pydantic object, API response) has an expected shape and set of values.

If snapshots need to be updated:

poetry run pytest <path_to_test_file> --su

You can be even more specific by specifying the test class or function:

poetry run pytest <path_to_test_file>::<class_or_function>

Always be specific about what snapshots you are updating. Don't blindly update a ton of snapshot files just to make your tests pass.

Commits

All commits must follow the Conventional Commits specification.

These standardized commit messages are used to automatically publish releases using semantic-release after commits are merged to main from successful PRs.

Example

feat: add API support for assigning labels to existing samples

Descriptive bodies and footers are required where necessary to describe the impact of the commit. Use bullets where appropriate.

Additional Requirements

  1. Write in the imperative. For example, "fix bug", not "fixed bug" or "fixes bug".
  2. Don't refer to issues or code reviews. For example, don't write something like this: "make style changes requested in review". Instead, "update styles to improve accessibility".
  3. Commits are not your personal journal. For example, don't write something like this: "got server running again" or "oops. fixed my code smell".

From Tim Pope: A Note About Git Commit Messages

virtool's People

Contributors

blakeasmith avatar bryce-davidson avatar buddy326 avatar christinewc avatar colevoelpel avatar deepsource-autofix[bot] avatar eroberts9789 avatar igboyes avatar jakeale avatar jinxuchen avatar mattcurts avatar miminko avatar officialarms avatar peterk87 avatar reecehoffmann avatar ryanfang5 avatar swovelandm avatar tianshengsui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

virtool's Issues

Handle Large Alignment Outputs

Large amounts of SAM output use too much memory. Find a way to deal with these rare situations.

One idea is to flush the data to a file when the memory usage reaches a set threshold.

Memory leak in main virtool process

The virtool process consume 10+ GB of memory in short order.

This may be due to file upload buffers not being cleared after the upload completes.

Move host indexes from reference/hosts/index to reference/hosts

Hosts are imported from FASTA files in the file manager since v1.8.0. The host data tree was changed from:

data_path/reference/hosts
data_path/reference/hosts/fasta
data_path/reference/hosts/index

to

data_path/reference/hosts

Only indexes are retained in this directory, not FASTA files.

Hosts imported before v1.8.0 will have to have their host file tree changed appropriately.

Change Client Transaction API

Change client transactions to use function chaining or promises:

eg. collection.request('test.test', {}).onSuccess().onFailure()

Alternatively, we could look into moving Redux, which may resolve this issue.

Dependency status view

A list of the external dependencies required by Virtool, their versions, and statuses.

Ideas:

  • functions for querying the dependency binaries that are required in PATH
  • functions for automatically installing these programs specifically for Virtool (installation sub-directory)
  • button for refreshing information
  • publication references
  • view licenses
  • descriptions
  • path returned by which

Improve Jobs View

Improve and simplify the jobs view in the client by:

  • removing archiving function
  • switching to a ListGroup instead of DynamicTable
  • allowing easy removal of completed, erroneous, or cancelled jobs
  • improving sorting

Virus Import Improvement

Not much feedback and progress information is shown to the user while importing a virus reference. Upgrade this functionality to show:

  • errors
  • progress bar
  • stats on imported viruses
  • better formatting

Host FASTA uploads

FASTA files currently have to be manually placed in the data/references/hosts/fasta path. A new interface for uploading hosts FASTAs or downloading from external URLs must be implemented.

Fix Read File Watching

Assess and fix a bug that is causing read files that no longer exist in watch on the server to remain in the client read listing.

Required enhancements to the file manager

There are some things the file manager currently does not do that it needs to:

  • remove untracked files from the files data path
  • remove database documents for files that have been manually removed from the files data path
  • remove tracked files that have expired

Unit Testing

Implement unit testing for:

  • Python code
  • Javascript code

Unhandled exception during software update

This exception is raised during the software update.

Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7fdc147e6158>, <tornado.concurrent.Future object at 0x7fdc34074438>)
Traceback (most recent call last):
  File "site-packages/tornado/ioloop.py", line 604, in _run_callback
  File "site-packages/tornado/stack_context.py", line 275, in null_wrapper
  File "site-packages/tornado/ioloop.py", line 619, in <lambda>
  File "site-packages/tornado/concurrent.py", line 237, in result
  File "<string>", line 3, in raise_exc_info
  File "site-packages/tornado/gen.py", line 1021, in run
  File "virtool/updates.py", line 180, in install_update
  File "site-packages/tornado/gen.py", line 1015, in run
  File "site-packages/tornado/concurrent.py", line 237, in result
  File "<string>", line 3, in raise_exc_info
  File "site-packages/tornado/gen.py", line 270, in wrapper
  File "/home/igboyes/.virtualenvs/virtool/lib/python3.5/types.py", line 243, in wrapped
  File "virtool/web.py", line 265, in _reload
  File "virtool/web.py", line 89, in initialize
  File "virtool/dispatcher.py", line 54, in add_interface
ValueError: Dispatcher already has interface with name 'settings'

Software update install needs more testing overall!

Jobs View Improvement

Improve the jobs view with the following changes:

  • remove the archived section as it is not necessary
  • add a button for clearing all completed, successful, or failed jobs
  • add a dashboard showing real resource usage on the host machine
  • add a component showing Virtool job resource reservations

Persistent 'no host' alert

An alert is visible in the host management view when no hosts have been successfully added. When a new host is added, the alert does not disappear without a page refresh.

Report Printing

Implement report printing. Features:

  • options for automatic formatting and letterheads
  • print a summary for multiple samples
  • print a report for a single analysis
  • print a report for all analyses attached to a single sample

Improve BASH Install Script

The install script needs a number of improvements:

  • render new lines cleanly when requesting input from the user
  • handle updates (in addition to installs)

Restore watch folder functionality

  • Any files placed in the watch folder will be uploaded by the Virtool file manager
  • Allow administrators to configure whether uploaded files are removed from the watch folder on upload

Automated Database Backups

Add options for automatically dumping the Virtool database to a local or remote file system location. Possible features:

  • backup interval setting
  • use SSH or SFTP

Extend organize.py module

This module contains functions that clean up the Virtool database on start. None of these functions are currently called.

They should be called as one of the first steps of starting the server.

Tailor job argument lists in client to specific task types

These lists are basically useless to the user at this point. Make more informative lists or consider getting rid of lists and having info blurbs.

For example, here is a current argument list:

  • algorithm: pathoscope_bowtie
  • username: igboyes
  • sample_id: fktma5
  • index_id: 9e94a2fc
  • name:
  • analysis_id: 0199cb5b

Modifications could include:

  • replacing database ids with human readable links to samples, analyses, etc
  • formatting algorithm names
  • giving index versions instead of ids

TypeError during analysis of some samples

Getting this exceptions during some analyses:

Traceback (most recent call last):
File "virtool/job.py", line 116, in run
File "virtool/analysis.py", line 217, in pathoscope
TypeError: 'NoneType' object is not subscriptable

This occurs during the Pathoscope stage of PathoscopeBowtie. Not if it occurs during PathoscopeSNAP. Is this due to having no high-quality mappings for that sample (ie. it is clean).

One CPH sample that causes this error is 16TFP146.

Decision Making Framework

Implement a framework that will allow users to attach diagnostic decisions to analysis and sample records. Features will include:

  • marking of decisive diagnostic hits in the Pathoscope viewer
  • marking of 'needs more work' hits in the Pathoscope viewer
  • searching samples by viruses as described by diagnostic decisions
  • text box or drop-down for hit decision rationale

Optimize use of D3 library

The D3 library has recently moved to a more modular structure.

This modularity is not currently being utilised by Virtool (the whole library is imported). Refactor all usage of
D3 to import only the bits needed.

Fix Virus Collection Checks

The code for verifying virus collections and their associated data in the data path does not work. Fix this for use during setup and normal repair of the database.

HMM Management View

Add a view for managing the HMM files and annotations included in Virtool for NuVs. Annotations should have:

  • editable names or nicknames
  • name of the user that assigned the nickname
  • date of nickname
  • read-only information from the annotation text files

HMM files should be described with:

  • file names and sizes
  • stats generated by hmmstat

FASTQ Uploading

Right now, sample FASTQ files have to be manually places in the watch path. Implement an upload system for these files.

Biopython Replacement

Biopython is a large library and only a very small part of it is used in Virtool. It drags Numpy and Scipy into the application. It should be replaced with a slimmer custom module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.