tron-bioinformatics / covigator Goto Github PK
View Code? Open in Web Editor NEWCoVigator - Monitoring SARS-CoV-2 mutations
License: MIT License
CoVigator - Monitoring SARS-CoV-2 mutations
License: MIT License
When taking screenshots for the manuscript I noticed that this columns takes space and it is not very informative. Whatever effect the mutations have this is encoded in the HGVS code, which is enough for the trained eye.
We agreed we cannot get rid of the apply buttom. But maybe we can make it more explicit that it needs to be clicked
Following up on #61 we want here to include in the DB the mutations that are described to conform a given lineage. This is slightly different from the mutations that are observed in the samples from this lineage.
We could create a table LineageVariant
with one FK pointing to the Lineage
table and another FK pointing to the Variant
table.
Potential issues:
Variant
table contains the functional annotations coming from SnpEff. This information is obtained from the VCF files that come out of the pipeline. We won't be able to add this information if the database does not contain this variant beforehand.In the recurrent mutations tab the top table of recurrent mutations can show two metrics segregated by month. The frequency within the month (ie: the frequency of how many samples collected in that month show the mutation) and the default is the count within the month. The first option is not working properly.
I initially thought of having a new lineages table in the database, where the PK is the Pangolin lineage identifier (ie: B.1.1.7) and additionally we have metadata about like WHO designation (ie: Delta), flag of VOC, parent lineage, etc.
This is a good resource to fetch such data: https://github.com/cov-lineages/constellations
There may be others.
Subtasks:
Before we implement a search tab which pose more challenges, we may select a lineage in the lineages dashboard with a dropdown.
In the ENA dataset we found 105 samples that have an empty collection date. These were processed otherwise properly and they have mutations called.
We want to add an exclusion criteria for new samples coming in from both datasets and apply this exclusion retroactively
The accessor queries the ENA REST API and stores each sample metadata (ie: URLs to FASTQs, country, collection date, etc.). It also, optionally, downloads the FASTQ files and stores them in the file structure where they can be later found.
The ENA API was updated in May 2023. We need to check if the changes affect the CoVigator accessor module.
More information here:
https://docs.google.com/document/d/1RPHmK8Pvm9UxSa21Ej3MkGoGYO9baSxwxk_dOuWWyNE/edit#heading=h.jq87g3izg5xr
In the lineages tab, the second plot shows the abundance of each lineage per day. The lines have a lot of local peaks that hamper the interpretation of the plot.
Within queries.py some SQL queries are created by string formatting. This is a possible point for SQL injections.
SQLalchemy uses the TextClause object for simple SQL queries. In it, you can specify placeholders that will be replaced with values from a dictionary when the query is executed using the execute method.
Some ideas from our team meeting:
Use lineage metadata from #61 to enrich the dashboard. Update the combo boxes in the lineages plot where we can select the lineages to show.
A search bar in the lineages dashboard that allows to search at least for Pangolin identifiers and WHO designations.
It is difficult to map each of these grey boxes on the top left to the corresponding plot.
We tried in the past to capture user feedback by providing buttons to create GitHub issues in the acknowledgments section. We did not receive anything so far. Either because not being published we have not reached a lot of people or because this is kind of hidden.
It may be helpful to have a more pervasive provide feedback feature with a visible button available from all locations of the dashboard. The article below provides a description of how to do this in a Dash dashboard.
https://medium.com/codex/how-to-create-a-dashboard-with-a-contact-form-using-python-and-dash-ee3aacffd349
Describe the bug
When switching to the samples tab it takes a long time (more than 10 seconds and less than a minute) to show the data.
This behaviour is also reproducible when already in the samples tab the source GISAID is selected. When the ENA source is selected everything is much faster.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Something faster!
Additional context
Reproducible from any environment
I though of having a third button in the landing page that points to an independent URL rather than having an additional tab within the existing dashboards.
Other ideas?
The constellation repository has been updated. It now contains the defining mutations for the XBB lineages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.