Comments (4)
Good idea! We will add a data set soon. What has been holding us back was the fact that clades are defined only for the HA gene. But we'll add NA data sets and can include tamiflu resistance mutations as 'labeled mutations'
from nextclade_data.
we have added NA datasets for A/H3N2, A/H1N1pdm, and B/Vic!
from nextclade_data.
Hi @rneher, Thank you for this implementation. I haven't spotted the 'labelled mutations" column when running nextclade H1N1 NA characterization. Is this in the pipeline or I am missing something?
from nextclade_data.
@jrotieno As far as I know this hasn't been done for flu datasets.
The split of mutations into labeled/unlabeled mutations requires some additional configuration in the dataset. The labels-to-mutations mapping and the reverse mapping, need to be precomputed (from the existing data) and then added into nucMutLabelMap
and nucMutLabelMapReverse
fields of the virus_properties.json
file of the dataset. This is how it's configured in SARS-CoV-2:
I don't know if Richard and Cornelius plan to do this, but you can do this yourself too, by taking the existing virus_properties.json
from a flu dataset ad adding these fields. Contributions are very welcome!
If you decide to try, in order to test things, you can override virus_properties.json
in Nextclade CLI by using --input-virus-properties
CLI parameter or in Nextclade Web by selecting a flu dataset, clicking "Customize dataset files" and dropping a file in the "Virus properties" section.
from nextclade_data.
Related Issues (20)
- Rename header for MPXV dataset HOT 2
- ENH: Add known stops/frameshifts to monkeypox dataset
- ENH: Add unaliased pango column to 21L focus build HOT 1
- ENH: Add labeled mutations to monkeypox datasets HOT 1
- Splitting up clade/WHO_name for SARS-CoV-2 HOT 3
- ENH: Add BA.2/4/5 recombinants to BA.2 dataset HOT 1
- ENH: Add parent lineages as partiallyAliased column for recombinants HOT 2
- ENH: Add key RBD mutations as labelled mutations HOT 5
- Dated nodes in SARS-COV-2 tree HOT 2
- Dataset question in a particular case HOT 1
- 20C is not stable, jumps around HOT 1
- Next update for sars-cov-2 dataset? HOT 2
- ENH: G_clade GB1 for RSV-B? HOT 4
- Some influenza datasets are missing glyco data (virus_properties.json) HOT 2
- No datasets found having attributes name: flu_h1n1pdm_na HOT 2
- Next update for sars-cov-2 dataset? HOT 1
- Nextclade CLI - thousands of unclassified sequences from GISAID - mostly early April 2023 submissions HOT 1
- SARS-CoV-2 Variant designated a while ago is missing ( BA.5.12 )
- Automate creation of GitHub releases HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextclade_data.