Context
Currently we use a long composite clade name for SARS-CoV-2 datasets that combines Nextstrain clade
, WHO name
, and sometimes legacy extra names like 20H (Beta, V2)
.
It would be useful to break these up more cleanly - especially the historic names are only kept for backwards compatibility, not because they are meaningful anymore.
In the future we would like to cleanly use the following:
Nextstrain_clade
-> 22D
WHO name
-> Omicron
Pango lineage is already annotated as a Nextclade_pango
column, so no change is necessary.
For a transition period, we would like to keep the old column while already having the new columns.
The migration should work as follows:
Step 1
Add extra columns Nextstrain_clade
, WHO_name
Add a column Nextstrain_legacy
which will maintain the old naming scheme for backwards compatibility
clade_membership
will stay as is in step 1.
These attributes should be output into the tsv/csv, the web version should not display these columns. This may require a slight code change/extension to Nextclade by @ivan-aksamentov:
Currently, extra columns are specified in the tree.json with a dict: {name,displayName,description}
, this should be extended by a showWeb
or similarly named boolean attribute. If set to false
, it is not shown - it defaults to true
to maintain backwards compatibility.
Data users can start using the new names Nextstrain_clade
and WHO_name
from now on, and those who want to keep using the historic names should start switching to Nextstrain_legacy
so their software keeps working once we implement step 2.
Step 2 (due 2023-02-01)
clade_membership
will stop using 20H (Beta, V2)
and start using Nextstrain_clade
. Web view will switch on WHO_name
and will keep Nextstrain_clade
and Nextstrain_legacy
switched off.
Step 3 (if ever in far future)
Nextstrain_legacy
is deprecated and removed.
Discussion
As soon as Step 1 is implemented, we can start using the new metadata fields in ncov-ingest and in builds as colouring etc.
As soon as step 1 is implemented, users can start migrating away from clade
in the metadata and start using Nextstrain_legacy
for backwards compatibility if they want to use the old names.
For data users, it is step 1 that's critical.
Step 2 is to complete the migration on the frontend. It should happen maybe after 1-2 months of step 1 being implemented, to give enough warning to make the few code changes necessary.
Implementation
For step 1 to go ahead we need the code change in Nextclade as described above (@ivan-aksamentov) and dataset changes (@corneliusroemer).
This should be possible within 2-4 weeks.
Once step 1 is complete, we can communicate the changes and provide a migration guide.
Step 2 just involves @corneliusroemer making another dataset change.