Comments (4)
I'm sorry about the mistaken tags. I believe new issues have those tags by default.
Yes, the index is causing issues and is not really compatible with our analysis system. Our data is chunked and analyzed in independent batches, so the index is meaningless and non-unique when the data is re-combined.
The problem caused by the extra index column is probably going to be easy to fix on our end, but any change in table outputs or command line options does require us to change our code base. In this case, it won't require a database schema update because I think the correct course of action is to simply drop it.
I just wanted to check in because this has never been an identifier in Nextclade output before and, at first glance, it looked trivial (row number).
Thank you for your response, @ivan-aksamentov
from nextclade.
@jacaravas Hi Jason. Yes this is an intentional addition. I forgot to mention it in the changelog and in the docs.
The index
column contains an index of the corresponding fasta entry. This is a better identifier than sequence name, because sequence names are not unique.
You marked it as a bug. Are you facing any problems because of that addition?
from nextclade.
any change in table outputs or command line options does require us to change our code base
I see. Nextclade is a fast-moving project. We research things and add and remove stuff all the time. So it might be pretty annoying for you :)
One way you could protect your code from breakage is to not rely on column order. We try hard to keep column names stable.
Additionally, you could freeze Nextclade version for some time to avoid sudden breakage, and have some sort of periodic upgrades where things break, but only when you want it to and have time to resolve.
from nextclade.
Our data is chunked and analyzed in independent batches, so the index is meaningless and non-unique when the data is re-combined.
You should be very careful about attributing particular analysis results back to their original samples (if you need this). People don't care much about how they call their sequences, so the index is the only way to tie outputs back to the inputs reliably. In your case the uniqueness can be ensured for example by combining batch index (or another unique identifier of a batch) with result index in that batch.
from nextclade.
Related Issues (20)
- Nextclade Web: Confusing unwanted dataset switching HOT 3
- Nextclade Web: consider rethinking dataset badges HOT 1
- Nextclade Web: don't store unnecessary dataset info in local storage
- [minor] Auspice dataset functionality: URL redirects don't update displayed metadata HOT 3
- Max marker setting even counts markers that are off
- Frameshift and insertion markers cannot be disabled/configured in contrast to all other markers
- Unfolding <details> in changelog in website jumbles things up
- Rename `master` to `main`
- How does one update Nextclade CLI? I cannot find any instructions on the Nextclade CLI page, only descriptions of various updates? HOT 1
- Bioconda workflow failed with push error due to insufficient permissions HOT 7
- SVG download for the Results table
- Default threads for webapp are set too high HOT 2
- Add coverage per CDS to output HOT 8
- List of mutational changes per clade HOT 2
- Can influenza H5 datasets be available for nextclade CLI HOT 3
- Can't do quality control when change reference HOT 1
- Nucleotide insertions not shown in peptide tooltip, causing confusion in case of frameshift due to nt insertion HOT 6
- nextclade run with optional dataset tag input HOT 5
- Definition of Private Mutation/Deletion in Nextclade CLI ndjson HOT 2
- qc algorithm for nonACTGNs HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextclade.