Giter Club home page Giter Club logo

Comments (4)

jacaravas avatar jacaravas commented on September 25, 2024 1

Hi @ivan-aksamentov

I'm sorry about the mistaken tags. I believe new issues have those tags by default.

Yes, the index is causing issues and is not really compatible with our analysis system. Our data is chunked and analyzed in independent batches, so the index is meaningless and non-unique when the data is re-combined.

The problem caused by the extra index column is probably going to be easy to fix on our end, but any change in table outputs or command line options does require us to change our code base. In this case, it won't require a database schema update because I think the correct course of action is to simply drop it.

I just wanted to check in because this has never been an identifier in Nextclade output before and, at first glance, it looked trivial (row number).

Thank you for your response, @ivan-aksamentov

from nextclade.

ivan-aksamentov avatar ivan-aksamentov commented on September 25, 2024

@jacaravas Hi Jason. Yes this is an intentional addition. I forgot to mention it in the changelog and in the docs.

The index column contains an index of the corresponding fasta entry. This is a better identifier than sequence name, because sequence names are not unique.

You marked it as a bug. Are you facing any problems because of that addition?

from nextclade.

ivan-aksamentov avatar ivan-aksamentov commented on September 25, 2024

@jacaravas

any change in table outputs or command line options does require us to change our code base

I see. Nextclade is a fast-moving project. We research things and add and remove stuff all the time. So it might be pretty annoying for you :)

One way you could protect your code from breakage is to not rely on column order. We try hard to keep column names stable.

Additionally, you could freeze Nextclade version for some time to avoid sudden breakage, and have some sort of periodic upgrades where things break, but only when you want it to and have time to resolve.

from nextclade.

ivan-aksamentov avatar ivan-aksamentov commented on September 25, 2024

Our data is chunked and analyzed in independent batches, so the index is meaningless and non-unique when the data is re-combined.

You should be very careful about attributing particular analysis results back to their original samples (if you need this). People don't care much about how they call their sequences, so the index is the only way to tie outputs back to the inputs reliably. In your case the uniqueness can be ensured for example by combining batch index (or another unique identifier of a batch) with result index in that batch.

from nextclade.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.