Giter Club home page Giter Club logo

Comments (3)

qmarcou avatar qmarcou commented on July 25, 2024

Hi Jeremy,
Thank you for pointing this out, this is for now intentional as we do not have a probabilistic in/del model.
For now alignments with in/dels are discarded by the inference module as a safeguard because it is not clear how one should weight them. However I realize how limiting this might be and will make a quick fix so as these sequences will be taken into account however without proper in/del probabilistic treatment.
Would that suit your needs?

from igor.

jeremycfd avatar jeremycfd commented on July 25, 2024

Hi @qmarcou! Wanted to follow up on this after thinking about it... So I'm interested in using IGoR for estimating Pgen, but many of the sequences that exist for TCR are fairly low-quality Sanger sequences. We have fairly robust pipelines for annotating and finding the CDR3 regions despite these low-quality bases interspersed throughout the sequence, but I haven't yet figured out how to feed in to your model our parsed annotation/CDR3 info. I've played around with simply setting alignment thresholds to 15 for all segments and deleting all the insertion, deletion, and mismatch information in the IGoR alignment outputs, simply because that allows Pgen to be calculated, but I'm concerned that some of that information is is actually necessary to get an accurate Pgen estimate. Do you have any thoughts on this? Would it be better to just put in the CDR3 nucleotide sequences and set the ---thresh to something low so that it will capture V alignments? That's obviously easy to do, but I worry that as a result of decreasing the amount of V and J sequence for mapping, we would be inferring incorrect V and J segments...

Thanks!

from igor.

qmarcou avatar qmarcou commented on July 25, 2024

Hi,
Your intuition is correct and you should probably refrain from editing the alignments results (as their results are used by the inference machinery, and simply removing potential insertions/deletions make them nonsensical). If you want to avoid considering gapped alignments for now the safest would be to set the gap penalty to a very high value (e.g 9999).
The other solution is connected to a second issue you have opened #7, and would be to provide directly the V/J templates as inferred from your upstream computational pipeline. Of course this would not be as precise but should still give a reasonable Pgen estimate for your sequences.
The cleanest solution remains for me to handle correctly gapped aligments, I have made a bit of progress there but the full probabilistic treatment of these in/dels is still quite far. I'll keep working on this and keep you updated!

from igor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.