Giter Club home page Giter Club logo

mixcr's People

Contributors

alex-davydov avatar amikelov avatar chudakovdm avatar dbolotin avatar denkoren avatar github-actions[bot] avatar gnefedev avatar mike-ainsel avatar mizraelson avatar nicbarker avatar poslavskysv avatar tavinathanson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mixcr's Issues

Extended export options

  • Export canonical clonotypes only: CXX...XX[WF] mask for CDR3
  • Export functional clonotypes only: no * or _ in CDR3, V segment is not a pseudogene

Shortcuts for frequently used sets of settings

Like

rna = -OvParameters.geneFeatureToAlign=VTranscript
full-length = ...
short-full-length = full_length - FR4
dont-cluster = ....

Possible usage:

mixcr align -:rna -:dont-cluster ...

here dont-cluster will be skipped, as it affects only assembling stage, but by permitting such things we will allow to set the same set of parameters for all stages, in the end it will simplify implementation of #14 to be in the following form:

mixcr analyse -r report.txt -:compress-intermediate -:rna -:dont-cluster my_name_R1.fastq.gz my_name_R2.fastq.gz

which will produce the following set of files:

my_name.vdjca.gz
my_name.clns.gz
my_name.txt
report.txt

Amino acid mutations/alignments with reference genes

From this letter:

Oh well. One more question about the mutations: these are interpretable as SHM, right? Assuming there is not sequencing/pcr error, so NGS being completely error-free, then these mutations would be SHM, and, not as it is now the case a mixture of SHM and NGS-related errors, right? And a suggestion: it would be nice to have them also on the amino acid levels (similar to IMGT).

Implement actions for simple loci library creation

User story 1 (from IMGT-like reference):

  1. I have a set of fasta files with reference sequences of V, D, J or C genes.

  2. Each file is padded with . symbols or something similar to align anchor points. So each anchor point has the same position in all sequences. (exactly like IMGT gaps)

  3. There is file or command-line argument with positions of all anchor points. Something like this:

    V=108:117:125:148:157: etc...
    
  4. I can create new loci library from this information or append it to already existing one:

    mixcr addReferenceGenes --taxonId 9615 --speciesCommonName dog,canis --locus TRB --geneType V --anchorPoints 108:117:125:148:157 --geneNamePattern '...' input.fasta myLL.ll
    

    this will create myLL.ll file or add locus information to it if it already exists.

    • Loci library file can't store two records with the same combination of taxonId, locus and geneName.

User story 2 (library from genomic data, MiXCR way of LL creation):

  1. I have a big fasta file with genomic sequence of chromosome or particular locus.

  2. There is another file with tab-delimited list of reference genes. Example segments.txt:

    GeneName Locus GeneType AnchorPoints
    TRBV12-3 TRB V 123341:123356:123387:123456
    ... ... ... ...
  3. I can create new loci library from this information or append it to already existing one:

    mixcr addReferenceLocus --taxonId 9615 --speciesCommonName dog,canis input.fasta segments.txt myLL.ll
    

Unexpected output of mixcr program (v1.3). Fixed in v1.4?

I am testing the MixCR program (v1.3) and I have found an unusual situation when running 'exportAlignments'. The problem I have noticed is that the order in which sequences are provided in a FASTA or FASTQ file will affect the number of successful sequences that are aligned.

In the example(s) I provide below, I made a FASTA file containing 7 total sequences. There are only 4 unique NGS reads in the FASTA file; that is I repeated one sequences 3x and a second sequence 2x. The remaining two sequences should not return strong hits.

if I run mixcr using the 7 test sequences (test1.fasta), then the Mixcr log file says that 2/7 sequences (rather than 5/7) returned results. This is problematic in that not all 5 are found, BUT even more problematic is if I simply change the order of the sequences in the file (test2.fasta) then the Mixcr log file says 4/7 (rather than 5/7) returned results.

The fact that I do not see 5/7 sequences successfully returned seems to be a bug. Also, I would not expect the output of exportalignments to be sensitive to the order of the sequences in a file. Is this true? If so, is it a known problem?

If its not a bug, then how can I run the settings so that I get all 5 successful sequences returned when using 'exportalignments'?

Implement infrastructure for simple loci library installation and usage

This issue is connected to #42

User story 1 (installation of loci library)

  1. If I have a custom loci library and want to use it without specifying the full path I can put it into following locations:
    • PATH_TO_MIXCR_SCRIPT/reference/ for system-wide installation
    • ~/.mixcr/reference/ for user-local installation
    • working directory . or ./reference
  2. Symlinks in any of the following cases should be correctly dereferenced by MIXCR

User story 2 (usage of custom loci library)

  1. If I have a custom loci library I can use it, either I have installed it or not, in the following way:
    • If it is installed as described above:

      mixcr align --lociLibrary myLL ....
      mixcr assemble ...
      mixcr exportClones ...
      

      I don't have to specify loci library second time in assemble, as *.vdjca file already contains this informatio

    • If I just have a file somewhere in the file system:

      mixcr align --lociLibrary /path/to/myLL ....
      mixcr assemble  --lociLibrary /path/to/myLL ...
      mixcr exportClones  --lociLibrary /path/to/myLL ...
      

User story 2 (default loci library)

  1. Any installed libraries with names other than default.ll will be used only if user specified it on the align step, the internal mi.ll will be used if --lociLibrary option is not used.
  2. Library installed with the name default.ll will be used by default.

RnaSeq parameters as default

My test showed that rna-seq parameters performs slightly better on real highly enriched datasets. While I expected the opposite effect. MiXCR with this parameters has nearly zero false positive rate, and sensitivity is also very high (it detects nearly all V(D)J events even in short 75+75 RNA-Seq datasets).

So, why don't we use this parameters as default?

Additional testing on broader spectrum of real enriched datasets required.

Limit possible set of D genes

Limit possible set of D genes only to loci of V and J genes.

To exclude combinations like:
TRBV -- IGHD -- TRBJ

High priority for clones lower priority for alignments (vdjca files).

Filtered export

  • Export canonical clonotypes only: CXX...XX[WF] mask for CDR3
  • Export functional clonotypes only: no * or _ in CDR3, V segment is not a pseudogene

Fix documentation

  • add options --filter-out-of-frames and --filter-stops to export
  • rename -presetFile with preset-file in export
  • rename -listFields with list-fields in export
  • add description for --save-reads in align
  • add description for --index in assemble
  • add info on possibility to add JVM args mixcr -Xmx2g align ...
  • add description for clones <-> reads mapping (export fields, new actions etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.