Giter Club home page Giter Club logo

vdjtools's People

Contributors

bvdmitri avatar dbolotin avatar jbengler avatar mikessh avatar rhilker avatar smoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vdjtools's Issues

Overlap statistics

Hi everybody !
I am using VDJTools for RNA-seq analysis but I am not good in informatic of mathematic or statistic. I am just a biologist :)
So i am asking how the "relative overlap diversity" is measured ? Because I was thinking this value should be really different of similarity index, but it seems that they follow the same variations when I overlap my samples...
Thank in advance for the answer.

Best regards

Optimize DB lookup

  • Use ClonotypeKeyGenerator to reverse hash<->iterable implementation of DB lookup which is now very computationally inefficient (yet memory efficient)
  • Consider adding this together with low-mem option

EDIT: Use suffix tree searches (SequenceTreeMap from milib)

Additional options for Convert util

Clonotypes and frequencies:

  • Re-normalizing sample
  • Collapsing duplicates

Segments:

  • Re-calculating V/D/J boundaries (requires a built-in aligner)
  • Refinement/de novo of D mappings

Diversity measures

  • Add D50 diversity estimate
  • Re-organize diversity measures into "estimate" and "index".. or "richness" and "diversity".. ?
  • Test diversity measures

NullPointerException in CalcCdrAAProfile routine

A java NullPointerException occurs if I call the CalcCdrAAProfile routine. As far as I can see this seems to happen even before command line arguments are read. Here's what the _vdjtools_error.log says:

[Wed Jul 27 11:37:35 CEST 2016 BEGIN]
[Script]
CalcCdrAAProfile
[CommandLine]
executing vdjtools-1.1.0.jar CalcCdrAAProfile -h
[Message]
java.lang.NullPointerException: Cannot invoke method join() on null object
[StackTrace-Short]
com.antigenomics.vdjtools.profile.CalcCdrAAProfile.run(CalcCdrAAProfile.groovy:41)
com.antigenomics.vdjtools.profile.CalcCdrAAProfile$run.call(Unknown Source)
com.antigenomics.vdjtools.misc.ExecUtil.run(ExecUtil.groovy:94)
com.antigenomics.vdjtools.misc.ExecUtil$run.call(Unknown Source)
com.antigenomics.vdjtools.VdjTools.run(VdjTools.groovy:207)
com.antigenomics.vdjtools.VdjTools.main(VdjTools.groovy)
[StackTrace-Full]
java.lang.NullPointerException: Cannot invoke method join() on null object
    at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:88)
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
    at com.antigenomics.vdjtools.profile.CalcCdrAAProfile.run(CalcCdrAAProfile.groovy:41)
    at com.antigenomics.vdjtools.profile.CalcCdrAAProfile$run.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
    at com.antigenomics.vdjtools.misc.ExecUtil.run(ExecUtil.groovy:94)
    at com.antigenomics.vdjtools.misc.ExecUtil$run.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
    at com.antigenomics.vdjtools.VdjTools.run(VdjTools.groovy:207)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1085)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909)
    at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:901)
    at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:884)
    at org.codehaus.groovy.runtime.InvokerHelper.runScript(InvokerHelper.java:406)
    at org.codehaus.groovy.runtime.InvokerHelper$runScript.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
    at com.antigenomics.vdjtools.VdjTools.main(VdjTools.groovy)
[END]

V-D-J insert sizes bug?

Investigate V/D/J end/start coords reporting convention in various tools and update clonotype parsing accordingly

Sample pool

Implement sample pool (mass join)
Total diversity estimate for a population of samples using Chao2

Implement pwm decomposition of repetoire

  • Collect N x M pwm matrices, where
    • N is the number of V segments
    • M is the number of peaks in the spectratype
  • Extract sequence motifs
    • Under some given threshold (e.g. if top AA is found in < 75% reads put X)
    • Consider visualization using R

Error converting from MiTCR

I'm using MiTCR latest version on a TRA experiment (mouse). The conversion fails like this - can you advise on what is wrong?

$ vdjtools Convert -S mitcr full/mid1_clones.csv vdjtools/mid1_clones.csv
Executing com.antigenomics.vdjtools.misc.Convert -S mitcr full/mid1_clones.csv vdjtools/mid1_clones.csv
[Mon Sep 12 13:18:04 CEST 2016 Convert] Reading sample(s)
[Mon Sep 12 13:18:04 CEST 2016 Convert] 1 sample(s) loaded
[Mon Sep 12 13:18:04 CEST 2016 SampleStreamConnection] Loading sample mid1_clones
[ERROR] java.lang.RuntimeException: Unable to parse clonotype string 1  2687    0.0605548419083677  TGTGCTTTGCGGGGGCAGCAAGGCACTGGGTCTAAGCTGTCATTT   GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG   TRAV12-1*03(306.7),TRAV12-1*04(306.6),TRAV12D-2*04(300.1),TRAV12D-2*02(298.9),TRAV12N-2*01(298.6)       TRAJ58*01(308.7)        270|279|304|0|9||45.0;270|279|302|0|9||45.0;267|276|300|0|9||45.0;267|276|301|0|9||45.0;267|276|301|0|9||45.0       21|52|83|14|45||155.0                                               TGTGCTTTGCGGGGGCAGCAAGGCACTGGGTCTAAGCTGTCATTT   38                              CALRGQQGTGSKLSF     :::::::::0::9:::::14::45::: for MiTcr input type., see _vdjtools_error.log for details

R wrappers and template

  • Add R wrappers for most useful VDJtools output types
  • Refactor existing scripts based on them
  • Add R template demonstrating running VDJtools and managing metadata

Additional analysis modes/output

Add R scripts for

  • Clonal homeostasis
  • PCA based on V/J usage profiles
  • A plot for sample pool #3
  • Visualize overlap between up to 10 repertoires in a way other than chord graph

Implement management of "unresolved" v d j families

We have ImmunoSeq datasets containing the value "unresolved" in the columns v, dor j.
In case this occurs at least once in v and d this causes the facyvj plot to fail.

I think the reason is line 75 of vj_pairing_plot.r. This line grid.col = c(rcols, ccols) causes a vector with two occurrences of "unresolved" (One from v and one from d). Upon transformation to a factor this messes up the blotting.

However, you could also argue that filtering "unresolved" before analyzing would be the preferred way to go anyway. This is just to let you know that we ran into this problem.

All the best and tanks a lot for this amazing piece of software!

Double bug fix

Sample Annotation class

database.each { String seq ->
dbCdrFreqs.put(seq, 0)

causes error (0 is Integer value)

resolved by

database.each { String seq ->
dbCdrFreqs.put(seq, (Double) 0)

Error on R step

Hello all.
I install vdjtools 1.2 on my server via root.
But if my users run this tool they got error:

$ /srv/dna_tools/vdjtools-1.1.1/vdjtools CalcSegmentUsage -m /shared/vdjtools/CalcSegmentUsage/input/metadata.txt -p -f age -n /shared/vdjtools/CalcSegmentUsage/output/2

[RUtil] Executing Rscript vexpr_plot.r /shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.V.txt 48 0 3 TRUE /shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.V.pdf
[ERROR] Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

Loading required package: RColorBrewer
Loading required package: ggplot2
Loading required package: plotrix

Attaching package: ‘plotrix’

The following object is masked from ‘package:gplots’:

plotCI

Error in pdf(fname) :
cannot open file '/shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.V.pdf'
Calls: custom.dev -> pdf
Execution halted

[RUtil] Executing Rscript vexpr_plot.r /shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.J.txt 13 0 3 TRUE /shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.J.pdf
[ERROR] Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

Loading required package: RColorBrewer
Loading required package: ggplot2
Loading required package: plotrix

Attaching package: ‘plotrix’

The following object is masked from ‘package:gplots’:

plotCI

Error in pdf(fname) :
cannot open file '/shared/vdjtools/CalcSegmentUsage/output/2.segments.wt.J.pdf'
Calls: custom.dev -> pdf
Execution halted

All files on tool folder have permission 777 .
vdjtools Rinstall run corretcly and install packages on tool folder Rpackages. When i Run tool via root it's all ok.
Why this error got?

PlotFancySpectratype fatal error (Mac OS X El Capitan, R 3.3.1)

Hi there!

Running latest version of Mac OS X VDJTools (installed via Homebrew). Can't run PlotFancySpectratype, as following error pops up:


Executing com.antigenomics.vdjtools.basic.PlotFancySpectratype VDJ_.3_Nt-sequences.txt FINAL
[Thu Nov 17 21:31:17 CST 2016 PlotFancySpectratype] Reading sample
[Thu Nov 17 21:31:17 CST 2016 SampleStreamConnection] Loading sample VDJ_.3_Nt-sequences
[Thu Nov 17 21:31:18 CST 2016 ClonotypeStreamParser] Finished parsing. 1 header and 0 bad line(s) were skipped.
[Thu Nov 17 21:31:18 CST 2016 SampleStreamConnection] Loaded sample VDJ_.3_Nt-sequences with 508 clonotypes and 518 cells. Memory usage: 4 of 8 GB
[Thu Nov 17 21:31:18 CST 2016 PlotFancySpectratype] Writing output and plotting data
[RUtil] Executing Rscript fancy_spectratype.r FINAL.fancyspectra.txt FINAL.fancyspectra.pdf Clonotype TRUE
[ERROR] Loading required package: ggplot2

 *** caught segfault ***
address 0x18, cause 'memory not mapped'

Traceback:
 1: dyn.load(file, DLLpath = DLLpath, ...)
 2: library.dynam(lib, package, package.lib)
 3: loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]])
 4: asNamespace(ns)
 5: namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc,     .libPaths()), versionCheck = vI[[j]]), i[[2L]], from = package)
 6: loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]])
 7: namespaceImport(ns, loadNamespace(i, c(lib.loc, .libPaths()),     versionCheck = vI[[i]]), from = package)
 8: loadNamespace(package, lib.loc)
 9: doTryCatch(return(expr), name, parentenv, handler)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call)[1L]        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        msg <- conditionMessage(e)        sm <- strsplit(msg, "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && identical(getOption("show.error.messages"),         TRUE)) {        cat(msg, file = stderr())        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
13: try({    attr(package, "LibPath") <- which.lib.loc    ns <- loadNamespace(package, lib.loc)    env <- attachNamespace(ns, pos = pos, deps)})
14: library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,     warn.conflicts = warn.conflicts, quietly = quietly)
15: doTryCatch(return(expr), name, parentenv, handler)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
17: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch(library(package, lib.loc = lib.loc, character.only = TRUE,     logical.return = TRUE, warn.conflicts = warn.conflicts, quietly = quietly),     error = function(e) e)
19: require(ggplot2)
An irrecoverable exception occurred. R is aborting now ...

The error occurs when I run the following part of the fancy_spectratype.R script in RStudio:

ggplot(df.m, aes(x = Len, y = value, fill = variable)) +
  geom_bar(width = 1, stat = "identity") +
  xlab("CDR3 length, bp") +
  labs(fill=label) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  scale_fill_manual(values=c("grey75", pal)) +
  theme_bw() +
  theme(legend.text=element_text(size=8), axis.title.y=element_blank()) +
  guides(fill = guide_legend(reverse = TRUE))

Currently running R 3.3.1, and have been able to use PlotVJFancy and CalcBasicStats and CalcSpectratype without issue. Thanks in advance for your help!

CDR3 amino acid physical properties

  • Implement calculation of CDR3 aa property profiles using various amino acid groupings that are based on their physical properties
    • Basic properties: polar/non-polar, etc
    • Kidera factors

Major refactoring

Utilize methods from Common/Misc classes instead of manual implementation in worker scripts

Are mouse gene lists available?

Hi Mikhail and @antigenomics team!

Have really enjoyed using VDJTools so far for human data. Had to do a vanilla install of R 3.3.0 to get it working on my Mac but otherwise no issues until I tried to import IMGT alignments of mouse TRB sequences. The overwhelming majority of reads were rejected as bad lines (e.g. 38/18276 lines were retained after conversion from IMGT) -- is this because the mouse gene lists are currently unsupported in VDJTools? Thanks a ton for your help.

Re-normalization

Implement re-normalizing utility (clonotype freqs in output sum to 1.0)
Consider cases when it is needed to retain clonotype frequencies (total < 1.0), e.g. VDJdb output re-annotation. Perhaps this can be made the default option.

Search sample(s) for CDR3 pattern

  • Both amino acid and nucleotide
  • Only basic implementation, e.g. XGGXX; more complex searches should be done via cdr3align
  • (?) Additional filters, e.g. V/J/CDR3 length

Documentation

Introduction

VDJtools is an open-source Java/Groovy-based framework designed to facilitate analysis of immune repertoire sequencing (RepSeq) data. VDJtools computes a wide set of statistics and is able to perform diverse cross-sample analysis. Both comprehensive tabular output and publication-ready plots are provided.

RepSeq link does not work
http://www.ncbi.nlm.nih.gov/pubmed/22043864a

Plotting error

Likely an issue with ggplot

java -Xmx20G -jar vdjtools-1.0.7.jar OverlapPair  -p ./samples/TW437.txt ./samples/TW438.txt out/A
Executing com.antigenomics.vdjtools.overlap.OverlapPair -p ./samples/TW437.txt ./samples/TW438.txt out/A
[Sat Mar 26 23:15:51 CST 2016 OverlapPair] Reading samples ./samples/TW437.txt and ./samples/TW438.txt
[Sat Mar 26 23:15:51 CST 2016 SampleStreamConnection] Loading sample TW437
[Sat Mar 26 23:15:53 CST 2016 ClonotypeStreamParser] Finished parsing. 1 header and 0 bad line(s) were skipped.
[Sat Mar 26 23:15:53 CST 2016 SampleStreamConnection] Loaded sample TW437 with 18262 clonotypes and 3272130 cells. Memory usage: 1 of 18 GB
[Sat Mar 26 23:15:53 CST 2016 SampleStreamConnection] Loading sample TW438
[Sat Mar 26 23:15:54 CST 2016 ClonotypeStreamParser] Finished parsing. 1 header and 0 bad line(s) were skipped.
[Sat Mar 26 23:15:54 CST 2016 SampleStreamConnection] Loaded sample TW438 with 28973 clonotypes and 3317916 cells. Memory usage: 1 of 18 GB
[Sat Mar 26 23:15:54 CST 2016 OverlapPair] Intersecting
[Sat Mar 26 23:15:54 CST 2016 Overlap] Intersecting samples #0 and 1
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing Correlation
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing Diversity
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing Frequency
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing Frequency2
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing vJSD
[Sat Mar 26 23:15:54 CST 2016 SegmentUsage] Processing sample TW437
[Sat Mar 26 23:15:54 CST 2016 SegmentUsage] Processing sample TW438
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing vjJSD
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing vj2JSD
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing sJSD
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing Jaccard
[Sat Mar 26 23:15:54 CST 2016 OverlapEvaluator] Computing MorisitaHorn
[Sat Mar 26 23:15:54 CST 2016 OverlapPair] Writing output
[Sat Mar 26 23:15:54 CST 2016 OverlapPair] Plotting
[RUtil] Executing Rscript a5816912-425f-4434-9974-9619cf6d4d03_intersect_pair_scatter.r TW437 TW438 out/A.xy.txt out/A.xx.txt out/A.yy.txt out/A.strict.paired.scatter.pdf
null device
          1
[RUtil] Executing Rscript e6669afc-2f85-4cfb-b7cc-4fd8152689da_intersect_pair_area.r TW437 TW438 out/A.paired.strict.table.collapsed.txt out/A.paired.strict.table.collapsed.pdf
[ERROR] During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
6: Setting LC_PAPER failed, using "C"
7: Setting LC_MEASUREMENT failed, using "C"
Loading required package: ggplot2
Loading required package: RColorBrewer
Error: Unknown parameters: guide
Execution halted

Documentation

Introduction

VDJtools is an open-source Java/Groovy-based framework designed to facilitate analysis of immune repertoire sequencing (RepSeq) data. VDJtools computes a wide set of statistics and is able to perform diverse cross-sample analysis. Both comprehensive tabular output and publication-ready plots are provided.

RepSeq link does not work
http://www.ncbi.nlm.nih.gov/pubmed/22043864a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.