The clustirr from snaketron

Bechmarking needed

gliphR vs. Jan's turboGliph vs. gliph's original versions vs. ting

Benchmarking to show
a) consistency: we should hope to find similar results given a simple sample
b) that choices in gliphR are biologically more suitable by examining clonaly expanded samples

on b) do we have an appropriate naive reference?

At first for us against ground truth for sanity checks
later on maybe included vignette in follow up version

What should be the main input (data_sample) of gliphR?

The original gliph algorithm uses as input the following:

minimum: vector of CDR3b sequences
maximum: data.frame with CDR3b + V + J (+ 3 columns for alpha chain)

To use V+J information in such a way that processes of local/global clustering are affected, the
user also has to use setup additional input parameters. It is my impression that very few users
do this, i.e. most users will provide CDR3b sequences only as input.

Hence my suggestion for gliphR:

We use as main input (parameter data_sample) a data.frame with 1 or 2 columns:

if 1 column -> the column has to be named CDR3a or CDR3b
if 2 columns -> the columns will represent CDR3a and CDR3b (order not relevant)

What do you think?

General TODO's to make R-package suitable for Bioconductor

This is a general TODO list

graph.R and plot_graph.Rd

In some of our earlier meetings we talked about the visualization of ClustIRR. This is a summary:

expand_clones parameter is not needed. This makes the algorithm computation/memory-wise more tractable
plot_graph.Rd -> graph.Rd; it is more convenient to name the man files the same as the scripts they describe. In my opinion, we need to work a little bit more on the text in this man page. Novice user should be able to use this page as a reference to interpret all visual symbols (shapes, colors, etc.) they encounter in the output.

If we remove expand_clones we can also simplify the vignette.

additional vignette about version comparison

Add get_igraph() function

replace or complement get_edges() function with function to receive igraph compatible data object

small examples at the end of man files

Integrate trim warning

directed graph?

https://github.com/snaketron/ClustIRR/blob/531e13aa1969e4b94f6e432d0f3457809df00c98/R/graph.R#L120C11-L120C32

@kaozkai

I though we were creating undirected (simple) graph. Is this correct?

Finish vignette

TODO:

finish section about edges
transfer left parts about version 1 and version 2 into the comparison vignette
add section with minimal graphical output

For global distances allow "interface" to tcrdist3

Internally global distances are computed between CDR3 sequences using hamming distance.

Well-known tools, such as tcrdist3, already exist that can compute global distances using more sophisticated. We should at least provide an interface to skip global dist computation and to use the distances estimated by complementary methods.

Find gliph3 difference supporting dataset for vignette

Single-cell RNA sequencing coupled to TCR
profiling of large granular lymphocyte leukemia
T cells

Data with HLA included

man files (e.g. using roxygen)

Check gliphR 1 algorithm

Find out were the noise/scattering comes from in OvE values when identical input is put in (possibly related to sample replacement)

NAMESPACE file

@kaozkai Which functions should be exported by ClustIRR?

Currently, we export:

export(cluster_irr)
export(get_graph)
export(get_edges)
export(plot_graph)

I would guess we need cluster_irr, get_graph and plot_graph.
Do we really need to have get_edges visible to the user?

add comparison plots
add section with minimal graphical output

Add additional features

Update gliphR with additional features after final release

Future ToDo List, for example scoring

clustering algorithm in gliphR -> Bioconductor package

General procedure:

make gliphR Bioconductor conform (test, input checks, vignettes, document, ...)
submit clustering algorithm to Bioconductor
update gliphR with scoring (quantification part)

Bioconductor changes

Incorporate all changes once the package is submitted

Future ToDo List

New input checks

In function R/input_checks we perform parameter checks.

We can borrow Jan's checks, but rename the parameters.

Additional checks have to be done for missing parameter
values, NAs, or NULLs, etc.

package size: Package tarball exceeds the Bioconductor size requirement. is: 6.2mb should be: <5mb (added hs_CD4_ref.RData to build ignore for now)
valid maintainer: Remove Maintainer field. Use Authors@R [cre] designation.
vignette directory: No 'vignettes' directory. (created dummy vignette)
man page documentation: At least 80% of man pages documenting exported objects must have runnable examples. (added minimal example, to be expanded with meaningful one)

Warnings

DESCRIPTION/NAMESPACE consistency: Import methods in NAMESPACE as well as DESCRIPTION.
coding practice: Avoid T/F variables; If logical, use TRUE/FALSE

Notes

Registration

support site registration: Unable to connect to support site: Peer certificate cannot be authenticated with given CA certificates: [support.bioconductor.org] SSL certificate problem: certificate has expired

snaketron / clustirr Goto Github PK

clustirr's People

Contributors

Stargazers

Watchers

clustirr's Issues

Recommend Projects

Recommend Topics

Recommend Org