Giter Club home page Giter Club logo

clustirr's People

Contributors

jwokaty avatar kaozkai avatar snaketron avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

clustirr's Issues

Bechmarking needed

gliphR vs. Jan's turboGliph vs. gliph's original versions vs. ting

Benchmarking to show
a) consistency: we should hope to find similar results given a simple sample
b) that choices in gliphR are biologically more suitable by examining clonaly expanded samples

on b) do we have an appropriate naive reference?

  • At first for us against ground truth for sanity checks
  • later on maybe included vignette in follow up version

What should be the main input (data_sample) of gliphR?

The original gliph algorithm uses as input the following:

  • minimum: vector of CDR3b sequences
  • maximum: data.frame with CDR3b + V + J (+ 3 columns for alpha chain)

To use V+J information in such a way that processes of local/global clustering are affected, the
user also has to use setup additional input parameters. It is my impression that very few users
do this, i.e. most users will provide CDR3b sequences only as input.

Hence my suggestion for gliphR:

We use as main input (parameter data_sample) a data.frame with 1 or 2 columns:

  • if 1 column -> the column has to be named CDR3a or CDR3b
  • if 2 columns -> the columns will represent CDR3a and CDR3b (order not relevant)

What do you think?

graph.R and plot_graph.Rd

In some of our earlier meetings we talked about the visualization of ClustIRR. This is a summary:

  • expand_clones parameter is not needed. This makes the algorithm computation/memory-wise more tractable
  • plot_graph.Rd -> graph.Rd; it is more convenient to name the man files the same as the scripts they describe. In my opinion, we need to work a little bit more on the text in this man page. Novice user should be able to use this page as a reference to interpret all visual symbols (shapes, colors, etc.) they encounter in the output.

If we remove expand_clones we can also simplify the vignette.

Add get_igraph() function

  • replace or complement get_edges() function with function to receive igraph compatible data object

Finish vignette

TODO:

  • finish section about edges
  • transfer left parts about version 1 and version 2 into the comparison vignette
  • add section with minimal graphical output

For global distances allow "interface" to tcrdist3

Internally global distances are computed between CDR3 sequences using hamming distance.

Well-known tools, such as tcrdist3, already exist that can compute global distances using more sophisticated. We should at least provide an interface to skip global dist computation and to use the distances estimated by complementary methods.

Check gliphR 1 algorithm

Find out were the noise/scattering comes from in OvE values when identical input is put in (possibly related to sample replacement)

NAMESPACE file

@kaozkai Which functions should be exported by ClustIRR?

Currently, we export:

export(cluster_irr)
export(get_graph)
export(get_edges)
export(plot_graph)

I would guess we need cluster_irr, get_graph and plot_graph.
Do we really need to have get_edges visible to the user?

Add additional features

Update gliphR with additional features after final release

Future ToDo List, for example scoring

New input checks

In function R/input_checks we perform parameter checks.

We can borrow Jan's checks, but rename the parameters.

Additional checks have to be done for missing parameter
values, NAs, or NULLs, etc.

BiocCheck Status

Errors

  • package size: Package tarball exceeds the Bioconductor size requirement. is: 6.2mb should be: <5mb (added hs_CD4_ref.RData to build ignore for now)
  • valid maintainer: Remove Maintainer field. Use Authors@R [cre] designation.
  • vignette directory: No 'vignettes' directory. (created dummy vignette)
  • man page documentation: At least 80% of man pages documenting exported objects must have runnable examples. (added minimal example, to be expanded with meaningful one)

Warnings

  • DESCRIPTION/NAMESPACE consistency: Import methods in NAMESPACE as well as DESCRIPTION.
  • coding practice: Avoid T/F variables; If logical, use TRUE/FALSE

Notes

  • Description: field: The Description field in the DESCRIPTION is made up by less than 3 sentences. Please consider expanding this field, and structure it as a full paragraph
  • coding practice:
    • Avoid 1:...; use seq_len() or seq_along()
    • Avoid the use of 'paste' in condition signals
  • function lengths: The recommended function length is 50 lines or less. There are 5 functions greater than 50 lines.
  • unit tests: Consider adding unit tests. We strongly encourage them. See https://contributions.bioconductor.org/tests.html
  • formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source:
  • bioc-devel mailing list subscription: Cannot determine whether maintainer is subscribed to the Bioc-Devel mailing list (requires admin credentials). Subscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel

Registration

  • support site registration: Unable to connect to support site: Peer certificate cannot be authenticated with given CA certificates: [support.bioconductor.org] SSL certificate problem: certificate has expired

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.