Giter Club home page Giter Club logo

chromosight's Introduction

Chromosight

animated logo

PyPI version Anaconda cloud Build Status codecov Read the docs License: GPLv3 Code style: black

Python package to detect chromatin loops (and other patterns) in Hi-C contact maps.

Preprint can be found on https://www.biorxiv.org/content/10.1101/2020.03.08.981910v2

Installation

Stable version with pip:

pip3 install --user chromosight

Stable version with conda:

conda install -c bioconda -c conda-forge chromosight

or, if you want to get the latest development version:

pip3 install --user -e git+https://github.com/koszullab/chromosight.git@master#egg=chromosight

Usage

chromosight has 3 subcommands: detect, quantify and generate-config. To get the list and description of those subcommands, you can always run:

chromosight --help

Pattern detection is done using the detect subcommand. The generate-config subcommand is used to create a new type of pattern that can then be fed to detect using the --custom-kernel option. The quantify subcommand is used to compute pattern matching scores for a list of 2D coordinates on a Hi-C matrix.

Get started

To get a first look at a chromosight run, you can run chromosight test, which will download a test dataset from the github repository and run chromosight detect on it.

Important options

  • --min-dist: Minimum distance from which to detect patterns.
  • --max-dist: Maximum distance from which to detect patterns. Increasing also increases runtime and memory use.
  • --pearson: Decrease to allow a greater number of pattern detected (with potentially more false positives).
  • --perc-undetected: Proportion of empty pixels allowed in a window for detection.

Example

To detect all chromosome loops with sizes between 2kb and 200kb using 8 parallel threads:

chromosight detect --threads 8 --min-dist 20000 --max-dist 200000 hic_data.cool out_dir

Input

Input Hi-C contact maps should be in cool format. The cool format is an efficient and compact format for Hi-C data based on HDF5. It is maintained by the Mirny lab and documented here: https://mirnylab.github.io/cooler/

Most other Hi-C data formats (hic, homer, hic-pro), can be converted to cool using hicexplorer's hicConvertFormat. Bedgraph2 format can be converted directly using cooler with the command cooler load -f bg2 <chrom.sizes>:<binsize> in.bg2.gz out.cool. For more informations, see the cooler documentation

Output

Two files are generated in the output directory (replace pattern by the pattern used, e.g. loops or borders):

  • pattern_out.txt: List of genomic coordinates, bin ids and correlation scores for the pattern identified
  • pattern_out.json: JSON file containing the windows (of the same size as the kernel used) around the patterns from pattern.txt

Alternatively, one can set the --win-fmt=npy option to dump windows into a npy file instead of JSON. This format can easily be loaded into a 3D array using numpy's np.load function.

Options

Pattern exploration and detection
Explore and detect patterns (loops, borders, centromeres, etc.) in Hi-C contact
maps with pattern matching.
Usage:
    chromosight detect  [--kernel-config=FILE] [--pattern=loops]
                        [--pearson=auto] [--win-size=auto] [--iterations=auto]
                        [--win-fmt={json,npy}] [--force-norm] [--full]
                        [--subsample=no] [--inter] [--tsvd] [--smooth-trend]
                        [--n-mads=5] [--min-dist=0] [--max-dist=auto]
                        [--no-plotting] [--min-separation=auto] [--dump=DIR]
                        [--threads=1] [--perc-undetected=auto] <contact_map>
                        [<output>]
    chromosight generate-config [--preset loops] [--click contact_map]
                        [--force-norm] [--win-size=auto] [--n-mads=5]
                        [--threads=1] <prefix>
    chromosight quantify [--inter] [--pattern=loops] [--subsample=no]
                         [--win-fmt=json] [--kernel-config=FILE] [--force-norm]
                         [--threads=1] [--full] [--n-mads=5] [--win-size=auto]
                         [--no-plotting] [--tsvd] <bed2d> <contact_map> <output>
    chromosight test
  
    detect:
        performs pattern detection on a Hi-C contact map via template matching
    generate-config:
        Generate pre-filled config files to use for detect and quantify.
        A config consists of a JSON file describing parameters for the
        analysis and path pointing to kernel matrices files. Those matrices
        files are tsv files with numeric values as kernel to use for
        convolution.
    quantify:
        Given a list of pairs of positions and a contact map, computes the
        correlation coefficients between those positions and the kernel of the
        selected pattern.
    test:
        Download example data and run loop detection on it.

Contributing

All contributions are welcome. We use the numpy standard for docstrings when documenting functions.

The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use nose2 as our testing framework. Ideally, new functions should have associated unit tests, placed in the tests folder.

To test the code, you can run:

nose2 -s tests/

chromosight's People

Contributors

axelcournac avatar baudrly avatar cmdoret avatar rmontagn avatar

Watchers

 avatar  avatar

Forkers

seanchen607

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.