Giter Club home page Giter Club logo

hmcnc's Introduction

hmcnc - Hidden Markov Copy Number Caller

Pipeline for calling CNVs in assemblies or alignments


HMM model


Initially the required packages need to be installed. On our linux cluster the easiest package management software is Anaconda/Miniconda.

  1. Download shell script (64bit):

https://docs.conda.io/en/latest/miniconda.html#linux-installers

  1. Run script and setup channels

bash Miniconda3-latest-Linux-x86_64.sh

https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

conda config --add channels defaults

conda config --add channels bioconda

conda config --add channels conda-forge

  1. Project env - There are many ways to do this but you can set up a project specific environment with all the packages you need.

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands

Required packages

  • bedtools
  • samtools
  • snakemake
  • boost
  • R
  • gxx
  • tabix

conda create --name <proj_env> bedtools samtools snakemake boost R tabix

conda install can be used to further add packages to environment with explicit version numbers.

conda install -n <proj_env> scipy=0.15.0

Always activate the env before attempting a run

conda activate <proj_env>

You might run into a conda init error the first time so run conda init and rerun

Compiling cpp source files

You can run the snakemake based make file:

snakemake -s make.smk.py --config boost=<boost> -j 1 -p

where <boost> is the location of boost_install/include folder.

Most likely {anaconda install}/envs/{proj_env}/include.

Running program

./hmcnc

usage: hmcnc <command> [<args>]

Hidden Markov Copy Number Caller command options:

asm: Run a denovo assembly.

aln: Run a reference alignment.

Alignment

./hmcnc aln -h

usage:

./hmcnc aln --bam <input.bam> --index <ref.index> [<args>]

Run HMM caller on alignment. If available, provide repeat mask annotation (--repeatMask, -r) for the reference used to filter >80 percent repeat content calls.

./hmcnc aln

required arguments:

  • --bam BAM Bam file of Alignment, bam index file should be in same dir. (default: None)
  • --index INDEX index file of reference/assembly coordinates (default: None)

optional arguments:

  • --mq MQ Min MapQ for reads (default: 10)
  • --outdir OUTDIR Output directory (default: .)
  • --repeatMask REPEATMASK Provide reference based repeat bed file. (default: No)
  • --coverage COVERAGE Provide genome-wide coverage, if not specified, caller will calculate mean coverage per contig. (default: No)
  • --subread SUBREAD [1|0], Needs subreads filtering or not.(PacBio clr reads) (default: 0)
  • -t THREADS, --threads THREADS Threads available (default: 1)
  • --epsi EPSI epsilon parameter (default: 90)
  • --minL MINL min collapse length (default: 15000)
  • --scr SCR Scripts DIR (default: /scratch2/rdagnew/hmmnew/snakemake)

Assembly

./hmcnc asm

Same as above but without repeat mask step.

Main Output

  • coverage.bins.bed.gz (coverage in 100bp windows)
  • copy_number.tsv (copy number profile of whole genome)
  • DUPcalls.copy_number.tsv
  • DUPcalls.masked_CN.tsv (calls repeat masked)
  • DUPcalls.composite.bed (Bookended calls are merged)
  • DUPcalls.masked_CN.composite.tsv
  • {GENOME}.noclip.pdf (plot of coverage and copy number across WG)
  • DELcalls.copy_number.tsv (del calls are naturally recovered)
  • CallSummary.tsv

hmcnc's People

Contributors

redndgreen8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Forkers

chaissonlab

hmcnc's Issues

ref hg38 abstraction

masking with centromere,gap,telomere etc.. needs to be handled if not running human genome

install issue, scipy=0.15.0 not available

Hi,
I was pointed to this tool for PB HiFi data analysis

I built the env as instructed but when it comes to scipy=0.15.0 it is not found by conda (or mamba)
Any idea what I can do to fix this?
can I use the scipy version 1.7.3 proposed by mamba? (seems to work)

Also, is there a tutorial on how to use the commands? the dry command output is not very explicit on where to get the inputs!

Thanks

(hmcnc) :/opt/biotools/hmcnc/HMM$ conda list
# packages in environment at /opt/miniconda3/envs/hmcnc:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_r-mutex                  1.0.0                     mro_2  
aioeasywebdav             2.4.0           py39hf3d152e_1001    conda-forge
aiohttp                   3.8.1            py39h7f8727e_1  
aiosignal                 1.2.0              pyhd3eb1b0_0  
amply                     0.1.5              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyhd3eb1b0_0  
async-timeout             4.0.1              pyhd3eb1b0_0  
attmap                    0.13.2             pyhd8ed1ab_0    conda-forge
attrs                     21.4.0             pyhd3eb1b0_0  
bcrypt                    3.2.0            py39he8ac12f_0  
bedtools                  2.30.0               h468198e_3    bioconda
binutils_impl_linux-64    2.33.1               he6710b0_7  
binutils_linux-64         2.33.1              h9595d00_15  
blas                      1.0                    openblas  
boost                     1.73.0          py39h06a4308_11  
boto3                     1.21.32            pyhd3eb1b0_0  
botocore                  1.24.32            pyhd3eb1b0_0  
bottleneck                1.3.4            py39hce1f21e_0  
brotlipy                  0.7.0           py39h27cfd23_1003  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.18.1               h7f8727e_0  
ca-certificates           2022.4.26            h06a4308_0  
cachetools                4.2.2              pyhd3eb1b0_0  
cairo                     1.16.0               h19f5f5c_2  
certifi                   2022.5.18.1      py39h06a4308_0  
cffi                      1.15.0           py39hd667e15_1  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
coin-or-cbc               2.10.8               h3786ebc_0    conda-forge
coin-or-cgl               0.60.6               he2f9439_0    conda-forge
coin-or-clp               1.17.6               h59210d1_1    conda-forge
coin-or-osi               0.108.7              h3b589db_0    conda-forge
coin-or-utils             2.11.4               hd28eb2d_1    conda-forge
coincbc                   2.10.8            0_metapackage    conda-forge
configargparse            1.4                pyhd3eb1b0_0  
connection_pool           0.0.3              pyhd3deb0d_0    conda-forge
cryptography              37.0.1           py39h9ce1e76_0  
curl                      7.82.0               h7f8727e_0  
datrie                    0.8.2            py39h27cfd23_0  
decorator                 5.1.1              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
docutils                  0.18.1           py39h06a4308_2  
dpath                     2.0.6            py39hf3d152e_1    conda-forge
dropbox                   11.14.0          py39h06a4308_0  
filechunkio               1.8                        py_2    conda-forge
filelock                  3.6.0              pyhd3eb1b0_0  
fontconfig                2.13.1               h6c09931_0  
freetype                  2.11.0               h70c0345_0  
fribidi                   1.0.10               h7b6447c_0  
frozenlist                1.2.0            py39h7f8727e_0  
ftputil                   5.0.4              pyhd8ed1ab_0    conda-forge
gcc_impl_linux-64         7.3.0                habb00fd_1  
gcc_linux-64              7.3.0               h553295d_15  
gfortran_impl_linux-64    7.3.0                hdf63c60_1  
gfortran_linux-64         7.3.0               h553295d_15  
gitdb                     4.0.7              pyhd3eb1b0_0  
gitpython                 3.1.18             pyhd3eb1b0_1  
glib                      2.69.1               h4ff587b_1  
google-api-core           1.31.5             pyhd8ed1ab_0    conda-forge
google-api-python-client  2.49.0             pyhd8ed1ab_0    conda-forge
google-auth               1.33.0             pyhd3eb1b0_0  
google-auth-httplib2      0.1.0              pyhd8ed1ab_1    conda-forge
google-cloud-core         1.7.1              pyhd3eb1b0_0  
google-cloud-storage      1.41.0             pyhd3eb1b0_0  
google-crc32c             1.1.2            py39h27cfd23_0  
google-resumable-media    1.3.1              pyhd3eb1b0_1  
googleapis-common-protos  1.53.0           py39h06a4308_0  
graphite2                 1.3.14               h295c915_1  
grpcio                    1.42.0           py39hce63b2e_0  
gxx_impl_linux-64         7.3.0                hdf63c60_1  
gxx_linux-64              7.3.0               h553295d_15  
harfbuzz                  2.8.1                h6f93f22_0  
htslib                    1.15.1               h9753748_0    bioconda
httplib2                  0.20.4             pyhd8ed1ab_0    conda-forge
icu                       58.2                 he6710b0_3  
idna                      3.3                pyhd3eb1b0_0  
importlib-metadata        4.11.3           py39h06a4308_0  
iniconfig                 1.1.1              pyhd3eb1b0_0  
jinja2                    3.0.3              pyhd3eb1b0_0  
jmespath                  0.10.0             pyhd3eb1b0_0  
jsonschema                4.4.0            py39h06a4308_0  
jupyter_core              4.10.0           py39h06a4308_0  
krb5                      1.19.2               hac12032_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libblas                   3.9.0           13_linux64_openblas    conda-forge
libboost                  1.73.0              h3ff78a5_11  
libcblas                  3.9.0           13_linux64_openblas    conda-forge
libcrc32c                 1.1.1                he6710b0_2  
libcurl                   7.82.0               h0b77cf5_0  
libdeflate                1.10                 h7f98852_0    conda-forge
libedit                   3.1.20210910         h7f8727e_0  
libev                     4.33                 h7f8727e_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            7.5.0               ha8ba4b0_17  
libgfortran4              7.5.0               ha8ba4b0_17  
libgomp                   12.1.0              h8d9b700_16    conda-forge
liblapack                 3.9.0           13_linux64_openblas    conda-forge
liblapacke                3.9.0           13_linux64_openblas    conda-forge
libnghttp2                1.46.0               hce63b2e_0  
libopenblas               0.3.18               hf726d26_0  
libpng                    1.6.37               hbc83047_0  
libprotobuf               3.20.1               h4ff587b_0  
libsodium                 1.0.18               h7b6447c_0  
libssh2                   1.10.0               h8f2d780_0  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.0.3                h7f8727e_2  
libxcb                    1.15                 h7f8727e_0  
libxml2                   2.9.14               h74e7548_0  
libzlib                   1.2.12               h166bdaf_0    conda-forge
logmuse                   0.2.6              pyh8c360ce_0    conda-forge
lz4-c                     1.9.3                h295c915_1  
make                      4.2.1                h1bed415_1  
markupsafe                2.0.1            py39h27cfd23_0  
mro-base                  3.5.1                         3  
mro-base_impl             3.5.1                h9a62091_0  
mro-basics                3.5.1                         0  
multidict                 5.2.0            py39h7f8727e_2  
nbformat                  5.3.0            py39h06a4308_0  
ncurses                   6.3                  h7f8727e_2  
numexpr                   2.8.1            py39hecfb737_0  
numpy                     1.22.3           py39h7a5d4dd_0  
numpy-base                1.22.3           py39hb8be1f0_0  
oauth2client              4.1.3                      py_0    conda-forge
openssl                   1.1.1o               h7f8727e_0  
packaging                 21.3               pyhd3eb1b0_0  
pandas                    1.4.2            py39h295c915_0  
pango                     1.45.3               hd140c19_0  
paramiko                  2.8.1              pyhd3eb1b0_0  
pcre                      8.45                 h295c915_0  
peppy                     0.31.2             pyhd8ed1ab_2    conda-forge
pip                       21.2.4           py39h06a4308_0  
pixman                    0.40.0               h7f8727e_1  
plac                      1.3.4              pyhd3eb1b0_0  
pluggy                    1.0.0            py39h06a4308_1  
ply                       3.11             py39h06a4308_0  
prettytable               3.3.0              pyhd8ed1ab_0    conda-forge
protobuf                  3.20.1           py39h295c915_0  
psutil                    5.8.0            py39h27cfd23_1  
pulp                      2.6.0            py39hf3d152e_1    conda-forge
py                        1.11.0             pyhd3eb1b0_0  
py-boost                  1.73.0          py39ha9443f7_11  
pyasn1                    0.4.8              pyhd3eb1b0_0  
pyasn1-modules            0.2.8                      py_0  
pycparser                 2.21               pyhd3eb1b0_0  
pygments                  2.11.2             pyhd3eb1b0_0  
pynacl                    1.4.0            py39he8ac12f_1  
pyopenssl                 22.0.0             pyhd3eb1b0_0  
pyparsing                 3.0.4              pyhd3eb1b0_0  
pyrsistent                0.18.0           py39heee7806_0  
pysftp                    0.2.9              pyhd3eb1b0_1  
pysocks                   1.7.1            py39h06a4308_0  
pytest                    7.1.1            py39h06a4308_0  
python                    3.9.12               h12debd9_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-fastjsonschema     2.15.1             pyhd3eb1b0_0  
python-irodsclient        1.1.3              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd3eb1b0_0  
pyyaml                    6.0              py39h7f8727e_1  
r                         3.5.1                  mro351_0  
r-boot                    1.3_20                 mro351_0  
r-checkpoint              0.4.4                  mro351_0  
r-class                   7.3_14          mro351hd10c6a6_0  
r-cluster                 2.0.7_1         mro351hac1494b_0  
r-codetools               0.2_15          mro351hf348343_0  
r-curl                    3.2             mro351hd10c6a6_1  
r-deployrrserve           9.0.0                  mro351_0  
r-doparallel              1.0.13                 mro351_0  
r-foreach                 1.5.0                  mro351_0  
r-foreign                 0.8_70                 mro351_0  
r-iterators               1.0.10          mro351hf348343_0  
r-jsonlite                1.5             mro351hd10c6a6_0  
r-kernsmooth              2.23_15         mro351hac1494b_0  
r-lattice                 0.20_35         mro351hd10c6a6_0  
r-mass                    7.3_49                 mro351_0  
r-matrix                  1.2_14          mro351hac1494b_0  
r-mgcv                    1.8_23                 mro351_0  
r-microsoftr              3.5.0.108              mro351_0  
r-nlme                    3.1_137         mro351hac1494b_0  
r-nnet                    7.3_12          mro351hd10c6a6_0  
r-png                     0.1_7           mro351hd10c6a6_0  
r-r6                      2.2.2           mro351hf348343_0  
r-recommended             3.5.1                  mro351_0  
r-revoioq                 10.0.0                 mro351_0  
r-revomods                11.0.0                 mro351_0  
r-revoutils               11.0.0                 mro351_0  
r-revoutilsmath           11.0.0                 mro351_0  
r-rpart                   4.1_13          mro351hd10c6a6_0  
r-runit                   0.4.26                 mro351_0  
r-spatial                 7.3_11                 mro351_0  
r-survival                2.41_3                 mro351_0  
ratelimiter               1.2.0                   py_1002    conda-forge
readline                  8.1.2                h7f8727e_1  
requests                  2.27.1             pyhd3eb1b0_0  
retry                     0.9.2                      py_0    conda-forge
rsa                       4.7.2              pyhd3eb1b0_1  
s3transfer                0.5.0              pyhd3eb1b0_0  
samtools                  1.15.1               h1170115_0    bioconda
setuptools                61.2.0           py39h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
slacker                   0.14.0                     py_0    conda-forge
smart_open                5.2.1            py39h06a4308_0  
smmap                     4.0.0              pyhd3eb1b0_0  
snakemake                 7.8.1                hdfd78af_0    bioconda
snakemake-minimal         7.8.1              pyhdfd78af_0    bioconda
sqlite                    3.38.3               hc218d9a_0  
stone                     3.2.1              pyhd3eb1b0_0  
stopit                    1.1.2                      py_0    conda-forge
tabix                     1.11                 hdfd78af_0    bioconda
tabulate                  0.8.9            py39h06a4308_0  
tk                        8.6.11               h1ccaba5_1  
tomli                     1.2.2              pyhd3eb1b0_0  
toposort                  1.7                pyhd8ed1ab_0    conda-forge
traitlets                 5.1.1              pyhd3eb1b0_0  
typing-extensions         4.1.1                hd3eb1b0_0  
typing_extensions         4.1.1              pyh06a4308_0  
tzdata                    2022a                hda174b7_0  
ubiquerg                  0.6.1              pyh9f0ad1d_0    conda-forge
uritemplate               4.1.1              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.9           py39h06a4308_0  
veracitools               0.1.3                      py_0    conda-forge
wcwidth                   0.2.5              pyhd3eb1b0_0  
wheel                     0.37.1             pyhd3eb1b0_0  
wrapt                     1.13.3           py39h7f8727e_2  
xz                        5.2.5                h7f8727e_1  
yaml                      0.2.5                h7b6447c_0  
yarl                      1.5.1            py39h07f9747_0    conda-forge
yte                       1.4.0            py39hf3d152e_0    conda-forge
zipp                      3.8.0            py39h06a4308_0  
zlib                      1.2.12               h166bdaf_0    conda-forge
zstd                      1.4.9                haebb681_0  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.