Giter Club home page Giter Club logo

clubcpg's People

Contributors

canthonyscott avatar cjgunase avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

clubcpg's Issues

CluBCpG for RRBS data?

Is your feature request related to a problem? Please describe.
Correct me if I am mistaken, but CluBCpG will probably not work optimally on RRBS data because of the clubcpg-coverage --bin_size parameter? n_reads = Number of reads which fully cover all CpGs within the bin

Describe the solution you'd like
I don't know for sure, but a solution could be that instead of the bin_size parameter a bin-file (per chromosome) could be passed with the expected RRBS-bins (depending on Restriction Enzyme - in most cases MspI, ordered Illumina read length).

Describe alternatives you've considered
/

Additional context
/

ReadTheDocs auto-build failing

Something has been updated on ReadTheDocs and the auto-builds no longer complete successfully using Sphinx. This needs to be diagnosed and fixed.

clubcpg-coverage error

Describe the bug
There appears to be an issue with the chromosome entry. I am trying to calculate coverage for an organism with a genome at scaffold level of assembly.

To Reproduce
clubcpg-coverage -a P_bismark_bt2_sorted.deduplicated.bam -o ${PWD} --bin_size 100 -chr NW_018395390.1 --no_overlap False

Error message
Log file: /media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/CompleteBins.P_bismark_bt2_sorted.deduplicated.bam.NW_018395390.1.log
Traceback (most recent call last):
File "/media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/clubcpg/bin/clubcpg-coverage", line 98, in
output_file = calc.analyze_bins(chrom_of_interest)
File "/media/kevlab/projects/helicoverpa_epigenetics/exp/wgbs/analysis/20200817/barcode_analysis/clubcpg/lib/python3.6/site-packages/clubcpg/CalculateBinCoverage.py", line 148, in analyze_bins
new[individual_chrom] = chromosome_lengths[individual_chrom]
KeyError: 'NW_018395390.1'

Expected behavior
An output file containing coverage estimates across the specified bins for the chromosome/scaffold of interest.

Screenshots
If applicable, add screenshots to help explain your problem.

System specs (please complete the following information):

  • OS: Ubuntu
  • Version 18.04.4 LTS

Additional context
Add any other context about the problem here.

Invalid literal error when running clubcpg-cluster

Hello,

When I try to run
clubcpg-cluster -a ../../d/Bismark_hg38/bam/small.bam -o /mnt/f/clubCpG --bins /mnt/f/clubCpG/CompleteBins.small.bam.chr16.log

I get the following error:

Only one input bam detected. Running in single-file mode
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/dgisch/anaconda3/envs/clubCpG/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/dgisch/anaconda3/envs/clubCpG/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/dgisch/anaconda3/envs/clubCpG/lib/python3.7/site-packages/clubcpg/ClusterReads.py", line 213, in process_bins
bin_loc = int(bin_loc)
ValueError: invalid literal for int() with base 10: 'size=100'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dgisch/anaconda3/envs/clubCpG/bin/clubcpg-cluster", line 147, in
cluster_reads.execute()
File "/home/dgisch/anaconda3/envs/clubCpG/lib/python3.7/site-packages/clubcpg/ClusterReads.py", line 344, in execute
results = results.get()
File "/home/dgisch/anaconda3/envs/clubCpG/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: invalid literal for int() with base 10: 'size=100'

I am running in an Ubuntu inside a Windows Subsystem Linux, with Python 3.7.

Thank you!

very large log files

Describe the bug
I am seeing some unusually large logs are being reported when I run GLIA/NEURON data.

Not able to reproduce results for SampleData

Describe the bug
I installed CluBCpG in a Conda environment on a linux server, and was able to run the test_Module.py successfully:
image

But when I apply the clubcpg-coverage command to the A_test.chr19.bam file, I get a different output (with 222 lines) from the one available on GitHub (at: https://github.com/waterlandlab/CluBCpG/tree/master/SampleData/COVERAGE/CompleteBins.A_test.chr19.bam.chr19.csv - this file has 562 lines), with some missing bins and different numbers of reads or even CpGs for some bins:
chr19_3079700,2,3
chr19_3079800,13,2
chr19_3080000,2,8
chr19_3080100,16,1
chr19_3080200,5,8
chr19_3080300,16,1
chr19_3080400,24,1
chr19_3080500,5,1
chr19_3080800,4,1
chr19_3081300,12,1

I see 2 possible explanations:

  1. Clubcpg does not interact properly with samtools in my installation. Does the test_Module evaluate this interaction?
  2. The SampleData and COVERAGE files on GitHub do not match?

Thanks in advance for your help,
PE

To Reproduce
clubcpg-coverage -a /b/home/path/CluBCpG/SampleData/A_test.chr19.bam -o /b/home/path/tests/ --bin_size 100 -chr chr19 --read1_5 0 --read1_3 0 --read2_5 0 --read2_3 0

Bug in clustering with 150-bp bins

We are using BAM files that were query name sorted, deduplicated using Picard, and then coordinate sorted and indexed using samtools. Since our data were generated using 150bp paired-end sequencing, we’re interested in using CluBCpG to analyze bins of 150bp, possibly allowing for more complex patterns. However, comparisons of results using bins of 100bp versus 150bp bins yields surprising results for 150bp. Our data have an average coverage of just below 10x, and we filter & keep bins with >= 5 reads and >= 2 cpgs per bin. In the csv generated by the clubcpg-cluster function:

  • in the case of 100-bp bins, all clusters funded contain >= 2 CpG per cluster, as expected;
  • however, in the case of 150-bp bins, about half of the clusters have only 1 CpG

Results clustering 100bp:
bin,input_label,methylation,class_label,read_number,cpg_number,cpg_pattern,class_split
chr19_3079000,EMseq_FM1_1M_reads_chr19.bam,1.0,0,7,2,1;1,EMseq_FM1_1M_reads_chr19.bam=7
chr19_3079600,EMseq_FM1_1M_reads_chr19.bam,1.0,0,7,2,1;1,EMseq_FM1_1M_reads_chr19.bam=7
chr19_3079700,EMseq_FM1_1M_reads_chr19.bam,1.0,0,5,5,1;1;1;1;1,EMseq_FM1_1M_reads_chr19.bam=5
chr19_3079800,EMseq_FM1_1M_reads_chr19.bam,0.5,0,2,2,1;0,EMseq_FM1_1M_reads_chr19.bam=2
chr19_3079800,EMseq_FM1_1M_reads_chr19.bam,1.0,1,3,2,1;1,EMseq_FM1_1M_reads_chr19.bam=3
chr19_3080000,EMseq_FM1_1M_reads_chr19.bam,1.0,0,6,8,1;1;1;1;1;1;1;1,EMseq_FM1_1M_reads_chr19.bam=6
chr19_3083700,EMseq_FM1_1M_reads_chr19.bam,1.0,0,4,8,1;1;1;1;1;1;1;1,EMseq_FM1_1M_reads_chr19.bam=4
chr19_3084300,EMseq_FM1_1M_reads_chr19.bam,0.0,0,4,2,0;0,EMseq_FM1_1M_reads_chr19.bam=4
chr19_3084300,EMseq_FM1_1M_reads_chr19.bam,1.0,1,2,2,1;1,EMseq_FM1_1M_reads_chr19.bam=2

Results clustering 150bp:
bin,input_label,methylation,class_label,read_number,cpg_number,cpg_pattern,class_split
chr19_3094650,EMseq_FM1_1M_reads_chr19.bam,1.0,1,6,2,1;1,EMseq_FM1_1M_reads_chr19.bam=6
chr19_3096000,EMseq_FM1_1M_reads_chr19.bam,1.0,0,7,1,1,EMseq_FM1_1M_reads_chr19.bam=7
chr19_3096000,EMseq_FM1_1M_reads_chr19.bam,0.0,1,4,1,0,EMseq_FM1_1M_reads_chr19.bam=4
chr19_3096750,EMseq_FM1_1M_reads_chr19.bam,1.0,0,6,2,1;1,EMseq_FM1_1M_reads_chr19.bam=6
chr19_3099150,EMseq_FM1_1M_reads_chr19.bam,1.0,0,8,2,1;1,EMseq_FM1_1M_reads_chr19.bam=8
chr19_3100500,EMseq_FM1_1M_reads_chr19.bam,1.0,0,7,1,1,EMseq_FM1_1M_reads_chr19.bam=7
chr19_3102450,EMseq_FM1_1M_reads_chr19.bam,1.0,0,4,3,1;1;1,EMseq_FM1_1M_reads_chr19.bam=4
chr19_3106200,EMseq_FM1_1M_reads_chr19.bam,1.0,0,4,2,1;1,EMseq_FM1_1M_reads_chr19.bam=4
chr19_3111150,Emseq_FM1_1M_reads_chr19.bam,1.0,0,7,1,1,Emseq_FM1_1M_reads_chr19.bam=7

You can find in the directory with the link below, the bam and the index files, the csv files files obtained after using clubcpg if you want to see more examples of this bug with 150-bp, and a short file named "To_reproduce.txt" contains the script that I used to run clubcpg.
With the bam file in the directory, it takes less than 10 min to reproduce it with clubcpg (coverage and cluster).

Bug_clubCpG

Comparative analysis for more than two libraries from two groupings?

Hello, this feature request is related to an enhancement that could enable a different type of analysis using your program. I was wondering if you had a recommendation for comparing more than two libraries in a CluBCpG analysis? I'm curious to see if this tool can be used for a pairwise comparison of experimental vs control with many samples in each group. I'm thinking it may be possible to use this output as the input or replacement for a WGCNA analysis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.