tjs23 / nuc_processing Goto Github PK
View Code? Open in Web Editor NEWChromatin contact paired-read single-cell Hi-C processing module for Nuc3D and NucTools
License: GNU Lesser General Public License v3.0
Chromatin contact paired-read single-cell Hi-C processing module for Nuc3D and NucTools
License: GNU Lesser General Public License v3.0
I try to process single cell hic data as following:
nuc_processing/nuc_process -i inbc_CGTCTCGT_oubc_CTGTCATT/inbc_CGTCTCGT_oubc_CTGTCATT.r?.fastq -g genome/B6 -g2 genome/Cast -re1 MboI -s 150-1000 -n 8 -f genome/bowtie2_B6/genome.fa -f2 genome/bowtie2_mask_CAST/genome_mask_CAST.fa -v -a -k -c 2
But it shows error.
nuc_process: error: unrecognized arguments: Cast genome/bowtie2_mask_CAST/genome_mask_CAST.fa
Can you please help with this?
Thanks
Gang
The format of my interaction data is hic file. I am confused with"The number of the ambiguity group to which the paired reads belong". How to calculate the number. Can I get the nunber from my hic file.
Looking forward for your help.
Using the release_1.0 branch: when I attempt to run nuc_process and create the restriction enzyme track, it fails with the following error
INFO : Creating restriction enzyme fragment file /path/to/bowtieindex/RE_frag_MboI_GCA_000001405.15_GRCh38_no_alt_analysis_set.txt
INFO : Calculating MboI fragment locations for chr10 contig chr10
Traceback (most recent call last):
File "/path/to/.conda/envs/nuc_process/bin/nuc_process", line 11, in <module>
sys.exit(main())
File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2866, in main
lig_junc, zip_files, sam_format, verbose)
File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2581, in nuc_process
re1_files = [check_re_frag_file(genome_index, re1, g_fastas, align_exe, num_cpu, remap=remap)]
File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1594, in check_re_frag_file
frag_data[contig] = get_chromo_re_fragments(fasta_file_objs, contig, seq.upper(), re_site, cut_pos)
File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1454, in get_chromo_re_fragments
sub_seq = frag_seq[a:b]
TypeError: slice indices must be integers or None or have an __index__ method
This happend using different assembly fasta files so I took a look at the NucProcess.py and saw that the a and b variables at line 1454 function seem to take on non-integer values. I was able to get NucProcess to run by modifying line 1434 in the NucProcessfile to step = int(mappability_length/2)
, forcing step
to be rounded in case it takes on a non-integer value.
With this change nuc_process produces results, however I am unsure if this may cause any unwanted effects down the line. While I didn't see how this would be influenced by my input files for how I call the nucProcess function, I may be missing something.
For completeness sake I have added my nuc_process function call.
nuc_process -f /path/to/fastas/grch38_fastas/*.fa -o sample1 -v -a -k -re1 MboI -s 150-2000 -n 4 -g /path/to/bowtieindex/GCA_000001405.15_GRCh38_no_alt_analysis_set -i /path/to/fastq1 /path/to/fastq2
I'm currently working with single-cell data with DNAse type.
Which doesn't contain restriction enzyme.
Is there a way to run nuc_process
with that data?
Perhaps a specific way to annotate the enzymes.conf
file?
Thanks and hope to hear from you again.
G.V.
when i try to clone this repo, the folder nuc_tools is not being included in version 1.3.
it has "@ 15b643c" in the folder name which is probably related to why this is happening
Can you please provide an example file of the homologous_chromos HOM_CHROMO_TSV_FILE? Or explain it a little bit.
Thanks
Gang
Hi Tim,
Currently the contact_map.svg file shows all chromosomes for a cell. Is it possible to obtain the contact matrix for individual chromosomes in .npy or related formats? using nuc_contact_map function?
Thanks,
Tarak
The issue solved.
Dear developers,
The command line was as follows:
nuc_process -cn Chr.name.txt -re1 DpnII -qm 30 -n 128 -r 3 -o Fol007 -g GENOME_index/Fol007 -pdf report -b /home/data2/mals/anaconda3/envs/nucprocess/bin/bowtie2 -p -v -a -k -sam reads_R?.fastq.gz
the error message was as follows:
INFO: Min. contig size not specified, using 10.0% of largest: 550,000 bp
INFO: Considering 1 chromosomes/contigs
INFO: Full contact map size 6 x 6
Traceback (most recent call last):
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3808, in
main()
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3787, in main
nuc_process(fastq_path_pair, genome_index, genome_index2, re1, re2, c1, c2, sizes, min_rep,
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3538, in nuc_process
contact_map([npz_path], pdf_path, bin_size=None, bin_size2=250.0,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 2507, in contact_map
plot_contact_matrix(matrix, pair_bin_size, title, scale_label, None, pair,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 1357, in plot_contact_matrix
tick_delta, nminor = _get_tick_delta(b, bin_size/unit)
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 823, in _get_tick_delta
sf = int(floor(np.log10(tick_delta_units)))
OverflowError: cannot convert float infinity to integer
Hello!
I want to convert my hic contact data to NCC format,but I don't know how to get the line "The number of the ambiguity group to which the paired reads belong" and "Whether read pairs are swapped relative to original FASTQ files".Can you help me ? Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.