tjs23 / nuc_processing Goto Github PK

Chromatin contact paired-read single-cell Hi-C processing module for Nuc3D and NucTools

License: GNU Lesser General Public License v3.0

Python 99.48% Batchfile 0.09% Shell 0.43%

nuc_processing's Issues

nun for hybrid genome

I try to process single cell hic data as following:
nuc_processing/nuc_process -i inbc_CGTCTCGT_oubc_CTGTCATT/inbc_CGTCTCGT_oubc_CTGTCATT.r?.fastq -g genome/B6 -g2 genome/Cast -re1 MboI -s 150-1000 -n 8 -f genome/bowtie2_B6/genome.fa -f2 genome/bowtie2_mask_CAST/genome_mask_CAST.fa -v -a -k -c 2

But it shows error.
nuc_process: error: unrecognized arguments: Cast genome/bowtie2_mask_CAST/genome_mask_CAST.fa

Can you please help with this?

Thanks

Gang

confused with"The number of the ambiguity group to which the paired reads belong"

The format of my interaction data is hic file. I am confused with"The number of the ambiguity group to which the paired reads belong". How to calculate the number. Can I get the nunber from my hic file.
Looking forward for your help.

TypeError when creating restriction enzyme fragment file

Using the release_1.0 branch: when I attempt to run nuc_process and create the restriction enzyme track, it fails with the following error

 INFO : Creating restriction enzyme fragment file /path/to/bowtieindex/RE_frag_MboI_GCA_000001405.15_GRCh38_no_alt_analysis_set.txt
    INFO : Calculating MboI fragment locations for chr10 contig chr10
Traceback (most recent call last):
  File "/path/to/.conda/envs/nuc_process/bin/nuc_process", line 11, in <module>
    sys.exit(main())
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2866, in main
    lig_junc, zip_files, sam_format, verbose)
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2581, in nuc_process
    re1_files = [check_re_frag_file(genome_index, re1, g_fastas, align_exe, num_cpu, remap=remap)]
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1594, in check_re_frag_file
    frag_data[contig] = get_chromo_re_fragments(fasta_file_objs, contig, seq.upper(), re_site, cut_pos)
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1454, in get_chromo_re_fragments
    sub_seq = frag_seq[a:b]
TypeError: slice indices must be integers or None or have an __index__ method

This happend using different assembly fasta files so I took a look at the NucProcess.py and saw that the a and b variables at line 1454 function seem to take on non-integer values. I was able to get NucProcess to run by modifying line 1434 in the NucProcessfile to step = int(mappability_length/2), forcing step to be rounded in case it takes on a non-integer value.

With this change nuc_process produces results, however I am unsure if this may cause any unwanted effects down the line. While I didn't see how this would be influenced by my input files for how I call the nucProcess function, I may be missing something.
For completeness sake I have added my nuc_process function call.

nuc_process -f /path/to/fastas/grch38_fastas/*.fa -o sample1 -v -a -k -re1 MboI -s 150-2000 -n 4 -g /path/to/bowtieindex/GCA_000001405.15_GRCh38_no_alt_analysis_set -i /path/to/fastq1 /path/to/fastq2

Parameter for data without restriction site (e.g. DNase type)

I'm currently working with single-cell data with DNAse type.
Which doesn't contain restriction enzyme.
Is there a way to run nuc_process with that data?
Perhaps a specific way to annotate the enzymes.conf file?

Thanks and hope to hear from you again.
G.V.

nuc_tools not included with git clone

when i try to clone this repo, the folder nuc_tools is not being included in version 1.3.

it has "@ 15b643c" in the folder name which is probably related to why this is happening

nuc_process for hybrid hybrid strain analysis

Can you please provide an example file of the homologous_chromos HOM_CHROMO_TSV_FILE? Or explain it a little bit.

Thanks

Gang

Getting contact map for individual chromosomes?

Hi Tim,

Currently the contact_map.svg file shows all chromosomes for a cell. Is it possible to obtain the contact matrix for individual chromosomes in .npy or related formats? using nuc_contact_map function?

Thanks,
Tarak

Getting genome wide contact map in .npy or related format?

The issue solved.

How can I solve the OverflowError ?

Dear developers,

The command line was as follows:

nuc_process -cn Chr.name.txt -re1 DpnII -qm 30 -n 128 -r 3 -o Fol007 -g GENOME_index/Fol007 -pdf report -b /home/data2/mals/anaconda3/envs/nucprocess/bin/bowtie2 -p -v -a -k -sam reads_R?.fastq.gz

the error message was as follows:

INFO: Min. contig size not specified, using 10.0% of largest: 550,000 bp
INFO: Considering 1 chromosomes/contigs
INFO: Full contact map size 6 x 6
Traceback (most recent call last):
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3808, in
main()
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3787, in main
nuc_process(fastq_path_pair, genome_index, genome_index2, re1, re2, c1, c2, sizes, min_rep,
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3538, in nuc_process
contact_map([npz_path], pdf_path, bin_size=None, bin_size2=250.0,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 2507, in contact_map
plot_contact_matrix(matrix, pair_bin_size, title, scale_label, None, pair,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 1357, in plot_contact_matrix
tick_delta, nminor = _get_tick_delta(b, bin_size/unit)
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 823, in _get_tick_delta
sf = int(floor(np.log10(tick_delta_units)))
OverflowError: cannot convert float infinity to integer

No module named 'NucProcess' when running nuc_sequence_names

Help

Hello!
I want to convert my hic contact data to NCC format,but I don't know how to get the line "The number of the ambiguity group to which the paired reads belong" and "Whether read pairs are swapped relative to original FASTQ files".Can you help me ? Thank you

tjs23 / nuc_processing Goto Github PK

nuc_processing's Issues

nun for hybrid genome

confused with"The number of the ambiguity group to which the paired reads belong"

TypeError when creating restriction enzyme fragment file

Parameter for data without restriction site (e.g. DNase type)

nuc_tools not included with git clone

nuc_process for hybrid hybrid strain analysis

Getting contact map for individual chromosomes?

Getting genome wide contact map in .npy or related format?

How can I solve the OverflowError ?

No module named 'NucProcess' when running nuc_sequence_names

Help

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent