Giter Club home page Giter Club logo

nuc_processing's Issues

nun for hybrid genome

I try to process single cell hic data as following:
nuc_processing/nuc_process -i inbc_CGTCTCGT_oubc_CTGTCATT/inbc_CGTCTCGT_oubc_CTGTCATT.r?.fastq -g genome/B6 -g2 genome/Cast -re1 MboI -s 150-1000 -n 8 -f genome/bowtie2_B6/genome.fa -f2 genome/bowtie2_mask_CAST/genome_mask_CAST.fa -v -a -k -c 2

But it shows error.
nuc_process: error: unrecognized arguments: Cast genome/bowtie2_mask_CAST/genome_mask_CAST.fa

Can you please help with this?

Thanks

Gang

TypeError when creating restriction enzyme fragment file

Using the release_1.0 branch: when I attempt to run nuc_process and create the restriction enzyme track, it fails with the following error

 INFO : Creating restriction enzyme fragment file /path/to/bowtieindex/RE_frag_MboI_GCA_000001405.15_GRCh38_no_alt_analysis_set.txt
    INFO : Calculating MboI fragment locations for chr10 contig chr10
Traceback (most recent call last):
  File "/path/to/.conda/envs/nuc_process/bin/nuc_process", line 11, in <module>
    sys.exit(main())
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2866, in main
    lig_junc, zip_files, sam_format, verbose)
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 2581, in nuc_process
    re1_files = [check_re_frag_file(genome_index, re1, g_fastas, align_exe, num_cpu, remap=remap)]
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1594, in check_re_frag_file
    frag_data[contig] = get_chromo_re_fragments(fasta_file_objs, contig, seq.upper(), re_site, cut_pos)
  File "/path/to/.conda/envs/nuc_process/lib/python3.6/site-packages/nuc_processing/NucProcess.py", line 1454, in get_chromo_re_fragments
    sub_seq = frag_seq[a:b]
TypeError: slice indices must be integers or None or have an __index__ method

This happend using different assembly fasta files so I took a look at the NucProcess.py and saw that the a and b variables at line 1454 function seem to take on non-integer values. I was able to get NucProcess to run by modifying line 1434 in the NucProcessfile to step = int(mappability_length/2), forcing step to be rounded in case it takes on a non-integer value.

With this change nuc_process produces results, however I am unsure if this may cause any unwanted effects down the line. While I didn't see how this would be influenced by my input files for how I call the nucProcess function, I may be missing something.
For completeness sake I have added my nuc_process function call.

nuc_process -f /path/to/fastas/grch38_fastas/*.fa -o sample1 -v -a -k -re1 MboI -s 150-2000 -n 4 -g /path/to/bowtieindex/GCA_000001405.15_GRCh38_no_alt_analysis_set -i /path/to/fastq1 /path/to/fastq2

Parameter for data without restriction site (e.g. DNase type)

I'm currently working with single-cell data with DNAse type.
Which doesn't contain restriction enzyme.
Is there a way to run nuc_process with that data?
Perhaps a specific way to annotate the enzymes.conf file?

Thanks and hope to hear from you again.
G.V.

nuc_tools not included with git clone

when i try to clone this repo, the folder nuc_tools is not being included in version 1.3.

it has "@ 15b643c" in the folder name which is probably related to why this is happening

Getting contact map for individual chromosomes?

Hi Tim,

Currently the contact_map.svg file shows all chromosomes for a cell. Is it possible to obtain the contact matrix for individual chromosomes in .npy or related formats? using nuc_contact_map function?

Thanks,
Tarak

How can I solve the OverflowError ?

Dear developers,

The command line was as follows:

nuc_process -cn Chr.name.txt -re1 DpnII -qm 30 -n 128 -r 3 -o Fol007 -g GENOME_index/Fol007 -pdf report -b /home/data2/mals/anaconda3/envs/nucprocess/bin/bowtie2 -p -v -a -k -sam reads_R?.fastq.gz

the error message was as follows:

INFO: Min. contig size not specified, using 10.0% of largest: 550,000 bp
INFO: Considering 1 chromosomes/contigs
INFO: Full contact map size 6 x 6
Traceback (most recent call last):
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/data2/mals/anaconda3/envs/nucprocess/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3808, in
main()
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3787, in main
nuc_process(fastq_path_pair, genome_index, genome_index2, re1, re2, c1, c2, sizes, min_rep,
File "/home/data2/mals/tools/nuc_processing/hic_core/nuc_process.py", line 3538, in nuc_process
contact_map([npz_path], pdf_path, bin_size=None, bin_size2=250.0,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 2507, in contact_map
plot_contact_matrix(matrix, pair_bin_size, title, scale_label, None, pair,
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 1357, in plot_contact_matrix
tick_delta, nminor = _get_tick_delta(b, bin_size/unit)
File "/home/data2/mals/tools/nuc_processing/nuc_tools/tools/contact_map.py", line 823, in _get_tick_delta
sf = int(floor(np.log10(tick_delta_units)))
OverflowError: cannot convert float infinity to integer

Help

Hello!
I want to convert my hic contact data to NCC format,but I don't know how to get the line "The number of the ambiguity group to which the paired reads belong" and "Whether read pairs are swapped relative to original FASTQ files".Can you help me ? Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.