boyle-lab / f-seq2 Goto Github PK

View Code? Open in Web Editor NEW

6.0 2.0 0.0 17.17 MB

Improving the feature density based peak caller with dynamic statistics

License: GNU General Public License v3.0

Makefile 0.99% Python 91.76% C 5.82% Cython 1.43%

peak-caller dnase-seq chip-seq atac-seq kernel-density-estimation idr idr-analysis

f-seq2's People

Contributors

Stargazers

Watchers

f-seq2's Issues

"No reads contained in input file(s)."

Hi,

I'm working with fseq2 with murine data.
When I run fseq2, it works on a part of my .bed file (MT, X, Y, ....) and leaves out the "numerical" chromosomes (1 --> 18).
When I sub-select chromosomes 1 to 18, this is what it shows me:

PROGRAM: fseq2.0.3 callpeak
F-Seq Version 2.0.3
#1: Read in files and calculate parameters
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:57: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
bed['end_new'] = (bed[['end', 'strand']]).max(axis=1)
Error: No reads contained in input file(s).

Note that my file contains 12868084 lines and its format (bed) seems classic:
18 90600554 90600572 N1:1:HCGMLBGX5:3:23603:4197:5544/2 34 +
18 90600563 90600620 N1:1:HCGMLBGX5:2:13301:23722:11856/2 32 +
18 90600609 90600642 N1:1:HCGMLBGX5:3:23603:4197:5544/1 34 -
18 90600693 90600728 N1:1:HCGMLBGX5:2:13301:23722:11856/1 32 -
...

Do you have any idea what's going on?

Thanks !

peaks.narrowPeak and summits.narrowPeak is not loaded to UCSC as custom track error

@aboyle @nsamzhao

I got this error while loading to UCSC browser as custom track which seems there:

Error File 'GM12376-DEB-1000_peaks.narrowPeak.chr' - Error line 1 of custom track: thickStart after thickEnd

I have generated a narrow peaks from ATAC-seq data using the following command:

fseq2 callpeak ${bemin}/${filename}.prichr.sorted.rmDup.namesorted.bam \
         -f 0 -l 600 -t 4.0 -pe -cpus ${PPN} -nfr_upper_limit 150 -pe_fragment_size_range auto -p_thr 0.05  -standard_narrowpeak -sort_by pValue -o  fseqpeaks_21068Rsy/${filename} \
         -name ${filename}

The data is from paired-end ATAC-Seq. I got two files:

GM12376-DEB-100_peaks.narrowPeak
GM12376-DEB-100_summits.narrowPeak

On your github note, you suggested the summits file is to visualize in genome viewer such as UCSC. I see both has written in strand specific manner but without no strand information.

The summit file is one base long based on start and end (which may show the center summit of the peak region). The only thing I have added to the fseq2 output was to add chr at the start of the line (I used Ensembl as ref genome).

chr3 93470630 93470631 GM12376-DEB-1000_summit_0 2829 . 395.179 282.943 277.703 0
chr16 1782831 1782832 GM12376-DEB-1000_summit_1 1746 . 296.913 174.654 169.714 0
chr4 781944 781945 GM12376-DEB-1000_summit_2 1689 . 296.229 168.904 164.14 0
chr17 82023063 82023064 GM12376-DEB-1000_summit_3 1677 . 328.671 167.777 163.138 0
chr19 42268966 42268967 GM12376-DEB-1000_summit_4 1598 . 319.611 159.812 155.27 0
chr19 14117960 14117961 GM12376-DEB-1000_summit_5 1542 . 249.44 154.239 149.776 0
chr19 5904543 5904544 GM12376-DEB-1000_summit_6 1526 . 316.756 152.681 148.294 0
chr8 25458597 25458598 GM12376-DEB-1000_summit_7 1526 . 269.523 152.632 148.294 0
chr9 129080973 129080974 GM12376-DEB-1000_summit_8 1509 . 245.81 150.961 146.675 0
chr7 100586399 100586400 GM12376-DEB-1000_summit_9 1447 . 285.673 144.729 140.488 0

Any idea?

@demis001

If `ncuts` is 0, throws error instead of gracefully warning and exiting

F-Seq2/fseq2/fseq2.py

Line 182 in 2eba80f

cut_density = int((ncuts / size) * total_window)

Came across this when testing on some very small example files. If the source file actually yields no cuts, then calculating the cut_density in calculate threshold throws an error. It would be perhaps more helpful for the tool to recognize when an input file contains no cuts and warn the user and exit with a message.

Tagged release for pip/anaconda repo versions.

Corresponding version for tag would be 2.0.3

Tagged release for fseq2 published version.

Please add tagged release corresponding to the version used in Zhao and Boyle, 2021.

Incorrect peakcalling

Hello,

Another big problem when using f-seq2:

It calls peaks where there are no reads
It does not call peaks where it should
It only calls on half of each chromosome

Are these known problems?
When I use f-seq1, the calls are done correctly.

Half of the chromosome not processed by fseq2 :
Abnormal peaks on fseq2:

Thanks,

Kévin

Value error & invalid value in multiply statement

Hi,

I'm trying to run F-Seq2 on 3' RNA single end sequencing data to identify polyadenylation peaks. I'm running the following

fseq2 callpeak input.bam -chrom_size_file mm10.chromsizes.bed -o output_dir -name sample1 -sig_format bigwig -standard_narrowpeak -l 50 -t 8.0

However, this results in the following error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/lib64/python3.8/site-packages/fseq2/fseq2.py", line 342, in run_kde_wo_control
    result_df = call_peaks(chrom=chrom, first_cut=first_cut, kdepy_result=kdepy_result,
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/lib64/python3.8/site-packages/fseq2/fseq2.py", line 528, in call_peaks
    peak_indexes, properties = find_peaks(kdepy_result, height=min_height, distance=min_distance,
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/lib64/python3.8/site-packages/scipy/signal/_peak_finding.py", line 934, in find_peaks
    raise ValueError('`distance` must be greater or equal to 1')
ValueError: `distance` must be greater or equal to 1
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/bin/fseq2", line 440, in <module>
    main(temp_dir_name)
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/bin/fseq2", line 49, in main
    submain(args)
  File "/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/lib64/python3.8/site-packages/fseq2/callpeak_main.py", line 183, in main
    results = pool.starmap(fseq2.run_kde_wo_control, input_param_ls)
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib64/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: `distance` must be greater or equal to 1

Because the error mentioned distance I tried adding -f 150 to my call. This made the program run longer (and hence I think further). However, then I got the following error:

/home/campbell/mgeuenic/.local/share/virtualenvs/schramek-3p-analysis-XaDVbhKH/lib64/python3.8/site-packages/scipy/signal/signaltools.py:507: RuntimeWarning: invalid value encountered in multiply
  ret = ifft(sp1 * sp2, fshape, axes=axes)

Ultimately I want to estimate the fragment size from the data though, rather than setting it. Is this possible?

callpeak_idr not working

I have been unable to run callpeak_idr.

Is no longer compatible with numpy >= 1.20.0.
Even with the compatible versions it fails with an error

callpeak runs ok.

Here is a summary:

Installed version of numpy on my computer:

numpy                     1.26.4           py39h474f0d3_0    conda-forge

The call and error

fseq2 callpeak_idr \
    -o test/callpeak_idr \
    -control_file_1 ctl_rep1.bam bam ctl_rep2.bam \
    -control_file_2 ctl_rep1.bam bam ctl_rep2.bam  \
    -name_1 rep1 \
    -name_2 rep2 \
    -standard_narrowpeak \
    -v \
    -l 50 \
    -t 8 \
    -tp 4 \
    -cpus 28 \
    -pe \
    --plot rep1_rep2.IDR.png
    tf_rep1.bam \
    tf_rep2.bam \
    >callpeak_idr.out 2>callpeak_idr.err

#3: Done
-------------------------------------
Thanks for using F-seq2.0.3!

Workflow#4: idr2.0.3 with samples(1,2), oracle peak list(3)
Traceback (most recent call last):
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/bin/fseq2", line 440, in <module>
    main(temp_dir_name)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/bin/fseq2", line 54, in main
    submain(args)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/lib/python3.9/site-packages/fseq2/callpeak_idr_main.py", line 60, in main
    idr_main(args)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/lib/python3.9/site-packages/fseq2/idr_main.py", line 20, in main
    r1, r2 = build_rank_vectors(merged_peaks)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/lib/python3.9/site-packages/fseq2/idr_2_0_3/idr.py", line 302, in build_rank_vectors
    return ( numpy.array(rank1, dtype=numpy.int),
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I created a new environment and followed the conda instructions on the install page:

conda install -c bioconda -c dmentipl -c samzhao fseq2

But it didn't install fseq2, so I used pip again

pip install fseq2

Now I have this version of numpy

numpy                     1.19.5                   pypi_0    pypi

But fseq2 is a different version now because it didn't recognize this argument

fseq2: error: unrecognized arguments: -standard_narrowpeak

I removed that argument, but I get a different error:

#3: Done
-------------------------------------
Thanks for using F-seq2.0.2!

Workflow#4: idr2.0.3 with samples(1,2), oracle peak list(3)
Initial parameter values: [0.10 1.00 0.20 0.50]
Traceback (most recent call last):
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/bin/fseq2", line 428, in <module>
    main(temp_dir_name)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/bin/fseq2", line 54, in main
    submain(args)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/callpeak_idr_main.py", line 60, in main
    idr_main(args)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/idr_main.py", line 40, in main
    fix_mu=args.fix_mu, fix_sigma=args.fix_sigma)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/idr_2_0_3/idr.py", line 449, in fit_model_and_calc_local_idr
    fix_mu=fix_mu, fix_sigma=fix_sigma)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/idr_2_0_3/optimization.py", line 468, in estimate_model_params
    fix_mu=fix_mu, fix_sigma=fix_sigma)
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/idr_2_0_3/optimization.py", line 420, in EMP_with_pseudo_value_algorithm
    z1 = compute_pseudo_values(r1, theta[0], theta[1], theta[3])
  File "/home/msimenc/software/mambaforge/envs/fastq2peaks.fseq2/lib/python3.6/site-packages/fseq2/idr_2_0_3/utility.py", line 46, in py_compute_pseudo_values
    -10, 10, EPS ) )
TypeError: py_cdf_i() takes 6 positional arguments but 7 were given

idr not working, KDEpy error

fseq2 idr fails for me due to KDEpy:

$ fseq2 idr \                                                                        
    --samples \                                                                    
        WRKY47_AtDNA_rep1_S5.trimmedAligned.sortedByCoord.out.noDuplicates_peaks.narrowPeak \
        WRKY47_AtDNA_rep2_S6.trimmedAligned.sortedByCoord.out.noDuplicates_peaks.narrowPeak \
    --input-file-type narrowPeak \                                                 
    --rank q.value \                                                               
    -o  . \                                                                        
    --output-file-type narrowPeak \                                                
    --log-output-file idr.log \                                                    
    --plot idr.png \                                                               
    >idr.out \                                                                     
    2>idr.err  

Traceback (most recent call last):
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/bin/fseq2", line 4, in <module>
    __import__('pkg_resources').run_script('fseq2==2.0.3', 'fseq2')
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/lib/python3.7/site-packages/pkg_resources/__init__.py", line 656, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/lib/python3.7/site-packages/fseq2-2.0.3-py3.7-linux-x86_64.egg/EGG-INFO/scripts/fseq2", line 10, in <module>
    from fseq2.fseq2 import __version__
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/lib/python3.7/site-packages/fseq2-2.0.3-py3.7-linux-x86_64.egg/fseq2/fseq2.py", line 17, in <module>
    import KDEpy
  File "/home/msimenc/software/mambaforge/envs/fseq2.built/lib/python3.7/site-packages/KDEpy/__init__.py", line 4, in <module>
    import importlib.metadata
ModuleNotFoundError: No module named 'importlib.metadata'

Please add -V/--version option.

fseq2 callpeak_idr input

Hi, this is not a bug report but a question about usage. The inputs for callpeak_idr are a treatment + control for each replicate. I have two control replicates and two treatment replicates but my controls aren't specifically paired with a given replicate, so can I provide both controls as input for callpeak_idr using each of -control_file_1 and control_file_2? If not, can you recommend how I proceed? P.S. I ran callpeak once for each replicate inputting both controls under -control_file. Thanks!

pandas associated KeyError: 'query_value' ?

Hello, I was testing on a handful of samples and received the following KeyError. This error occurs in versions 2.0.0/2.0.1/2.0.2. Current pandas version is 1.2.3. Not sure if this is related to the chromosomes with no peaks or something else. Happy to provide any additional information that would be helpful.

$ fseq2 callpeak g2.bed -f 0 -l 600 -t 4.0 -name g2 -cpus 32 -pe -pe_fragment_size_range auto
no peaks find on chr14_GL000225v1_random
no peaks find on chr14_KI270724v1_random
no peaks find on chr22_KI270736v1_random
no peaks find on chr22_KI270738v1_random
no peaks find on chr9_KI270720v1_random
no peaks find on chrUn_GL000224v1
no peaks find on chrUn_KI270438v1
no peaks find on chrUn_KI270467v1
no peaks find on chrUn_KI270511v1
no peaks find on chrUn_KI270746v1
no peaks find on chrUn_KI270756v1
no peaks find on chrUn_KI270750v1
no peaks find on chr17_GL000205v2_random
no peaks find on chr22_KI270735v1_random
no peaks find on chr17_KI270729v1_random
no peaks find on chrUn_KI270336v1
no peaks find on chrUn_KI270337v1
no peaks find on chrUn_KI270333v1
no peaks find on chrUn_KI270507v1
no peaks find on chrUn_KI270747v1
no peaks find on chrUn_KI270539v1
no peaks find on chr4_GL000008v2_random
no peaks find on chrUn_KI270583v1
no peaks find on chrUn_KI270466v1
no peaks find on chr22_KI270732v1_random
no peaks find on chrUn_GL000216v2
Traceback (most recent call last):
  File "/home/$USER/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'query_value'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/$USER/.local/bin/fseq2", line 428, in <module>
    main(temp_dir_name)
  File "/home/$USER/.local/bin/fseq2", line 49, in main
    submain(args)
  File "/home/$USER/.local/lib/python3.8/site-packages/fseq2/callpeak_main.py", line 190, in main
    result_df = fseq2.interpolate_poisson_p_value(result_df)
  File "/home/$USER/.local/lib/python3.8/site-packages/fseq2/fseq2.py", line 864, in interpolate_poisson_p_value
    result_df['query_int'] = result_df['query_value'].astype(int)
  File "/home/$USER/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/$USER/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'query_value'

boyle-lab / f-seq2 Goto Github PK

f-seq2's People

Contributors

Stargazers

Watchers

f-seq2's Issues

"No reads contained in input file(s)."

peaks.narrowPeak and summits.narrowPeak is not loaded to UCSC as custom track error

If `ncuts` is 0, throws error instead of gracefully warning and exiting

Tagged release for pip/anaconda repo versions.

Tagged release for fseq2 published version.

Incorrect peakcalling

Value error & invalid value in multiply statement

callpeak_idr not working

idr not working, KDEpy error

Please add -V/--version option.

fseq2 callpeak_idr input

pandas associated KeyError: 'query_value' ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent