giesselmann / strique Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 10.0 1.51 MB

Nanopore raw signal repeat detection pipeline

License: MIT License

CMake 5.08% Python 74.86% C++ 18.00% Dockerfile 2.06%

nanopore signal-processing

strique's People

Contributors

Stargazers

Watchers

Forkers

shulp2211 pythseq krischan sheffield-bioinformatics-core sschmeier vmscmams bharathramh bushraalayed vladofilipovic aw1231

strique's Issues

Two issues to run STRique

Hi @giesselmann ,

Glad to run your nice tool STRique. But I have two issues when running STRique on your plasmid_c9orf72 and my own data.

On plasmid_c9orf72, STRique does not finish the running after 60 hours. My command is given below

samtools view bam_results/barcode11.bam | python3 strique/0.4.0/scripts/STRique.py count strique_g4c2.fofn strique/0.4.0/models/r9_4_450bps.model strqiueg4c2.repeat_config.tsv > barcode11.strique.tsv

The repeat region I listed in strqiueg4c2.repeat_config.tsv is:

chr   begin end   name  repeat   prefix   suffix
pCRAmpBE 1111  1129  pCRAmpBEg4c2   GGGGCC   GATCCGCTCTTCCGGCC TGCGGCCGCCACCGCGG

On my own data for CAG repeat region, I found that STRique cannot find reliable repeat counts in many reads. For example, I have ~4000 long reads, but STRique output 0 repeat count for ~1800 long reads, and many of other reads also have low score_prefix/score_suffix score. My command is

samtools view htt_bam_results/barcode07.bam | python3 strique/0.4.0/scripts/STRique.py count htt.fofn strique/0.4.0/models/r9_4_450bps.model htt.repeat_config.tsv > barcode07.strique.tsv

The repeat region I listed in htt.repeat_config.tsv is:

chr   begin end   name  repeat   prefix   suffix
chr4  3074876  3074933  httCAG   CAG   AGTCCCTCAAGTCCTTC CAACAGCCGCCACCGCC

Could you please help to address the issues above? Thank you.

STRique

Thanks for the great package. I successfully used it ~6 months ago on a data set that I generated, and it worked beautifully.

I'm back in lab now generating some new data that I am trying to analyze. The index command works, but the count command does not. I'm getting an error running it on the new datasets, the old data sets, and with the STRique_test.py. I'm getting the same errors with both installed versions. The outputs are below. This is probably some issue on our end, but was hoping you might have an insight into what the issue is.
Thanks,
Thomas

The failed command trying to run an old analysis that was previously successful:
`python3 /camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py count --out test.txt --algn TST11465.sam --t 8 reads.fofn /camhpc/pkg/STRique/0.3.0/centos7/models/r9_4_450bps.model TST11465_C9.tsv
24.06.2020 16:45:52 [PID 6980] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 758, in worker
input = worker_callable(**input)

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 684, in detect
self.init_hmm()

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 645, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 403, in init
self.build_model()

File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 432, in build_model
self.bake(merge='All')

File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake

File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]

KeyError: 0`

The results from STRique_test.py (the same for both versions)

`python /camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py
EEEE

ERROR: test_Detection (main.DetectionTest)

Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

======================================================================
ERROR: test_Interpolation (main.DetectionTest)

Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

======================================================================
ERROR: test_Modification (main.DetectionTest)

Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 114, in test_Modification
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

======================================================================
ERROR: test_Normalization (main.DetectionTest)

Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 93, in test_Normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0

Ran 4 tests in 0.399s

FAILED (errors=4)`

ValueError: RepeatCounter: Target with name (21.13723577) already defined

Hey,
This error is reoccuring. Is there something I should be worried about?

Best,
ligia

05.02.2021 15:59:42 [PID 988702] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):

File "STRique.py", line 757, in worker
input = worker_callable(**input)

File "STRique.py", line 683, in detect
self.init_hmm()

File "STRique.py", line 644, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)

File "STRique.py", line 579, in add_target
raise ValueError("RepeatCounter: Target with name " + str(target_name) + " already defined.")

ValueError: RepeatCounter: Target with name 21.13723577 already defined.

command plot is not recognised

I have already used index and count commands from STRique and now I want to plot the results.

However, when I do cat D144018.striqueFilter.tsv | python3 /app/scripts/STRique.py plot --output plotFilterD144018 index_D14418.fofn

I get

Unrecognized command
usage: STRique.py <command> [<args>]
Available commands are:
   index      Index batch(es) of bulk-fast5 or tar archived single fast5
   count      Count single read repeat expansions

STRique: a nanopore raw signal repeat detection pipeline

positional arguments:
  command     Subcommand to run

optional arguments:
  -h, --help  show this help message and exit

I am running STRique on a docker container (Docker version 17.04.0-ce) and docker setup was according to the official documentation

Documentation for Fast5Masker (expansion request)

Hello,

Is it possible for you to submit some documentation/usage for fast5masker.py script, I am attempting to mask some data and similarily to your Nanopolish step, then run the raw signal through megalodon.

Thanks

Error Reading fast5

Hello,

I get error reading fast5 when trying to run your tool. Any ideas? We are using MinIT and the latest version of MinKnow (19). I think these are multi read fast5s? is this a problem?

Thanks

Matt

repeat_contig.tsv file

Hello! I am having problems with the creation of the repeat_contig.tsv file.
I still don't understand how you defined the prefix and suffix sequences. I tried to align the sequences provided by you in the repeat_contig.tsv file with the read in the c9orf72.sam file without good results. Furthermore, if I count the number of GGCCCC repetitions manually, I have a different number than the one provided by the tool.
I ask all this, since I have an error probably caused by the repeat_contig.tsv file that I built. The file generated by you with my sam file works fine.

thanks for your help

Issues related to native DNA

Hello Pay,

Just recently our group managed to sequence multiple plasmids containing 50x STRs made of tri-nucleotides. Despite of initial problems with 'config file' we managed to analyse our dataset with STRique software run on Docker platform. Results looks quite good as overall output indicated acceptable range of deviation when data visualized with whiskers-plot, moreover data looked very good after alignment and visualization with the IGV. However, the same data plotted with bar chart does not look as good as we initially thought. Question 1: is that something you would expect or we made a mistake during the analysis? Very high-amount of data generated for plasmid samples will allow us for pre- and post-filtration of data e.g. removal of extreme outliers or filtration based on prefix and suffix scores.

Nonetheless, the newest dataset generated for native DNA seems to completely fail when processed with STRique i.e. zero reads in the final output. Despite of substantial quantity of reads (>400k, Cas9-enriched) we cannot produce any significant output with STRique. Question 2: what would be your suggestions to troubleshoot it? We could shorten both prefix and suffix from 150bp down to 20-30nt, however from alignment results (SAM, minimap2) this will definitely fail once again as >95% of data is missing 5' and 3' flanking regions and our gene of interest is heavily truncated for >99% of reads. I know that STRiqe could identify methylaton patterns on the gDNA, we did not try that yet but reads in FASTA format seems to have extreme amount of errors. Question 3: do you think we may be sequencing highly/extremely-modified gDNA, which cannot be accurately basecalled or processed with STRique algorithm, have you observed something similar or heard from some other groups regarding such issues?

Kind regards
Simon

wrong repeat unit

Hi STRique developers,

We have given wrong repeat unit in the repeat region, the program still output quite some results. Is this expected? For example, the expected repeat unit is CCGG we have given it ACTG, but the program still output repeat counts. Or, there is criteria other than prefix_score and suffix_score we can use to filter out?

Thanks.

George

ImportError: cannot import name 'pyseqan' from 'STRique_lib'

Hi, I was trying to install STRique on my cluster and I ran into some problems, I was hoping they could be solved in this.

I went through all the steps in this document that has been provided on read the docs.
https://strique.readthedocs.io/en/latest/installation/src/
The installation went through without any error messages being emitted and I did get the message that it had finished processing all the dependencies of STRique.

But when I moved on to the test page to test the installation. I followed the steps on this page:
https://strique.readthedocs.io/en/latest/installation/test/

I ran into problems on the second line. namely python3 scripts/STRique_test.py

The error that I get is the following.
Traceback (most recent call last): File "scripts/STRique_test.py", line 39, in <module> import STRique File "/.mounts/labs/simpsonlab/users/schaudhary/projects/2020.11.STRDetection/STRique/scripts/STRique.py", line 49, in <module> from STRique_lib import fast5Index, pyseqan ImportError: cannot import name 'pyseqan' from 'STRique_lib' (/path/to/directory/STRique/STRique_lib/__init__.py)

Would you happen to know how to get past this?

Thank you.

Error STRique count

Hi @giesselmann ,
I think your tool is very interesting and I would like to use it.
I tried to use the Docker version of the tool, but I met some errors that I did not completely understand.
My data are as follows:
fast5_pass: directory containing 484 .fast5 files resulting from MinKnow
my_sample.bam: aligned reads by minimap2
my_config.tsv: tsv file with my own regions of interest
I am running it on a PC with Windows 10.

I firstly run the docker version typing the command:

docker run -it --mount type=bind,source=$(pwd),target=/host/users/lenovo/desktop giesselmann/strique

Then, i did indexing:

python3 app/scripts/STRique.py index --recursive host/users/lenovo/desktop/my_sample/fast5_pass > host/users/lenovo/desktop/my_sample/fast5_pass/reads.fofn

When I ran the counting step:

cat host/users/lenovo/desktop/my_sample/my_sample.bam | python3 app/scripts/STRique.py count host/users/lenovo/desktop/my_sample/fast5_pass/reads.fofn app/models/r9_4_450bps.model host/users/lenovo/desktop/my_config.tsv > host/users/lenovo/desktop/my_sample/result.tsv

I got the many times following error:

[PID 61] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):

File "/usr/local/lib/python3.6/dist-packages/STRique-0.4.2-py3.6-linux-x86_64.egg/STRique_lib/fast5Index.py", line 81, in get_raw
signal = fp[os.path.join(offset, 'Raw', s)][()]

File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper

File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper

File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 787, in getitem
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)

File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper

File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper

File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read

File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw

OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)

Followed by:

Could not retrieve bc752507-c038-4be3-bf31-93983f4a7ad6 from file host/users/lenovo/desktop/my_sample/fast5_pass/FAO49405_pass_c7cb835e_326.fast5.

And, after them, lot of:

10.11.2022 19:47:25 [PID 90] [ERROR] Detector: Error parsing alignment
b734757f-310d-4599-aa25-f8fae95a25c1 4 * 0 0 * * 0 0 TGCCTTCTAGTTTCAGTTACATCCATGCTCTATCTTCTGCTGGGATTACGGCATGACACACTTAAACATTTTCTTTATTTTTAATATGTTTCTTTCTTCTTCTTCTTCTTCTTTTTTTTTTTTTTTTTTTGTATTTTTAGTAGATATGGGTTTCACCATGTTGGCCAGGATAGTCTTGAACTCCTGACCTCAGGTGATCCACCTGCCTAGGCCTCCCAAAGTGCTGAGATTATGGGCGTGAGCTACCGCGTCCTGCCAGGAAATCCATTTTCTAAGTCTAACTTTTAAGCACTGTACCTTAATCCCTGAAGC ''('&(/)&%%'','&'++,&%$$$$%&31/0...+))3821.--(%$&*+&'((%%%%)4:9;887775,(&&%%&''(%$%,,3576---.1225570-4,022369<@ADECDB;?55:?CGCG20@>6)&$%')-3>D0///48;?5+,--0.''%&((8@?>52./+()((*,18<;<>AE>8445)'&&(+.../@A<95542352/-0778><1((()+++,54((((55:==4348846617:;=<:989/4.-,+,5))7>?7@@?><<=;54)'''''&&'**()('&$$ rl:i:141

How can I deal with such errors?
Thank you very much in advance!

No target found

Hi,

Thanks for a tool that I think could be very useful for me! Although I can't get it quite to work for my data.

I get "Not target found for.." every read in my data set, although I see in the genome browser that I have a large amount of reads mapped to my targeted repeat expansion. I use 150 bp prefix and suffix with a CAG repeat. Any idea what could be going on here?

Thanks.

will Error parsing alignment affect results?

Hi thanks again for the tool. Error parsing alignment appears and followed by unaligned reads as following:

936e3a5e-548e-482f-ae61-47e9ba16c2b0 4 * 0 0 ** 0 0 ......

Will such errors affect the results? I have this question because the estimation of the repeat number is much smaller than the real value: although the real value is only around 40 repeats, result shows 4 for all fast5 files. This is only the case when I set the flanking region in the config file to be 50bp. When I set flanking region to be 250bp as recommended in another issue, all results show 0. The reads are generated by sequencing after PCR.

repeat number 0

Hi,

I am trying to quantify repeat number in a large insertion of unknown (potentially varying) size. The alignment is very poor because this insert is not in the reference. When calling repeat number with STRique I am getting a lot of reads that have counts of 0 but when I look at the fast5 there is definitely a repeat present. Could this be a result of the poor mapping?

Count distribution:

example of read with count of 0:

STRique not providing results after 24 hours

Hi, I am using the docker version of STRique as depicted in the documentation:
docker run -it --mount type=bind,source=$(pwd),target=/host giesselmann/strique
I am creating the fast5 index with:
python3 /app/scripts/STRique.py index /host/fast5 > fast5/reads.fofn
And then, I am executing the count command as follows:
python3 /app/scripts/STRique.py count --t 8 /host/fast5/reads.fofn /app/models/r9_4_450bps.model strique_input.tsv > strique_output.tsv

However, I've stopped it after 24 hours without showing any kind of log nor printing anything in strique_output in addition to a header. I've checked the cpu activity while STRique was running, but I did not see so much activity.

Am I doing something wrong?

Kind regards,
Francisco Abad.

RuntimeError: dictionary keys changed during iteration

Hi, there,

We recently upgraded to STRique 0.4.2; but when we tried to run the program, we had the following weird error:

03.01.2022 14:00:58 [PID 46032] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.

Traceback (most recent call last):

File "/tools/STRique/0.4.2/scripts/STRique.py", line 758, in worker

input = worker_callable(**input)

File "/tools/STRique/0.4.2/scripts/STRique.py", line 684, in detect

self.__init_hmm__()

File "/tools/STRique/0.4.2/scripts/STRique.py", line 645, in init_hmm

self.repeatCounter.add_target(target_name, repeat, prefix, suffix)

File "/tools/STRique/0.4.2/scripts/STRique.py", line 567, in add_target

flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),

File "/tools/STRique/0.4.2/scripts/STRique.py", line 403, in init

self.__build_model__()

File "/tools/STRique/0.4.2/scripts/STRique.py", line 432, in build_model

self.bake(merge='All')

File "pomegranate/hmm.pyx", line 869, in pomegranate.hmm.HiddenMarkovModel.bake

File "/tools/STRique/0.4.2/lib/python3.8/site-packages/networkx/classes/reportviews.py", line 729, in

for nbr, dd in nbrs.items()

RuntimeError: dictionary keys changed during iteration

Distinguish between AAAAT and GAAAT

Hello,

We are using your tool to investigate a repeat expansion with the motif AAAAT. We assumed that there might be a mutation and it could also have the motif GAAAT. Looking at the alignment we saw that only 3% of the reads have GAAAT. When we counted the repeats with STRique using GAAAT and AAAAT as a motif in two separate config files the results were the same. Even the number of evaluated reads using GAAAT as a motif was the same, although there should be just a small fraction of reads with this motive. Is it possible that STRique doesn't distinguish between AAAAT and GAAAT?

Thank you for your help!

Best wishes,
Theresa

Dealing with concatemers

Hi,

I've noticed with things like plasmid sequencing you sometimes get concatemers, when this happens STRique often predicts massive repeats which span multiple copies of the plasmid from the left flanking in one concatemer to the right flanking in another. I've attached an example - this is two plasmids concatenated - the repeat is around 100bp but it's predicted as 941.

Any ideas on a fix? I thought about some kind of pre-processing to split the concatemers but the tools are lacking to do this (at least on the fast5 level)

Thanks!

NameError: name 'super' is not defined

Hi,

I am trying to run the test script:

cat data/c9orf72.sam | python3 scripts/STRique.py ./data ./models/template_median68pA6mer.model ./configs/repeat_config.tsv --config ./configs/STRique.json

/usr/lib/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
ID target strand count score_prefix score_suffix log_p offset ticks
Traceback (most recent call last):
File "scripts/STRique.py", line 765, in
ow.write_line(**counts)
TypeError: write_line() argument after ** must be a mapping, not NoneType
Traceback (most recent call last):
File "scripts/STRique.py", line 341, in free_bake_buffers
NameError: name 'super' is not defined
Exception ignored in: 'pomegranate.hmm.HiddenMarkovModel.dealloc'
Traceback (most recent call last):
File "scripts/STRique.py", line 341, in free_bake_buffers
NameError: name 'super' is not defined
Segmentation fault (core dumped)

Definitions of the output fields?

I didn't find definitions of the fields in the output?

ID, target, strand, seem obvious, but I'd like to know the definitions of the rest:
count (repeats, or bases?), score_prefix, score_suffix, log_p, offset, ticks, mod.

Thanks in advance.

Chris

Interpretation of results from STRique

We are doing repeat identification of (AAAAG)n/(AAGGG)n motifs (region contains complex repeat patterns of both repeats) on RFC1.
We first tried on the example data of c9orf72 which is available on the docker container to understand the pipeline workflow.

although we were able to replicate the same plot as mentioned in the paper (we got 733 repeat counts). But, we couldn't understand the number of repeats getting calculated, as we were not able to find the GGGGCC / CCCCGG tandem repeats in the read from SAM file.
And we tried our data with the reads which counted 44 repeats using STRique (plot attached), but expected to have more repeats. And we also couldn't understand the weird pattern in the repeat region.

And what are the smaller amplitude patterns denotes?

And how many reads and how much read length we have to sequence 500+ penta nucleotide repeats.. we devised our protocol around 3K purified PCR amplicons. what u suggest to get better output in STRique.

Test after installation failed

Hi, I just installed the tool STRique and I ran the test command using 'python3 scripts/STRique_test.py' and I received the following errors. Thank you.

`EEEE

ERROR: test_Detection (main.DetectionTest)

Traceback (most recent call last):
File "scripts/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration

======================================================================
ERROR: test_Interpolation (main.DetectionTest)

Traceback (most recent call last):
File "scripts/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration

======================================================================
ERROR: test_Modification (main.DetectionTest)

Traceback (most recent call last):
File "scripts/STRique_test.py", line 114, in test_Modification
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration

======================================================================
ERROR: test_Normalization (main.DetectionTest)

Traceback (most recent call last):
File "scripts/STRique_test.py", line 93, in test_Normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration

Ran 4 tests in 0.239s

FAILED (errors=4)
`

prefix/suffix in repeat_config.tsv

ModuleNotFoundError: No module named 'ont_fast5_api'

Hi,

I get the following error:
ModuleNotFoundError: No module named 'ont_fast5_api'

I have installed STRique according to the installation guide for source installation and things seemed to work just fine except I had to install pomegranate with pip as the requirements.txt failed for that one.

It seems like I do have a submodule names 'ont_fast5_api' since I have the following:
~/src/STRique/submodules/ont_fast5_api/ont_fast5_api

Can you help me out?

I also installed STRique with udocker in a different env but the creation of fofn files seems never-ending.

Installation issue?

Hi, I have got this error message when I tried two test commands.

python scripts/STRique_test.py
EEE
ERROR: test_Detection (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration

ERROR: test_Interpolation (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration

ERROR: test_normalization (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 92, in test_normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration
Ran 3 tests in 0.365s

FAILED (errors=3)

cat data/c9orf72.sam | python3 scripts/STRique.py ./data/ ./models/template_median68pA6mer.model ./configs/repeat_config.tsv
ID target strand count score_prefix score_suffix log_p offset ticks
Traceback (most recent call last):
File "scripts/STRique.py", line 764, in
counts = rd.detect(line)
File "scripts/STRique.py", line 551, in detect
self.init_hmm()
File "scripts/STRique.py", line 512, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "scripts/STRique.py", line 338, in init
self.build_model()
File "scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration

I think I was using:
Python 3.7.0
networks 2.1
numpy 1.15.1
pomgranate 0.11.0

The same thing occurred when I tested them with python 3.5.0.
I would be grateful if you could let me know what was wrong.

Best,
Satomi

Plot output using Docker

Hi Giesselmann,

I’m currently trying to get the new Plot option to work. I managed to run the command without any error massages, but with no output file. Could you maybe help or see what might be wrong?

I used this command: cat /work/sdularsen/nanopore/04284circulomic_ligationkit_181119/04284circulomic_ligationkit_181119/04284circulomic/20191118_1413_1-E7-H7_PAE17010_81d9e4ee/fastq_pass/04284circulomic_ligationkit_181119_res.test.txt | python3 /app/scripts/STRique.py plot /work/sdularsen/nanopore/04284circulomic_ligationkit_181119/04284circulomic_ligationkit_181119/04284circulomic/20191118_1413_1-E7-H7_PAE17010_81d9e4ee/fast5_pass/reads.fofn

Failed STRique test

I am having some difficulty with the installation. Following the instructions for downloading via the command line (I'm using a mac), I get a series of deprecation warnings. For example, I will get:

      'PyThread_create_key' is deprecated [-Wdeprecated-declarations]

(See this file for entire output following attempted installation:
STRique_download_output.txt)

When I try to run the test, it seems to work at first, but then ends up stopping (see this file for the output:
STRique_test_output.txt
). Do you have any ideas for what the issue could be?

Thanks!

Empty STRique output

Dear STRique developers,
I tried out your software, but got empty output a part from the header.
I created a config file with the sequences of interest and mapped my MinION reads to human reference genome hg38 with minimap2.
This is the command I used for running STRique:

STRIQUE_ROOT=/home/simone/MinION/software/STRique
STRIQUE="python3 /home/simone/MinION/software/STRique/scripts/STRique.py"
REPEAT=/home/simone/MinION/Cas9/repeat_config.tsv

$STRIQUE index --recursive $FAST5 > $INDEX
samtools view $BAM | $STRIQUE count $INDEX $STRIQUE_ROOT/models/template_median68pA6mer.model $REPEAT --t 30 > STRique_output.txt

And this is the output I got:

ID target strand count score_prefix score_suffix log_p offset ticks

The target region I am interested in has coverage depth of about 13X. Is it enough?
Is STRique able to differentiate between the two haplotypes?
All tests I performed with python3 scripts/STRique_test.py gave PASS results.
When looking in IGV at reads mapped to the target region I can roughly count the number of triplets.
Do you have any suggestions?
Thanks in advance

VBZ fast5 results in failure to read fast5

Newer versions of MinKNOW now compress fast5 files with VBZ compression which causes strique.py count to fail.

Nanopore have provided an updated hd5 plugin to handle VBZ, however I'm struggling to get this to work with STRique

https://github.com/nanoporetech/vbz_compression

I've successfully used the tool they provide in the fast5 api (compress_fast5) to convert back to gzip which fixes the issue so it's definitely the switch to VBZ compression.

Any ideas (apart from changing all fast5 to gzip)?

the strique can't stop

Dear authors,
I'm using the strique to do the repeat quantification. I found that the program can't stop. And still shows like

And I checked the CPU resource by Ubuntu system monitor, it showed there is a CPU 100% working.

Actually, I met this problem several times. Please give me some advice.

Here is my command used:

#!/bin/bash

#conda activate strique

STRIQUE_PATH="/media/amax/disk1/shared/tools/STRique"


## fofn file

FAST5_DIR=$1
FOFN_PATH=$2
BAM_FILE=$3
OUTPUT=$4

python3 $STRIQUE_PATH/scripts/STRique.py index --recursive $FAST5_DIR --out_prefix $FAST5_DIR > $FOFN_PATH

## repeat quantification


samtools view -F 2308 $BAM_FILE | python3 $STRIQUE_PATH/scripts/STRique.py count --t 12 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config > $OUTPUT

runtime is several hours

Hello, I am running STRique on a nanopore experiment with 20000 reads on a workstation with 64 threads, and the runtime is several hours. On the STRique github page it says the runtime should be a few minutes.

I was wondering, is this because you are running STRique on experiments with just a few hundred reads that you got from a Cas9/Cas12 enrichment experiment? Our experiment used PCR enrichment, so we have several thousand reads.

STRique model version R10.3

I found in an earlier issue that R10.3 would not work with the current version of STRique:
#6 (comment)_

I have a couple of questions and would appreciate your feedback:

Are there any updates regarding the model version?
We couldn't find in the STRique documentation what are the contents of the model file, such as this one:

AAAAAA	87.31411831337803	0.7271229290351257	2556
AAAAAC	83.7620420260019	1.0166215079284922	3802
AAAAAG	84.87997176980885	0.6816026090898406	1660

What do the four columns refer to?

Is this an issue that can be solved by creating a model file for R10.3?

Explanation needed for model files

Thanks for the tool!

I can understand r9_4 and 450bps are both parameters of the nanopore. What about mCpG? Also can I ask where I will be able to obtain .model files for other nanopore models? For example, we have some experiment planned for R10.

Thanks!

Paper?

Is a paper describing the general idea of the algorithm?

Error in index

Hi there,

I followed the instructions to install STRique in a separate virtual environment, and tested everything is well.

And then, I am executing the count command as follows:
python3 $script index --recursive --out_prefix ${output_folder} ${input_folder}/ > ${output_folder}/${myseq}.fofn

I got this error message as follows:

[ERROR] Failed to open /scratch/stimulated_test/C9ORF72_c1_deep_simulator_read_5x/fast5/signal_23610_2d253b4e-3df3-4f1f-8a73-0be882763c81.fast5, skip file for indexing
Traceback (most recent call last):
File "/home/bin/STRique/scripts/STRique.py", line 1030, in
main()
File "/home/bin/STRique/scripts/STRique.py", line 890, in init
getattr(self, args.command)(sys.argv[2:])
File "/home/bin/STRique/scripts/STRique.py", line 899, in index
for record in fast5Index.fast5Index.index(args.input, recursive=args.recursive, output_prefix=args.out_prefix, tmp_prefix=args.tmp_prefix):
File "/home/venv/STR/lib/python3.8/site-packages/STRique-0.4.2-py3.8-linux-x86_64.egg/STRique_lib/fast5Index.py", line 175, in index
ID = fast5Index.get_ID_single(input_file)
File "/home/venv/STR/lib/python3.8/site-packages/STRique-0.4.2-py3.8-linux-x86_64.egg/STRique_lib/fast5Index.py", line 65, in get_ID_single
return str(f5["/Raw/" + s.rpartition('/')[0]].attrs['read_id'], 'utf-8')
TypeError: decoding str is not supported

Therefore, I tried to remove 'utf-8' on line 65 in fast5Index.py as follows
return str(f5["/Raw/" + s.rpartition('/')[0]].attrs['read_id'])

Then, the problem was fixed.

Do you have some suggestions or comments on it? Do I need to remove other 'utf-8' in this file?

Thank you!

Best Regards,
Hsin

giesselmann / strique Goto Github PK

strique's People

Contributors

Stargazers

Watchers

Forkers

strique's Issues

`python /camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py EEEE

ERROR: test_Detection (main.DetectionTest)

====================================================================== ERROR: test_Interpolation (main.DetectionTest)

====================================================================== ERROR: test_Modification (main.DetectionTest)

====================================================================== ERROR: test_Normalization (main.DetectionTest)

`EEEE

ERROR: test_Detection (main.DetectionTest)

====================================================================== ERROR: test_Interpolation (main.DetectionTest)

====================================================================== ERROR: test_Modification (main.DetectionTest)

====================================================================== ERROR: test_Normalization (main.DetectionTest)

Recommend Projects

Recommend Topics

Recommend Org

`python /camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py
EEEE

======================================================================
ERROR: test_Interpolation (main.DetectionTest)

======================================================================
ERROR: test_Modification (main.DetectionTest)

======================================================================
ERROR: test_Normalization (main.DetectionTest)

======================================================================
ERROR: test_Interpolation (main.DetectionTest)

======================================================================
ERROR: test_Modification (main.DetectionTest)

======================================================================
ERROR: test_Normalization (main.DetectionTest)