giesselmann / strique Goto Github PK
View Code? Open in Web Editor NEWNanopore raw signal repeat detection pipeline
License: MIT License
Nanopore raw signal repeat detection pipeline
License: MIT License
Hi @giesselmann ,
Glad to run your nice tool STRique. But I have two issues when running STRique on your plasmid_c9orf72 and my own data.
samtools view bam_results/barcode11.bam | python3 strique/0.4.0/scripts/STRique.py count strique_g4c2.fofn strique/0.4.0/models/r9_4_450bps.model strqiueg4c2.repeat_config.tsv > barcode11.strique.tsv
The repeat region I listed in strqiueg4c2.repeat_config.tsv
is:
chr begin end name repeat prefix suffix
pCRAmpBE 1111 1129 pCRAmpBEg4c2 GGGGCC GATCCGCTCTTCCGGCC TGCGGCCGCCACCGCGG
samtools view htt_bam_results/barcode07.bam | python3 strique/0.4.0/scripts/STRique.py count htt.fofn strique/0.4.0/models/r9_4_450bps.model htt.repeat_config.tsv > barcode07.strique.tsv
The repeat region I listed in htt.repeat_config.tsv
is:
chr begin end name repeat prefix suffix
chr4 3074876 3074933 httCAG CAG AGTCCCTCAAGTCCTTC CAACAGCCGCCACCGCC
Could you please help to address the issues above? Thank you.
Thanks for the great package. I successfully used it ~6 months ago on a data set that I generated, and it worked beautifully.
I'm back in lab now generating some new data that I am trying to analyze. The index command works, but the count command does not. I'm getting an error running it on the new datasets, the old data sets, and with the STRique_test.py. I'm getting the same errors with both installed versions. The outputs are below. This is probably some issue on our end, but was hoping you might have an insight into what the issue is.
Thanks,
Thomas
The failed command trying to run an old analysis that was previously successful:
`python3 /camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py count --out test.txt --algn TST11465.sam --t 8 reads.fofn /camhpc/pkg/STRique/0.3.0/centos7/models/r9_4_450bps.model TST11465_C9.tsv
24.06.2020 16:45:52 [PID 6980] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 758, in worker
input = worker_callable(**input)
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 684, in detect
self.init_hmm()
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 645, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0`
The results from STRique_test.py (the same for both versions)
Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0
Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0
Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 114, in test_Modification
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0
Traceback (most recent call last):
File "/camhpc/pkg/STRique/0.3.0/centos7/bin/STRique_test.py", line 93, in test_Normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 403, in init
self.build_model()
File "/camhpc/pkg/STRique/0.3.0/centos7/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 755, in pomegranate.hmm.HiddenMarkovModel.bake
File "/camhpc/pkg/anaconda3/2019.03/centos7/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: 0
Ran 4 tests in 0.399s
FAILED (errors=4)`
Hey,
This error is reoccuring. Is there something I should be worried about?
Best,
ligia
05.02.2021 15:59:42 [PID 988702] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):
File "STRique.py", line 757, in worker
input = worker_callable(**input)
File "STRique.py", line 683, in detect
self.init_hmm()
File "STRique.py", line 644, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "STRique.py", line 579, in add_target
raise ValueError("RepeatCounter: Target with name " + str(target_name) + " already defined.")
ValueError: RepeatCounter: Target with name 21.13723577 already defined.
I have already used index
and count
commands from STRique and now I want to plot the results.
However, when I do cat D144018.striqueFilter.tsv | python3 /app/scripts/STRique.py plot --output plotFilterD144018 index_D14418.fofn
I get
Unrecognized command
usage: STRique.py <command> [<args>]
Available commands are:
index Index batch(es) of bulk-fast5 or tar archived single fast5
count Count single read repeat expansions
STRique: a nanopore raw signal repeat detection pipeline
positional arguments:
command Subcommand to run
optional arguments:
-h, --help show this help message and exit
I am running STRique on a docker container (Docker version 17.04.0-ce
) and docker setup was according to the official documentation
Hello,
Is it possible for you to submit some documentation/usage for fast5masker.py script, I am attempting to mask some data and similarily to your Nanopolish step, then run the raw signal through megalodon.
Thanks
Hello,
I get error reading fast5 when trying to run your tool. Any ideas? We are using MinIT and the latest version of MinKnow (19). I think these are multi read fast5s? is this a problem?
Thanks
Matt
Hello! I am having problems with the creation of the repeat_contig.tsv file.
I still don't understand how you defined the prefix and suffix sequences. I tried to align the sequences provided by you in the repeat_contig.tsv file with the read in the c9orf72.sam file without good results. Furthermore, if I count the number of GGCCCC repetitions manually, I have a different number than the one provided by the tool.
I ask all this, since I have an error probably caused by the repeat_contig.tsv file that I built. The file generated by you with my sam file works fine.
thanks for your help
Hello Pay,
Just recently our group managed to sequence multiple plasmids containing 50x STRs made of tri-nucleotides. Despite of initial problems with 'config file' we managed to analyse our dataset with STRique software run on Docker platform. Results looks quite good as overall output indicated acceptable range of deviation when data visualized with whiskers-plot, moreover data looked very good after alignment and visualization with the IGV. However, the same data plotted with bar chart does not look as good as we initially thought. Question 1: is that something you would expect or we made a mistake during the analysis? Very high-amount of data generated for plasmid samples will allow us for pre- and post-filtration of data e.g. removal of extreme outliers or filtration based on prefix and suffix scores.
Nonetheless, the newest dataset generated for native DNA seems to completely fail when processed with STRique i.e. zero reads in the final output. Despite of substantial quantity of reads (>400k, Cas9-enriched) we cannot produce any significant output with STRique. Question 2: what would be your suggestions to troubleshoot it? We could shorten both prefix and suffix from 150bp down to 20-30nt, however from alignment results (SAM, minimap2) this will definitely fail once again as >95% of data is missing 5' and 3' flanking regions and our gene of interest is heavily truncated for >99% of reads. I know that STRiqe could identify methylaton patterns on the gDNA, we did not try that yet but reads in FASTA format seems to have extreme amount of errors. Question 3: do you think we may be sequencing highly/extremely-modified gDNA, which cannot be accurately basecalled or processed with STRique algorithm, have you observed something similar or heard from some other groups regarding such issues?
Kind regards
Simon
Hi STRique developers,
We have given wrong repeat unit in the repeat region, the program still output quite some results. Is this expected? For example, the expected repeat unit is CCGG we have given it ACTG, but the program still output repeat counts. Or, there is criteria other than prefix_score and suffix_score we can use to filter out?
Thanks.
George
Hi, I was trying to install STRique on my cluster and I ran into some problems, I was hoping they could be solved in this.
I went through all the steps in this document that has been provided on read the docs.
https://strique.readthedocs.io/en/latest/installation/src/
The installation went through without any error messages being emitted and I did get the message that it had finished processing all the dependencies of STRique.
But when I moved on to the test page to test the installation. I followed the steps on this page:
https://strique.readthedocs.io/en/latest/installation/test/
I ran into problems on the second line. namely python3 scripts/STRique_test.py
The error that I get is the following.
Traceback (most recent call last): File "scripts/STRique_test.py", line 39, in <module> import STRique File "/.mounts/labs/simpsonlab/users/schaudhary/projects/2020.11.STRDetection/STRique/scripts/STRique.py", line 49, in <module> from STRique_lib import fast5Index, pyseqan ImportError: cannot import name 'pyseqan' from 'STRique_lib' (/path/to/directory/STRique/STRique_lib/__init__.py)
Would you happen to know how to get past this?
Thank you.
Hi @giesselmann ,
I think your tool is very interesting and I would like to use it.
I tried to use the Docker version of the tool, but I met some errors that I did not completely understand.
My data are as follows:
fast5_pass: directory containing 484 .fast5 files resulting from MinKnow
my_sample.bam: aligned reads by minimap2
my_config.tsv: tsv file with my own regions of interest
I am running it on a PC with Windows 10.
I firstly run the docker version typing the command:
docker run -it --mount type=bind,source=$(pwd),target=/host/users/lenovo/desktop giesselmann/strique
Then, i did indexing:
python3 app/scripts/STRique.py index --recursive host/users/lenovo/desktop/my_sample/fast5_pass > host/users/lenovo/desktop/my_sample/fast5_pass/reads.fofn
When I ran the counting step:
cat host/users/lenovo/desktop/my_sample/my_sample.bam | python3 app/scripts/STRique.py count host/users/lenovo/desktop/my_sample/fast5_pass/reads.fofn app/models/r9_4_450bps.model host/users/lenovo/desktop/my_config.tsv > host/users/lenovo/desktop/my_sample/result.tsv
I got the many times following error:
[PID 61] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/STRique-0.4.2-py3.6-linux-x86_64.egg/STRique_lib/fast5Index.py", line 81, in get_raw
signal = fp[os.path.join(offset, 'Raw', s)][()]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 787, in getitem
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 192, in h5py.h5d.DatasetID.read
File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw
OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)
Followed by:
Could not retrieve bc752507-c038-4be3-bf31-93983f4a7ad6 from file host/users/lenovo/desktop/my_sample/fast5_pass/FAO49405_pass_c7cb835e_326.fast5.
And, after them, lot of:
10.11.2022 19:47:25 [PID 90] [ERROR] Detector: Error parsing alignment
b734757f-310d-4599-aa25-f8fae95a25c1 4 * 0 0 * * 0 0 TGCCTTCTAGTTTCAGTTACATCCATGCTCTATCTTCTGCTGGGATTACGGCATGACACACTTAAACATTTTCTTTATTTTTAATATGTTTCTTTCTTCTTCTTCTTCTTCTTTTTTTTTTTTTTTTTTTGTATTTTTAGTAGATATGGGTTTCACCATGTTGGCCAGGATAGTCTTGAACTCCTGACCTCAGGTGATCCACCTGCCTAGGCCTCCCAAAGTGCTGAGATTATGGGCGTGAGCTACCGCGTCCTGCCAGGAAATCCATTTTCTAAGTCTAACTTTTAAGCACTGTACCTTAATCCCTGAAGC ''('&(/)&%%'','&'++,&%$$$$%&31/0...+))3821.--(%$&*+&'((%%%%)4:9;887775,(&&%%&''(%$%,,3576---.1225570-4,022369<@ADECDB;?55:?CGCG20@>6)&$%')-3>D0///48;?5+,--0.''%&((8@?>52./+()((*,18<;<>AE>8445)'&&(+.../@A<95542352/-0778><1((()+++,54((((55:==4348846617:;=<:989/4.-,+,5))7>?7@@?><<=;54)'''''&&'**()('&$$ rl:i:141
How can I deal with such errors?
Thank you very much in advance!
Hi,
Thanks for a tool that I think could be very useful for me! Although I can't get it quite to work for my data.
I get "Not target found for.." every read in my data set, although I see in the genome browser that I have a large amount of reads mapped to my targeted repeat expansion. I use 150 bp prefix and suffix with a CAG repeat. Any idea what could be going on here?
Thanks.
Hi thanks again for the tool. Error parsing alignment appears and followed by unaligned reads as following:
936e3a5e-548e-482f-ae61-47e9ba16c2b0 4 * 0 0 ** 0 0 ......
Will such errors affect the results? I have this question because the estimation of the repeat number is much smaller than the real value: although the real value is only around 40 repeats, result shows 4 for all fast5 files. This is only the case when I set the flanking region in the config file to be 50bp. When I set flanking region to be 250bp as recommended in another issue, all results show 0. The reads are generated by sequencing after PCR.
Hi,
I am trying to quantify repeat number in a large insertion of unknown (potentially varying) size. The alignment is very poor because this insert is not in the reference. When calling repeat number with STRique I am getting a lot of reads that have counts of 0 but when I look at the fast5 there is definitely a repeat present. Could this be a result of the poor mapping?
Hi, I am using the docker version of STRique as depicted in the documentation:
docker run -it --mount type=bind,source=$(pwd),target=/host giesselmann/strique
I am creating the fast5 index with:
python3 /app/scripts/STRique.py index /host/fast5 > fast5/reads.fofn
And then, I am executing the count command as follows:
python3 /app/scripts/STRique.py count --t 8 /host/fast5/reads.fofn /app/models/r9_4_450bps.model strique_input.tsv > strique_output.tsv
However, I've stopped it after 24 hours without showing any kind of log nor printing anything in strique_output
in addition to a header. I've checked the cpu activity while STRique was running, but I did not see so much activity.
Am I doing something wrong?
Kind regards,
Francisco Abad.
Hi, there,
We recently upgraded to STRique 0.4.2; but when we tried to run the program, we had the following weird error:
03.01.2022 14:00:58 [PID 46032] [WARNING] Factory: Unexpected error in Worker, proceeding wiht remaining reads.
Traceback (most recent call last):
File "/tools/STRique/0.4.2/scripts/STRique.py", line 758, in worker
input = worker_callable(**input)
File "/tools/STRique/0.4.2/scripts/STRique.py", line 684, in detect
self.__init_hmm__()
File "/tools/STRique/0.4.2/scripts/STRique.py", line 645, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "/tools/STRique/0.4.2/scripts/STRique.py", line 567, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/tools/STRique/0.4.2/scripts/STRique.py", line 403, in init
self.__build_model__()
File "/tools/STRique/0.4.2/scripts/STRique.py", line 432, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 869, in pomegranate.hmm.HiddenMarkovModel.bake
File "/tools/STRique/0.4.2/lib/python3.8/site-packages/networkx/classes/reportviews.py", line 729, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration
Hello,
We are using your tool to investigate a repeat expansion with the motif AAAAT. We assumed that there might be a mutation and it could also have the motif GAAAT. Looking at the alignment we saw that only 3% of the reads have GAAAT. When we counted the repeats with STRique using GAAAT and AAAAT as a motif in two separate config files the results were the same. Even the number of evaluated reads using GAAAT as a motif was the same, although there should be just a small fraction of reads with this motive. Is it possible that STRique doesn't distinguish between AAAAT and GAAAT?
Thank you for your help!
Best wishes,
Theresa
Hi,
I've noticed with things like plasmid sequencing you sometimes get concatemers, when this happens STRique often predicts massive repeats which span multiple copies of the plasmid from the left flanking in one concatemer to the right flanking in another. I've attached an example - this is two plasmids concatenated - the repeat is around 100bp but it's predicted as 941.
Any ideas on a fix? I thought about some kind of pre-processing to split the concatemers but the tools are lacking to do this (at least on the fast5 level)
Thanks!
Hi,
I am trying to run the test script:
cat data/c9orf72.sam | python3 scripts/STRique.py ./data ./models/template_median68pA6mer.model ./configs/repeat_config.tsv --config ./configs/STRique.json
/usr/lib/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
ID target strand count score_prefix score_suffix log_p offset ticks
Traceback (most recent call last):
File "scripts/STRique.py", line 765, in
ow.write_line(**counts)
TypeError: write_line() argument after ** must be a mapping, not NoneType
Traceback (most recent call last):
File "scripts/STRique.py", line 341, in free_bake_buffers
NameError: name 'super' is not defined
Exception ignored in: 'pomegranate.hmm.HiddenMarkovModel.dealloc'
Traceback (most recent call last):
File "scripts/STRique.py", line 341, in free_bake_buffers
NameError: name 'super' is not defined
Segmentation fault (core dumped)
I didn't find definitions of the fields in the output?
ID, target, strand, seem obvious, but I'd like to know the definitions of the rest:
count (repeats, or bases?), score_prefix, score_suffix, log_p, offset, ticks, mod.
Thanks in advance.
Chris
We are doing repeat identification of (AAAAG)n/(AAGGG)n motifs (region contains complex repeat patterns of both repeats) on RFC1.
We first tried on the example data of c9orf72 which is available on the docker container to understand the pipeline workflow.
although we were able to replicate the same plot as mentioned in the paper (we got 733 repeat counts). But, we couldn't understand the number of repeats getting calculated, as we were not able to find the GGGGCC / CCCCGG tandem repeats in the read from SAM file.
And we tried our data with the reads which counted 44 repeats using STRique (plot attached), but expected to have more repeats. And we also couldn't understand the weird pattern in the repeat region.
And what are the smaller amplitude patterns denotes?
Hi, I just installed the tool STRique and I ran the test command using 'python3 scripts/STRique_test.py' and I received the following errors. Thank you.
Traceback (most recent call last):
File "scripts/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration
Traceback (most recent call last):
File "scripts/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration
Traceback (most recent call last):
File "scripts/STRique_test.py", line 114, in test_Modification
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration
Traceback (most recent call last):
File "scripts/STRique_test.py", line 93, in test_Normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 566, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config),
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 402, in init
self.build_model()
File "/home/ken/Kens/softwares.3/STRique/STRique/scripts/STRique.py", line 431, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 795, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/ken/Kens/softwares.3/STRique/v1/lib64/python3.8/site-packages/networkx/classes/reportviews.py", line 718, in
for nbr, dd in nbrs.items()
RuntimeError: dictionary keys changed during iteration
Ran 4 tests in 0.239s
FAILED (errors=4)
`
Hi,
I get the following error:
ModuleNotFoundError: No module named 'ont_fast5_api'
I have installed STRique according to the installation guide for source installation and things seemed to work just fine except I had to install pomegranate with pip as the requirements.txt failed for that one.
It seems like I do have a submodule names 'ont_fast5_api' since I have the following:
~/src/STRique/submodules/ont_fast5_api/ont_fast5_api
Can you help me out?
I also installed STRique with udocker in a different env but the creation of fofn files seems never-ending.
Hi, I have got this error message when I tried two test commands.
python scripts/STRique_test.py
EEE
ERROR: test_Detection (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 55, in test_Detection
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration
ERROR: test_Interpolation (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 75, in test_Interpolation
dt.add_target('fmr1', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration
ERROR: test_normalization (main.DetectionTest)
Traceback (most recent call last):
File "scripts/STRique_test.py", line 92, in test_normalization
dt.add_target('c9orf72', repeat, prefix, suffix)
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 338, in init
self.build_model()
File "/home/satomi/nanoSTRique/STRique/scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration
Ran 3 tests in 0.365s
FAILED (errors=3)
cat data/c9orf72.sam | python3 scripts/STRique.py ./data/ ./models/template_median68pA6mer.model ./configs/repeat_config.tsv
ID target strand count score_prefix score_suffix log_p offset ticks
Traceback (most recent call last):
File "scripts/STRique.py", line 764, in
counts = rd.detect(line)
File "scripts/STRique.py", line 551, in detect
self.init_hmm()
File "scripts/STRique.py", line 512, in init_hmm
self.repeatCounter.add_target(target_name, repeat, prefix, suffix)
File "scripts/STRique.py", line 439, in add_target
flankedRepeatHMM(repeat, prefix, suffix, self.pm, self.HMM_config) )
File "scripts/STRique.py", line 338, in init
self.build_model()
File "scripts/STRique.py", line 366, in build_model
self.bake(merge='All')
File "pomegranate/hmm.pyx", line 826, in pomegranate.hmm.HiddenMarkovModel.bake
File "/home/satomi/anaconda3/lib/python3.7/site-packages/networkx/classes/reportviews.py", line 666, in
return (self._report(n, nbr, dd) for n, nbrs in self._nodes_nbrs()
RuntimeError: dictionary changed size during iteration
I think I was using:
Python 3.7.0
networks 2.1
numpy 1.15.1
pomgranate 0.11.0
The same thing occurred when I tested them with python 3.5.0.
I would be grateful if you could let me know what was wrong.
Best,
Satomi
Hi Giesselmann,
Iโm currently trying to get the new Plot option to work. I managed to run the command without any error massages, but with no output file. Could you maybe help or see what might be wrong?
I used this command: cat /work/sdularsen/nanopore/04284circulomic_ligationkit_181119/04284circulomic_ligationkit_181119/04284circulomic/20191118_1413_1-E7-H7_PAE17010_81d9e4ee/fastq_pass/04284circulomic_ligationkit_181119_res.test.txt | python3 /app/scripts/STRique.py plot /work/sdularsen/nanopore/04284circulomic_ligationkit_181119/04284circulomic_ligationkit_181119/04284circulomic/20191118_1413_1-E7-H7_PAE17010_81d9e4ee/fast5_pass/reads.fofn
I am having some difficulty with the installation. Following the instructions for downloading via the command line (I'm using a mac), I get a series of deprecation warnings. For example, I will get:
'PyThread_create_key' is deprecated [-Wdeprecated-declarations]
(See this file for entire output following attempted installation:
STRique_download_output.txt)
When I try to run the test, it seems to work at first, but then ends up stopping (see this file for the output:
STRique_test_output.txt
). Do you have any ideas for what the issue could be?
Thanks!
Dear STRique developers,
I tried out your software, but got empty output a part from the header.
I created a config file with the sequences of interest and mapped my MinION reads to human reference genome hg38 with minimap2.
This is the command I used for running STRique:
STRIQUE_ROOT=/home/simone/MinION/software/STRique
STRIQUE="python3 /home/simone/MinION/software/STRique/scripts/STRique.py"
REPEAT=/home/simone/MinION/Cas9/repeat_config.tsv
$STRIQUE index --recursive $FAST5 > $INDEX
samtools view $BAM | $STRIQUE count $INDEX $STRIQUE_ROOT/models/template_median68pA6mer.model $REPEAT --t 30 > STRique_output.txt
And this is the output I got:
ID target strand count score_prefix score_suffix log_p offset ticks
The target region I am interested in has coverage depth of about 13X. Is it enough?
Is STRique able to differentiate between the two haplotypes?
All tests I performed with python3 scripts/STRique_test.py
gave PASS results.
When looking in IGV at reads mapped to the target region I can roughly count the number of triplets.
Do you have any suggestions?
Thanks in advance
Newer versions of MinKNOW now compress fast5 files with VBZ compression which causes strique.py count
to fail.
Nanopore have provided an updated hd5 plugin to handle VBZ, however I'm struggling to get this to work with STRique
https://github.com/nanoporetech/vbz_compression
I've successfully used the tool they provide in the fast5 api (compress_fast5
) to convert back to gzip which fixes the issue so it's definitely the switch to VBZ compression.
Any ideas (apart from changing all fast5 to gzip)?
Dear authors,
I'm using the strique to do the repeat quantification. I found that the program can't stop. And still shows like
And I checked the CPU resource by Ubuntu system monitor, it showed there is a CPU 100% working.
Actually, I met this problem several times. Please give me some advice.
Here is my command used:
#!/bin/bash
#conda activate strique
STRIQUE_PATH="/media/amax/disk1/shared/tools/STRique"
## fofn file
FAST5_DIR=$1
FOFN_PATH=$2
BAM_FILE=$3
OUTPUT=$4
python3 $STRIQUE_PATH/scripts/STRique.py index --recursive $FAST5_DIR --out_prefix $FAST5_DIR > $FOFN_PATH
## repeat quantification
samtools view -F 2308 $BAM_FILE | python3 $STRIQUE_PATH/scripts/STRique.py count --t 12 $FOFN_PATH $STRIQUE_PATH/models/r9_4_450bps.model strique.config > $OUTPUT
Hello, I am running STRique on a nanopore experiment with 20000 reads on a workstation with 64 threads, and the runtime is several hours. On the STRique github page it says the runtime should be a few minutes.
I was wondering, is this because you are running STRique on experiments with just a few hundred reads that you got from a Cas9/Cas12 enrichment experiment? Our experiment used PCR enrichment, so we have several thousand reads.
I found in an earlier issue that R10.3 would not work with the current version of STRique:
#6 (comment)_
I have a couple of questions and would appreciate your feedback:
AAAAAA 87.31411831337803 0.7271229290351257 2556
AAAAAC 83.7620420260019 1.0166215079284922 3802
AAAAAG 84.87997176980885 0.6816026090898406 1660
What do the four columns refer to?
Thanks for the tool!
I can understand r9_4 and 450bps are both parameters of the nanopore. What about mCpG? Also can I ask where I will be able to obtain .model files for other nanopore models? For example, we have some experiment planned for R10.
Thanks!
Is a paper describing the general idea of the algorithm?
Hi there,
I followed the instructions to install STRique in a separate virtual environment, and tested everything is well.
And then, I am executing the count command as follows:
python3 $script index --recursive --out_prefix ${output_folder} ${input_folder}/ > ${output_folder}/${myseq}.fofn
I got this error message as follows:
[ERROR] Failed to open /scratch/stimulated_test/C9ORF72_c1_deep_simulator_read_5x/fast5/signal_23610_2d253b4e-3df3-4f1f-8a73-0be882763c81.fast5, skip file for indexing
Traceback (most recent call last):
File "/home/bin/STRique/scripts/STRique.py", line 1030, in
main()
File "/home/bin/STRique/scripts/STRique.py", line 890, in init
getattr(self, args.command)(sys.argv[2:])
File "/home/bin/STRique/scripts/STRique.py", line 899, in index
for record in fast5Index.fast5Index.index(args.input, recursive=args.recursive, output_prefix=args.out_prefix, tmp_prefix=args.tmp_prefix):
File "/home/venv/STR/lib/python3.8/site-packages/STRique-0.4.2-py3.8-linux-x86_64.egg/STRique_lib/fast5Index.py", line 175, in index
ID = fast5Index.get_ID_single(input_file)
File "/home/venv/STR/lib/python3.8/site-packages/STRique-0.4.2-py3.8-linux-x86_64.egg/STRique_lib/fast5Index.py", line 65, in get_ID_single
return str(f5["/Raw/" + s.rpartition('/')[0]].attrs['read_id'], 'utf-8')
TypeError: decoding str is not supported
Therefore, I tried to remove 'utf-8' on line 65 in fast5Index.py as follows
return str(f5["/Raw/" + s.rpartition('/')[0]].attrs['read_id'])
Then, the problem was fixed.
Do you have some suggestions or comments on it? Do I need to remove other 'utf-8' in this file?
Thank you!
Best Regards,
Hsin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.