beatsonlab-microbialgenomics / micropipe Goto Github PK
View Code? Open in Web Editor NEWA pipeline for high-quality bacterial genome construction using ONT sequencing
License: GNU General Public License v3.0
A pipeline for high-quality bacterial genome construction using ONT sequencing
License: GNU General Public License v3.0
Dear micropipe team,
this looks like a wonderfull tool for me. Sorry this might be a newbie nextflow question. Running sample data of micropipe like:
nextflow main.nf --samplesheet test_data/samples_1.csv --outdir micropipetest/ --gpu
I receive error NOTE: Process assembly:porechop (S24) terminated with an error exit status (255) -- Error is ignored
Where can I find log information what went wrong?
System information
Further output of nextflow
WARN: DSL 2 PREVIEW MODE IS DEPRECATED - USE THE STABLE VERSION INSTEAD -- Read more at https://www.nextflow.io/docs/latest/dsl2.html#dsl2-migration-notes
executor > local (1)
[94/71c68c] process > assembly:porechop (S24) [100%] 1 of 1, failed: 1 ?
[- ] process > assembly:japsa -
[- ] process > assembly:flye -
[- ] process > assembly:racon_cpu -
[- ] process > assembly:medaka_cpu -
[- ] process > assembly:nextpolish -
[- ] process > assembly:fixstart -
[- ] process > assembly:quast -
[barcode01, /home/software/micropipe/test_data/S24EC_1P_test.fastq.gz, /home/software/micropipe/test_data/S24EC_2P_test.fastq.gz]
[barcode01, /home/software/micropipe/test_data/barcode01.fastq.gz, S24, 5.5m]
Hi,
_nextpolish.log says
INFO: Converting SIF file to temporary sandbox...
[INFO] 2022-03-01 02:45:24,637 start...
[INFO] 2022-03-01 02:45:24,637 logfile: pid2111778.log.info
[WARNING] 2022-03-01 02:45:24,637 Re-write workdir
[INFO] 2022-03-01 02:45:24,645 scheduled tasks:
[1, 2, 1, 2]
[INFO] 2022-03-01 02:45:24,645 options:
[INFO] 2022-03-01 02:45:24,645 {'polish_options': ' -p 40', 'rewrite': 1, 'job_prefix': 'nextPolish', 'job_type': 'local', 'cluster_options': '', 'snp_valid': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_valid', 'kmer_count': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.kmer_count', 'sgs_max_depth': '100', 'align_threads': '40', 'sgs_block_size': 91759816L, 'lgs_max_read_len': '150k', 'parallel_jobs': '6', 'multithread_jobs': '40', 'snp_phase': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_phase', 'genome': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/consensus.fasta', 'genome_size': 5505589L, 'workdir': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204', 'cleantmp': 0, 'sgs_align_options': 'bwa mem -p -t 40', 'sgs_unpaired': '0', 'sgs_fofn': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn', 'lgs_polish': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.lgs_polish', 'sgs_use_duplicate_reads': 0, 'score_chain': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.score_chain', 'task': [1, 2, 1, 2], 'lgs_max_depth': '60', 'lgs_block_size': '500M', 'lgs_minimap2_options': '-x map-ont', 'rerun': 3, 'lgs_min_read_len': '1k'}
[INFO] 2022-03-01 02:45:24,645 step 0 and task 1 start:
[INFO] 2022-03-01 02:45:24,646 analysis tasks done
[INFO] 2022-03-01 02:45:24,647 total jobs: 3
[INFO] 2022-03-01 02:45:24,648 Throw jobID:[2111788] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh] in the local_cycle.
[INFO] 2022-03-01 02:45:25,149 Throw jobID:[2111845] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split1/nextPolish.sh] in the local_cycle.
[INFO] 2022-03-01 02:45:25,651 Throw jobID:[2112009] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split2/nextPolish.sh] in the local_cycle.
[ERROR] 2022-03-01 02:45:27,799 db_split failed: please check the following logs:
[ERROR] 2022-03-01 02:45:27,799 /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e
cat: '01.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part*.fasta': No such file or directory
cat: '03.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part*.fasta': No such file or directory
/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/.command.sh: line 12: //: Is a directory
And the log /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e says
hostname
cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0
cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0
time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn
time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn
Error! /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/110712RA1944_S13_L001_R1_001.fastq.gz does not exist!Command exited with non-zero status 1
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 4164maxresident)k
0inputs+0outputs (0major+132minor)pagefaults 0swaps
/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91
However, file 110712RA1944_S13_L001_R1_001.fastq.gz does exist. Its basically a link to the raw data.
Thanks
So I have been trying to run micropipe. And I always fail in the demultiplexing step with my own data (there is always successful basecalling and guppy is definitely being located as I made appropriate changes in config file). So I decided to run with the example samplesheet and that is failing from the very start. Like there isn't successful basecalling or anything. I get the following error. I tried running this to get a better understanding of a working samplesheet but this is not helpful.
[- ] process > basecalling_demultiplexing_... -
[- ] process > pycoqc -
[- ] process > assembly:porechop -
[- ] process > assembly:japsa -
[- ] process > assembly:flye -
[- ] process > assembly:racon_cpu -
[- ] process > assembly:medaka_cpu -
[- ] process > assembly:nextpolish -
[- ] process > assembly:fixstart -
[- ] process > assembly:quast -
[barcode01, S24, 5.5m]
[barcode01, /scicomp/home-pure/suj7/test_data/S24EC_1P_test.fastq.gz, /scicomp/home-pure/suj7/test_data/S24EC_2P_test.fastq.gz]
No such file: /scicomp/home-pure/suj7/false
I am trying to run micropipe assembly-only. This is my sample sheet:
(base) [suj7@login02 ~]$ head sample0.txt
barcode_id,sample_id,long_fastq,genome_size
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_2.fastq,5m
I have attached an error file here
myerror.txt
I know I used nano-hq instead of the original nano-raw, but it doesn't work any better with nano-raw or nano-corr.
Hi,
I was wondering what a working sample sheet would be. I have no Illumina files, and I wish to start from the FAST5 input. I added guppy to the config file and basecalling finishes successfully, but the demultiplexing step is failing to identify the barcodes and keeps failing. I ran guppy_barcoder outside of micropipe and it identified the barcodes.So this means it's something in the sample sheet that is causing the issues.
I ran the pipeline from assembly-only at first. I originally assumed the long_fastq would be the already demultiplexed fastq files when running starting from assembly, but I keep getting errors. The following was the sample sheet I have been using:
barcode_id,sample_id,long_fastq,genome_size
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_2.fastq,5m
barcode17,barcode17,demux_guppy_fastq/barcode17/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode17,barcode17,demux_guppy_fastq/barcode17/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode20,barcode20,demux_guppy_fastq/barcode20/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode27,barcode27,demux_guppy_fastq/barcode27/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode46,barcode46,demux_guppy_fastq/barcode46/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode55,barcode55,demux_guppy_fastq/barcode55/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode58,barcode58,demux_guppy_fastq/barcode58/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode60,barcode60,demux_guppy_fastq/barcode60/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode64,barcode64,demux_guppy_fastq/barcode64/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode67,barcode67,demux_guppy_fastq/barcode67/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode72,barcode72,demux_guppy_fastq/barcode72/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode79,barcode79,demux_guppy_fastq/barcode79/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
Dear micropipe team
While micorpipe with ONT data alone works fine, we have problems with combining Illumina reads. The problem is that the file sample1/4_polishing_short_reads/04RR0090_flye_polishedLR_SR.fasta is empty. Therefore, quast throws an error. sample1_flye_polishedLR_SR_fixstart.log says „db_split failed:“ and wants me to check /home/software/micropipe/work/ce/cf7d05849bd4f81c067beb16e92367/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e . However, the folder 00.score_chain does not exist.
This is the call
nextflow main.nf --basecalling --demultiplexing --gpu --samplesheet /data/samplesheet.csv --fast5 /data/20210525_0850_MN32008_FAQ18836_9cbcdc36/fast5/ --datadir /home/testmicropipemalle/illumina/ --outdir /home/testmicropipemallei nextflow
No changes regarding nextpolish were applied in nextflow.config so it should use this container: docker://pvstodghill/nextpolish:1.1.0__2020-05-12 Accordingly nextpolish_version.txt says v.1.1.0
Installing NextPolish v 1.3.1 from its GIT page by hand and running the sample data worked fine.
Thanks
I am working with a student who is having this issue with their execution of nextflow
executor > local (3)
[30/a60146] process > assembly:porechop (H37Rv.1) [100%] 1 of 1 ✔
[24/79a658] process > assembly:japsa (H37Rv.1) [100%] 1 of 1 ✔
[dd/860979] process > assembly:flye (H37Rv.1) [ 0%] 0 of 1
[- ] process > assembly:racon_cpu -
[- ] process > assembly:medaka_cpu -
[- ] process > assembly:nextpolish -
[- ] process > assembly:fixstart -
[- ] process > assembly:quast -
Error executing process > 'assembly:flye (H37Rv.1)'
Caused by:
Missing output file(s) `assembly.fasta` expected by process `assembly:flye (H37Rv.1)`
Command executed:
set +eu
flye --nano-raw filtered.fastq.gz --genome-size 5.0m --threads 4 --out-dir $PWD --plasmids
flye -v 2> flye_version.txt
Command exit status:
0
Command output:
(empty)
Command error:
WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
[2022-07-22 17:41:29] INFO: Starting Flye 2.5-release
[2022-07-22 17:41:29] INFO: >>>STAGE: configure
[2022-07-22 17:41:29] INFO: Configuring run
[2022-07-22 17:43:47] INFO: Total read length: 5089510998
[2022-07-22 17:43:47] INFO: Input genome size: 5000000
[2022-07-22 17:43:47] INFO: Estimated coverage: 1017
[2022-07-22 17:43:47] WARNING: Expected read coverage is 1017, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
[2022-07-22 17:43:47] INFO: Reads N50/N90: 9733 / 2679
[2022-07-22 17:43:47] INFO: Minimum overlap set to 3000
[2022-07-22 17:43:47] INFO: Selected k-mer size: 15
[2022-07-22 17:43:47] INFO: >>>STAGE: assembly
[2022-07-22 17:43:47] INFO: Assembling disjointigs
[2022-07-22 17:43:47] INFO: Reading sequences
[2022-07-22 17:45:15] INFO: Generating solid k-mer index
[2022-07-22 17:45:32] INFO: Counting k-mers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-07-22 17:48:26] INFO: Counting k-mers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-07-22 17:54:34] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-07-22 18:05:38] INFO: Extending reads
[2022-07-22 18:24:23] INFO: Overlap-based coverage: 868
[2022-07-22 18:24:23] INFO: Median overlap divergence: 0.0852075
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-07-24 03:32:08] INFO: Assembled 0 disjointigs
[2022-07-24 03:32:08] INFO: Generating sequence
[2022-07-24 03:32:09] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
Work dir:
/projectsp/alland/PanGenome_Project/ReviewerResponses/testing_pipelines/work/dd/8609795cae4b8d69393b8e7daee1bf
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Looking for some guidance on how to proceed.
Best,
Paul
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.