beatsonlab-microbialgenomics / micropipe Goto Github PK

A pipeline for high-quality bacterial genome construction using ONT sequencing

License: GNU General Public License v3.0

Python 0.01% Nextflow 0.81% Shell 0.07% Roff 0.08% HTML 99.03%

genome-assembly long-read-sequencing nanopore-analysis-pipeline bioinformatics-pipeline microbial-genomics oxford-nanopore

micropipe's Issues

assembly:porechop terminated with an error exit status (255)

Dear micropipe team,
this looks like a wonderfull tool for me. Sorry this might be a newbie nextflow question. Running sample data of micropipe like:
nextflow main.nf --samplesheet test_data/samples_1.csv --outdir micropipetest/ --gpu

I receive error NOTE: Process assembly:porechop (S24) terminated with an error exit status (255) -- Error is ignored

Where can I find log information what went wrong?

System information

CentOs 7
N E X T F L O W ~ version 21.04.3
singularity version 2.4.2-dist

Further output of nextflow

WARN: DSL 2 PREVIEW MODE IS DEPRECATED - USE THE STABLE VERSION INSTEAD -- Read more at https://www.nextflow.io/docs/latest/dsl2.html#dsl2-migration-notes
executor > local (1)
[94/71c68c] process > assembly:porechop (S24) [100%] 1 of 1, failed: 1 ?
[- ] process > assembly:japsa -
[- ] process > assembly:flye -
[- ] process > assembly:racon_cpu -
[- ] process > assembly:medaka_cpu -
[- ] process > assembly:nextpolish -
[- ] process > assembly:fixstart -
[- ] process > assembly:quast -
[barcode01, /home/software/micropipe/test_data/S24EC_1P_test.fastq.gz, /home/software/micropipe/test_data/S24EC_2P_test.fastq.gz]
[barcode01, /home/software/micropipe/test_data/barcode01.fastq.gz, S24, 5.5m]

Nextpolish db_split failed

Hi,
_nextpolish.log says

INFO: Converting SIF file to temporary sandbox...
[INFO] 2022-03-01 02:45:24,637 start...
[INFO] 2022-03-01 02:45:24,637 logfile: pid2111778.log.info
[WARNING] 2022-03-01 02:45:24,637 Re-write workdir
[INFO] 2022-03-01 02:45:24,645 scheduled tasks:
[1, 2, 1, 2]
[INFO] 2022-03-01 02:45:24,645 options:
[INFO] 2022-03-01 02:45:24,645 {'polish_options': ' -p 40', 'rewrite': 1, 'job_prefix': 'nextPolish', 'job_type': 'local', 'cluster_options': '', 'snp_valid': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_valid', 'kmer_count': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.kmer_count', 'sgs_max_depth': '100', 'align_threads': '40', 'sgs_block_size': 91759816L, 'lgs_max_read_len': '150k', 'parallel_jobs': '6', 'multithread_jobs': '40', 'snp_phase': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_phase', 'genome': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/consensus.fasta', 'genome_size': 5505589L, 'workdir': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204', 'cleantmp': 0, 'sgs_align_options': 'bwa mem -p -t 40', 'sgs_unpaired': '0', 'sgs_fofn': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn', 'lgs_polish': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.lgs_polish', 'sgs_use_duplicate_reads': 0, 'score_chain': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.score_chain', 'task': [1, 2, 1, 2], 'lgs_max_depth': '60', 'lgs_block_size': '500M', 'lgs_minimap2_options': '-x map-ont', 'rerun': 3, 'lgs_min_read_len': '1k'}
[INFO] 2022-03-01 02:45:24,645 step 0 and task 1 start:
[INFO] 2022-03-01 02:45:24,646 analysis tasks done
[INFO] 2022-03-01 02:45:24,647 total jobs: 3
[INFO] 2022-03-01 02:45:24,648 Throw jobID:[2111788] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh] in the local_cycle.
[INFO] 2022-03-01 02:45:25,149 Throw jobID:[2111845] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split1/nextPolish.sh] in the local_cycle.
[INFO] 2022-03-01 02:45:25,651 Throw jobID:[2112009] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split2/nextPolish.sh] in the local_cycle.
[ERROR] 2022-03-01 02:45:27,799 db_split failed: please check the following logs:
[ERROR] 2022-03-01 02:45:27,799 /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e
cat: '01.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part*.fasta': No such file or directory
cat: '03.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part*.fasta': No such file or directory
/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/.command.sh: line 12: //: Is a directory

And the log /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e says

hostname
cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0
cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0
time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn
time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn
Error! /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/110712RA1944_S13_L001_R1_001.fastq.gz does not exist!Command exited with non-zero status 1
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 4164maxresident)k
0inputs+0outputs (0major+132minor)pagefaults 0swaps
/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91

However, file 110712RA1944_S13_L001_R1_001.fastq.gz does exist. Its basically a link to the raw data.

Thanks

Test Data for Basecalling Not working

So I have been trying to run micropipe. And I always fail in the demultiplexing step with my own data (there is always successful basecalling and guppy is definitely being located as I made appropriate changes in config file). So I decided to run with the example samplesheet and that is failing from the very start. Like there isn't successful basecalling or anything. I get the following error. I tried running this to get a better understanding of a working samplesheet but this is not helpful.
[- ] process > basecalling_demultiplexing_... -
[- ] process > pycoqc -
[- ] process > assembly:porechop -
[- ] process > assembly:japsa -
[- ] process > assembly:flye -
[- ] process > assembly:racon_cpu -
[- ] process > assembly:medaka_cpu -
[- ] process > assembly:nextpolish -
[- ] process > assembly:fixstart -
[- ] process > assembly:quast -
[barcode01, S24, 5.5m]
[barcode01, /scicomp/home-pure/suj7/test_data/S24EC_1P_test.fastq.gz, /scicomp/home-pure/suj7/test_data/S24EC_2P_test.fastq.gz]
No such file: /scicomp/home-pure/suj7/false

Flye not creating assembly file

I am trying to run micropipe assembly-only. This is my sample sheet:
(base) [suj7@login02 ~]$ head sample0.txt
barcode_id,sample_id,long_fastq,genome_size
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_2.fastq,5m

I have attached an error file here
myerror.txt
I know I used nano-hq instead of the original nano-raw, but it doesn't work any better with nano-raw or nano-corr.

Working sample sheet

Hi,

I was wondering what a working sample sheet would be. I have no Illumina files, and I wish to start from the FAST5 input. I added guppy to the config file and basecalling finishes successfully, but the demultiplexing step is failing to identify the barcodes and keeps failing. I ran guppy_barcoder outside of micropipe and it identified the barcodes.So this means it's something in the sample sheet that is causing the issues.
I ran the pipeline from assembly-only at first. I originally assumed the long_fastq would be the already demultiplexed fastq files when running starting from assembly, but I keep getting errors. The following was the sample sheet I have been using:
barcode_id,sample_id,long_fastq,genome_size
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode13,barcode13,demux_guppy_fastq/barcode13/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode14,barcode14,demux_guppy_fastq/barcode14/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode15,barcode15,demux_guppy_fastq/barcode15/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode16,barcode16,demux_guppy_fastq/barcode16/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_2.fastq,5m
barcode17,barcode17,demux_guppy_fastq/barcode17/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode17,barcode17,demux_guppy_fastq/barcode17/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_1.fastq,5m
barcode20,barcode20,demux_guppy_fastq/barcode20/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode27,barcode27,demux_guppy_fastq/barcode27/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode46,barcode46,demux_guppy_fastq/barcode46/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode55,barcode55,demux_guppy_fastq/barcode55/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode58,barcode58,demux_guppy_fastq/barcode58/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode60,barcode60,demux_guppy_fastq/barcode60/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode64,barcode64,demux_guppy_fastq/barcode64/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode67,barcode67,demux_guppy_fastq/barcode67/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode72,barcode72,demux_guppy_fastq/barcode72/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m
barcode79,barcode79,demux_guppy_fastq/barcode79/fastq_runid_5363c81269c5ff44cc45e657b79b1135be0297cb_0.fastq,5m

Problem with NextPolish

Dear micropipe team

While micorpipe with ONT data alone works fine, we have problems with combining Illumina reads. The problem is that the file sample1/4_polishing_short_reads/04RR0090_flye_polishedLR_SR.fasta is empty. Therefore, quast throws an error. sample1_flye_polishedLR_SR_fixstart.log says „db_split failed:“ and wants me to check /home/software/micropipe/work/ce/cf7d05849bd4f81c067beb16e92367/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e . However, the folder 00.score_chain does not exist.

This is the call

nextflow main.nf --basecalling --demultiplexing --gpu --samplesheet /data/samplesheet.csv --fast5 /data/20210525_0850_MN32008_FAQ18836_9cbcdc36/fast5/ --datadir /home/testmicropipemalle/illumina/ --outdir /home/testmicropipemallei nextflow

No changes regarding nextpolish were applied in nextflow.config so it should use this container: docker://pvstodghill/nextpolish:1.1.0__2020-05-12 Accordingly nextpolish_version.txt says v.1.1.0

Installing NextPolish v 1.3.1 from its GIT page by hand and running the sample data worked fine.

Thanks

error at assembly (flye step)

I am working with a student who is having this issue with their execution of nextflow


executor >  local (3)
[30/a60146] process > assembly:porechop (H37Rv.1) [100%] 1 of 1 ✔
[24/79a658] process > assembly:japsa (H37Rv.1)    [100%] 1 of 1 ✔
[dd/860979] process > assembly:flye (H37Rv.1)     [  0%] 0 of 1
[-        ] process > assembly:racon_cpu          -
[-        ] process > assembly:medaka_cpu         -
[-        ] process > assembly:nextpolish         -
[-        ] process > assembly:fixstart           -
[-        ] process > assembly:quast              -
Error executing process > 'assembly:flye (H37Rv.1)'

Caused by:
  Missing output file(s) `assembly.fasta` expected by process `assembly:flye (H37Rv.1)`

Command executed:

  set +eu
  flye --nano-raw filtered.fastq.gz --genome-size 5.0m --threads 4 --out-dir $PWD --plasmids
  flye -v 2> flye_version.txt

Command exit status:
  0

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  [2022-07-22 17:41:29] INFO: Starting Flye 2.5-release
  [2022-07-22 17:41:29] INFO: >>>STAGE: configure
  [2022-07-22 17:41:29] INFO: Configuring run
  [2022-07-22 17:43:47] INFO: Total read length: 5089510998
  [2022-07-22 17:43:47] INFO: Input genome size: 5000000
  [2022-07-22 17:43:47] INFO: Estimated coverage: 1017
  [2022-07-22 17:43:47] WARNING: Expected read coverage is 1017, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
  [2022-07-22 17:43:47] INFO: Reads N50/N90: 9733 / 2679
  [2022-07-22 17:43:47] INFO: Minimum overlap set to 3000
  [2022-07-22 17:43:47] INFO: Selected k-mer size: 15
  [2022-07-22 17:43:47] INFO: >>>STAGE: assembly
  [2022-07-22 17:43:47] INFO: Assembling disjointigs
  [2022-07-22 17:43:47] INFO: Reading sequences
  [2022-07-22 17:45:15] INFO: Generating solid k-mer index
  [2022-07-22 17:45:32] INFO: Counting k-mers (1/2):
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 17:48:26] INFO: Counting k-mers (2/2):
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 17:54:34] INFO: Filling index table
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-22 18:05:38] INFO: Extending reads
  [2022-07-22 18:24:23] INFO: Overlap-based coverage: 868
  [2022-07-22 18:24:23] INFO: Median overlap divergence: 0.0852075
  0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  [2022-07-24 03:32:08] INFO: Assembled 0 disjointigs
  [2022-07-24 03:32:08] INFO: Generating sequence
  [2022-07-24 03:32:09] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

Work dir:
  /projectsp/alland/PanGenome_Project/ReviewerResponses/testing_pipelines/work/dd/8609795cae4b8d69393b8e7daee1bf

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Looking for some guidance on how to proceed.

Best,
Paul

beatsonlab-microbialgenomics / micropipe Goto Github PK

micropipe's Issues

assembly:porechop terminated with an error exit status (255)

Nextpolish db_split failed

Test Data for Basecalling Not working

Flye not creating assembly file

Working sample sheet

Problem with NextPolish

error at assembly (flye step)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent