Giter Club home page Giter Club logo

flagger's Introduction

Evaluating dual/diploid assemblies with Flagger

Overview

Here is a description of a read-based pipeline that can detect different types of mis-assemblies in a draft dual/diploid assembly. (What is a dual assembly? Read this page). The core component of this pipeline is Flagger. Flagger recieves the read alignments to a draft diploid assembly, detects anomalies in read coverage along the assembly and partitions the assembly into 4 main components; erroneous, (falsely) duplicated, haploid (correctly assembled) and collapsed.

This evaluation has 3 steps:

  • Align long reads to the diploid assembly
  • Phase and relocalize the reads with secondary alignments using secphase (Optional)
  • Run Flagger

1. Align long reads

The ONT and HiFi reads can be aligned to a dual/diploid assembly (~ 6Gbases long in human) with a long read aligner like winnowmap and minimap2. Here are the commands for producing the alignments (taken from the winnowmap docs):

  # making the k-mer table with meryl
  meryl count k=15 output merylDB asm.fa
  meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt
  
  # alignment with winnowmap
  winnowmap -W repetitive_k15.txt -ax [map-ont | map-pb] -Y -L --eqx --cs -I8g <(cat pat_asm.fa mat_asm.fa) reads.fq.gz | \
    samtools view -hb | samtools sort -@8 > read_alignment.bam

Any other appropriate long read aligner can be employed in this step.

2. Relocalize wrongly phased reads (Optional)

In this step we use Secphase to phase and relocalize the reads with multiple alignments. To be more precise all the secondary and primary alignments of the same read are scored based on marker consistency and the alignment with the highest score is selected as the primary alignment. The output of this section is a corrected version of the input bam file, in which the primary and secondary alignments are swapped whenever neccessary. Secphase can work properly only if the secondary alignments are available with their full sequence and base quality array.

More information about Secphase is available here

3. Run Flagger

The produced alignment file (in step 1) can be used as the input to Flagger. Flagger outputs a bed file with 5 labels; erroneous (Err), duplicated (Dup), haploid (Hap), collapsed (Col) and unkown (Unk). Any component other than "haploid" is pointing to unreliable blocks in assembly and unkown label is for the bases that couldn't be assigned confidently to the other components (for example when assigned blocks are extremely short). These components are explained in detail here.

More information about Flagger is available here

Running pipeline with WDL

It is easier to run the pipeline using the WDLs described below. A WDL file can be run locally using Cromwell, which is an open-source Workflow Management System for bioinformatics. The latest releases of Cromwell are available here and the documentation is available here.

It is recommended to run the whole pipeline using flagger_end_to_end_with_mapping.wdl.

Here is a list of input parameters for flagger_end_to_end_with_mapping.wdl (The parameters marked as "(Mandatory)" are mandatory to be defined in the input json):

Parameter Description Type Default
sampleName Sample name; for example 'HG002' String No Default (Mandatory)
suffixForFlagger Suffix string that contains information about this analysis; for example 'hifi_winnowmap_flagger_for_hprc' String No Default (Mandatory)
suffixForMapping Suffix string that contains information about this alignment. It will be appended to the name of the final alignment file. For example 'hifi_winnowmap_v2.03_hprc_y2' String No Default (Mandatory)
hap1AssemblyFasta Path to uncompressed or gzip-compressed fasta file of the 1st haplotype. File No Default (Mandatory)
hap2AssemblyFasta Path to uncompressed or gzip-compressed fasta file of the 2nd haplotype. File No Default (Mandatory)
readfiles An array of read files. Their format can be either fastq, fq, fastq.gz, fq.gz, bam or cram. For cram format referenceFastaForReadExtraction should also be passed. Array[File] No Default (Mandatory)
preset Paremeter preset should be selected based on aligner and sequencing platform. Common presets are map-pb/map-hifi/map-ont for minimap2, map-pb/map-ont for winnowmap and hifi-haploid/hifi-haploid-complete/hifi-diploid/ont-haploid-complete for veritymap String No Default (Mandatory)
aligner Name of the aligner. It can be either minimap2, winnowmap or veritymap. String winnowmap
kmerSize The kmer size for using minimap2 or winnowmap. With winnowmap kmer size should be 15 and with minimap2 kmer size should be 17 and 19 for using the presets map-ont and map-hifi/map-pb respectively. Int 15
alignerOptions Aligner options. It can be something like '--eqx --cs -Y -L -y' for minimap2/winnowmap. Note that if assembly is diploid and aligner is either minimap2 or winnowmap '-I8g' is necessary. If the reads contain modification tags and these tags are supposed to be present in the final alignment file, alignerOptions should contain '-y' and the aligner should be either minimap2 or winnowmap. If running secphase is enabled it is recommended to add '-p0.5' to alignerOptions; it will keep more secondary alignments so secphase will have more candidates per read. For veritymap '--careful' can be used but not recommended for whole-genome assembly since it increases the runtime dramatically. String --eqx -Y -L -y
readExtractionOptions The options to be used while converting bam to fastq with samtools fastq. If the reads contain epigentic modification tags it is necessary to use '-TMm,Ml,MM,ML' String -TMm,Ml,MM,ML
referenceFastaForReadExtraction If reads are in CRAM format then the related reference should be passed to this parameter for read extraction. File No Default (Optional)
enableAddingMDTag If true it will call samtools calmd to add MD tags to the final bam file. Boolean true
splitNumber The number of chunks which the input reads should be equally split into. Note that enableSplittingReadsEqually should be set to true if user wants to split reads into equally sized chunks. Int 16
enableSplittingReadsEqually If true it will merge all reads together and then split them into multiple chunks of roughly equal size. Each chunk will then be aligned via a separate task. This feature is useful for running alignment on cloud/slurm systems where there are multiple nodes available with enough computing power and having alignments tasks distributed among small nodes is more efficient or cheaper than running a single alignment task in a large node. If the whole workflow is being on a single node it is not recommened to use this feature since merging and splitting reads takes its own time. Boolean false
minReadLengthForMapping If it is greater than zero, a task will be executed for filtering reads shorter than this value before alignment. Int 0
alignerThreadCount The number of threads for mapping in each alignment task Int 16
alignerMemSize The size of the memory in Gb for mapping in each alignment task Int 48
alignerDockerImage The mapping docker image String mobinasri/long_read_aligner:v0.4.0
correctBamOptions Options for the correct_bam program that can filters short/highly divergent alignments String --primaryOnly --minReadLen 5000 --minAlignment 5000 --maxDiv 0.1
preemptible Number of retries to use preemptible nodes on Terra/GCP. Int 2
zones Name of the zone for taking nodes on Terra/GCP. String us-west2-a
maxReadDivergenceForFlagger Alignments with gap-compressed ratio higher than this will be filtered in the pre-process step of flagger. Float 0.1
potentialBiasesBedArray Array of bed files each of which contains regions with potential coverage bias for example one bed file can contain HSat2 regions in haplotype 1. Array[File] No Default (Optional)
sexBed Optional bed file containing regions assigned to X/Y chromosomes. (can be either in ref or asm coordinates) File No Default (Optional)
SDBed Optional Bed file containing Segmental Duplications. (can be either in ref or asm coordinates) File No Default (Optional)
cntrBed Optional Bed file containing peri/centromeric satellites (ASat, HSat, bSat, gSat) without 'ct' blocks. File No Default (Optional)
cntrCtBed Optional Bed file containing centromere transition 'ct' blocks. File No Default (Optional)
additionalStratificationBedArray Array of additional stratification bed files for final stats tsv file. Array[File] No Default (Optional)
additionalStratificationNameArray Array of names for the stratifications provided in the argument additionalStratificationBedArray. Array[File] No Default (Optional)
enableProjectingBedsFromRef2Asm If True it means that the given bed files are in ref coors (e.g. chm13v2) and they have to be projected to asm coors. Boolean false
projectionReferenceFasta The given bed files are in the coordinates of this reference. A reference should be passed if enableProjectingBedsFromRef2Asm is true. File No Default (Optional)
enableRunningSecphase If true it will run secphase in the marker mode using the wdl parameters starting with 'secphase' otherwise skip it. Boolean false
secphaseDockerImage Docker image for running Secphase String mobinasri/secphase:v0.4.3
secphaseOptions String containing secphase options (can be either --hifi or --ont). String --hifi
secphaseVersion Secphase version. String v0.4.3
enableOutputtingWig If true it will make wig files from cov files and output them. wig files can be easily imported into IGV sessions Boolean true
enableOutputtingBam If true it will output read alignment bam file and its related index Boolean false
windowSize The size of the window flagger uses for finding coverage distrubutions (Default = 5Mb) Int 5000000
sortPdfPagesByHaplotype Sort the coverage distributions plotted in the output pdf by haplotype Boolean false
hap1ContigPattern The pattern that will be used for finding the names of the contigs belonging to haplotype1. It will be skipped if sortPdfPagesByHaplotype is false. String hap1
hap2ContigPattern The pattern that will be used for finding the names of the contigs belonging to haplotype2. It will be skipped if sortPdfPagesByHaplotype is false. String hap2

Using CHM13 annotation files (Optional)

A set of annotations files in the coordinates of chm13v2.0 are prepared beforehand and they can be used as inputs to the workflow. These bed files are useful when there is no denovo annotation for the assemblies and we like to project annotations from chm13v2.0 to the assembly coordinates for two main purposes:

  1. Detecting regions with coverage biases: Satellite repeat arrays might have coverage biases so before running Flagger the pipeline will detect potentially baised regions. Flagger will then fit a separate Gaussian model to each detected annotation.
  2. Stratifying final results with the projected annotations.

Using these bed files are optional and the related parameters can be left undefined (being absent from the input json). In this case the workflow should still work properly. However users should be aware that in this case potential coverage biases in HSat arrays may mislead the pipeline. The final summary tsv files will contain zero values for any stratification whose bed file was left undefined.

Parameter Value
enableProjectingBedsFromRef2Asm true
projectionReferenceFastaGz https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
potentialBiasesBedArray ['https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_bsat.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_hsat1A.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_hsat1B.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_hsat2.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_hsat3.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/potential_biases/chm13v2.0_hor.bed']
censat_bed https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_no_ct.bed
sd_bed https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.all.bed
sex_bed https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sex/chm13v2.0_sex.bed
additional_bed_array ['https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_no_ct.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_only_ct.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_bsat.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_gsat.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_hor.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_hsat1A.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_hsat1B.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_hsat2.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_hsat3.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/censat/chm13v2.0_mon.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.g99.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.g98_le99.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.g90_le98.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.le90.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/sd/chm13v2.0_SD.all.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/repeat_masker/chm13v2.0_RM_4.1.2p1_le6_STR.bed','https://raw.githubusercontent.com/mobinasri/flagger/v0.4.0/misc/stratifications/repeat_masker/chm13v2.0_RM_4.1.2p1_ge7_VNTR.bed']
additional_name_array ['sat_no_ct','only_ct','bsat','gsat','hor','hsat1A','hsat1B','hsat2','hsat3','mon','sd_g99','sd_g98_le99','sd_g90_le98','sd_le90','sd_all','STR','VNTR']

All files with git and s3 urls are publicly accessible so if you are running the WDL on Terra/AnVIL platforms you can use the same urls. They are also available in the directories misc/annotations and misc/biased_regions of this repository for those who want to run locally.

Other major workflows

flagger_end_to_end_with_mapping.wdl is running two major workflows; long_read_aligner_scattered.wdl and flagger_end_to_end.wdl. The first one runs read mapping and the second one runs the end-to-end flagger pipeline (including annotation projection, bias detection, secphase and flagger).

Users can run each of them separately. For example if there is a read alignment file available beforehand users can run only flagger_end_to_end.wdl. The parameters for each of these workflows is a subset of the parameters listed above for flagger_end_to_end_with_mapping.wdl therefore this table can still be used as a reference for either long_read_aligner_scattered.wdl or flagger_end_to_end.wdl.

Running WDLs locally using Cromwell

Below are the main commands for running flagger_end_to_end_with_mapping.wdl locally using Cromwell.

wget https://github.com/broadinstitute/cromwell/releases/download/85/cromwell-85.jar
wget https://github.com/broadinstitute/cromwell/releases/download/85/womtool-85.jar

# Get version 0.4.0 of Flagger
wget https://github.com/mobinasri/flagger/archive/refs/tags/v0.4.0.zip

unzip v0.4.0.zip


# make a directory for saving outputs and json files
mkdir workdir 

cd workdir


java -jar ../womtool-85.jar inputs ../flagger-0.4.0/wdls/workflows/flagger_end_to_end_with_mapping.wdl > inputs.json

After modifying inputs.json, setting mandatory parameters: sampleName, suffixForFlagger, suffixForMapping, hap1AssemblyFasta, hap2AssemblyFasta, readfiles, preset and removing unspecified parameters (they will be set to default values), users can run the command below:

## run flagger workflow
java -jar ../cromwell-85.jar run ../flagger-0.4.0/wdls/workflows/flagger_end_to_end_with_mapping.wdl -i inputs.json -m outputs.json

The paths to output files will be saved in outputs.json. The instructions for running any other WDL is similar.

Running WDLs on Slurm using Toil

Instructions for running WDLs on Slurm are provided here , which includes some test data sets for each of the workflows:

Dockstore links

All WDLs are uploaded to Dockstore for easier import into platforms like Terra or AnVIL.

Components

Component Status Color Description
Err Erroneous Red This block has low read coverage. If it is located in the middle of a contig it could be either a misjoin or a region that needs polishing
Dup Duplicated Orange This block is potentially a false duplication of another block. It should mainly include low-MAPQ alignments with half of the expected coverage. Probably one of the copies has to be polished or removed to fix this issue
Hap Haploid Green This block is correctly assembled and has the expected read coverage
Col Collapsed Purple Two or more highly similar haplotypes are collapsed into this block
Unk Unknown Gray These blocks could not be assigned confidently (usually on the edges of other components)

Each of these components has their own color when they are shown in the IGV or the UCSC Genome Browser.

Running Flagger on HPRC assemblies

The haplotype-resolved assemblies of the HPRC-Y1 samples and their corresponding data sets are available in

https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=working/HPRC/

For more details read this github page:

https://github.com/human-pangenomics/HPP_Year1_Assemblies

We have used the Genbank version of the HPRC-Y1 assemblies.

The v0.1 results are available in https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/e9ad8022-1b30-11ec-ab04-0a13c5208311--COVERAGE_ANALYSIS_Y1_GENBANK/FLAGGER/

Publications

Liao, Wen-Wei, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu et al. "A draft human pangenome reference." Nature 617, no. 7960 (2023): 312-324.

flagger's People

Contributors

mobinasri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

flagger's Issues

Updated files for CHM13-v2.0

Hello!

Do you have any suggestion for generating the biased_high and biased_low files for CHM13v2.0? It basically would require running the analysis you had in the first place on HG002-chrY. Or do you think this does not make a large difference in results.

Thanks so much!

Pipeline failing on call_stats step

Running flagger on the latest version, I get an error in the call_stats step (for both call_stats and call_stats_alt_removed).
Previous steps finish successfully.

The error is that the perc_only and flagger_stats tsv files are not created. However, the flagger.hap_ngx.txt file is created.

The stderr is attached, and ln error message is attached.
ln: failed to access '/cromwell-executions/FlaggerEndToEnd/9149f654-c23d-422b-9d68-70a44af71356/call-stats_alt_removed/execution/*perc_only.tsv': No such file or directory
ln: failed to access '/cromwell-executions/FlaggerEndToEnd/9149f654-c23d-422b-9d68-70a44af71356/call-stats_alt_removed/execution/*flagger_stats.tsv': No such file or directory
stderr.txt

Thanks.

Flagger with sequencing with target enrichment?

Hi,
I love the concept of your tool, and am wondering if it would be appropriate to use in a more special l case. I want to evaluate haplotype-resolved de-novo assemblies of a complex region, and I have both HiFi and ultra-long ONT reads. Am I right in thinking if the HiFi reads are obtained with capture probes, flagger would not work due to uneven coverage distribution? And on that note, would you foresee that target enrichment (adaptive sampling) with ONT would work with flagger on the enriched ROI assembly?

Thanks 👍

depth2cov fails silently

Hi,

I am in the process of computing the different coverages for my assembly and I have succeeded for 3 out of the 4 sets: corrected low mapq, alt filtered low mapq and corrected high mapq. For the last set (alt filtered high mapq), it seems that depth2cov fails silently: the tools prints the length of each contig, then Coverting depth to cov and stops immediately. No .cov file has been output. I have recomputed the .depth file just to be sure it wasn't a corruption issue but same problem.

Any idea on how I could solve this?

Thanks,
Guillaume

Genomic Reference for Files in FLAGGER_HIFI_PROJECTED_SIMPLIFIED_BEDS/ Directories

Hi,

For the files in FLAGGER_HIFI_PROJECTED_SIMPLIFIED_BEDS/ directories, are their cooddinate based on T2T-CHM13 (v.2.0) or T2T-CHM13 (v.1.1)or hg38?
For examples, files in here: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/e9ad8022-1b30-11ec-ab04-0a13c5208311--COVERAGE_ANALYSIS_Y1_GENBANK/FLAGGER/APR_08_2022/FINAL_HIFI_BASED/FLAGGER_HIFI_PROJECTED_SIMPLIFIED_BEDS/

Thank you!

flagger parameters with updated nanopore flow cell

Hello,

Thank you for the excellent tool. I am interested in running flagger locally on a de novo assembly that used PB hifi and ONT reads (R10.4 flowcell). Do you know if I should keep the ONT-related cutoffs as suggested on your github (which I think refer to the R9.4 accuracy) or change parameters to be a bit more stringent. For example with FlaggerEndToEnd.maxReadDivergence [0.02 for HiFi, 0.09 for ONT, 0.06 for ONT/R10]

Best,
Dustin

Could not localize, doesn't exist

Hello, I get the following error when I try to run the flagger pipeline.

[2024-05-03 07:30:55,44] [error] BackgroundConfigAsyncJobExecutionActor [ddf63c0arunSecPhase.secphase:NA:1]: Error attempting to Execute                                                                                                                                                                                                                                                                                                                                                               
java.lang.Exception: Failed command instantiation                                                                                                                                                                                                                                                                                                                                                                                                                                                      
        at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:674)                                                                                                                                                                                                                                                                                                                                                                            
        at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:609)                                                                                                                                                                                                                                                                                                                                                                           
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand$lzycompute(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                              
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.instantiatedCommand(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                         
        at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents(StandardAsyncExecutionActor.scala:385)                                                                                                                                                                                                                                                                                                                                                                          
        at cromwell.backend.standard.StandardAsyncExecutionActor.commandScriptContents$(StandardAsyncExecutionActor.scala:384)                                                                                                                                                                                                                                                                                                                                                                         
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.commandScriptContents(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                       
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents(SharedFileSystemAsyncJobExecutionActor.scala:175)                                                                                                                                                                                                                                                                                                                                                           
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.writeScriptContents$(SharedFileSystemAsyncJobExecutionActor.scala:174)                                                                                                                                                                                                                                                                                                                                                          
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.cromwell$backend$sfs$BackgroundAsyncJobExecutionActor$$super$writeScriptContents(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                            
        at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents(BackgroundAsyncJobExecutionActor.scala:12)                                                                                                                                                                                                                                                                                                                                                                        
        at cromwell.backend.sfs.BackgroundAsyncJobExecutionActor.writeScriptContents$(BackgroundAsyncJobExecutionActor.scala:11)                                                                                                                                                                                                                                                                                                                                                                       
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.writeScriptContents(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                         
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:158)                                                                                                                                                                                                                                                                                                                                                                       
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:155)                                                                                                                                                                                                                                                                                                                                                                      
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                                     
        at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:751)                                                                                                                                                                                                                                                                                                                                                                        
        at scala.util.Try$.apply(Try.scala:210)                                                                                                                                                                                                                                                                                                                                                                                                                                                        
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:751)                                                                                                                                                                                                                                                                                                                                                                                   
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:751)                                                                                                                                                                                                                                                                                                                                                                                  
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                                
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:1141)                                                                                                                                                                                                                                                                                                                                                                              
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:1133)                                                                                                                                                                                                                                                                                                                                                                             
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                            
        at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)                                                                                                                                                                                                                                                                                                                                                              
        at cromwell.core.retry.Retry$.withRetry(Retry.scala:46)                                                                                                                                                                                                                                                                                                                                                                                                                                        
        at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)                                                                                                                                                                                                                                                                                                                                                                                      
        at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)                                                                                                                                                                                                                                                                                                                   
        at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)                                                                                                                                                                                                                                                                                                                                                                 
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270)                                                                                                                                                                                                                                                                                                                                                                                                                         
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270)                                                                                                                                                                                                                                                                                                                                                                                                                         
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270)                                                                                                                                                                                                                                                                                                                                                                                                                         
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270)                                                                                                                                                                                                                                                                                                                                                                                                                         
        at akka.actor.Actor.aroundReceive(Actor.scala:539)                                                                                                                                                                                                                                                                                                                                                                                                                                             
        at akka.actor.Actor.aroundReceive$(Actor.scala:537)                                                                                                                                                                                                                                                                                                                                                                                                                                            
        at cromwell.backend.impl.sfs.config.BackgroundConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:200)                                                                                                                                                                                                                                                                                                                                                               
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)                                                                                                                                                                                                                                                                                                                                                                                                                                    
        at akka.actor.ActorCell.invoke(ActorCell.scala:583)                                                                                                                                                                                                                                                                                                                                                                                                                                            
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)                                                                                                                                                                                                                                                                                                                                                                                                                                     
        at akka.dispatch.Mailbox.run(Mailbox.scala:229)                                                                                                                                                                                                                                                                                                                                                                                                                                                
        at akka.dispatch.Mailbox.exec(Mailbox.scala:241)                                                                                                                                                                                                                                                                                                                                                                                                                                               
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)                                                                                                                                                                                                                                                                                                                                                                                                                           
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)                                                                                                                                                                                                                                                                                                                                                                                                               
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)                                                                                                                                                                                                                                                                                                                                                                                                                       
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)                                                                                                                                                                                                                                                                                                                                                                                                              
Caused by: common.exception.AggregatedMessageException: Error(s):                                                                                                                                                                                                                                                                                                                                                                                                                                      
:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
Could not localize  -> /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ:                                                                                                                                                                                                                                                              
         doesn't exist                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
        /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ -> /scratch/rranallo-benavidez/flagger/BJ: Operation not permitted                                                                                                                                                                                                           
        File not found /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/ca
ll-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/8
9f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp
/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/
call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase
/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-
2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/Fl
aggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/10
78326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/assembly.haplotype2.chm13v2.0.sorted.bam
Could not localize  -> /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ:
         doesn't exist
        /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ -> /scratch/rranallo-benavidez/flagger/BJ: Operation not permitted
        File not found /scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/ca
ll-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/8
9f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp
/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/
call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase
/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-
2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/Fl
aggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/10
78326427/BJ.tmp/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/-519806452/BJ_hifi_all_vs_verkko_diploid_read_alignment.sorted_by_qname.bam
        at common.validation.Validation$ValidationTry$.toTry$extension(Validation.scala:94)
        at common.validation.Validation$ValidationTry$.toTry$extension(Validation.scala:90)
        at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:672)
        ... 44 common frames omitted

It mentions that

/scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ

doesn't exist. However,

/scratch/rranallo-benavidez/flagger/BJ/cromwell-executions/FlaggerEndToEnd/89f0d04c-b9bd-45b8-a08b-2c6eb6756128/call-secphase/runSecPhase/ddf63c0a-d8d9-47f7-abf9-997ef218c6e1/call-secphase/inputs/1078326427/BJ.tmp

does exist.

Do you have any idea what might be causing this error? Thanks for any insights. Also, is there any reason why the file names are so nested? For example, the file not found seems to repeat the same folder structure multiple times.

duplicated.bed is empty

Following your flagger pipline, I finally gain the four files, the haploid.bed(99.55%),error.bed(0.013%), collosed.bed( 0.437%) and duplicated.bed(0%). The duplicated.bed file here is completely empty. I don't know what the problem is, how to deal with it ? Thanks
This my code:

#1.Mapping HiFi reads by winnowmap
$Winnowmap/meryl count k=15 output merylDB $ref
$Winnowmap/meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt

$Winnowmap/winnowmap -Y -L --eqx --cs -I8g  -W repetitive_k15.txt  -ax map-pb $ref $reads_pool  > HiFi_all.sam
samtools view -hb -@20 DRCMale_HiFi_all.sam -o HiFi_all.bam
bam=HiFi_all.bam
ref=Combined.fa
prefix=dipGenome
EXPECTED_COVERAGE=28

#2.secphase:gain the corrected bam
#Phase reads for HiFi
docker run   \
-v "${PWD}":"/input" \
mobinasri/secphase:v0.1 \
secphase -q -c -t10 -d 1e-4 -e 0.1 -b20 -m20 -s40 \
-i "/input/$bam" \
-f "/input/$ref" >PHASING_OUT.log
#produce the modified bam file
docker run   \
-v "${PWD}":"/input" \
-v "${PWD}":"/output" \
mobinasri/secphase:v0.1 \
correct_bam \
-i "/input/$bam" \
-P "/input/PHASING_OUT.log" \
-o "/output/${prefix}.corrected.bam" \
--primaryOnly

samtools sort -@20 ${prefix}.corrected.bam >${prefix}.corrected.sorted.bam
samtools index -@20 ${prefix}.corrected.sorted.bam

#3.DeepVariants
mkdir -p output
mkdir -p output/intermediate_results_dir
BIN_VERSION="1.2.0"
docker run \
  -v "${PWD}":"/input" \
  -v "${PWD}/output":"/output" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type "PACBIO" \
  --ref /input/$ref \
  --reads /input/${prefix}.corrected.sorted.bam \
  --output_vcf /output/${prefix}.vcf.gz \
  --make_examples_extra_args="keep_supplementary_alignments=true,min_mapping_quality=0" \
  --call_variants_extra_args="use_openvino=true" \
  --num_shards 144 \
  --intermediate_results_dir /output/intermediate_results_dir \
  --dry_run=false
bcftools view -Ov -f PASS -m2 -M2 -v snps -e 'FORMAT/VAF< 0.3 | FORMAT/GQ< 10' output/${prefix}.vcf.gz > ${prefix}.filtered.vcf.gz

#4.Flagger: Detected misassmeblies
docker run   \
-v "${PWD}":"/input" \
-v "${PWD}":"/output" \
 mobinasri/flagger:v0.1 \
 filter_alt_reads \
-i "/input/${prefix}.corrected.sorted.bam" \
-o "/output/${prefix}.alt_filtered.bam" \
-f "/output/${prefix}.alt.bam" \
-v "/input/${prefix}.filtered.vcf.gz" \
-m 100 -r 0.4
                
## Find base-level coverages
#samtools depth  -aa -Q 0 ${prefix}.alt_filtered.bam > read_alignment.depth
# Convert depth to cov
docker run \
 -v "${PWD}":"/input" \
 -v "${PWD}":"/output" \
 mobinasri/flagger:v0.1 \
 depth2cov \
 -d "/input/read_alignment.depth" \
 -f "/input/$ref.fai" \
 -o "/output/read_alignment.cov"


#Coverage Distribution and Fitting The Mixture Model
docker run \
  -v "${PWD}":"/input" \
 -v "${PWD}":"/output" \
 mobinasri/flagger:v0.1 \
 cov2counts \
 -i "/input/read_alignment.cov" \
 -o "/output/read_alignment.counts"


### Run fit_model_extra.py to fit the model
docker run \
   -v "${PWD}":"/input" \
 -v "${PWD}":"/output" \
 mobinasri/flagger:v0.1 \
 python3 /home/programs/src/fit_gmm.py \
 --counts  "/input/read_alignment.counts" \
 --cov ${EXPECTED_COVERAGE} \
 --output "/output/read_alignment.table" 


## Run find_blocks_from_table
docker run \
  -v "${PWD}":"/input" \
 -v "${PWD}":"/output" \
 mobinasri/flagger:v0.1 \
 find_blocks_from_table \
 -c "/input/read_alignment.cov" \
 -t "/input/read_alignment.table" \
 -p "/output/$prefix"

A few questions about the WDL pipeline

Hi,

I skimmed through the WDL files recently and I have a handful of questions about the pipeline:

  • The fit_model steps seem to have a hard-coded coverage of 20 used as parameter of fit_gmm.py. Shouldn't it use the output of calc_mean_sd.py instead?
  • A few of the steps converting .cov files to .bed use the following awk instruction: if(substr($1,1,1) == ">") {contig=substr($1,2,40);. This is assuming that the assembly has contig names no longer than 40 characters?
  • The command split_cov_by_window creating coverage files for each window of 5mbp outputs a "summary" file windows.txt. This file seems to have a BED file structure from what I saw: every line is contig name, start position and end position. However, in task findBlocksByWindow, this file is parsed and filtered using the expression cat ~{windowsText} | awk '~{minWindowSize} <= $3'. Using this, it seems that the 3rd field of each line in windows.txt is the length of the window rather than the end position in the contig?

Thank you for your help.
Guillaume

Citation

Hi @mobinasri to cite this pipeline should we cite the pangenome paper or do you have a methods paper for this?

Thanks,
Mitchell

The most appropriate way to estimate sequence coverage

Hi,

Thanks for a great tool! I have a question on the expected coverage you use to input for running fit_model_extra.py. When we have no reason to suspect there should be systematic increase or decrease in coverage, what is the most appropriate way to estimate this? E.g. median or arithmetic mean from output of samtools depth?

"## Run fit_model_extra.py to fit the model
docker run
-v ${INPUT_DIR}:${INPUT_DIR}
-v ${OUTPUT_DIR}:${OUTPUT_DIR}
mobinasri/flagger:v0.3.2
python3 /home/programs/src/fit_gmm.py
--counts ${INPUT_DIR}/read_alignment.counts
--cov ${EXPECTED_COVERAGE}
--output ${OUTPUT_DIR}/read_alignment.table "

Kind regards,
Andreas

Install instructions

Hi,

I'd like to install this software locally - is using the docker image the fastest/best way to do that?

Could I use Flgger to evaluate the chromosome-level genome assembly

Hi,

Thank you for your softwore.
I am writing to inquire whether I can use Flogger to evaluate a chromosome-level genome assembly instead of a draft dual assembly for presenting my genome evaluation results.
I have compared the chromosome-level genome assembly with the draft dual assembly, and the results are quite similar.
Thank you for your assistance.

Best Regards,
Yuting

Segmentation fault when running depth2cov

When I run:
singularity exec /path/flagger_v0.3.2.sif /home/programs/bin/depth2cov -d dipo.depth -f dipo.fasta.fai -o dipo.cov
I got error report:
/var/spool/slurm/d/job8163912/slurm_script: line 7: 9001 Segmentation fault
In the output file dipo.cov, some right results has already been produced.
I check the input file dipo.depth, and the format of each chromosome is correct. I split the depth of each chr, and run depth2cov, olny chr13 report "Segmentation fault", what might be the cause?
Thanks a lot !

Space char. in DeepVariant command causes DeepVariant to fail

Hi,

Just FYI, the space character in

--make_examples_extra_args="keep_supplementary_alignments=true, min_mapping_quality=0"

of the DeepVariant command (Section 3 of README) will cause DeepVariant 1.4.0 to complain about positional arguments not being allowed and stop there.
The correct argument is:

--make_examples_extra_args="keep_supplementary_alignments=true,min_mapping_quality=0"

Thanks for the tools
Guillaume

samtools error: Chromosome blocks not continuous

Running flagger_end_to_end.wdl via Cromwell with a slurm backend, I have an error in the call-preprocess step.
It looks like the error is that the index cannot be created for the bam.

Inputs are the HPRC assemblies and aligned bam (minimap2) from https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=working/HPRC_PLUS/HG00733/. Specifically, HG00733.maternal.f1_assembly_v2_genbank.fa.gz, HG00733.paternal.f1_assembly_v2_genbank.fa.gz,and HG00733_20190925_EEE_m54329U_190607_185248.Q20.fastq.gz.
hap1 and hap2toRefBam files are generated with asm2asmAlignment.wdl.

cat HG00733.maternal.f1_assembly_v2_genbank.fa.gz HG00733.paternal.f1_assembly_v2_genbank.fa.gz > combined.fa.gz
minimap2-ax map-pb --threads 32 --cs --eqx -Y -L -I8g combined.fa.gz HG00733_20190925_EEE_m54329U_190607_185248.Q20.fastq.gz > read_alignment.sam
samtools view -hb read_alignment.sam > read_alignment.bam

Input along with recommended FlaggerEndToEnd inputs are passed into cromwell as a json file (see attached inputs.json).
Config is also attached.

Error message:

  • OPTIONS='--primaryOnly --minReadLen 5000 --minAlignment 5000 --maxDiv 0.02 --phasingLog /cromwell-executions/FlaggerEndToEnd/1082ae7f-76e1-4c86-b5fb-d4b30730c9ee/call-preprocess/runFlaggerPreprocess/f49c67ef-a8e5-4f8b-aa5b-4a1d08526a7d/call-correctBam/inputs/-1247867400/read_alignment_minimap2secphase_v0.3.0.phasing_out.txt --exclude read_alignment_minimap2.excluded_read_ids.txt'
  • correct_bam --primaryOnly --minReadLen 5000 --minAlignment 5000 --maxDiv 0.02 --phasingLog /cromwell-executions/FlaggerEndToEnd/1082ae7f-76e1-4c86-b5fb-d4b30730c9ee/call-preprocess/runFlaggerPreprocess/f49c67ef-a8e5-4f8b-aa5b-4a1d08526a7d/call-correctBam/inputs/-1247867400/read_alignment_minimap2secphase_v0.3.0.phasing_out.txt --exclude read_alignment_minimap2.excluded_read_ids.txt -i /cromwell-executions/FlaggerEndToEnd/1082ae7f-76e1-4c86-b5fb-d4b30730c9ee/call-preprocess/runFlaggerPreprocess/f49c67ef-a8e5-4f8b-aa5b-4a1d08526a7d/call-correctBam/inputs/-85231667/read_alignment_minimap2.bam -o output/read_alignment_minimap2.corrected.bam -n8
  • samtools index -@8 output/read_alignment_minimap2.corrected.bam
    [E::hts_idx_push] Chromosome blocks not continuous
    [E::sam_index] Read 'm54329U_190607_185248/28/ccs' with ref_name='HG00733#1#JAHEPQ010000069.1', ref_length=51427907, flags=0, pos=10296551 cannot be indexed
    samtools index: failed to create index for "output/read_alignment_minimap2.corrected.bam": No such file or directory

config_and_inputs.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.