popitsch / nanopanel2 Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 2.0 463 KB

Nanopanel2: a somatic variant caller for Nanopore panel sequencing data

License: Other

Python 100.00%

nanopanel2's People

Contributors

Stargazers

Watchers

Forkers

1383385 biowilko

nanopanel2's Issues

basecall_grp parameter

I have looked through all my output files from Guppy and can't see where the basecall_grp information is stored. Can you advise where I can find it please.

Can nanopanel2 accepted fastq input?

Hi, distinguished developers of nanopanel2. In you README file you describe nanopanel2 accepted basecalled FAST5 files as input. The workflow as far as I comprehend just extract the fastq sequence file from basecalled FAST5 file I suppose, so can I use fastq file for input?

Error: file /var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01 was not found! Exiting...

Hello：
I don't quite understand how to write this configuration file.
Now i have Fast5 files split by barcode.

❯ ls
barcode01  barcode15  barcode29  barcode43  barcode57  barcode71  barcode85
barcode02  barcode16  barcode30  barcode44  barcode58  barcode72  barcode86
barcode03  barcode17  barcode31  barcode45  barcode59  barcode73  barcode87
barcode04  barcode18  barcode32  barcode46  barcode60  barcode74  barcode88
barcode05  barcode19  barcode33  barcode47  barcode61  barcode75  barcode89
barcode06  barcode20  barcode34  barcode48  barcode62  barcode76  barcode90
barcode07  barcode21  barcode35  barcode49  barcode63  barcode77  barcode91
barcode08  barcode22  barcode36  barcode50  barcode64  barcode78  barcode92
barcode09  barcode23  barcode37  barcode51  barcode65  barcode79  barcode93
barcode10  barcode24  barcode38  barcode52  barcode66  barcode80  barcode94
barcode11  barcode25  barcode39  barcode53  barcode67  barcode81  barcode95
barcode12  barcode26  barcode40  barcode54  barcode68  barcode82  barcode96
barcode13  barcode27  barcode41  barcode55  barcode69  barcode83  unclassified
barcode14  barcode28  barcode42  barcode56  barcode70  barcode84

How can I configure the software to run normally.
command line

singularity run /home/guangzhoulab001/nanopanel2_1.01.sif call -c /home/guangzhoulab001/nanopanel2-1.01/config.json -o .

json file

{
        "dataset_name": "seegene02-20210722-1",                   # name of this dataset (will be used in the output file names/tables)
        "ref":          "/home/guangzhoulab001/GCF_000195955.2_ASM19595v2_genomic.fna",      # the amplicon reference sequence
        "fast5_dir":    "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01",      # workspace output dir of guppy that contains basecalled FAST5 files 
        "fastq_dir":    "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01",                # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured 
        "basecall_grp": "Basecall_1D_001",              # the used basecall group identifier in the FAST5 files
        "demultiplex": {                                # This section is required only for multiplexed datasets.
                "BC01": "S01",                          # Maps the 1st barcode ('BC01') to a sample identified that will be used in the output files 
                "BC02": "S02",
                "BC03": "S03",
                "BC04": "S04",
                "BC05": "S05",
                "BC06": "S06",
                "BC07": "S07",
                "BC08": "S08"
                },
        "logfile": "nanopanel2.log",                    # name of the log file
        "consensus": "mean",                            # used for consensus calculation (only if multiple mappers are configured)
        "mappers": {                                    # configured long-read mappers. Supported types are 'minimap2', 'ngmlr' and 'last'. 
                "mm2" : {							
                        "type": "minimap2"
                        },
                "ngms": {
                        "type": "ngmlr",
                        "additional_param": [ "--no-smallinv", "--no-lowqualitysplit", "-k", "10", "--match", "3", "--mismatch", "-3", "--bin-size", "2", "--kmer-skip", "1" ] # additional runtime parameters for ngmlr
                        },
                "last": {
                        "type": "last"
                        }
                },
        "roi_intervals": ["chr:100-1000"],              # list of genomic intervals in which variant calling will be done 
        "truth_vcf": {                                  # only required if truth-set data is available. Links sample identifiers to truth set VCF files.
                "S01": "truth_vcf/S01.exp.vcf",
                "S02": "truth_vcf/S02.exp.vcf",
                "S03": "truth_vcf/S03.exp.vcf",
                "S04": "truth_vcf/S04.exp.vcf",
                "S05": "truth_vcf/S05.exp.vcf",
                "S06": "truth_vcf/S06.exp.vcf",
                "S07": "truth_vcf/S07.exp.vcf",
                "S08": "truth_vcf/S08.exp.vcf"
                },
        "threads":      8,                              # number of CPUs/threads used by np2 and 3rd part tools
        "suppress_snv": [],                             # list of filters; SNV calls filtered by those will not be included in the output VCF (but will still be in the output TSV file)
        "suppress_del": ["AF", "DP"],
        "suppress_ins": ["AF", "DP"],
        "max_h5_cache": 500,                            # maximum number of cached H5 files. Setting this to a number >= the number of input FAST5 will greatly speed up the pipeline (at the cost of memory) 
        "exe": {                                        # this section enables users to link to executables for 3rd party tools. Not needed when running via singularity. Supported sections: 'bgzip', 'samtools', 'porechop', 'minimap2', 'ngmlr', 'lastal', 'last-split', 'maf-convert')
                "ngmlr":    "singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif"  # in this example, ngmlr is called via an (external) singularity image
        }
}

Genotype always 1

In the vcf file is the genotype always set to 1 no matter the value for VAF_CORR= as long as the allele frequency is above the threshold of 1%?

Regions of interest Bed

In the block diagram it lists a regions of interest bed as an optional input file but I can't see how to specify it. There is a line in the config for roi_intervals though.
Is it possible to supply a bed file of individual snp locations to only call snps at those positions?

KeyError: "Unable to open object (object 'Trace' doesn't exist)"

I'm getting this error when I try to run your tool, I'm not sure if it's something screwy with my config file or what....

I'm running on the singularity image if that helps.

Here's the error:

Extracting FASTQ data for all: 100%|███████████████████████████| 50/50 [00:57<00:00, 1.14s/it]
Decorating reads in all/mm2: 0%| | 0/192821 [00:00<?, ?it/s]Traceback (most recent call last):
File "/nanopanel2/nanopanel2.py", line 2077, in
nanopanel2_pipeline(config, outdir)
File "/nanopanel2/nanopanel2.py", line 2007, in nanopanel2_pipeline
decorate_reads(config, samples)
File "/nanopanel2/nanopanel2.py", line 677, in decorate_reads
trace = np.array(h5_file[rid]["Analyses"][bcg]["BaseCalled_template"]["Trace"])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'Trace' doesn't exist)"

It seems to be where the decorated bam index file is getting generated since that's the only missing file as far as I can tell.

tumor-normal somatic variant calling

Is it possible to use Nanopanel2 to do somatic variant calling on a pair of matched tumor and normal long-read (oxford nanopore) sequenced samples?

Problems running release v1.01

Hi,

I downloaded release v1.01 to try out nanopanel2, but I've had a fair few problems getting it to work. Command:

singularity run nanopanel2_1.01.sif call --conf config.json --out test/

While nanopanel2 runs using minimap2 alone, the program fails when using either or both ngmlr and last.

ngmlr:
The command to run ngmlr fails because nanopanel2 is trying to run using a singularity container within nanopanel2_1.01.sif, rather than just calling the ngmlr program that's already installed in the main nanopanel2_1.01.sif container. Failing command within the log file:

singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif -r nCoV-2019.reference.fasta -q test/2021-06-18/2021-06-18.sample.fq.gz -x ont -t 12 -o test/2021-06-18/2021-06-18.sample.ngms.sam --no-smallinv --no-lowqualitysplit -k 10 --match 3 --mismatch -3 --bin-size 2 --kmer-skip 1

If I shell into nanopanel2_1.01.sif and run the same command as above, but changin 'singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif" to ngmlr, the command finishes with no problems.

last:
lastal fails because nanopanel2 does not correctly create a lastdb for the reference fasta file. It seems to be looking for a lastdb at '/home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db', as evidenced by the error:

ERROR:root:b"lastal: can't open file: /home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db.prj\n"
ERROR:root:ERROR Command 'lastal -Q1 /home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db test/2021-06-18/2021-06-18.sample.fq.gz > test/2021-06-18/2021-06-18.sample.last.sam.tmp' returned non-zero exit status 1. - removing outputfile test/2021-06-18/2021-06-18.sample.last.sam.tmp

The last step works if I shell into the container, create my own lastdb called 'last_db', then run the alignment:

lastdb last_db nCoV-2019.reference.fasta
lastal -Q1 last_db test/2021-06-18/2021-06-18.sample.fq.gz > test/2021-06-18/2021-06-18.sample.last.sam.tmp

If I delete the ngmlr and last sections from the config, nanopanel2 completes.

Cheers,
Charles

basecall_grp?

Greetings!
I am trying to run nanopanel2 on my samples which were basecalled and demultiplexed with the following versions:
MinKNOW 21.02.2
MinKNOW Core 4.2.4
Bream 6.1.10
Guppy 4.3.4

And I am just trying to run np2 with the data for barcode02.
"fast5_dir": "//data/fast5_pass/barcode02", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "//data/fastq_pass/barcode02", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured

I am gettign this error:

KeyError: "Unable to open object (object 'Basecall_1D_001' doesn't exist)"
I do not see any reference to this term in your paper, and I cannot find anything with a Google search.

this error appears even after I remove the following line from the config file:
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files

my config file is attached
config.json.zip

Any ideas?

I am hoping to use this program to analyze somatic variation in rat neurons after CRISPR mutagenesis.

Thanks
chris

Fast5 files split by barcode

I have barcoded reads which are being demultiplexed correctly by Porechop but my Fast5 files are not loading.
From the log file:
Reading FAST5 files from fast5_pass/
INFO:root:Loaded 0 FAST5 files

From looking at the code I suspect this may be because my fast5 reads are in separate subfolders by barcode within the folder I specify in the config (fast5_pass).
The contents of fast5_pass looks like:
barcode01/
barcode02/
etc....
With the .fast5 reads for each barcode within their own folder.

popitsch / nanopanel2 Goto Github PK

nanopanel2's People

Contributors

Stargazers

Watchers

Forkers

nanopanel2's Issues

basecall_grp parameter

Can nanopanel2 accepted fastq input?

Error: file /var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01 was not found! Exiting...

Genotype always 1

Regions of interest Bed

KeyError: "Unable to open object (object 'Trace' doesn't exist)"

tumor-normal somatic variant calling

Problems running release v1.01

basecall_grp?

Fast5 files split by barcode

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent