Giter Club home page Giter Club logo

nanopanel2's People

Contributors

popitsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

1383385 biowilko

nanopanel2's Issues

basecall_grp parameter

I have looked through all my output files from Guppy and can't see where the basecall_grp information is stored. Can you advise where I can find it please.

Can nanopanel2 accepted fastq input?

Hi, distinguished developers of nanopanel2. In you README file you describe nanopanel2 accepted basecalled FAST5 files as input. The workflow as far as I comprehend just extract the fastq sequence file from basecalled FAST5 file I suppose, so can I use fastq file for input?

Error: file /var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01 was not found! Exiting...

Hello:
I don't quite understand how to write this configuration file.
Now i have Fast5 files split by barcode.

❯ ls
barcode01  barcode15  barcode29  barcode43  barcode57  barcode71  barcode85
barcode02  barcode16  barcode30  barcode44  barcode58  barcode72  barcode86
barcode03  barcode17  barcode31  barcode45  barcode59  barcode73  barcode87
barcode04  barcode18  barcode32  barcode46  barcode60  barcode74  barcode88
barcode05  barcode19  barcode33  barcode47  barcode61  barcode75  barcode89
barcode06  barcode20  barcode34  barcode48  barcode62  barcode76  barcode90
barcode07  barcode21  barcode35  barcode49  barcode63  barcode77  barcode91
barcode08  barcode22  barcode36  barcode50  barcode64  barcode78  barcode92
barcode09  barcode23  barcode37  barcode51  barcode65  barcode79  barcode93
barcode10  barcode24  barcode38  barcode52  barcode66  barcode80  barcode94
barcode11  barcode25  barcode39  barcode53  barcode67  barcode81  barcode95
barcode12  barcode26  barcode40  barcode54  barcode68  barcode82  barcode96
barcode13  barcode27  barcode41  barcode55  barcode69  barcode83  unclassified
barcode14  barcode28  barcode42  barcode56  barcode70  barcode84

How can I configure the software to run normally.
command line

singularity run /home/guangzhoulab001/nanopanel2_1.01.sif call -c /home/guangzhoulab001/nanopanel2-1.01/config.json -o .

json file

{
        "dataset_name": "seegene02-20210722-1",                   # name of this dataset (will be used in the output file names/tables)
        "ref":          "/home/guangzhoulab001/GCF_000195955.2_ASM19595v2_genomic.fna",      # the amplicon reference sequence
        "fast5_dir":    "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01",      # workspace output dir of guppy that contains basecalled FAST5 files 
        "fastq_dir":    "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01",                # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured 
        "basecall_grp": "Basecall_1D_001",              # the used basecall group identifier in the FAST5 files
        "demultiplex": {                                # This section is required only for multiplexed datasets.
                "BC01": "S01",                          # Maps the 1st barcode ('BC01') to a sample identified that will be used in the output files 
                "BC02": "S02",
                "BC03": "S03",
                "BC04": "S04",
                "BC05": "S05",
                "BC06": "S06",
                "BC07": "S07",
                "BC08": "S08"
                },
        "logfile": "nanopanel2.log",                    # name of the log file
        "consensus": "mean",                            # used for consensus calculation (only if multiple mappers are configured)
        "mappers": {                                    # configured long-read mappers. Supported types are 'minimap2', 'ngmlr' and 'last'. 
                "mm2" : {							
                        "type": "minimap2"
                        },
                "ngms": {
                        "type": "ngmlr",
                        "additional_param": [ "--no-smallinv", "--no-lowqualitysplit", "-k", "10", "--match", "3", "--mismatch", "-3", "--bin-size", "2", "--kmer-skip", "1" ] # additional runtime parameters for ngmlr
                        },
                "last": {
                        "type": "last"
                        }
                },
        "roi_intervals": ["chr:100-1000"],              # list of genomic intervals in which variant calling will be done 
        "truth_vcf": {                                  # only required if truth-set data is available. Links sample identifiers to truth set VCF files.
                "S01": "truth_vcf/S01.exp.vcf",
                "S02": "truth_vcf/S02.exp.vcf",
                "S03": "truth_vcf/S03.exp.vcf",
                "S04": "truth_vcf/S04.exp.vcf",
                "S05": "truth_vcf/S05.exp.vcf",
                "S06": "truth_vcf/S06.exp.vcf",
                "S07": "truth_vcf/S07.exp.vcf",
                "S08": "truth_vcf/S08.exp.vcf"
                },
        "threads":      8,                              # number of CPUs/threads used by np2 and 3rd part tools
        "suppress_snv": [],                             # list of filters; SNV calls filtered by those will not be included in the output VCF (but will still be in the output TSV file)
        "suppress_del": ["AF", "DP"],
        "suppress_ins": ["AF", "DP"],
        "max_h5_cache": 500,                            # maximum number of cached H5 files. Setting this to a number >= the number of input FAST5 will greatly speed up the pipeline (at the cost of memory) 
        "exe": {                                        # this section enables users to link to executables for 3rd party tools. Not needed when running via singularity. Supported sections: 'bgzip', 'samtools', 'porechop', 'minimap2', 'ngmlr', 'lastal', 'last-split', 'maf-convert')
                "ngmlr":    "singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif"  # in this example, ngmlr is called via an (external) singularity image
        }
}

Genotype always 1

In the vcf file is the genotype always set to 1 no matter the value for VAF_CORR= as long as the allele frequency is above the threshold of 1%?

Regions of interest Bed

In the block diagram it lists a regions of interest bed as an optional input file but I can't see how to specify it. There is a line in the config for roi_intervals though.
Is it possible to supply a bed file of individual snp locations to only call snps at those positions?

KeyError: "Unable to open object (object 'Trace' doesn't exist)"

I'm getting this error when I try to run your tool, I'm not sure if it's something screwy with my config file or what....

I'm running on the singularity image if that helps.

Here's the error:

Extracting FASTQ data for all: 100%|███████████████████████████| 50/50 [00:57<00:00, 1.14s/it]
Decorating reads in all/mm2: 0%| | 0/192821 [00:00<?, ?it/s]Traceback (most recent call last):
File "/nanopanel2/nanopanel2.py", line 2077, in
nanopanel2_pipeline(config, outdir)
File "/nanopanel2/nanopanel2.py", line 2007, in nanopanel2_pipeline
decorate_reads(config, samples)
File "/nanopanel2/nanopanel2.py", line 677, in decorate_reads
trace = np.array(h5_file[rid]["Analyses"][bcg]["BaseCalled_template"]["Trace"])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'Trace' doesn't exist)"

It seems to be where the decorated bam index file is getting generated since that's the only missing file as far as I can tell.

tumor-normal somatic variant calling

Is it possible to use Nanopanel2 to do somatic variant calling on a pair of matched tumor and normal long-read (oxford nanopore) sequenced samples?

Problems running release v1.01

Hi,

I downloaded release v1.01 to try out nanopanel2, but I've had a fair few problems getting it to work. Command:

singularity run nanopanel2_1.01.sif call --conf config.json --out test/

While nanopanel2 runs using minimap2 alone, the program fails when using either or both ngmlr and last.

ngmlr:
The command to run ngmlr fails because nanopanel2 is trying to run using a singularity container within nanopanel2_1.01.sif, rather than just calling the ngmlr program that's already installed in the main nanopanel2_1.01.sif container. Failing command within the log file:

singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif -r nCoV-2019.reference.fasta -q test/2021-06-18/2021-06-18.sample.fq.gz -x ont -t 12 -o test/2021-06-18/2021-06-18.sample.ngms.sam --no-smallinv --no-lowqualitysplit -k 10 --match 3 --mismatch -3 --bin-size 2 --kmer-skip 1

If I shell into nanopanel2_1.01.sif and run the same command as above, but changin 'singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif" to ngmlr, the command finishes with no problems.

last:
lastal fails because nanopanel2 does not correctly create a lastdb for the reference fasta file. It seems to be looking for a lastdb at '/home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db', as evidenced by the error:

ERROR:root:b"lastal: can't open file: /home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db.prj\n"
ERROR:root:ERROR Command 'lastal -Q1 /home/cfos/Programs/nanopanel2/nCoV-2019.reference.last_db test/2021-06-18/2021-06-18.sample.fq.gz > test/2021-06-18/2021-06-18.sample.last.sam.tmp' returned non-zero exit status 1. - removing outputfile test/2021-06-18/2021-06-18.sample.last.sam.tmp

The last step works if I shell into the container, create my own lastdb called 'last_db', then run the alignment:

lastdb last_db nCoV-2019.reference.fasta
lastal -Q1 last_db test/2021-06-18/2021-06-18.sample.fq.gz > test/2021-06-18/2021-06-18.sample.last.sam.tmp

If I delete the ngmlr and last sections from the config, nanopanel2 completes.

Cheers,
Charles

basecall_grp?

Greetings!
I am trying to run nanopanel2 on my samples which were basecalled and demultiplexed with the following versions:
MinKNOW 21.02.2
MinKNOW Core 4.2.4
Bream 6.1.10
Guppy 4.3.4

And I am just trying to run np2 with the data for barcode02.
"fast5_dir": "//data/fast5_pass/barcode02", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "//data/fastq_pass/barcode02", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured

I am gettign this error:

KeyError: "Unable to open object (object 'Basecall_1D_001' doesn't exist)"
I do not see any reference to this term in your paper, and I cannot find anything with a Google search.

this error appears even after I remove the following line from the config file:
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files

my config file is attached
config.json.zip

Any ideas?

I am hoping to use this program to analyze somatic variation in rat neurons after CRISPR mutagenesis.

Thanks
chris

Fast5 files split by barcode

I have barcoded reads which are being demultiplexed correctly by Porechop but my Fast5 files are not loading.
From the log file:
Reading FAST5 files from fast5_pass/
INFO:root:Loaded 0 FAST5 files

From looking at the code I suspect this may be because my fast5 reads are in separate subfolders by barcode within the folder I specify in the config (fast5_pass).
The contents of fast5_pass looks like:
barcode01/
barcode02/
etc....
With the .fast5 reads for each barcode within their own folder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.