kishwarshafin / helen Goto Github PK
View Code? Open in Web Editor NEWH.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
License: MIT License
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
License: MIT License
Hi,
Can you add helen to bioconda? It is so difficult To install helen on a centos machine.
Best
Kun
Hi, am trying to run the docker image and get a torch runtime error.
Is this an error in the docker image? Thanks!
INFO: POLISH MODULE SELECTED
INFO: RUN-ID: 09012020_134236
INFO: PREDICTION OUTPUT DIRECTORY: /.../helen_out/predictions_09012020_134236
INFO: CALL CONSENSUS STARTING
INFO: OUTPUT FILE: /.../helen_out/predictions_09012020_134236/265L12.cont.cor.fa
INFO: MODEL LOADING TO ONNX
Traceback (most recent call last):
File "/opt/conda/bin/helen", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/helen/helen.py", line 313, in main
FLAGS.callers)
File "/opt/conda/lib/python3.7/site-packages/helen/modules/python/PolishInterface.py", line 87, in polish_genome
callers)
File "/opt/conda/lib/python3.7/site-packages/helen/modules/python/CallConsensusInterface.py", line 153, in call_consensus
callers, threads_per_caller, num_workers)
File "/opt/conda/lib/python3.7/site-packages/helen/modules/python/models/predict_cpu.py", line 248, in predict_cpu
join=True)
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/opt/conda/lib/python3.7/site-packages/helen/modules/python/models/predict_cpu.py", line 194, in setup
threads)
File "/opt/conda/lib/python3.7/site-packages/helen/modules/python/models/predict_cpu.py", line 65, in predict
torch.set_num_threads(threads)
RuntimeError: set_num_threads expects a positive integer
I have downloaded the available models, but there were no models other than human. My data was mammal but isn't human,how can I get the right model.Thank you for your help.
The Servers I'm using is RedHat4.8 && gcc4.8;
I have spent more than one day in installing helen, the error were coming one after another, anybody met the similar pros?
Please notice me. Thks.
hi,
i ran the margin polishg progrecess (docker version) , and got a fail result.
root@ecs-9875:/media/datarun/blnanodata/data# tail marginPolish.log
/usr/bin/time -f '\nDEBUG_MAX_MEM:%M\nDEBUG_RUNTIME:%E\n' /opt/MarginPolish/build/marginPolish reads_2_assembly.bam new.fasta allParams.np.human.guppy-ff-233.json -t 32 -o output/marginpolish_images -f
Running OpenMP with 32 threads.
Parsing model parameters from file: allParams.np.human.guppy-ff-233.json
Calloc failed with request for -2 lots of 16 bytes
Command exited with non-zero status 1
DEBUG_MAX_MEM:3836
DEBUG_RUNTIME:0:00.00
Can you help me to fix it ?
Hi,
I really like your polishing pipeline and it gives great results so far.
Last week a new and improved version of Guppy with boosted accuracy was released. Are you planning to provide models for this version of Guppy and if so, when can we expect these?
Thank you,
Dominik
Hello!
I have a problem with stitch.py. When I use the script with the following command A), I got the error B). Do you have any suggestions to resolve the issue. Thanks in advance!
A)
python3.6 /home/nozawa/Software/helen/stitch.py -t 16
-i /home/nozawa/Data/pal/MinION/MarginPolish_HELEN/consensus/HELEN_prediction.hdf
-o /home/nozawa/Data/pal/MinION/MarginPolish_HELEN/consensus
B)
Traceback (most recent call last):
File "/home/nozawa/Software/helen/stitch.py", line 5, in
from modules.python.Stitch import Stitch
File "/home/nozawa/Software/helen/modules/python/Stitch.py", line 10, in
from build import HELEN
ImportError: cannot import name 'HELEN'
i can't run 'python3 cal_consensus.py', and i got the follow message " Traceback (most recent call last):
File "./call_consensus.py", line 3, in
from modules.python.TextColor import TextColor
ImportError: No module named 'modules.python'
"
any guy can help me ?
Dear HELEN developers,
When running MarginPolish with the allParams.np.human.guppy-ff-235.json model, i get a Calloc error.
udocker run -v /mnt/SCRATCH/michelmo/Projects/MudMinnow/Nhub_guppy305_flye10K:/data mGPolish reads_2_assembly.bam assembly.fasta allParams.np.human.guppy-ff-235.json -t 32 -o /mnt/SCRATCH/michelmo/Projects/MudMinnow/Nhub_guppy305_flye10K/mG305 -f
******************************************************************************
* *
* STARTING 137351bb-4e04-3309-9bf5-ae016625cef7 *
* *
******************************************************************************
executing: sh
Set log level to INFO
Running OpenMP with 32 threads.
> Parsing model parameters from file: allParams.np.human.guppy-ff-235.json
Calloc failed with request for -2 lots of 16 bytes
Command exited with non-zero status 1
DEBUG_MAX_MEM:4608
DEBUG_RUNTIME:0:00.06
The program runs if using another model:
udocker run -v /mnt/SCRATCH/michelmo/Projects/MudMinnow/Nhub_guppy305_flye10K:/data mGPolish reads_2_assembly.bam assembly.fasta allParams.np.human.guppy-ff-233.json -t 32 -o /mnt/SCRATCH/michelmo/Projects/MudMinnow/Nhub_guppy305_flye10K/mG305 -f
******************************************************************************
* *
* STARTING 137351bb-4e04-3309-9bf5-ae016625cef7 *
* *
******************************************************************************
executing: sh
Set log level to INFO
Running OpenMP with 32 threads.
> Parsing model parameters from file: allParams.np.human.guppy-ff-233.json
> Parsing reference sequences from file: assembly.fasta
> Going to write polished reference in : /mnt/SCRATCH/michelmo/Projects/MudMinnow/Nhub_guppy305_flye10K/mG305.fa
...
Is the 235 model file corrupted?
Also, i saw your latest models for polishing is named guppy 2.3.5.
Is this trained on the HAC configuration files?
We are currently using promethION data basecalled with HAC models on guppy 3.0.5 provided by ONT and i wonder which model would fit the data best.
model files used for basecalling:
md5sum dna_r9.4.1_450bps_hac_prom.cfg c9dc5f42f63c005085ed89e4094e0bb4
md5sum template_r9.4.1_450bps_hac_prom.jsn 6ee479f9ae82a7d26cb47bd24a7882fd
Maybe it would be more accurate to name models after their used basecall models instead of guppy versions?
Thanks,
michel
Request for MarginPolish to either:
Any of those 3 options would be great. I'm not sure what you need in terms of system configuration, but I'll provide you with some basics on my primary test system, and you can let me know if you need more:
O/S CentOS v. 7.6
Dual Intel(R) Xeon(R) CPU E5-2640 v2 CPUs
256GB RAM
GCC v. 4.8.5 default compiler, but other compilers are available
Using Environment modules system
Cmake 3.11
O/S repos include CentOS 7 Basic, Plus, and EPEL
We have a mixture of systems, but the configuration above is pretty typical. On the university HPC, they use SLURM for resource management. On our primary lab servers we can run in standard user mode, or using Torque/PBS. All of my tests have been performed running outside of a resource management system.
Let me know what else you might need.
Thanks,
John
Hi,
Just compiled marginPolish and Helen according to your installation tutorial and ran it on a test dataset.
I am missing the marginPolish images being created.
Get the fasta output but not the image (hdf?).
Also, when looking at marginPolish options, i dont get the same options as are posted on their github repo (missing the -f parameter).
Mine shows:
./marginPolish
usage: marginPolish <BAM_FILE> <ASSEMBLY_FASTA> <PARAMS> [options]
Version: 1.0.0
Polishes the ASSEMBLY_FASTA using alignments in BAM_FILE.
Required arguments:
BAM_FILE is the alignment of reads to the assembly (or reference).
ASSEMBLY_FASTA is the reference sequence BAM file in fasta format.
PARAMS is the file with marginPolish parameters.
Default options:
-h --help : Print this help screen
-a --logLevel : Set the log level [default = info]
-t --threads : Set number of concurrent threads [default = 1]
-o --outputBase : Name to use for output files [default = 'output']
-r --region : If set, will only compute for given chromosomal region.
Format: chr:start_pos-end_pos (chr3:2000-3000).
Miscellaneous supplementary output options:
-i --outputRepeatCounts : Output base to write out the repeat counts [default = NULL]
-j --outputPoaTsv : Output base to write out the poa as TSV file [default = NULL]
Theirs is:
marginPolish <BAM_FILE> <ASSEMBLY_FASTA> <PARAMS> [options]
Polishes the ASSEMBLY_FASTA using alignments in BAM_FILE.
Required arguments:
BAM_FILE is the alignment of reads to the assembly (or reference).
ASSEMBLY_FASTA is the reference sequence BAM file in fasta format.
PARAMS is the file with marginPolish parameters.
Default options:
-h --help : Print this help screen
-a --logLevel : Set the log level [default = info]
-t --threads : Set number of concurrent threads [default = 1]
-o --outputBase : Name to use for output files [default = 'output']
-r --region : If set, will only compute for given chromosomal region.
Format: chr:start_pos-end_pos (chr3:2000-3000).
HELEN feature generation options:
-f --produceFeatures : output features for HELEN.
-F --featureType : output features of chunks for HELEN. Valid types:
splitRleWeight: [default] run lengths split into chunks
nuclAndRlWeight: split into nucleotide and run length (RL across nucleotides)
rleWeight: weighted likelihood from POA nodes (RLE)
simpleWeight: weighted likelihood from POA nodes (non-RLE)
-L --splitRleWeightMaxRL : max run length (for 'splitRleWeight' type only) [default = 10]
-u --trueReferenceBam : true reference aligned to ASSEMBLY_FASTA, for HELEN
features. Setting this parameter will include labels
in output.
Miscellaneous supplementary output options:
-i --outputRepeatCounts : Output base to write out the repeat counts [default = NULL]
-j --outputPoaTsv : Output base to write out the poa as TSV file [default = NULL]
I am missing the whole HELEN feature generation options.
Do you have a docker which i could use?
Thanks,
Michel
stitch.py -o .
produced a hidden file .HELEN_consensus.fa
.
It would be better to use os.path.join()
here:
Line 47 in d372a9a
Hi,
Which model is the best one for polishing a high heterozygosity plant genome assemblies (genome size is ~3.6Gb, het rate is >> 2%)? Assemblies(~8Gb) are generated from flye.
Best,
Kun
Hi,
I am running your new docker container to stream-line assembly polishing and run into some trouble with marginPolish. Looks like MP is stalling at the very end.
singularity run /net/cn-1/mnt/SCRATCH/michelmo/Projects/CONTAINERS/helen_latest20200519.sif marginpolish ../SimonFlye27_15K.ONTremap.0x904.bam SimonFlye27_15K.fasta /net/cn-1/mnt/SCRATCH/mic
helmo/Projects/CONTAINERS/MP_r941_guppy344_human.json -t 64 -o . -f
Running OpenMP with 64 threads.
> Parsing model parameters from file: /net/cn-1/mnt/SCRATCH/michelmo/Projects/CONTAINERS/MP_r941_guppy344_human.json
> Parsing reference sequences from file: SimonFlye27_15K.fasta
> Going to write polished reference in : ./output.fa
> Set up bam chunker with chunk size 5000 and overlap 50 (for region=all), resulting in 546365 total chunks
> Polishing 1% complete (5623/546365). Estimated time remaining: 31h 25m
> Polishing 2% complete (10934/546365). Estimated time remaining: 25h 50m
> Polishing 3% complete (16427/546365). Estimated time remaining: 23h 52m
> Polishing 4% complete (21903/546365). Estimated time remaining: 22h 37m
> Polishing 5% complete (27374/546365). Estimated time remaining: 22h 48m
> Polishing 6% complete (32813/546365). Estimated time remaining: 22h 32m
> Polishing 7% complete (38250/546365). Estimated time remaining: 22h 2m
> Polishing 8% complete (43711/546365). Estimated time remaining: 21h 55m
> Polishing 9% complete (49186/546365). Estimated time remaining: 22h 18m
> Polishing 10% complete (54652/546365). Estimated time remaining: 22h 35m
> Polishing 11% complete (60114/546365). Estimated time remaining: 22h 50m
> Polishing 12% complete (65596/546365). Estimated time remaining: 22h 55m
> Polishing 13% complete (71045/546365). Estimated time remaining: 22h 54m
> Polishing 14% complete (76500/546365). Estimated time remaining: 22h 52m
> Polishing 15% complete (81977/546365). Estimated time remaining: 22h 50m
> Polishing 16% complete (87432/546365). Estimated time remaining: 22h 48m
> Polishing 17% complete (92924/546365). Estimated time remaining: 22h 43m
.....
> Polishing 91% complete (497222/546365). Estimated time remaining: 2h 41m
> Polishing 92% complete (502673/546365). Estimated time remaining: 2h 23m
> Polishing 93% complete (508160/546365). Estimated time remaining: 2h 5m
> Polishing 94% complete (513630/546365). Estimated time remaining: 1h 47m
> Polishing 95% complete (519110/546365). Estimated time remaining: 1h 29m
> Polishing 96% complete (524517/546365). Estimated time remaining: 1h 12m
> Polishing 97% complete (530110/546365). Estimated time remaining: 54m 3s
> Polishing 98% complete (535445/546365). Estimated time remaining: 35m 59s
> Polishing 99% complete (541585/546365). Estimated time remaining: 17m 57s
H5 files have been created and written to but no more writing happened for the last few hours.
Process is still running but only using 1 thread for the last 5 hours.
Is this expected and does marginpolish do some final wrapup in the end which takes longer than expected?
368136 michelmo 20 0 121.9g 119.1g 1484 S 100.0 3.9 112466:10 marginPolish
Thank you,
Michel
Hi,I used shasta to assemble the sequencing data from nanopore. I have hundreds of assembled results, and I only focus on those sequences that do not appear in the reference genome (NRS). So, I want to know if I could use M-H to only polish the NRS I extracted from the assembly results instead of the entire assembly fasta file?
I am running helen polish and received below warning message. Is this a harmless warning or I should do something about it?
INFO: POLISH MODULE SELECTED
INFO: RUN-ID: 04112022_102154
INFO: PREDICTION OUTPUT DIRECTORY: /HELEN/predictions_04112022_102154
INFO: CALL CONSENSUS STARTING
INFO: OUTPUT FILE: /HELEN/predictions_04112022_102154/output_AngusONTpolish.fa
INFO: MODEL LOADING TO ONNX
INFO: SAVING MODEL TO ONNX
/opt/conda/lib/python3.7/site-packages/torch/onnx/symbolic_opset9.py:1436: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable lenght with GRU can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
"or define the initial states (h0/c0) as inputs of the model. ")
INFO: TORCH THREADS SET TO: 4.
Dear,
In README.md Step 2 example script:
samtools sort -@ 32 unsorted.bam | samtools view > reads_2_assembly.0x904q60.bam
should be
samtools sort -@ 32 unsorted.bam >reads_2_assembly.0x904q60.bam
Best,
Jia
Hi,
I run into following error troubles with when running helen (CPU only installation) on a flye-assembly:
python3 ~/tools/helen/call_consensus.py -i . -m /net/fs-1/home01/michelmo/tools/helen/r941_flip231_v001.pkl
INFO: OUTPUT DIRECTORY: ./output/
INFO: TORCH THREADS SET TO: 1.
Loading data
Traceback (most recent call last):
File "/mnt/users/michelmo/tools/helen/call_consensus.py", line 133, in <module>
FLAGS.gpu_mode)
File "/mnt/users/michelmo/tools/helen/call_consensus.py", line 53, in polish_genome
predict(image_filepath, output_filename, model_path, batch_size, num_workers, threads, gpu_mode)
File "/net/fs-1/home01/michelmo/tools/helen/modules/python/models/predict.py", line 61, in predict
test_data = SequenceDataset(test_file)
File "/net/fs-1/home01/michelmo/tools/helen/modules/python/models/dataloader_predict.py", line 35, in __init__
with h5py.File(hdf5_file_path, 'r') as hdf5_file:
File "/mnt/users/michelmo/.conda/envs/HELEN/lib/python3.7/site-packages/h5py/_hl/files.py", line 394, in __init__
swmr=swmr)
File "/mnt/users/michelmo/.conda/envs/HELEN/lib/python3.7/site-packages/h5py/_hl/files.py", line 170, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
I suspect that h5-files are somehow corrupted (some are empty), but thats difficult to assess for me. I thought the size of total images would be larger than the 5.9 M i got from a 2.5 Gb genome.
total 5.9M
420K mGimageRainbowtrout.T00.h5 212K mGimageRainbowtrout.T11.h5 316K mGimageRainbowtrout.T22.h5
212K mGimageRainbowtrout.T01.h5 0 mGimageRainbowtrout.T12.h5 0 mGimageRainbowtrout.T23.h5
212K mGimageRainbowtrout.T02.h5 0 mGimageRainbowtrout.T13.h5 212K mGimageRainbowtrout.T24.h5
0 mGimageRainbowtrout.T03.h5 212K mGimageRainbowtrout.T14.h5 212K mGimageRainbowtrout.T25.h5
420K mGimageRainbowtrout.T04.h5 212K mGimageRainbowtrout.T15.h5 420K mGimageRainbowtrout.T26.h5
212K mGimageRainbowtrout.T05.h5 0 mGimageRainbowtrout.T16.h5 0 mGimageRainbowtrout.T27.h5
212K mGimageRainbowtrout.T06.h5 0 mGimageRainbowtrout.T17.h5 0 mGimageRainbowtrout.T28.h5
0 mGimageRainbowtrout.T07.h5 212K mGimageRainbowtrout.T18.h5 212K mGimageRainbowtrout.T29.h5
420K mGimageRainbowtrout.T08.h5 212K mGimageRainbowtrout.T19.h5 212K mGimageRainbowtrout.T30.h5
0 mGimageRainbowtrout.T09.h5 420K mGimageRainbowtrout.T20.h5 212K mGimageRainbowtrout.T31.h5
420K mGimageRainbowtrout.T10.h5 212K mGimageRainbowtrout.T21.h5
MarginPolish was run with default settings:
/marginPolish $BAM \
$ASM \
/net/fs-1/home01/michelmo/tools/marginPolish/params/allParams.np.human.guppy-ff-233.json \
-t 32 \
-o mGimageRainbowtrout \
-f 2>&1 | tee mG.log
Any ideas or hints about what could have gone wrong would be appreciated.
Michel
Hi:
MarginPolish && HELEN is such an excellent pipeline for polishing ONT assembly, which is easy to run and has very high accuracy. I am using the latest model to polishing some human data. I wonder what data do you use to train the model MP_r941_guppy344_human.json
and HELEN_r941_guppy344_human.pkl
. The training datasets of this two models were not mentioned in the paper. Which specie and which chromosome is used, HG002, CHM13 or HG00733 and chr1-6 or chr1-19, chr21-22?
Neng
Dear author,
Thanks for your great assembly tool Shasta, and polish tool marginpolish and helen. I have used Shasta to assembled a genome and generated the Assembly.fasta
file.
Next, I try to use Helen to polish the genome. I have generated the .bam
file by minimap2 and indexed it by Samtools. However, the marginpolish step generates zero fasta file output.fa
. The log is as follows.
Running OpenMP with 2 threads.
> Parsing model parameters from file: ./helen_model/MP_r941_guppy344_human.json
> Parsing reference sequences from file: Assembly.fasta
> Going to write polished reference in : margin_image/output.fa
> Set up bam chunker with chunk size 5000 and overlap 50 (for region=all), resulting in 538336 total chunks
Warning! ***HDF5 library version mismatched error***
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.8.12, library is 1.8.11
SUMMARY OF THE HDF5 CONFIGURATION
=================================
General Information:
-------------------
HDF5 Version: 1.8.11
Configured on: Wed May 8 16:20:56 CDT 2013
Configured by: hdftest@koala
Configure mode: production
Host system: x86_64-unknown-linux-gnu
Uname information: Linux koala 2.6.18-348.1.1.el5 #1 SMP Tue Jan 22 16:19:19 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Byte sex: little-endian
Libraries: static, shared
Installation point: /mnt/scr1/pre-release/hdf5/v1811/thg-builds/koala
Compiling Options:
------------------
Compilation Mode: production
C Compiler: /usr/bin/gcc ( gcc (GCC) 4.1.2 20080704 )
CFLAGS:
H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wno-long-long -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Wnonnull -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wvolatile-register-var -O3 -fomit-frame-pointer -finline-functions
AM_CFLAGS:
CPPFLAGS:
H5_CPPFLAGS: -D_POSIX_C_SOURCE=199506L -DNDEBUG -UH5_DEBUG_API
AM_CPPFLAGS: -I/mnt/hdf/packages/szip/shared/encoder/Linux2.6-x86_64-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE
Shared C Library: yes
Static C Library: yes
Statically Linked Executables: yes
LDFLAGS:
H5_LDFLAGS:
AM_LDFLAGS: -L/mnt/hdf/packages/szip/shared/encoder/Linux2.6-x86_64-gcc/lib
Extra libraries: -lsz -lz -lrt -ldl -lm
Archiver: ar
Ranlib: ranlib
Debugged Packages:
API Tracing: no
Languages:
----------
Fortran: yes
Fortran Compiler: /usr/bin/gfortran ( GNU Fortran (GCC) 4.1.2 20080704 )
Fortran 2003 Compiler: no
Fortran Flags:
H5 Fortran Flags:
AM Fortran Flags:
Shared Fortran Library: yes
Static Fortran Library: yes
C++: yes
C++ Compiler: /usr/bin/g++ ( g++ (GCC) 4.1.2 20080704 )
C++ Flags:
H5 C++ Flags:
AM C++ Flags:
Shared C++ Library: yes
Static C++ Library: yes
Features:
---------
Parallel HDF5: no
High Level library: yes
Threadsafety: no
Default API Mapping: v18
With Deprecated Public Symbols: yes
I/O filters (external): deflate(zlib),szip(encoder)
I/O filters (internal): shuffle,fletcher32,nbit,scaleoffset
MPE: no
Direct VFD: no
dmalloc: no
Clear file buffers before write: yes
Using memory checker: no
Function Stack Tracing: no
GPFS: no
Strict File Format Checks: no
Optimization Instrumentation: no
Large File Support (LFS): yes
Bye...
The command I used to run marginpolish is
marginpolish reads_2_assembly.0x904q60.bam Assembly.fasta $MODELDIR/MP_r941_guppy344_human.json -t 2 -o margin_image/output -f
Do you have any solutions to my issue?
Best
Xiaofei
Hello,
I am trying to run helen in polishing mode. Here is my command:
helen polish -i marginPolish_images -m helen_models/HELEN_r941_guppy344_microbial.pkl -o helen_polish/ -t 16
However I face following error:
Traceback (most recent call last): File "/lustre-gseg/software/bin/helen", line 33, in <module> sys.exit(load_entry_point('helen==0.0.23', 'console_scripts', 'helen')()) File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/helen/helen.py", line 313, in main FLAGS.callers) File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/helen/modules/python/PolishInterface.py", line 87, in polish_genome callers) File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/helen/modules/python/CallConsensusInterface.py", line 153, in call_consensus callers, threads_per_caller, num_workers) File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/helen/modules/python/models/predict_cpu.py", line 248, in predict_cpu join=True) File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/lustre-gseg/software/MarginPolish-HELEN/py36_venv/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 108, in join (error_index, name) Exception: process 6 terminated with signal SIGKILL
Please could you shed some light on this?
Many thanks in advance.
Our lab is doing research on some channel protein, the sequencing error seems different from R9 pore, so we want to training marginpolish/Helen with these new data, could you tell me how to do it, thanks.
Hi kishwar
Can I use this polisher on complex plant genome?
Does the model trained with human sequencing data work for the plant species?
Thank you.
Jolvii
stitch.py throws a ValueError if one of my contigs is named the following, but works fine if I rename it to something like CLUS3951
bc.1+2.clus.3951.fa.poa:1.0-7835.0
It is apparently trying to convert the 7835.0 at the end into an integer
File "stitch.py", line 93, in <module> process_marginpolish_h5py(FLAGS.sequence_hdf, FLAGS.output_dir, FLAGS.threads) File "stitch.py", line 58, in process_marginpolish_h5py consensus_sequence = stich_object.create_consensus_sequence(hdf_file_path, contig, chunk_keys, threads) File "modules/python/Stitch.py", line 280, in create_consensus_sequence sequence_chunk_key_list.append((contig, int(st), int(end))) ValueError: invalid literal for int() with base 10: '7835.0'
Hi,
I'm wondering if M + H is compatible with draft assemblies from other assemblers.
It would be great if there's already some documents showing comparisons.
Thanks!
Steve
Hi,
I only found comparisons for racon and medaka, however, the best results I see are from nanopolish, so I wonder how m + h compares to it.
Thanks,
Adrian
hi
I ran helen to polish the draft assembly from na12878 chromosome 21. But there seem some problems in the polished results.
First I ran marginPolish to generate image features with the command:
marginPolish read2assembly.sort.bam ../assembly.fasta ~/tools/MarginPolish/params/allParams.np.human.r94-g235.json -o chr21_margin -t 60 -f
Second, I ran helen to generate a more accurate assembly with the command:
helen polish -i output_files/ -m ~/tools/helen/models/HELEN_r941_guppy344_human.pkl -b 512 -w 4 -t 60 -o helenPolish -p chr21_helen -g
After that I use pomoxis to evaluate the error rate of the polished assembly.
The following two figures are the results of marginPolish and helen.
Neng
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.