nanoporetech / pod5-file-format Goto Github PK
View Code? Open in Web Editor NEWPod5: a high performance file format for nanopore reads.
Home Page: https://pod5.nanoporetech.com
License: Other
Pod5: a high performance file format for nanopore reads.
Home Page: https://pod5.nanoporetech.com
License: Other
Hi, thank you so much for sharing this tool and detailed document.
I would like to know how fast it is to convert from pod5 to fast5. For promethION data, there are a lot of .fast5 files, meybe 20 millions reads , if I transfer these files into .pod5, how long does it take? And, can I use multiple cpu core to do it?
Thank you so much!
Hi, I am trying to convert some fast5 files to pod5, the command checks the files and starts converting them but it crashes after a while.
pod5 convert fast5 fast5/*.fast5 -r -o pod5/ -O fast5/
Converting 886 Fast5s: 3%|##1 | 111250/3543065 [00:47<22:58, 2488.94Reads/s]
Killed
I'm trying to build pod5
based on the instructions using conan
. I was able to successfully obtain all the dependencies with conan
and create the build
directory but when I try to run make
I get the error listed below.
In file included from pod5-file-format/c++/pod5_format/internal/combined_file_utils.h:3,
from pod5-file-format/c++/pod5_format/file_reader.cpp:3:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h: In function ‘const char* Minknow::ReadsFormat::EnumNameContentType(Minknow::ReadsFormat::ContentType)’:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h:49:20: error: ‘IsOutRange’ is not a
member of ‘flatbuffers’
49 | if (flatbuffers::IsOutRange(e, ContentType_ReadsTable, ContentType_OtherIndex)) return "";
| ^~~~~~~~~~
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h: In function ‘const char* Minknow::ReadsFormat::EnumNameFormat(Minknow::ReadsFormat::Format)’:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h:76:20: error: ‘IsOutRange’ is not a
member of ‘flatbuffers’
76 | if (flatbuffers::IsOutRange(e, Format_FeatherV2, Format_FeatherV2)) return "";
| ^~~~~~~~~~
In file included from pod5-file-format/c++/pod5_format/internal/combined_file_utils.h:3,
from pod5-file-format/c++/pod5_format/file_writer.cpp:3:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h: In function ‘const char* Minknow::ReadsFormat::EnumNameContentType(Minknow::ReadsFormat::ContentType)’:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h:49:20: error: ‘IsOutRange’ is not a
member of ‘flatbuffers’
49 | if (flatbuffers::IsOutRange(e, ContentType_ReadsTable, ContentType_OtherIndex)) return "";
| ^~~~~~~~~~
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h: In function ‘const char* Minknow::ReadsFormat::EnumNameFormat(Minknow::ReadsFormat::Format)’:
pod5-file-format/build/c++/pod5_flatbuffers/footer_generated.h:76:20: error: ‘IsOutRange’ is not a
member of ‘flatbuffers’
76 | if (flatbuffers::IsOutRange(e, Format_FeatherV2, Format_FeatherV2)) return "";
| ^~~~~~~~~~
make[2]: *** [c++/CMakeFiles/pod5_format.dir/build.make:94: c++/CMakeFiles/pod5_format.dir/pod5_format/file_reader.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [c++/CMakeFiles/pod5_format.dir/build.make:80: c++/CMakeFiles/pod5_format.dir/pod5_format/file_writer.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:244: c++/CMakeFiles/pod5_format.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
Steps to produce:
mkdir build
cd build
conan remote add -I 0 conancenter https://center.conan.io
conan install --build=missing -s build_type=Release ..
cmake -DUSE_CONAN=ON -DCMAKE_BUILD_TYPE=Release ..
make
Hello,
When I convert a set of fast5 files to pod5, the adc_max/min
values are zero
The description of these fields states that the digitisation
comes from the max-min
of these values, however, they are zero in all of my reads, so I can't calculate the expected 2048.0
An alternative way to calculate digitisation
is by knowing the adc_range
; however, when a fast5 file is read by ( https://github.com/nanoporetech/pod5-file-format/blob/dcc0b99a45f742f06fe45d7d99f4dc8a0255e5a7/python/pod5_format/pod5_format/writer.py ) , this value is used to calculate the scale
with the digitisation
, and only the scale
is recorded adc_range
is discarded.
Is it possible to maintain the adc_range
value in the conversion step, or ideally digitisation
and adc_range
?
data dumps below
Cheers,
James
[types.RunInfo.fields.adc_max]
type = "int16"
description = "The maximum ADC value that might be encountered. This is a hardware constraint."
[types.RunInfo.fields.adc_min]
type = "int16"
description = "The minimum ADC value that might be encountered. This is a hardware constraint. adc_max - adc_min is the digitisation."
#### read.run_info dump
run info
acquisition_id: bfdfd1d840e2acaf5c061241fd9b8e5c3cfe729f
acquisition_start_time: 2020-10-27 05:41:50+00:00
adc_max: 0 <-----| these are zero
adc_min: 0 <-----|
context_tags
barcoding_enabled: 0
basecall_config_filename: dna_r9.4.1_450bps_hac_prom.cfg
experiment_duration_set: 4320
experiment_type: genomic_dna
local_basecalling: 1
package: bream4
package_version: 6.0.7
sample_frequency: 4000
sequencing_kit: sqk-lsk109
experiment_name:
flow_cell_id: PAF25452
flow_cell_product_code: FLO-PRO002
protocol_name: sequencing/sequencing_PRO002_DNA:FLO-PRO002:SQK-LSK109
protocol_run_id: 97d631c6-c622-473d-9e7d-3cb9297b0036
protocol_start_time: 1970-01-01 00:00:00+00:00
sample_id: NA12878_SRE
sample_rate: 4000
sequencing_kit: sqk-lsk109
sequencer_position: 3A
sequencer_position_type: promethion
software: python-pod5-converter
system_name:
system_type:
tracking_id
asic_id: 0004A30B00F25467
asic_id_eeprom: 0004A30B00F25467
asic_temp: 31.996552
asic_version: Unknown
auto_update: 0
auto_update_source: https://mirror.oxfordnanoportal.com/software/MinKNOW/
bream_is_standard: 0
configuration_version: 4.0.13
device_id: 3A
device_type: promethion
distribution_status: stable
distribution_version: 20.06.9
exp_script_name: sequencing/sequencing_PRO002_DNA:FLO-PRO002:SQK-LSK109
exp_script_purpose: sequencing_run
exp_start_time: 2020-10-27T05:41:50Z
flow_cell_id: PAF25452
flow_cell_product_code: FLO-PRO002
guppy_version: 4.0.11+f1071ce
heatsink_temp: 32.164288
hostname: PC24A004
hublett_board_id: 013b01308fa78662
hublett_firmware_version: 2.0.14
installation_type: nc
ip_address:
local_firmware_file: 1
mac_address:
operating_system: ubuntu 16.04
protocol_group_id: PLPN243131
protocol_run_id: 97d631c6-c622-473d-9e7d-3cb9297b0036
protocols_version: 6.0.7
run_id: bfdfd1d840e2acaf5c061241fd9b8e5c3cfe729f
sample_id: NA12878_SRE
satellite_board_id: 013c763bef6cca9d
satellite_firmware_version: 2.0.14
usb_config: firm_1.2.3_ware#rbt_4.5.6_rbt#ctrl#USB3
version: 4.0.3
### read and read.calibration
read_id: 000dab68-15a2-43c1-b33d-9598d95b37de
channel: 861
well: 1
pore_type: not_set
read_number: 261
start_sample: 3856185
end_reason: data_service_unblock_mux_change
median_before: 204.2
sample_count: 331742
byte_count: 226302
signal_compression_ratio: 0.341
scale: 0.36551764607429504
offset: -223.0
Hi,
I need to partition 1500 reads that are spread across 1375 pod5 files into 5 new pod5 files. However, each time I try, I end up with a different output.
I made a mapping.csv file detailing read IDs and the corresponding filename I want the read to end up in. Here are the first three lines:
chikungunya_virus.pod5,2993b28e-b5f0-44dd-8612-e0fce1167e22
chikungunya_virus.pod5,27b8d110-bf05-4b78-ab4d-a5e8661343f3
chikungunya_virus.pod5,42292daf-5d66-47a5-a6d5-3900b83462dc
Then I run the following command
pod5 subset --threads 5 --csv mapping.csv *.pod5
get this message
Subsetting 15000 read_ids into 5 outputs using 5 workers
and after it appears to have been completed successfully, I check the POD5 file content using:
pod5 inspect summary *.pod5
Issue: sometimes I end up with POD5 files with only a fraction of the reads/raws signals, while sometimes I just get an error, for example: “Failed to open pod5 file: zika_virus.pod5: IOError: Invalid signature in file”.
When I run pod5 subset
multiple times, I end up with different amounts of reads in each POD5 file, and always a low number of reads.
Any advice on what I should try next?
Thanks in advance!
Wim
Ideally the pod5 tool guides would include an example on generating a summary file based on a folder with pod5s and then subsequently subset based on the channel information to facilitate easier duplex calling. nanoporetech/dorado#68 (comment)
I am struggling a bit with the formats and I am not sure that what is called summary by pod5 subset is even the same file as is output from pod5 inspect reads.
From the guides:
pod5 inspect reads https://github.com/nanoporetech/pod5-file-format/blob/master/python/pod5/README.md#pod5-inspect-reads
"Inspect all reads and print a csv table"
pod5 subset https://github.com/nanoporetech/pod5-file-format/blob/master/python/pod5/README.md#subsetting-from-a-summary
"pod5 subset can dynamically generate output targets and collect associated reads based on a tab-separated file "
I tried to convert the csv file from pod5 inspect to tab using csvkit and provide that file as summary but that gave me the following error "Number of passed names did not match number of header fields in the file"
What I tried to run:
`
pod5 inspect reads pod5_original/*pod5 > summary.csv
csvformat -T summary.csv > summary.tsv
pod5 subset pod5_original/*pod5 --output barcode_channel_subset --summary summary.tsv --columns channel --template "{channel}.subset.pod5"
`
Hello,
When trying to convert fast5 into pod5 using pod5-convert-from-fast5 I get the following error message relating to the channel_id. Any suggestions or help?
Error in file Raw/0/AllenMiseq2_20170425_FN_MN19868_mux_scan_sample_id_37726_ch507_read77_strand.fast5: "Unable to open object (object 'channel_id' doesn't exist)"
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 157, in get_reads_from_files
channel_id = inp[key]["channel_id"]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py", line 288, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
Hi,
I wonder if there is any support to store pod5 file in Apache parquet format?
Thanks,
William
Hi
For some unknown reason we have some fast5 files in a skip folder that appear to be corrupted.
The pod5 convert runs happily until it reaches one of the "corrupt" files and then crashes completely. I can then manually remove the reported file but have to start over again.
I would prefer if corrupt files were simply reported and left out of the pod5 conversion.
Best regards
Rasmus
Error nessage exanple:
pod5 convert fast5 corrupt_fast5s/ dummy.pod5
Converting 16 fast5 files..
0 reads, 0 Samples, 0/16 files, 0.0 MB/s
Error processing corrupt_fast5s/PAK66154_skip_ecff0cbe_52bfab07_177.fast5
Sub-process trace:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.10/site-packages/pod5/tools/pod5_convert_from_fast5.py", line 309, in get_reads_from_files
_f5[read_id],
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ubuntu/.local/lib/python3.10/site-packages/h5py/_hl/group.py", line 357, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (file read failed: time = Mon Mar 13 15:03:13 2023\n, filename = 'corrupt_fast5s/PAK66154_skip_ecff0cbe_52bfab07_177.fast5', file descriptor = 10, errno = 5, error message = 'Input/output error', buf = 0x557f9c22e9d0, total read size = 80, bytes this sub-read = 80, bytes actually read = 18446744073709551615, offset = 0)"
An unexpected error occurred:
POD5 has encountered an error: ''
For detailed information set POD5_DEBUG=1'
Ended up looping until no more corrupt files were left
RUNAGAIN=1;
while [ $RUNAGAIN -gt 0 ]
do
pod5 convert fast5 ./20230301_1540_1E_PAK66154_ecff0cbe/fast5_skip/*.fast5 out.pod5 2> errout.txt
corrupt_file=$(cat errout.txt | grep "filename" | sed -E "s/.*filename = '(.*.fast5).*/\1/")
if [ -f $corrupt_file ];
then
mv $corrupt_file corrupt_fast5s/;
rm errout.txt;
rm out.pod5;
else
echo "No more corrupt files"
RUNAGAIN=0;
fi
done
Hello!
I try to install pod5 via pip inside conda environment for multiple times. I have no success while installing on different machines. The typical error is
-- Checking for module 'arrow'
-- No package 'arrow' found
CMake Error at /home/asan/miniconda3/envs/4pod5-env/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Arrow (missing: ARROW_INCLUDE_DIR ARROW_LIB_DIR
ARROW_FULL_SO_VERSION ARROW_SO_VERSION)
Call Stack (most recent call first):
/home/asan/miniconda3/envs/4pod5-env/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
cmake_modules/FindArrow.cmake:450 (find_package_handle_standard_args)
cmake_modules/FindArrowPython.cmake:46 (find_package)
CMakeLists.txt:231 (find_package)
-- Configuring incomplete, errors occurred!
See also "/tmp/pip-install-959xe01e/pyarrow_81abfae6eef142db98c35f0c4d548b21/build/temp.linux-x86_64-cpython-311/CMakeFiles/CMakeOutput.log".
error: command '/home/asan/miniconda3/envs/4pod5-env/bin/cmake' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects
Conan way also had not led to any success.
The conda environment contains python 3.11, cmake, boost and compilers (gcc_linux-64, gxx_linux-64 and gfortran_linux-64). The hosts are Ubuntu 22.04 and 23.04. Installing arrow (or pyarrow) inside conda via conda package manager either via pip did not job.
What would be a solution? Are there any activities on adding pod5 into the conda repository?
Hi,
I am trying to install this program on HPC (PBSpro) but not sure which part I should follow.
And, I have tried "Developing with conan" and "Pre commit".
FYI, this is what I did.
conda activate pip and cmake
git clone https://github.com/nanoporetech/pod5-file-format.git
cd pod5-file-format
git submodule update --init --recursive
mkdir build
cd build
conan install --build=missing -s build_type=Release ..
cmake -DUSE_CONAN=ON -DCMAKE_BUILD_TYPE=Release ..
Both attempts were not successful due to the cmake step.
ERROR: boost/1.78.0: Error in build() method, line 875
self.run(full_command, run_environment=True)
ConanException: Error 1 while executing b2 -q target-os=linux architecture=x86 address-model=64 binary-format=elf abi=sysv --layout=system --user-config=/home/uqhjung3/.conan/data/boost/1.78.0///source/source_subfolder/tools/build/user-config.jam -sNO_ZLIB=0 -sNO_BZIP2=0 -sNO_LZMA=1 -sNO_ZSTD=1 boost.locale.icu=off --disable-icu boost.locale.iconv=on boost.locale.iconv.lib=libc threading=multi visibility=hidden link=static variant=release --with-atomic --with-chrono --with-container --with-context --with-contract --with-coroutine --with-date_time --with-exception --with-filesystem --with-iostreams --with-locale --with-log --with-program_options --with-random --with-regex --with-serialization --with-stacktrace --with-system --with-test --with-thread --with-timer --with-type_erasure --with-wave toolset=gcc define=GLIBCXX_USE_CXX11_ABI=0 pch=on cxxflags="-fPIC -DBOOST_STACKTRACE_ADDR2LINE_LOCATION=/usr/bin/addr2line" install --prefix=/home/uqhjung3/.conan/data/boost/1.78.0///package/cf5b1011055d170fc18a05ba048979d2089d1695 -j24 --abbreviate-paths -d0 --debug-configuration --build-dir="/home/uqhjung3/.conan/data/boost/1.78.0//_/build/cf5b1011055d170fc18a05ba048979d2089d1695"
Any idea or suggestion on this matter?
Many thanks in advance!
Taek
Is there a roadmap when POD5 will be natively written by MinKNOW?
If it writes POD5, does it write many files (as for fast5) or will there be just a bunch of files or even a single POD5 file in the end?
Any info on that?
Hi, when I run pod5 convert fast5 command, I will get this message after this command run a while:
Sub-process trace:
A process in the process pool was terminated abruptly while the future was running or pending.
I do not know the reason and also do not know how to solve this question. Thank you very much!
I use the following line in a bash script to convert my fast5 files to pod5
pod5 convert fast5 *.fast5 --output converted.pod5
In general, it works as expected. However, if I run the bash script on bigger data sets, conversion starts, reaches 100% and nothing happens. The next parts of my script are not executed. If I use the same script on a "smaller" data set, the conversion and the whole script finishes as expected.
Interestingly, if I terminate the conversion with "ctrl + C" when it reaches 100% the other steps are getting executed.
Here is the out put of the terminal, when killing the conversion:
Converting 206 Fast5s: 100%|#######| 821696/821696 [01:40<00:00, 8144.87Reads/s]
^CException ignored in atexit callback: <function _exit_function at 0x7f91e832ecb0>
Traceback (most recent call last):
File "/home/nanopore/software/anaconda3/lib/python3.10/multiprocessing/util.py", line 360, in _exit_function
_run_finalizers()
File "/home/nanopore/software/anaconda3/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/home/nanopore/software/anaconda3/lib/python3.10/multiprocessing/util.py", line 224, in call
res = self._callback(*self._args, **self._kwargs)
File "/home/nanopore/software/anaconda3/lib/python3.10/multiprocessing/queues.py", line 199, in _finalize_join
thread.join()
File "/home/nanopore/software/anaconda3/lib/python3.10/threading.py", line 1096, in join
self._wait_for_tstate_lock()
File "/home/nanopore/software/anaconda3/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
KeyboardInterrupt:
Of note, the same happens if I only run the command above in a terminal on its own. It finished for small data sets but not for the bigger ones, although both reach 100%. I was waiting for an hour and nothing happened.
Hi,
it would be great to add an -r (recursive) option, to convert all fast5 files in a directory. Although wildcards in the path work an -r option would be great.
Best
Florian
ONT changed the sampling mode to 5khz recently.
I think I remember that the pod5 format was also developed to be able to save this kind of data.
By accident the format of the raw reads was changed from pod5 to fast 5 for one of our runs. Now I am not sure, if i convert the fast5 to pod5 afterwards, is the resulting pod5 file real 5khz sampled data, or was it lost/reduced to 4 khz due to the initial fast5 format.
Note I'll also post this question on the dorado github, as I am not sure which site is correct for this issue.
Hi,
I would like to convert a large set of old minion runs to pod5 for long term storage and possibly re-basecalling. By default only one pod5 file is created and when I try --output-one-to-one each fast5 is converted to a separate pod5 but placed in the same folder. I got the error below, likely because files in pass/fail folders have the same name. Would be great if either folder structure could be kept or pass/fail added to file names, and also if the conversion keeps track of read id:s to remove duplicates, as I think is done in guppy and in the single_to_multi fast5 conversion tool.
An unexpected error occurred: Input path already exists. Refusing to overwrite.
Traceback (most recent call last):
File "/home/minion/anaconda3/bin/pod5-convert-from-fast5", line 8, in
sys.exit(main())
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 623, in main
args.signal_chunk_size,
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 603, in convert_from_fast5
raise exc
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 565, in convert_from_fast5
writer = output_handler.get_writer(item.file)
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 395, in get_writer
return self._open_writer(output_path=output_path)
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format_tools/pod5_convert_from_fast5.py", line 381, in _open_writer
writer = p5.Writer(output_path)
File "/home/minion/anaconda3/lib/python3.7/site-packages/pod5_format/writer.py", line 84, in init
raise FileExistsError("Input path already exists. Refusing to overwrite.")
FileExistsError: Input path already exists. Refusing to overwrite.
I am attempting to use Guppy v6.5.7 with a merged pod5 from a PromethION run because of some strange issues that I've been having with fast5s. I created the pod5 using pod5 convert to create a single merged pod5 file. It's ~900GB.
When I start Guppy, it seems to take a long time after the "init" phase before basecalling actually starts -- as much as 30 minutes. It doesn't seem to be doing much on CPU except for slowly ramping up the memory usage. Is it trying to do a memory map?
This long startup time is really a bummer because I am submitting small jobs to the HPC using Guppy's --resume feature, and this really cuts into the server time if this happens every single time a new job starts.
A sample log -- take a look in between the "Init time" timestamp and when the first read is loaded:
2023-06-07 10:39:13.108864 [guppy/message] ONT Guppy basecalling software version 6.5.7+ca6d6af, minimap2 version 2.24-r1122
config file: /home/groups/hanleeji/ont-guppy_v6.5.7/data/dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg
model file: /home/groups/hanleeji/ont-guppy_v6.5.7/data/template_r9.4.1_450bps_sup_prom.jsn
input path: pod5/
save path: guppy_5mc_prom_pod5/
chunk size: 2000
chunks per runner: 768
minimum qscore: 7
records per file: 4000
fastq compression: ON
num basecallers: 4
gpu device: cuda:all
kernel path:
runners per device: 12
alignment file: /home/groups/hanleeji/hs38_naa.mmi
alignment type: auto
Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /home/groups/hanleeji/ont-guppy_v6.5.7/bin
2023-06-07 10:39:13.110617 [guppy/info] crashpad_handler not supported on this platform.
2023-06-07 10:39:13.523133 [guppy/info] CUDA device 0 (compute 8.0) initialised, memory limit 85031714816B (84594458624B free)
2023-06-07 10:39:17.480038 [guppy/message] loading new index: /home/groups/hanleeji/hs38_naa.mmi
2023-06-07 10:40:39.909546 [guppy/message] Full alignment will be performed.
2023-06-07 10:40:51.638841 [guppy/message] Resuming basecall from previous logfile: guppy_5mc_prom_pod5/guppy_basecaller_log-2023-06-07_10-07-54.log
2023-06-07 10:41:35.454594 [guppy/message] Found 1 input read file to process.
2023-06-07 10:41:35.503698 [guppy/info] lamp_arrangements arrangement folder not found: /home/groups/hanleeji/ont-guppy_v6.5.7/data/barcoding/lamp_arrangements
2023-06-07 10:41:35.961987 [guppy/info] lamp_arrangements arrangement folder not found: /home/groups/hanleeji/ont-guppy_v6.5.7/data/read_splitting/lamp_arrangements
2023-06-07 10:41:35.997585 [guppy/message] Init time: 142887 ms
2023-06-07 11:10:37.310822 [guppy/info] Read '000b2109-cd1a-4713-bcb6-265a84b14ed4' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342269 [guppy/info] Read '000c5547-4ccf-4e80-a8bf-e4c5e61356be' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342317 [guppy/info] Read '001455e0-3bc8-4640-b672-d0ac87237293' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342343 [guppy/info] Read '00187ac0-90c8-4807-9e51-9e5298fcff54' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342361 [guppy/info] Read '00370473-eb9e-45bb-bf4b-efe54e32d8e6' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342380 [guppy/info] Read '0039136e-a13c-4d6c-89a3-28ae5ce3c217' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342397 [guppy/info] Read '003a74f4-fc9e-4273-9e17-0f3ddfef99ec' from file "20211119_PRM_1118.pod5" has been loaded.
2023-06-07 11:10:37.342413 [guppy/info] Read '0043353b-2c49-43ea-bba4-fd21b4f99a9e' from file "20211119_PRM_1118.pod5" has been loaded.
Hi,
I'm having difficulties installing pod5 on macOS Ventura. When I use pip install pod5
I get the following error:
ERROR: Cannot install pod5==0.0.43 and pod5==0.1 because these package versions have conflicting dependencies.
The same command on Ubuntu 20.04.5 works fine.
Cheers,
Angus
Hi,
I am trying to convert fast5 files to pod5 to perform base-calling and modification calling using Dorado, on the nanopore-wgs-consortium NA12878 dataset. But I am getting the KeyError: 'sample_id'.
I was getting some sort of error as "can't locate attribute: 'sample_id' error." while I was using Bonito for base-calling and modification calling.
Is there something I can do to make it work or to debug the issue?
Hi, I tried to merge all the pod5 files for one sample (>8k files) but encountered this:
POD5 has encountered an error: '[Errno 24] Too many open files'
For detailed information set POD5_DEBUG=1'
What should I try next? Should I use cat to merge the files directly? Many thanks!
Best,
CW
(pod5 0.1.16, python 3.7.13)
I have a folder of ~7000 fast5 files that I want to convert into pod5. From running --help, I am using this command:
usage: pod5 convert fast5 [-h] -o OUTPUT [-r] [-t THREADS] [--strict]
[-O ONE_TO_ONE] [-f]
[--signal-chunk-size SIGNAL_CHUNK_SIZE]
inputs [inputs ...]
Convert fast5 file(s) into a pod5 file(s)
positional arguments:
inputs Input path for fast5 file
optional arguments:
-h, --help show this help message and exit
-r, --recursive Search for input files recursively (default: False)
-t THREADS, --threads THREADS
Set the number of threads to use [default: 10]
(default: 10)
--strict Immediately quit if an exception is encountered during
conversion instead of continuing with remaining inputs
after issuing a warning (default: False)
required arguments:
-o OUTPUT, --output OUTPUT
Output path for the pod5 file(s). This can be an
existing directory (creating 'output.pod5' within it)
or a new named file path. A directory must be given
when using --one-to-one. (default: None)
output control arguments:
-O ONE_TO_ONE, --one-to-one ONE_TO_ONE
Output files are written 1:1 to inputs. 1:1 output
files are written to the output directory in a new
directory structure relative to the directory path
provided to this argument. This directory path must be
a relative parent of all inputs. (default: None)
-f, --force-overwrite
Overwrite destination files (default: False)
--signal-chunk-size SIGNAL_CHUNK_SIZE
Chunk size to use for signal data set (default:
102400)
**nohup pod5 convert fast5 -o pod5/CHM13.pod5 --recursive --strict -t 60 multi_fast5/ &**
When I check nohup, I'm getting this:
cat nohup.out
Checking Fast5 Files: 0%| | 0/7469 [00:00<?, ?Files/s]
Using ps, I'm getting the following status codes -- I believe indicating it's waiting for something to finish:
ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
billylau 3471706 1.5 0.0 19561020 308316 pts/12 Sl 15:19 0:32 /home/billylau/.conda/envs/pod5/bin/python /home/billylau/.conda/envs/pod5/bin/pod5 convert fast5 -o pod5/CHM13.pod5 --recursive --strict -t 60 multi_fast5/
billylau 3472653 0.0 0.0 8892 3280 pts/12 R+ 15:55 0:00 ps ux
top also indicates that nothing is happening:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3471706 billylau 20 0 18.7g 311000 42388 S 2.0 0.0 0:34.00 pod5
It's been like this for ~30 minutes now with nothing updating. When I look at the code, eg. https://github.com/nanoporetech/pod5-file-format/blob/73617c63ac310cc4e9f8d23cf06f2bfde5d21b7b/python/pod5/src/pod5/tools/pod5_convert_from_fast5.py, it doesn't look like it's doing anything more than checking whether they are multi or not and it should update the progress bar quickly.
Edit: doing a single pod5 file works fine:
nohup pod5 convert fast5 -o pod5/CHM13.pod5 --recursive --strict -t 60 multi_fast5/FAK50913_c7ef4ac6a67eb7bce6220608fccf5b19227f4904_5.fast5 &
cat nohup.out
Converting 1 Fast5s: 100%|##########| 4000/4000 [00:11<00:00, 352.08Reads/s]
Edit 2: adding a wildcard to the input seems to make it "not stuck", but it still seems absurdly slow to me on the checking step:
nohup pod5 convert fast5 -o pod5/CHM13.pod5 --recursive --strict -t 60 multi_fast5/*
Checking Fast5 Files: 12%|#2 | 920/7469 [20:50<8:54:48, 4.90s/Files]
It seems like it's fast at first but then slows down in the first minute:
Checking Fast5 Files: 0%| | 0/7469 [00:00<?, ?Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 6%|5 | 425/7469 [00:13<03:03, 38.39Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 7%|6 | 512/7469 [00:16<03:15, 35.58Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 7%|6 | 512/7469 [00:16<03:15, 35.58Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 7%|6 | 512/7469 [00:16<03:15, 35.58Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 566/7469 [00:23<05:25, 21.21Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 566/7469 [00:23<05:25, 21.21Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 566/7469 [00:23<05:25, 21.21Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 584/7469 [00:40<14:34, 7.87Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 584/7469 [00:40<14:34, 7.87Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 584/7469 [00:40<14:34, 7.87Files/s](pod5) billylau@suzuki:/mnt/ix1/Projects_lite/20230403_BL_CHM13_nanopore_raw/00_fast5$ cat nohup.out
Checking Fast5 Files: 8%|7 | 593/7469 [00:50<23:56, 4.79Files/s]
Dear pod5 developers,
please consider to create a conda package for pod5.
I tried to create a pod5 recipe from the pypi pod5
package myself using grayskull
, but it fails:
There is no sdist package on pypi for pod5.
It would be super helpful to create a conda pod5 package, as many scientists work with conda.
Kind regards,
Jannes Spangenberg
I am trying to train a basecaller with the raw signals saved in pod5 format. Does the team have any best practices with regard to accessing the read individually from a single output.pod5 file? Currently I am using this with Pytorch so when defining a dataset object, each read is accessed individually and each time we will call:
with p5.Reader(fpath) as read:
read = next(read.reads([read_id]))
is this the most efficient way to access a single read from the reader object?
Thanks!
I had no trouble getting pod5 convert fast5
up and running by installing using pip into a conda environment on my Linux server running Slurm. Initial tests on smaller data sets/number of files worked fine. However, when I run the command on my full ONT dataset, the program gets to 100% and never exits.
# command
pod5 convert fast5 --threads 16 ./fast5_pass/*.fast5 --output pod5 --one-to-one fast5_pass
# hang state in log - stays here for hours
Converting 674 Fast5s: 100%|##########| 2695235/2695235 [2:27:51<00:00, 303.81Reads/s]
# idle state of pod process
top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
336777 wilsonte 20 0 9780.7m 223624 43464 S 0.0 0.1 18:18.87 pod5
# verification of done state in file output
ls fast5_pass | wc -w
674
ls pod5 | wc -w
674
I can find no evidence that the command has anything more to do, or is doing anything, but it never exits, which prevents my pipeline from progressing. I have forced a stop and just continued on with Dorado basecalling - so far that seems to have no problems with the pod5 files created above.
I have been trying to install this tool in order to convert my fast5 files into pod5 files.
Unfortunately, every time I use the conan build command, as described in #5, I get the following error:
ERROR: Error loading conanfile at '/lustre/nobackup/WUR/ABGC/hoger006/Tools/pod5-file-format/conanfile.py': Unable to load conanfile in /lustre/nobackup/WUR/ABGC/hoger006/Tools/pod5-file-format/conanfile.py File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/lustre/nobackup/WUR/ABGC/hoger006/Tools/pod5-file-format/conanfile.py", line 3, in <module> from conans import CMake, ConanFile, tools ImportError: cannot import name 'CMake' from 'conans' (/home/WUR/hoger006/lustre_dir/Tools/mambaforge/envs/conan/lib/python3.11/site-packages/conans/__init__.py)
Because I am trying to install this tool on an HPC, I do not have administrative access, which might cause this error to occur.
Is there a package which contains a pre-built version of this tool, or a containerised version?
For reference I cloned the github and am using conan version 2.0.2 through mamba/conda.
how will the MinKNOW output be - what would be the default batch size?
And, also is MinKNOW going to output one large single POD5 file per one sequencing run or will it be multiple POD5 files like it Is being done with FAST5 at the moment?
I noticed an article "Fast Nanopore Sequencing Data Analysis with SLOW5"
Does pod5 have anything to do with it ?
Hi folks,
I was able to convert a folder of fast5s to pod5 without any issues. Is there a tool for reversing the process?
Hi, when I run pod5 convert fast5
command, I got this error when running 74% of the process :
An unexpected error occurred: Trying to re-open a closed Writer to ... .pod5
Hi,
In my software, it is useful to have access to the POD5 file format version. It's possible, but not via the public API.
Old version:
p5_handle._read_reader.reader.schema.metadata[b'MINKNOW:pod5_version']
Current version:
p5_handle._handles.read.reader.schema.metadata[b'MINKNOW:pod5_version']
The metadata
dict also contains a file UUID and the name of the software that made the file - both useful too. Would it be possible to make access to this metadata dict part of the public/documented Python API please?
Cheers,
TIM
Install using pip in a Python3.11 docker results in the following error, which suggests lib-pod5~=0.1 is not available:
(base) ➜ v0.1.0 git:(master) ✗ docker build --platform linux/amd64 -t zeunas/pod5tools:0.1.0 .
[+] Building 12.1s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 519B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/python:3.11 1.2s
=> [auth] library/python:pull token for registry-1.docker.io 0.0s
=> [1/3] FROM docker.io/library/python:3.11@sha256:11560799e4311fd5abcca7ace13585756d7222ce5471162cd78c78a4ecaf62bd 0.0s
=> CACHED [2/3] WORKDIR /usr/src/app 0.0s
=> ERROR [3/3] RUN pip install --no-cache-dir --upgrade pip && pip install --no-cache-dir pod5==0.1.0 10.8s
------
> [3/3] RUN pip install --no-cache-dir --upgrade pip && pip install --no-cache-dir pod5==0.1.0:
#7 6.721 Requirement already satisfied: pip in /usr/local/lib/python3.11/site-packages (22.3.1)
#7 7.384 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#7 9.518 Collecting pod5==0.1.0
#7 9.694 Downloading pod5-0.1-py3-none-any.whl (47 kB)
#7 9.728 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 kB 2.8 MB/s eta 0:00:00
#7 9.890 Collecting iso8601
#7 9.920 Downloading iso8601-1.1.0-py3-none-any.whl (9.9 kB)
#7 10.10 Collecting jsonschema
#7 10.13 Downloading jsonschema-4.17.3-py3-none-any.whl (90 kB)
#7 10.16 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.4/90.4 kB 4.9 MB/s eta 0:00:00
#7 10.23 ERROR: Could not find a version that satisfies the requirement lib-pod5~=0.1 (from pod5) (from versions: none)
#7 10.23 ERROR: No matching distribution found for lib-pod5~=0.1
------
executor failed running [/bin/sh -c pip install --no-cache-dir --upgrade pip && pip install --no-cache-dir pod5==0.1.0]: exit code: 1
Any idea what's going on? Strangely enough, I can see lib-pod5 0.1.0 should be available on pypi: https://pypi.org/project/lib-pod5/
Hello,
What is the "best" way to do multiprocessing while reading a whole pod5 file?
Currently, I'm using something like this from the benchmarking code, which gets the total number of batches and splits those batches into groups of batches. Then each group is sent to a worker to run on a spawned process. That worker then goes through each batch, and reads each read in that batch, before moving to the next batch.
# worker to process a set of batches in a pod5 file
def batch_worker(filename, select_batches, result_queue):
# for each batch in set of batches, get it, and process the reads
for batch_id in batches:
batch = file.get_batch(batch_id)
for read in batch.reads():
# get read stuff
result_q.put(stuff)
main():
# setup mp
mp.set_start_method("spawn")
result_queue = mp.Queue()
runners = 10
# open file
file = pod5_format.open_combined_file(filename)
# get range of batches and split into groups for each runner
batches = list(range(file.batch_count))
approx_chunk_size = max(1, len(batches) // runners)
start_index = 0
# submit each set of batches to the runners to process
while start_index < len(batches):
select_batches = batches[start_index : start_index + approx_chunk_size]
p = mp.Process(
target=batch_worker,
args=(filename, select_batches, result_queue),
)
p.start()
processes.append(p)
start_index += len(select_batches)
# clean up processes and other code not shown here
for p in processes:
p.join()
The other method I came up with was submitting each batch to a queue, then each worker pulls from the queue, and processes that
batch of reads, and places it in the results queue. But it's essentially the same, where each worker processes a batch.
Is there another or better way to do this? What is the most efficient way to read a pod5 file if you are reading all the data, not just a selection of the data.
Cheers,
James
Hello,
I am converting a directory of fast5 from an ONT run to POD5 for use with Dorado. pod5 convert keeps crashing python. The program keeps running, but eventually the other python instances related to pod5 convert crash and the programs stalls before converting all files. Is there a way I can avoid this happening?
System:
Apple M1 Pro 10 core CPU, 32gb
MacOS 13.1
pod5 installed through pip
python 3.10
Error report below
Thanks.
-Isaac
-------------------------------------
Translated Report (Full Report Below)
-------------------------------------
Process: Python [4426]
Path: /Library/Frameworks/Python.framework/Versions/3.10/Resources/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 3.10.2 (3.10.2)
Code Type: ARM-64 (Native)
Parent Process: Python [4418]
Responsible: Terminal [632]
User ID: 501
Date/Time: 2022-12-27 21:08:55.7683 -0500
OS Version: macOS 13.1 (22C65)
Report Version: 12
Anonymous UUID: 3093BEF8-BED7-F432-D82E-4805C4F3C24B
Time Awake Since Boot: 2400 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000280800000
Exception Codes: 0x0000000000000001, 0x0000000280800000
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [4426]
VM Region Info: 0x280800000 is not in any region. Bytes after previous region: 1 Bytes before following region: 8388608
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
MALLOC_SMALL 280000000-280800000 [ 8192K] rw-/rwx SM=PRV
---> GAP OF 0x800000 BYTES
MALLOC_SMALL 281000000-281800000 [ 8192K] rw-/rwx SM=PRV
Kernel Triage:
VM - pmap_enter retried due to resource shortage
VM - pmap_enter retried due to resource shortage
VM - pmap_enter retried due to resource shortage
VM - pmap_enter retried due to resource shortage
I want to convert my single_read_fast5 files into multi_read_fast5_file(s) into one .pod5 file.
single_to_multi_fast5 -i {input} -s {output}
pod5 convert fast5 {output}/*.fast5 --output converted.pod5
where {input} is simply the folder with all the single-read fast5 files.
The command single_to_multi_fast5 converts my input files into a file "batch_0.fast5" and additionally it outputs a "filename_mapping.txt".
But when I try to use the pod5 command, the following error appears:
Converting 1 Fast5s: 0%| | 0/4000 [00:00<?, ?Reads/s]ERROR:pod5:Enqueueing exception: batch_0.fast5 'sample_id'
Converting 1 Fast5s: 0%| | 0/4000 [00:00<?, ?Reads/s]
WARNING:pod5:Unfinished exceptions found during shutdown!
I can't do much with the error message, I think maybe something got lost in the conversion that pod5 needs.
I am using the following libraries:
pod5 0.2.2
ont-fast5-api 4.1.1
Compared to FAST5, how fast/efficient is POD5 for writing?
Will PromethION P48 be able to write when all 48 flowcells are operating and at double the current sampling rate (i.e. 8000?)?
Are there any plans to expand the list of languages able access POD5 data files? I'd be particularly interested in a Julia package?
Given the completeness of the Python package (which is brilliant for scripting), merely having the ability to load and extract data from a POD5 data file using other languages would be sufficient. There is no need for options to manipulate or write out the data to new files. From the Julia perspective this should be fairly trivial given that Arrow.jl can do the heavy lifting with the data tables, but I couldn't from the documents figure out how you have wrapped these up in the container.
Happy to put in some effort to get this off the ground as it would remove another dependancy in my workflows and working with HDF5 is horrid. I'd like to make the jump ASAP!
Thanks, Tom.
Hello,
Could you please explain the streaming functionality in C/C++ if I were to extract raw data for Read Until/selective sequencing?
I know a FAST5 has to be completely written before it can be read. Does POD5 have any advantages for Read Until?
How would future chunks of a read be handled? Appended to the same file or new file?
Dear POD5 developers,
I have been trying to use the POD5 C API to write a simple example of converting raw signal data to pico ampere. It is a single POD5 file containing a large number of reads and I want to iterate through all the reads while exploiting as many threads as possible. Learning from the Dorado code I have written something and a code snippet is given below. I have a few questions.
pod5_get_signal_row_info()
without using a C++ vector (by using pure C structs)? See comment on the code belowpod5_init();
Pod5FileReader_t* file = pod5_open_combined_file(argv[1]);
if (!file) {
fprintf(stderr,"Error in opening file\n");
perror("perr: ");
exit(EXIT_FAILURE);
}
size_t batch_count = 0;
if (pod5_get_read_batch_count(&batch_count, file) != POD5_OK) {
fprintf(stderr, "Failed to query batch count: %s\n", pod5_get_error_string());
}
int read_count = 0;
for (size_t batch_index = 0; batch_index < batch_count; ++batch_index) {
Pod5ReadRecordBatch_t* batch = NULL;
if (pod5_get_read_batch(&batch, file, batch_index) != POD5_OK) {
fprintf(stderr,"Failed to get batch: %s\n", pod5_get_error_string());
}
size_t batch_row_count = 0;
if (pod5_get_read_batch_row_count(&batch_row_count, batch) != POD5_OK) {
fprintf(stderr,"Failed to get batch row count\n");
}
rec_t *rec = (rec_t*)malloc(batch_row_count * sizeof(rec_t));
// need to find out of this part can be multi-threaded, and if so the best way, for instance should this be parallised using an openMP for ()? or is it internally using threads by the arrow library which is opaque to the user?
for (size_t row = 0; row < batch_row_count; ++row) {
uint8_t read_id[16];
int16_t pore = 0;
int16_t calibration_idx = 0;
uint32_t read_number = 0;
uint64_t start_sample = 0;
float median_before = 0.0f;
int16_t end_reason = 0;
int16_t run_info = 0;
int64_t signal_row_count = 0;
if (pod5_get_read_batch_row_info(batch, row, read_id, &pore, &calibration_idx,
&read_number, &start_sample, &median_before,
&end_reason, &run_info, &signal_row_count) != POD5_OK) {
fprintf(stderr,"Failed to get read %ld\n", row );
}
read_count += 1;
char read_id_tmp[37];
pod5_error_t err = pod5_format_read_id(read_id, read_id_tmp);
CalibrationDictData_t *calib_data = NULL;
if (pod5_get_calibration(batch, calibration_idx, &calib_data) != POD5_OK) {
fprintf(stderr, "Failed to get read %ld calibration_idx data: %s\n", row, pod5_get_error_string());
}
uint64_t *signal_rows_indices= (uint64_t*) malloc(signal_row_count * sizeof(uint64_t));
if (pod5_get_signal_row_indices(batch, row, signal_row_count,
signal_rows_indices) != POD5_OK) {
fprintf(stderr,"Failed to get read %ld; signal row indices: %s\n", row, pod5_get_error_string());
}
// cannot get to work this in C, So using C++
//SignalRowInfo_t *signal_rows = (SignalRowInfo_t *)malloc(sizeof(SignalRowInfo_t)*signal_row_count);
std::vector<SignalRowInfo_t *> signal_rows(signal_row_count);
if (pod5_get_signal_row_info(file, signal_row_count, signal_rows_indices,
signal_rows.data()) != POD5_OK) {
fprintf(stderr,"Failed to get read %ld signal row locations: %s\n", row, pod5_get_error_string());
}
size_t total_sample_count = 0;
for (size_t i = 0; i < signal_row_count; ++i) {
total_sample_count += signal_rows[i]->stored_sample_count;
}
int16_t *samples = (int16_t*)malloc(sizeof(int16_t)*total_sample_count);
size_t samples_read_so_far = 0;
for (size_t i = 0; i < signal_row_count; ++i) {
if (pod5_get_signal(file, signal_rows[i], signal_rows[i]->stored_sample_count,
samples + samples_read_so_far) != POD5_OK) {
fprintf(stderr,"Failed to get read %ld; signal: %s\n", row, pod5_get_error_string());
fprintf(stderr,"Failed to get read %ld; signal: %s\n", row, pod5_get_error_string());
}
samples_read_so_far += signal_rows[i]->stored_sample_count;
}
rec[row].len_raw_signal = samples_read_so_far;
rec[row].raw_signal = samples;
rec[row].scale = calib_data->scale;
rec[row].offset = calib_data->offset;
rec[row].read_id = strdup(read_id_tmp);
pod5_release_calibration(calib_data);
pod5_free_signal_row_info(signal_row_count, signal_rows.data());
free(signal_rows_indices);
}
//process the batch here
//print the output here
if (pod5_free_read_batch(batch) != POD5_OK) {
fprintf(stderr,"Failed to release batch\n");
}
for (size_t row = 0; row < batch_row_count; ++row) {
free(rec[row].read_id);
free(rec[row].raw_signal);
}
free(rec);
}
Is the above implementation the most efficient way to use POD5 on a multi core system?
❯ pod5 filter WTC-11-NGN2-hiPSC_chromatin_1.pod5 --ids readID_5e5.txt --output WTC-11-NGN2-hiPSC_chromatin_1_subset.pod5 --force
[1] 1432189 illegal hardware instruction (core dumped) pod5 filter WTC-11-NGN2-hiPSC_chromatin_1.pod5 --ids readID_5e5.txt --output
Running latest 0.2 version.
pod5 inspect reads input.pod5
Works so I think there is nothing wrong with the input pod5.
Apologies if this is not place to ask this, I also asked on the Nanopore Community page, but figured I'd ask here too.
I recently loaded converted some fast5 files to pod5 to modbasecall with guppy.
This is the command I used:
./ont-guppy/bin/guppy_basecaller -i pods/ -a resources/ref.mmi -s guppy_out/ -c dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg -x auto --recursive --bam_out --index --compress_fastq
I receive this output:
ONT Guppy basecalling software version 6.4.2+97a7f06, minimap2 version 2.24-r1122
config file: /home/matthew/snake_guppy/ont-guppy/data/dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg
model file: /home/matthew/snake_guppy/ont-guppy/data/template_r10.4_e8.1_sup.jsn
input path: pods/
save path: guppy_out/
chunk size: 2000
chunks per runner: 208
minimum qscore: 10
records per file: 4000
fastq compression: ON
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 12
alignment file: resources/ref.mmi
alignment type: auto
Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /home/matthew/snake_guppy/ont-guppy/bin
loading new index: resources/ref.mmi
Full alignment will be performed.
Found 5 input read files to process.
Init time: 5485 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
Caller time: 101 ms, Samples called: 0, samples/s: 0
There were fast5 file loading problems! Failed to load 5 out of 5 fast5 files. Check log file for details.
Finishing up any open output files.
Basecalling completed successfully.
The log file says:
The EULA may be found in /home/matthew/snake_guppy/ont-guppy/bin
2023-01-25 16:49:07.330495 [guppy/info] crashpad_handler successfully launched.
2023-01-25 16:49:07.434523 [guppy/info] CUDA device 0 (compute 8.6) initialised, memory limit 25438322688B (24809373696B free)
2023-01-25 16:49:09.463378 [guppy/message] loading new index: resources/ref.mmi
2023-01-25 16:49:12.813098 [guppy/message] Full alignment will be performed.
2023-01-25 16:49:12.814092 [guppy/message] Found 5 input read files to process.
2023-01-25 16:49:12.814483 [guppy/info] Error attempting to open file "pods/OM2.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814569 [guppy/info] Error attempting to open file "pods/YM3.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814633 [guppy/info] Error attempting to open file "pods/OM1.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814701 [guppy/info] Error attempting to open file "pods/OM3.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814769 [guppy/info] Error attempting to open file "pods/YM2.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.815703 [guppy/message] Init time: 5485 ms
2023-01-25 16:49:12.915834 [guppy/message] Caller time: 101 ms, Samples called: 0, samples/s: 0
2023-01-25 16:49:12.915859 [guppy/message] There were fast5 file loading problems! Failed to load 5 out of 5 fast5 files. Check log file for details.
2023-01-25 16:49:12.915872 [guppy/message] Finishing up any open output files.
2023-01-25 16:49:12.986101 [guppy/info] Stats for model /home/matthew/snake_guppy/ont-guppy/data/template_r10.4_e8.1_sup.jsn, 12 runners/device, 208 chunks/run, 2000 blocks/chunk, lifetime 4.52 s
CUDA device 0: 0 runs with 0 chunks (-nan%), 0 samples (-nan%), avg max size -nan, avg size -nan (-nan% of max), 0 samples/s
2023-01-25 16:49:13.012831 [guppy/message] Basecalling completed successfully.
2023-01-25 16:49:07.434523 [guppy/info] CUDA device 0 (compute 8.6) initialised, memory limit 25438322688B (24809373696B free)
2023-01-25 16:49:09.463378 [guppy/message] loading new index: resources/ref.mmi
2023-01-25 16:49:12.813098 [guppy/message] Full alignment will be performed.
2023-01-25 16:49:12.814092 [guppy/message] Found 5 input read files to process.
2023-01-25 16:49:12.814483 [guppy/info] Error attempting to open file "pods/OM2.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814569 [guppy/info] Error attempting to open file "pods/YM3.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814633 [guppy/info] Error attempting to open file "pods/OM1.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814701 [guppy/info] Error attempting to open file "pods/OM3.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.814769 [guppy/info] Error attempting to open file "pods/YM2.pod5": Failed to query batch count: Invalid: null file passed to C API
2023-01-25 16:49:12.815703 [guppy/message] Init time: 5485 ms
2023-01-25 16:49:12.915834 [guppy/message] Caller time: 101 ms, Samples called: 0, samples/s: 0
2023-01-25 16:49:12.915859 [guppy/message] There were fast5 file loading problems! Failed to load 5 out of 5 fast5 files. Check log file for details.
2023-01-25 16:49:12.915872 [guppy/message] Finishing up any open output files.
2023-01-25 16:49:12.986101 [guppy/info] Stats for model /home/matthew/snake_guppy/ont-guppy/data/template_r10.4_e8.1_sup.jsn, 12 runners/device, 208 chunks/run, 2000 blocks/chunk, lifetime 4.52 s
CUDA device 0: 0 runs with 0 chunks (-nan%), 0 samples (-nan%), avg max size -nan, avg size -nan (-nan% of max), 0 samples/s
2023-01-25 16:49:13.012831 [guppy/message] Basecalling completed successfully.
I'm not sure what the issue is, does anyone have any advice?
I checked the files with pod5 inspect summary and they seem fine and have the expected sizes.
Hi,
I'm trying to read POD5 files with your C API. My problem comes specifically from the pod5_get_pore_type and pod5_get_end_reason functions. When I malloc a 16 char block for the end reason and an end reason larger than 16 chars is found, a buffer overflow occurs.
Specifically lines 663 to 667 contain:
POD5_C_ASSIGN_OR_RAISE(auto const end_reason_val, batch->batch.get_end_reason(end_reason));
*end_reason_string_value_size = end_reason_val.second.size() + 1;
if (end_reason_val.second.size() >= *end_reason_string_value_size) {
return POD5_ERROR_STRING_NOT_LONG_ENOUGH;
}
My understanding is that said if contains dead code, hence it is never returned an POD5_ERROR_STRING_NOT_LONG_ENOUGH error code and thus a client application has no way of knowing whether the alloc'd memory is sufficient or not. Should I be checking the string value in another way?
Thanks,
Rafael.
Hi, I tried to find pod5_format_export.h
in this project, but it is missing. But your other cpp files have included it many times, where is it?
❯ pip install --upgrade pod5
ERROR: Could not find a version that satisfies the requirement pod5 (from versions: none)
ERROR: No matching distribution found for pod5
Hi,
Could you please help me to understand what is happening behind the following piece of code (that I got from Dorado's POD5 reading):
if (pod5_get_signal_row_info(file, signal_row_count, signal_rows_indices,
signal_rows.data()) != POD5_OK) {
fprintf(stderr,"Failed to get read %ld signal row locations: %s\n", row, pod5_get_error_string());
}
fprintf(stderr,"ROw count\t%s\t%ld\n", read_id_tmp, signal_row_count);
size_t total_sample_count = 0;
for (size_t i = 0; i < signal_row_count; ++i) {
total_sample_count += signal_rows[i]->stored_sample_count;
}
int16_t *samples = (int16_t*)malloc(sizeof(int16_t)*total_sample_count);
size_t samples_read_so_far = 0;
for (size_t i = 0; i < signal_row_count; ++i) {
if (pod5_get_signal(file, signal_rows[i], signal_rows[i]->stored_sample_count,
samples + samples_read_so_far) != POD5_OK) {
}
samples_read_so_far += signal_rows[i]->stored_sample_count;
}
Is this signal_row_count
the number of chunks the MinKNOW is expecting to write when directly writing? However, when I converted 500,000 reads from fast5 to pod5, none of the reads has other than 1 for signal_row_count. When MinKNOW is writing files, what would be the expected value for this signal_row_count? I am asking this because the primary design goal in POD5 has been writing (and the need to write in chunks) and thus if the converter is not producing a file that the MINKNOW is expecting to produce, none of the reading-related benchmarks we do using pod5 generated using fast5 conversion are representative of the reality, as seek system calls (or major page faults if mmap is used internally) are ignored. If MinKNOW is expecting to reconvert chunked POD5 to unchunked POD5, then the benchmarks would be still representative, but if such a conversion is done, it contradicts the need to have a 'balanced' file format.
Also, are there any benchmarks done to evaluate POD5's writing performance? And is there a C API to do POD5 writing?.
Thank you.
Hi, I've run into the following problem after migrating to 0.2.0 (I'm not sure if I'm misunderstanding the C API or if this is a bug):
When writing POD5 files through the C API via "pod5_add_reads_data", and having more than 1000 reads, the programs fails on an assertion buffer. The problem specifically happend on read 999. Changing the read_table_batch_size to 10000 in the writer option fixes the issue.
I'll add some debug info in case this is a bug:
The exception happens in:
expandable_buffer.h@46 called by
read_table_writer.cpp@122 (write_batch) called by
read_table_writer@88 (add_read) called by
file_writer@92 (add_complete_read) called by
c_api@1124 (pod5_add_reads_data) called by my code (copy.cpp)
The dataset used is: s3://ont-open-data/gm24385_2020.09/analysis/r9.4.1/20200914_1357_1-E11-H11_PAF27462_d3c9678e/guppy_v4.0.11_r9.4.1_hac_prom/align_unfiltered/chr15/fast5/batch12.fast5
It was converted using the pod5 convertion tool given in the Python package.
Here is the full call stack from VSCode:
libc.so.6!__pthread_kill_implementation(int no_tid, int signo, pthread_t threadid) (pthread_kill.c:44)
libc.so.6!__pthread_kill_internal(int signo, pthread_t threadid) (pthread_kill.c:78)
libc.so.6!__GI___pthread_kill(pthread_t threadid, int signo) (pthread_kill.c:89)
libc.so.6!__GI_raise(int sig) (raise.c:26)
libc.so.6!__GI_abort() (abort.c:79)
libc.so.6!__assert_fail_base(const char * fmt, const char * assertion, const char * file, unsigned int line, const char * function) (assert.c:92)
libc.so.6!__GI___assert_fail(const char * assertion, const char * file, unsigned int line, const char * function) (assert.c:101)
pod5::ExpandableBuffer::get_data_span(const pod5::ExpandableBuffer * const this) (PATH/pod5/c++/pod5_format/expandable_buffer.h:46)
pod5::detail::StringDictionaryKeyBuilder::get_typed_offset_data(const pod5::detail::StringDictionaryKeyBuilder * const this) (PATH/pod5/c++/pod5_format/read_table_writer_utils.h:90)
pod5::detail::get_array_data(const std::shared_ptrarrow::DataType & type, const pod5::detail::StringDictionaryKeyBuilder & builder, std::size_t expected_length) (PATH/pod5/c++/pod5_format/read_table_writer_utils.cpp:33)
pod5::RunInfoWriter::get_value_array(pod5::RunInfoWriter * const this) (PATH/pod5/c++/pod5_format/read_table_writer_utils.cpp:228)
pod5::DictionaryWriter::build_dictionary_array(pod5::DictionaryWriter * const this, const std::shared_ptrarrow::Array & indices) (PATH/pod5/c++/pod5_format/read_table_writer_utils.cpp:198)
pod5::detail::BuilderHelperarrow::DictionaryArray::Finish(pod5::detail::BuilderHelperarrow::DictionaryArray * const this, std::shared_ptrarrow::Array * dest) (PATH/pod5/c++/pod5_format/schema_field_builder.h:174)
pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns()::{lambda(auto:1&, unsigned long)#1}::operator()<pod5::detail::BuilderHelperarrow::DictionaryArray >(pod5::detail::BuilderHelperarrow::DictionaryArray&, unsigned long) const(const struct {...} * const __closure, pod5::detail::BuilderHelperarrow::DictionaryArray & element, std::size_t index) (PATH/pod5/c++/pod5_format/schema_field_builder.h:240)
pod5::detail::for_each<std::tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray >&, pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns()::{lambda(auto:1&, unsigned long)#1}, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20>(std::tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray >&, pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns()::{lambda(auto:1&, unsigned long)#1}, std::integer_sequence<int, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20>)(std::tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray > & t, struct {...} f) (PATH/pod5/c++/pod5_format/tuple_utils.h:11)
pod5::detail::for_each_in_tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns()::{lambda(auto:1&, unsigned long)#1}>(std::tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray >&, pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns()::{lambda(auto:1&, unsigned long)#1})(std::tuple<pod5::detail::BuilderHelperpod5::UuidArray, pod5::detail::ListBuilderHelper<arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt32Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt64Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt16Type >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::UInt8Type >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelper<arrow::NumericArrayarrow::FloatType >, pod5::detail::BuilderHelperarrow::DictionaryArray, pod5::detail::BuilderHelperarrow::BooleanArray, pod5::detail::BuilderHelperarrow::DictionaryArray > & t, struct {...} f) (PATH/pod5/c++/pod5_format/tuple_utils.h:18)
pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> >::finish_columns(pod5::FieldBuilder<pod5::Field<0, pod5::UuidArray>, pod5::ListField<1, arrow::ListArray, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<2, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<3, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<4, arrow::NumericArrayarrow::FloatType >, pod5::Field<5, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<6, arrow::NumericArrayarrow::FloatType >, pod5::Field<7, arrow::NumericArrayarrow::FloatType >, pod5::Field<8, arrow::NumericArrayarrow::FloatType >, pod5::Field<9, arrow::NumericArrayarrow::FloatType >, pod5::Field<10, arrow::NumericArrayarrow::UInt32Type >, pod5::Field<11, arrow::NumericArrayarrow::FloatType >, pod5::Field<12, arrow::NumericArrayarrow::UInt64Type >, pod5::Field<13, arrow::NumericArrayarrow::UInt16Type >, pod5::Field<14, arrow::NumericArrayarrow::UInt8Type >, pod5::Field<15, arrow::DictionaryArray>, pod5::Field<16, arrow::NumericArrayarrow::FloatType >, pod5::Field<17, arrow::NumericArrayarrow::FloatType >, pod5::Field<18, arrow::DictionaryArray>, pod5::Field<19, arrow::BooleanArray>, pod5::Field<20, arrow::DictionaryArray> > * const this) (PATH/pod5/c++/pod5_format/schema_field_builder.h:238)
pod5::ReadTableWriter::write_batch(pod5::ReadTableWriter * const this) (PATH/pod5/c++/pod5_format/read_table_writer.cpp:122)
pod5::ReadTableWriter::add_read(pod5::ReadTableWriter * const this, const pod5::ReadData & read_data, const gsl::span & signal, uint64_t signal_duration) (PATH/pod5/c++/pod5_format/read_table_writer.cpp:88)
pod5::FileWriterImpl::add_complete_read(pod5::FileWriterImpl * const this, const pod5::ReadData & read_data, const gsl::span & signal) (PATH/pod5/c++/pod5_format/file_writer.cpp:92)
pod5::FileWriter::add_complete_read(pod5::FileWriter * const this, const pod5::ReadData & read_data, const gsl::span & signal) (PATH/pod5/c++/pod5_format/file_writer.cpp:340)
pod5_add_reads_data(Pod5FileWriter_t * file, uint32_t read_count, uint16_t struct_version, const void * row_data, const int16_t ** signal, const uint32_t * signal_size) (PATH/pod5/c++/pod5_format/c_api.cpp:1124)
main(int argc, char ** argv) (PATH/src/c++/copy.cpp:431)
Regards,
Rafael.
Hey,
I am trying to run guppy (v6.3.7) basecalling on our HPC on a pod5 file. The job finishes without crashing after like 5 seconds. No error message thrown. No output is created.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.