Comments (10)
Hi @hasindu2008,
The question is complex, and dependent on disk technology, and write pattern of the calling software.
WRT increased sampling rate, yes. Pod5 is capable of handling higher write rates on a live sequencer.
Thanks,
- George
from pod5-file-format.
Let us say that our current P48 tower that comes with 8xSSD RAID0 setup and the writing pattern of MINKNOW. How higher are we talking about - 10 times of FAST5?
from pod5-file-format.
10x faster is the number I got when I benchmarked previously in a single threaded environment.
We have since optimised writing significantly, to the point where disks have generally always been the bottleneck, and a standard minknow sequencing environment is highly multithreaded.
However, without your exact machine revision + disks I can't 100% comment on the results you would see on your runs.
And further, when minknow is running sequencing, there is a lot of IO/resource usage going on - its not always the best idea to optimise the speed of final file output - instead, basecalling or data analysis or acquisition may need to take priority.
- George
from pod5-file-format.
What is the underlying writing mechanism in pod5? Is it standard write() system call, or mmap or asynchronous io or iouring?
from pod5-file-format.
Hi @hasindu2008,
It'll depend on the OS, but on a standard linux OS, a buffered write using arrow's standard writer is used.
- George
from pod5-file-format.
Do you know what system call arrow is using internally?
from pod5-file-format.
I'm afraid I dont, sorry
from pod5-file-format.
@jorj1988
How can I get this revision number for our PromethION? The name is something like PC48B226. the 8XSSD are Micron_5300_MTFD. What I am interested in knowing is what is the upper limit of the sampling frequency that POD5 will allow on our PromethION, during a standard minKNOW sequencing environment which is highly multithreaded, and when I/O priorities are properly set to cater live fast-basecalling and acquisition. A single-threaded benchmark is not really what I am after.
from pod5-file-format.
What I am interested in knowing is what is the upper limit of the sampling frequency that POD5 will allow on our PromethION
Hi @hasindu2008, I'm afraid I don't have exact data on this question - it'll depend on a number of factors to do with the hardware and software stack.
pod5 inside minknow is expected to support all the sequencing conditions the sequencer supports (we use a 4k sampling rate right now for most conditions). When it comes to writing from a non-live source (repacking or selecting) the repack should operate at close to the disk's throughput.
Hope that helps,
- George
from pod5-file-format.
So after all these, it appears that POD5 being selected as a suitable file format for writing did not really rely on a comprehensive benchmark. Two of the predictions I made about POD5 (refer to that document) have already become true. The time may tell, and more headaches are to follow, perhaps a critical flaw ;).
After all this, my conclusion is that technically, a simple binary format like BLOW5 is much more efficient and better for both reading and writing than a complicated over-engineered format like POD5 for all the practical ways. If you went down that path, by now MINKNOW could have been directly writing at much higher stability and reliability, perhaps 6 months ago, with very less engineering effort.
Anyway, thank you very much @jorj1988 and @vellamike for doing your best and helping as far as you are allowed to.
from pod5-file-format.
Related Issues (20)
- pod5 view does not work for some data since version 0.3.0 HOT 6
- pod5 webserver memory error HOT 2
- option to split pod5 by size/read number HOT 3
- Scratch/tmp pod5 problem HOT 21
- Semaphore hissy fit at the end of subset run HOT 1
- pod5 subset/filter in preparation for dorado duplex is slow HOT 5
- error with pod5 convert to_fast5 HOT 1
- Cannot install pod5 through pip on ARM due to dependency issues HOT 11
- Reader class attributes immutable (Cannot edit "sample_id" field of mutable read object) HOT 1
- getrandom error with pod5 convert fast5 HOT 14
- MantaControl': Unable to read fast5 file at /path/: HDF5 exception", HOT 2
- Getting the signal chunk size of a pod5 file HOT 1
- Missing conda pod5 package HOT 1
- No documentation regarding multi-file pod5 dependency HOT 2
- pod5 convert fast5 warning: Failed to read key read_XXX HOT 2
- Troubleshooting Conversion of Fast5 Files to Pod5 Format HOT 12
- error:XX.fast5 is not a multi-read fast5 file HOT 2
- pod5 filter get killed HOT 5
- pod5 convert fast5 is stalling HOT 4
- Split Read IDs Cause Missing Read Error? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pod5-file-format.