Giter Club home page Giter Club logo

Comments (10)

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @hasindu2008,

The question is complex, and dependent on disk technology, and write pattern of the calling software.

WRT increased sampling rate, yes. Pod5 is capable of handling higher write rates on a live sequencer.

Thanks,

  • George

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

Let us say that our current P48 tower that comes with 8xSSD RAID0 setup and the writing pattern of MINKNOW. How higher are we talking about - 10 times of FAST5?

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

10x faster is the number I got when I benchmarked previously in a single threaded environment.

We have since optimised writing significantly, to the point where disks have generally always been the bottleneck, and a standard minknow sequencing environment is highly multithreaded.

However, without your exact machine revision + disks I can't 100% comment on the results you would see on your runs.

And further, when minknow is running sequencing, there is a lot of IO/resource usage going on - its not always the best idea to optimise the speed of final file output - instead, basecalling or data analysis or acquisition may need to take priority.

  • George

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

What is the underlying writing mechanism in pod5? Is it standard write() system call, or mmap or asynchronous io or iouring?

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @hasindu2008,

It'll depend on the OS, but on a standard linux OS, a buffered write using arrow's standard writer is used.

  • George

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

Do you know what system call arrow is using internally?

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

I'm afraid I dont, sorry

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

@jorj1988
How can I get this revision number for our PromethION? The name is something like PC48B226. the 8XSSD are Micron_5300_MTFD. What I am interested in knowing is what is the upper limit of the sampling frequency that POD5 will allow on our PromethION, during a standard minKNOW sequencing environment which is highly multithreaded, and when I/O priorities are properly set to cater live fast-basecalling and acquisition. A single-threaded benchmark is not really what I am after.

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

What I am interested in knowing is what is the upper limit of the sampling frequency that POD5 will allow on our PromethION

Hi @hasindu2008, I'm afraid I don't have exact data on this question - it'll depend on a number of factors to do with the hardware and software stack.

pod5 inside minknow is expected to support all the sequencing conditions the sequencer supports (we use a 4k sampling rate right now for most conditions). When it comes to writing from a non-live source (repacking or selecting) the repack should operate at close to the disk's throughput.

Hope that helps,

  • George

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

So after all these, it appears that POD5 being selected as a suitable file format for writing did not really rely on a comprehensive benchmark. Two of the predictions I made about POD5 (refer to that document) have already become true. The time may tell, and more headaches are to follow, perhaps a critical flaw ;).

After all this, my conclusion is that technically, a simple binary format like BLOW5 is much more efficient and better for both reading and writing than a complicated over-engineered format like POD5 for all the practical ways. If you went down that path, by now MINKNOW could have been directly writing at much higher stability and reliability, perhaps 6 months ago, with very less engineering effort.

Anyway, thank you very much @jorj1988 and @vellamike for doing your best and helping as far as you are allowed to.

from pod5-file-format.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.