Giter Club home page Giter Club logo

Comments (5)

hasindu2008 avatar hasindu2008 commented on June 2, 2024 1

Hi George

Thank you for the quick response. (1) is not a big requirement as I can use g++ for now. For (2), how are you envisioning the MinKNOW output to become - what would be the default batch size. And, also is MinKNOW going to output one large single POD5 file per one sequencing run or will it be multiple POD5 files like it Is being done with FAST5 at the moment?

(3) is what I am mostly interested in. Isn't arrow capable of internally utilising threads for decompression and/or parsing?
From a C user's perspective, high-level API calls for threading are not that necessary, as long as thread-safe functions and the an example on how they can be used are provided. Can I simply parallelise the following loop - I tried but it crashes probably something is not threadsafe?

for (size_t row = 0; row < batch_row_count; ++row) {
....
}

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @hasindu2008,

Great shouts on the above, I'll try to answer in order:

  1. This API is one on my list to improve - I am planning to add a call to return all the signal for a read up front - with fewer calls. Any input on the call structure that would work best for you?
  2. Arrow writes in a specific batch size, so you cannot change that at read time, you can however read multiple batches, or half a batch at once - if your code allows this.
  3. So currently the above code is single threaded internally - loading only one read into memory, however, I have implemented a multi threaded reader for the python API, and again - it is on my list to move this to the general purpose C API. Do you have input on how you would like to call this as a user?

I will endeveour to push a release shortly that at least makes getting signal in a single threaded way easier, then push onto getting it faster - using multiple threads.

Thanks,

  • George

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

@jorj1988
In addition to the above, it will be great if you could give some insights on how this random access (https://github.com/nanoporetech/pod5-file-format/blob/master/c%2B%2B/examples/find_specific_read_ids.cpp) can be parallelised in the user side.

from pod5-file-format.

hasindu2008 avatar hasindu2008 commented on June 2, 2024

Is this multi-threading-related crash in POD5 fixed now?

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hello @hasindu2008,

My apologies - this issue has slipped by me.

Yes, this issue is now resolved - it is now safe to read pod5 files from as many threads as you like. I would recommend for cache efficiency reading one batch at a time in each thread, however this is not required by the API.

  • George

from pod5-file-format.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.