Comments (5)
Hi George
Thank you for the quick response. (1) is not a big requirement as I can use g++ for now. For (2), how are you envisioning the MinKNOW output to become - what would be the default batch size. And, also is MinKNOW going to output one large single POD5 file per one sequencing run or will it be multiple POD5 files like it Is being done with FAST5 at the moment?
(3) is what I am mostly interested in. Isn't arrow capable of internally utilising threads for decompression and/or parsing?
From a C user's perspective, high-level API calls for threading are not that necessary, as long as thread-safe functions and the an example on how they can be used are provided. Can I simply parallelise the following loop - I tried but it crashes probably something is not threadsafe?
for (size_t row = 0; row < batch_row_count; ++row) {
....
}
from pod5-file-format.
Hi @hasindu2008,
Great shouts on the above, I'll try to answer in order:
- This API is one on my list to improve - I am planning to add a call to return all the signal for a read up front - with fewer calls. Any input on the call structure that would work best for you?
- Arrow writes in a specific batch size, so you cannot change that at read time, you can however read multiple batches, or half a batch at once - if your code allows this.
- So currently the above code is single threaded internally - loading only one read into memory, however, I have implemented a multi threaded reader for the python API, and again - it is on my list to move this to the general purpose C API. Do you have input on how you would like to call this as a user?
I will endeveour to push a release shortly that at least makes getting signal in a single threaded way easier, then push onto getting it faster - using multiple threads.
Thanks,
- George
from pod5-file-format.
@jorj1988
In addition to the above, it will be great if you could give some insights on how this random access (https://github.com/nanoporetech/pod5-file-format/blob/master/c%2B%2B/examples/find_specific_read_ids.cpp) can be parallelised in the user side.
from pod5-file-format.
Is this multi-threading-related crash in POD5 fixed now?
from pod5-file-format.
Hello @hasindu2008,
My apologies - this issue has slipped by me.
Yes, this issue is now resolved - it is now safe to read pod5 files from as many threads as you like. I would recommend for cache efficiency reading one batch at a time in each thread, however this is not required by the API.
- George
from pod5-file-format.
Related Issues (20)
- pod5 filter/ subset failed with TypeError: enable_string_cache() missing 1 required positional argument: 'enable' HOT 3
- V0.3.6 pod5 view "Error while processing" HOT 3
- pod5 view does not work for some data since version 0.3.0 HOT 6
- pod5 webserver memory error HOT 2
- option to split pod5 by size/read number HOT 3
- Scratch/tmp pod5 problem HOT 21
- Semaphore hissy fit at the end of subset run HOT 1
- pod5 subset/filter in preparation for dorado duplex is slow HOT 5
- error with pod5 convert to_fast5 HOT 1
- Cannot install pod5 through pip on ARM due to dependency issues HOT 11
- Reader class attributes immutable (Cannot edit "sample_id" field of mutable read object) HOT 1
- getrandom error with pod5 convert fast5 HOT 14
- MantaControl': Unable to read fast5 file at /path/: HDF5 exception", HOT 2
- Getting the signal chunk size of a pod5 file HOT 1
- Missing conda pod5 package HOT 1
- No documentation regarding multi-file pod5 dependency HOT 2
- pod5 convert fast5 warning: Failed to read key read_XXX HOT 2
- Troubleshooting Conversion of Fast5 Files to Pod5 Format HOT 12
- error:XX.fast5 is not a multi-read fast5 file HOT 2
- pod5 filter get killed HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pod5-file-format.