Comments (9)
Hi James,
You're right - there is an issue in the digitisaion conversion to adc range - I will get a release in to fix this asap.
WRT keeping range + digitisation, what is the need for these fields explicitly?
As you say, you can calculate digitisation (when everything is working) using: adc_max - adc_min
, and range can be derived using: calibration.scale * digitisaion
.
We have chosen to store scale and offset in the pod5 format, as its the value actually needed to convert ADC values to pA.
However, if there is a user need to actually know range and digitisation we could add it to the format - or add methods to the reader to recover the data?
Thanks,
- George
from pod5-file-format.
Hello George,
These 2 steps in the pA conversion
- scale = range / digitisation
- pA = scale * (signal + offset)
Keeping scale and offset in pod5, means you can skip straight to step 2. However step 1 was never data-intensive, and multiple 3rd party tools expect range and digitisation as inputs for pA conversion.
This adds an extra complication when it comes to integration. Normally, when integrating a file format that contains the same data, you want to just write the data ingester, and then match the fields with the internal data structure being used in the software, and you don't have to change anything else other than maybe input arguments.
In this case, integration would have to add some switch for using pod5 and skipping step 1.
It's not a huge deal, it's mostly around design in the existing ecosystem for ease of adoption.
One last, though probably the more important factor. If you do it this way, fast5->pod5 becomes non-reversible. As there is no way to get the range or digitisation values going backwards pod5->fast5. I know that isn't high on the priority list for moving forward, but from a scientific reproducibility point of view (and probably troubleshooting and growing pains), it is very important.
So it would be good if they could be included, or at least calculated from whatever is in pod5.
Cheers,
James
from pod5-file-format.
Hi @Psy-Fer ,
Thats really good info thanks, I will immediately add accessors so the values can be extracted - as you say with some potential loss.
I will also discuss with the team internally around which fields can be stored in the file, and I will refresh my knowledge on what minknow handles internally - if we internally use offset
+ scale
, then using on disk then storing range
on disk seems pointless.
Thanks,
- George
from pod5-file-format.
Hi @Psy-Fer ,
I hope 0.0.17 has resolved a lot of these issues - please let us know any feedback!
- George
from pod5-file-format.
Hey George,
Yep that fixed it. Thanks for that.
Cheers,
James
from pod5-file-format.
Hey George,
Just re-opening this one.
Any chance we can get the same hooks you fixed up for python API also put into the C API?
So we can get the range and digitisation values using the C API ? (python ones are working great).
If I've missed something and they are there and I've just missed them, please let me know where I can find them.
Otherwise, yea, could we get them added?
Cheers,
James
from pod5-file-format.
Hi James,
Good shout - I will get the C API updated with these values too
- George
from pod5-file-format.
Hi @Psy-Fer ,
There is a new API call available in 0.0.21 with these values present:
https://github.com/nanoporetech/pod5-file-format/blob/master/c++/pod5_format/c_api.h#L221
Thanks,
- George
from pod5-file-format.
Thanks George, I'll give it a go and let you know. Appreciate the turnaround on this too.
- James
from pod5-file-format.
Related Issues (20)
- pod5 view fails with polars.exceptions.ColumnNotFoundError: not_set HOT 2
- pod5 filter/ subset failed with TypeError: enable_string_cache() missing 1 required positional argument: 'enable' HOT 3
- V0.3.6 pod5 view "Error while processing" HOT 3
- pod5 view does not work for some data since version 0.3.0 HOT 6
- pod5 webserver memory error HOT 2
- option to split pod5 by size/read number HOT 3
- Scratch/tmp pod5 problem HOT 21
- Semaphore hissy fit at the end of subset run HOT 1
- pod5 subset/filter in preparation for dorado duplex is slow HOT 5
- error with pod5 convert to_fast5 HOT 1
- Cannot install pod5 through pip on ARM due to dependency issues HOT 11
- Reader class attributes immutable (Cannot edit "sample_id" field of mutable read object) HOT 1
- getrandom error with pod5 convert fast5 HOT 14
- MantaControl': Unable to read fast5 file at /path/: HDF5 exception", HOT 2
- Getting the signal chunk size of a pod5 file HOT 1
- Missing conda pod5 package HOT 1
- No documentation regarding multi-file pod5 dependency HOT 2
- pod5 convert fast5 warning: Failed to read key read_XXX HOT 2
- Troubleshooting Conversion of Fast5 Files to Pod5 Format HOT 12
- error:XX.fast5 is not a multi-read fast5 file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pod5-file-format.