Giter Club home page Giter Club logo

Comments (9)

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi James,

You're right - there is an issue in the digitisaion conversion to adc range - I will get a release in to fix this asap.

WRT keeping range + digitisation, what is the need for these fields explicitly?

As you say, you can calculate digitisation (when everything is working) using: adc_max - adc_min, and range can be derived using: calibration.scale * digitisaion.

We have chosen to store scale and offset in the pod5 format, as its the value actually needed to convert ADC values to pA.

However, if there is a user need to actually know range and digitisation we could add it to the format - or add methods to the reader to recover the data?

Thanks,

  • George

from pod5-file-format.

Psy-Fer avatar Psy-Fer commented on June 2, 2024

Hello George,

These 2 steps in the pA conversion

  1. scale = range / digitisation
  2. pA = scale * (signal + offset)

Keeping scale and offset in pod5, means you can skip straight to step 2. However step 1 was never data-intensive, and multiple 3rd party tools expect range and digitisation as inputs for pA conversion.
This adds an extra complication when it comes to integration. Normally, when integrating a file format that contains the same data, you want to just write the data ingester, and then match the fields with the internal data structure being used in the software, and you don't have to change anything else other than maybe input arguments.

In this case, integration would have to add some switch for using pod5 and skipping step 1.

It's not a huge deal, it's mostly around design in the existing ecosystem for ease of adoption.

One last, though probably the more important factor. If you do it this way, fast5->pod5 becomes non-reversible. As there is no way to get the range or digitisation values going backwards pod5->fast5. I know that isn't high on the priority list for moving forward, but from a scientific reproducibility point of view (and probably troubleshooting and growing pains), it is very important.

So it would be good if they could be included, or at least calculated from whatever is in pod5.

Cheers,
James

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @Psy-Fer ,

Thats really good info thanks, I will immediately add accessors so the values can be extracted - as you say with some potential loss.

I will also discuss with the team internally around which fields can be stored in the file, and I will refresh my knowledge on what minknow handles internally - if we internally use offset + scale, then using on disk then storing range on disk seems pointless.

Thanks,

  • George

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @Psy-Fer ,

I hope 0.0.17 has resolved a lot of these issues - please let us know any feedback!

  • George

from pod5-file-format.

Psy-Fer avatar Psy-Fer commented on June 2, 2024

Hey George,

Yep that fixed it. Thanks for that.

Cheers,
James

from pod5-file-format.

Psy-Fer avatar Psy-Fer commented on June 2, 2024

Hey George,

Just re-opening this one.
Any chance we can get the same hooks you fixed up for python API also put into the C API?

So we can get the range and digitisation values using the C API ? (python ones are working great).

If I've missed something and they are there and I've just missed them, please let me know where I can find them.
Otherwise, yea, could we get them added?

Cheers,
James

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi James,

Good shout - I will get the C API updated with these values too

  • George

from pod5-file-format.

0x55555555 avatar 0x55555555 commented on June 2, 2024

Hi @Psy-Fer ,

There is a new API call available in 0.0.21 with these values present:

https://github.com/nanoporetech/pod5-file-format/blob/master/c++/pod5_format/c_api.h#L221

Thanks,

  • George

from pod5-file-format.

Psy-Fer avatar Psy-Fer commented on June 2, 2024

Thanks George, I'll give it a go and let you know. Appreciate the turnaround on this too.

  • James

from pod5-file-format.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.