Giter Club home page Giter Club logo

Comments (6)

ilia-kats avatar ilia-kats commented on June 2, 2024

It is not. TransposedDataset is a transposed HDF5.Dataset, the type was created to avoid reading the entire dataset into memory. So all of TransposedDataset's operations are forwarded to HDF5.Dataset, after permuting the indices if necessary. HDF5.Dataset does not define strides either.

from muon.jl.

orenbenkiki avatar orenbenkiki commented on June 2, 2024

Makes sense, except...

An H5DF Dataset is not a Julia AbstractArray. Instead it provides read and readmmap (when possible), which allows getting such an array. The result does support strides etc.

In contrast, TransposedDataset does claim to be an AbstractArray, but does not support strides. It also works similarly to an H5DF Dataset but It does not support readmmap, ismappable and iscontiguous.

So the API of TransposedDataset provides neither the complete API of an H5DF Dataset, nor the complete API for a Julia AbstractArray.

Intuitively, it should do both. That is, provide readmmap and ismappable and iscontiguous, just like an H5DF Dataset; and also provide strides (when possible), just like a Julia AbstractArray.

Naturally readmmap would need to return the Transpose of the memory-mapped matrix (using the LinearAlgebra package). That's a zero-copy view of the data so would still be memory-mapped.

from muon.jl.

ilia-kats avatar ilia-kats commented on June 2, 2024

strides is not part of the AbstractArray interface. AbstractArray only mandates that size and getindex are implemented, everything else is optional. strides is part of the strided Array interface.

from muon.jl.

orenbenkiki avatar orenbenkiki commented on June 2, 2024

Technically correct; however, not supporting strides for arrays which are actually strided (memory-mappable vectors and matrices) disables all sort of optimizations when actually working with these arrays.

My workaround for now is to memory-map the array using the internal dset (and transpose it), completely ignoring the fact that TransposedDataset is an AbstractArray.

Of course this runs into issue #24 so it only works for annotations written by the Python anndata package...

from muon.jl.

ilia-kats avatar ilia-kats commented on June 2, 2024

Note that the strided Array interface also mandates an implementation of Base.unsafe_convert, which returns a pointer to the memory block where the array is stored. This is impossible with either HDF5.Dataset or TransposedDataset.

from muon.jl.

orenbenkiki avatar orenbenkiki commented on June 2, 2024

Good point. So I guess the readmmap workaround is the only option, which depends on #24 (for data written by the Julia package).

Thanks!

from muon.jl.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.