Comments (8)
Agreed 100%. I was basically waiting for immutables to land before getting serious about Compound support, and never got back to it.
Because I don't need this right away I probably won't get to this immediately; feel free to tackle it, or I'll tackle it myself in a week or two.
from hdf5.jl.
(I have some julia/Profile.jl bugs that need fixing first.)
from hdf5.jl.
I've started on this, but I'm still thinking of the best way to handle things. The first decision to be made is whether we store only immutable bits types as compound types; we store all immutable types as compound types; or we store all Julia types as compound types.
There is an undeniable appeal to storing all Julia types as compound types. Reading/writing objects that contain bits type fields would be significantly faster, since those fields wouldn't need to be references. The differences between immutables and ordinary objects would just be in the way arrays are handled (i.e., as arrays of references or arrays of values). I think we could even reconstruct missing/changed types by dynamically generating a new type based on the compound type definition. The major downside is that we'd need to break compatibility with existing JLD files, leave around a method to read them, or create a converter.
I've also been thinking about how to efficiently convert HDF5 compound types to Julia types. "Efficiently" ideally means that, once the compound type is read into memory, we convert the compound type to a Julia type in place and avoid additional allocations. For immutable bits types and arrays thereof, this is easy, since we just need to add padding in the right places. It might even be possible to get the HDF5 library to perform this conversion for us. For normal Julia types, where arrays are stored as references, there isn't necessarily a big advantage to in-place conversion, since we'll never be converting very much data at a time, although we would avoid an allocation for each object. For arrays of immutable types with pointers, we would need to convert HDF5 object references to pointers in-place to avoid allocating a second buffer.
Allocating a second buffer isn't that bad, and to start with I'll probably just do this, but it limits the maximum size of an array of non-bits immutables to half of the system's available memory. It might be possible to avoid, though, either by giving the HDF5 library custom conversion functions using H5Tregister
that would perform in-place conversion of objects with object references or by reading from the HDF5 file directly from Julia (although I'm not sure how to make this work for chunked datasets).
from hdf5.jl.
I haven't thought about this in ages, but how would one handle a type declaration like
type MyType
x::Real
end
and for an array of them, some are Float64
and others Int16
?
from hdf5.jl.
We'd store anything that's not a bits type as a reference in the compound type, effectively mirroring the way Julia stores types in memory. If we have to reconstruct the type from the compound type definition because the Julia type changed or no longer exists, we'd just leave reference fields untyped.
from hdf5.jl.
That seems very reasonable.
Overall I think this sounds like a great plan. As much as it pains me to break JLD compatibility, I think the reality is that these files are not yet in heavy use, but probably will be some day (I'm just starting to make use of them in practice in my own work). So now is the time for breakage if there ever is. Moreover, since there is a version number, in principle we have all the information we need. The "converter," if we need one, could even be a current snapshot of jld.jl, with a different module name (e.g., JLD01).
As far as efficiency goes, presumably there may be places where it matters and where it doesn't (since IO is expected to be somewhat limiting). To me it seems that you have a great plan. I agree that, ultimately, we could probably get the HDF5 library to insert padding etc for us, but also that such optimizations can come second if they're nontrivial to get working.
from hdf5.jl.
Making progress: https://gist.github.com/simonster/50d282a533a76eaebbb3
Next step: reading it out.
from hdf5.jl.
Oo ooh! Very nice! I'm really excited about this.
Aside from my Vector{Vector{T}}
blunder (#123), the major changes that I expect you must be making here were another motivation for the JLDArchives package. Always nice to have an easy way to run tests to see what has been broken.
from hdf5.jl.
Related Issues (20)
- Is tab completion supportable? [feature request] HOT 10
- Change use constructors instead of functions HOT 2
- Windows test failures for SZIP compression HOT 2
- Update change log and release notes
- Add to list of known preferences HOT 2
- Convenience function to use custom or system provided HDF5 binaries HOT 1
- Test failures in h5a_iterate HOT 1
- Changed requirements in HDF5_jll's `libhdf5.so` for `libcurl.so`? HOT 8
- Can't get HDF5.jl work with Julia running in docker (julia:1.8-alpine3.17) - can't find libmpi.so.12 HOT 6
- Add mid/high level interface for HDF5 Dimension Scale HOT 1
- Writing scalar datasets of compound types HOT 3
- freeze when `hdf5_type_id` on self-referential datatype HOT 17
- Get rid of HISTORY.md? HOT 1
- Segfault when writing variable length string as attribute HOT 8
- Feature request - add support for SparseMatrixCSC HOT 1
- Support szip (freely) HOT 5
- Installing HDF5.jl on ARM M1 HOT 5
- HDF5.jl triggers segfault in ccall with openmp+clang(m1) with julia 1.10 HOT 20
- Inconsistent writing of complex data inside compound type HOT 1
- `set_libraries!()` fails on fresh install HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdf5.jl.