lasersonlab / ndarray.scala Goto Github PK
View Code? Open in Web Editor NEWN-dimensional arrays, with Zarr and HDF5 integrations
License: Apache License 2.0
N-dimensional arrays, with Zarr and HDF5 integrations
License: Apache License 2.0
All dependencies of the zarr
module are cross-published for JVM and JS except JBlosc.
Cross-publishing to JS would be "nice to have", either for use on the client in #11, and also to prepare for adding a scala-native target later.
Native support is currently blocked on Cats, which in turn needs ScalaCheck, and so is probably a few months out at least, but it would be really cool as a potential way to support R as well, or even be usable from Python, in case features make it in here that aren't in the reference python implementation.
Hi,
@ryan-williams current example uses null
as fill value, but also NaN
is often used.
With our data (NaN
as fill_value
), if i do val zarrGroup: zarr.Group = get(Path("/home/lauri/Downloads/GFS/hycom_test").load[Group])
I get:
Not an array:
DecodingFailure(Double, List())
Not a group:
org.lasersonlab.zarr.Group$InvalidChild: Path /home/lauri/Downloads/GFS/hycom_test/lon/.zarray:
Not an array:
org.lasersonlab.io.FileNotFoundException
Not a group:
java.nio.file.NotDirectoryException: /home/lauri/Downloads/GFS/hycom_test/lon/.zarray
at org.lasersonlab.zarr.Group$.$anonfun$apply$20(Group.scala:138)
at scala.util.Either$LeftProjection.map(Either.scala:573)
at org.lasersonlab.zarr.Group$.$anonfun$apply$19(Group.scala:133)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: DecodingFailure(Double, List())
Same data works in Python Zarr.
Currently Zarr arrays, and the ndarray.Vector
s that typically back them, support Traverse
operations as well as randomly-accessing individual elements.
Slicing along arbitrary dimensions would be good to add.
Would be nice to allow e.g.
Array((1000, 1000))(1 to 1000000)
in addition to the current
Array(1000 :: 1000 :: ⊥)(1 to 1000000)
@tomwhite's old prototype for traversing HDF5 files (via NetCDF) using Spark is still in the singlecell
module
It would be good to add parallelization options to the convert
CLI; most likely a Spark-specific code-path will be necessary for that, though in general it would be great to plumb parallelization through cats.Traverse
/ using Cats' Parallel. Future
s and cloud-functions might be workable there.
put a simple webapp on top of the zarr
module that allows viewing (cloud-resident) Zarr (or HDF5) files in the browser
something like HDFView, but web-based, targeting data in the cloud, and working for multiple CDM-style formats
directory/file listings and tree-maps based on file sizes are features I've personally really missed so far while interacting with these formats
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.