Comments (2)
Thanks for investigating this so thoroughly!
We can probably ensure that tensorstore includes the uncompressed size in the header in this case, but in general there could be multiple variable-output-size codecs chained and it is desirable to be able to do streaming encoding.
Therefore in addition to that, other implementations should still support decoding without the size in the header.
from tensorstore.
Here's an illustration of saving a zarr array with tensorstore using zstd compression. python-zstandard is unable to open the chunk unless max_output_size
is provided.
In [1]: import tensorstore as ts, zstandard as zstd
In [2]: ds = ts.open({
...: 'driver': 'zarr',
...: 'kvstore': {
...: 'driver': 'file',
...: 'path': 'tmp/zarr_zstd_dataset',
...: },
...: 'metadata': {
...: 'compressor': {
...: 'id': 'zstd',
...: 'level': 3,
...: },
...: 'shape': [1024, 1024],
...: 'chunks': [64, 64],
...: 'dtype': '|u1',
...: }
...: }).result()
In [3]: ds[:] = 5
In [4]: with open("tmp/zarr_zstd_dataset/0/0", "rb") as f:
...: src = f.read()
...:
In [5]: zstd.backend_c.frame_content_size(src)
Out[5]: -1
In [6]: zstd.ZstdDecompressor().decompress(src)
---------------------------------------------------------------------------
ZstdError Traceback (most recent call last)
Cell In[6], line 1
----> 1 zstd.ZstdDecompressor().decompress(src)
ZstdError: could not determine content size in frame header
In [7]: zstd.ZstdDecompressor().decompress(src, max_output_size=1024*1024)
Out[7]: b'\x05\x05\x05\x05\x05\x05\x05\x05\x05 [...] \x05\x05\x05 '
For an example of being unable to open the dataset with zarr-python see zarr-developers/zarr-python#2056
from tensorstore.
Related Issues (20)
- Unable to include tensorstore as a cmake dependency HOT 1
- Question: does tensorstore support array with multiple dynamic dimensions? HOT 2
- Clarify in documentation if the C++ API is thread safe HOT 2
- Writing local files fails on Windows 11 HOT 3
- Python library fails to compile with gcc 14 HOT 1
- png support for neuroglancer precomputed
- make fails when using c++ API HOT 6
- c++ ninja build failing on gh-actions using windows-latest HOT 12
- NumPy 2 support HOT 3
- Tensorstore's S3 retry implementation does not conform to S3 specs, resulting in checkpointing failing when it should not. HOT 4
- Seg fault in zero-length in-memory arrays
- anonymous S3 HOT 5
- Trouble understanding `resolve`? HOT 9
- consider looking in `/etc/pki/ca-trust/extracted` for CA certificates? HOT 3
- question about writing parallel and group handling HOT 33
- Slow random read performance HOT 7
- Segfault/Mutex Error HOT 11
- Does zarr_sharding_indexed exist? HOT 1
- Incorrect writes using int array indexing, affected by chunk layout HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorstore.