Comments (3)
@jasonkena - thanks for the report. Your diagnosis seems correct but I'm not sure what we want to do about it. Its quite expensive to be always reloading metadata to protect against metadata modifications by another writer.
Finally, I should note that we haven't settled on whether or not to keep the synchronizer API around for the 3.0 release (it is not currently included).
from zarr-python.
The problem seems to be that the cached metadata is not updated after the shape is resized in another thread/process, leading to dropped rows.
I found two workarounds:
- defining another a custom append function which forces metadata to be reloaded
def fixed_append(arr, data, axis=0):
def fixed_append_nosync(data, axis=0):
arr._load_metadata_nosync()
return arr._append_nosync(data, axis=axis)
return arr._write_op(fixed_append_nosync, data, axis=axis)
- specifying
cache_metadata=False
to force reloading at all data accesses
Perhaps the default value for cache_metadata
should be False
when synchronizer
is specified to prevent this behavior?
I believe this resolves these StackOverflow questions:
- https://stackoverflow.com/questions/61929796/parallel-appending-to-a-zarr-store-via-xarray-to-zarr-and-dask
- https://stackoverflow.com/questions/61799664/how-can-one-write-lock-a-zarr-store-during-append
from zarr-python.
Oddly enough, both workarounds fail when working with in-memory zarr arrays (initialized with zarr.zeros(...)
)
from zarr-python.
Related Issues (20)
- http zarr proxy HOT 4
- [v3] benchmarks and performance tools HOT 1
- `zarr.{array,config,group}` are both modules and functions HOT 5
- _merge_chunk_array could allocate memory more carefully HOT 5
- Adding GPU CI HOT 7
- Re-enable dependabot on `main` branch HOT 2
- [v3] Inner chunk size validation behavior for `ShardingCodec` when downstream of `TransposeCodec` HOT 1
- zarr-python cannot read arrays saved by tensorstore using the zstd compressor HOT 1
- write_empty_chunks and zarr.zeros HOT 1
- Boolean array uses same amount of space as uint8 HOT 2
- pytest is an unnecessary dependency? HOT 4
- Keys not URL-decoded when loaded over the network
- User attributes not being saved to file HOT 9
- Failure to encode `object` types when used with `zarr.full` HOT 3
- How easy can it be to create a zarr array
- Inconsistent reading performance with multiple cpu threads
- deprecate n5
- Enable ruff rull B905 on v2 branch
- Store ABC doesn't match spec HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zarr-python.