Comments (6)
When writing, sometimes a mask of values to overwrite is used. This error indicates that there is such a mask but there is no data.
This is only a fragment of the code. Can you reduce it to a minimal single function with all of the code necessary to reproduce it.
from tensorstore.
Hi,
here is an example where i tried to summerize what my code is doing.
It is split into two parts, because the config generation happens on the main node before multiple nodes write the dataset in parallel. The boundaries of each node should allways align with shard boundaries.
//generate multiscale neuroglancer config - done on main node
tensorstore::Open({{"driver", "neuroglancer_precomputed"},
{"kvstore", {
{"driver", "file"},
{"path", "<path_to_new_dataset>"}
}},
{"multiscale_metadata", {
{"data_type", "uint8_t"},
{"num_channels", numberChannels},
{"type", "image"},
}},
{"scale_metadata", {
{"resolution", {resolution[0], resolution[1], resolution[2]}},
{"encoding", encoding},
{"chunk_size", {chunkSize[0], chunkSize[1], chunkSize[2]}},
{"size", {sizeX, sizeY, sizeZ}},
{"sharding", {
{"@type", "neuroglancer_uint64_sharded_v1"},
{"preshift_bits", neuroBits.preShiftBits},
{"minishard_bits", neuroBits.miniShardBits},
{"shard_bits", neuroBits.shardBits},
{"minishard_index_encoding", "raw"},
{"hash", "identity"}
}},
}},
{"scale_index", index},
},
context,
tensorstore::OpenMode::create,
tensorstore::RecheckCached{false},
tensorstore::ReadWriteMode::write).value();
//reopen tensorstore - following code is done in parallel on multiple nodes
tensorstore::TensorStore store =
tensorstore::Open({{"driver", "neuroglancer_precomputed"},
{"kvstore", {{"driver", "file"},
{"path", "<path_to_new_dataset>"}}},
{"scale_index", index}},
context,
tensorstore::OpenMode::open,
tensorstore::RecheckCached{true},
tensorstore::ReadWriteMode::write).value();
//size and intervall of array should be aligned with shard
boost::multi_array<uint8_t,4> dataPrint(boost::extents[D4][endD3 - startD3][endD2 - startD2][endD1 - startD1]);
//load data into array - not tensorstore
//do stuff with array
//write data to
std::vector<int64_t> shape = {endD1 - startD1, endD2 - startD2, endD3 - startD3, D4};
auto intervalD1 = tensorstore::Dims(0).HalfOpenInterval(startD1, endD1);
auto intervalD2 = tensorstore::Dims(1).HalfOpenInterval(startD2, endD2);
auto intervalD3 = tensorstore::Dims(2).HalfOpenInterval(startD3, endD3);
auto arr = tensorstore::Array(dataArray.data(), shape, tensorstore::fortran_order);
auto writeFuture = tensorstore::Write(tensorstore::UnownedToShared(arr), store | intervalD1 | intervalD2 | intervalD3);
writeFuture.commit_future.value();
auto result = writeFuture.result();
exceptional_assert(result.ok(), "Error while writing to disk");
Do you need more information?
Thank you for your help.
from tensorstore.
Let's try and get a self-contained repro case; I've built a guint test incorporating your spec.
Is it possible for you to edit this self-contained test so that it fails?
#include <cstdint>
#include <vector>
#include <gtest/gtest.h>
#include "absl/status/status.h"
#include "tensorstore/array.h"
#include "tensorstore/context.h"
#include "tensorstore/contiguous_layout.h"
#include "tensorstore/index_space/dim_expression.h"
#include "tensorstore/open.h"
#include "tensorstore/open_mode.h"
#include "tensorstore/staleness_bound.h"
#include "tensorstore/tensorstore.h"
#include "tensorstore/util/status_testutil.h"
// Boost
#include "boost/multi_array.hpp"
static constexpr int D4 = 1;
absl::Status CreateTensorstore(tensorstore::Context context) {
int chunkSize[] = {16, 16, 16};
int size[] = {1024, 1024, 1024};
int preShiftBits = 2;
int miniShardBits = 4;
int shardBits = 8;
return tensorstore::Open(
{
{"driver", "neuroglancer_precomputed"},
{"kvstore",
{
{"driver", "memory"},
{"path", "prefix/"},
}},
{"multiscale_metadata",
{
{"data_type", "uint8"}, // not uint8_t
{"num_channels", D4},
{"type", "image"},
}},
{"scale_metadata",
{
{"resolution", {1.0, 1.0, 1.0}},
{"encoding", "raw"},
{"chunk_size",
{chunkSize[0], chunkSize[1], chunkSize[2]}},
{"size", {size[0], size[1], size[2]}},
{"sharding",
{{"@type", "neuroglancer_uint64_sharded_v1"},
{"preshift_bits", preShiftBits},
{"minishard_bits", miniShardBits},
{"shard_bits", shardBits},
{"minishard_index_encoding", "raw"},
{"hash", "identity"}}},
}},
{"scale_index", 0},
},
context, tensorstore::OpenMode::create,
tensorstore::RecheckCached{false},
tensorstore::ReadWriteMode::write)
.status();
}
TEST(Issue155, Repro) {
tensorstore::Context context = tensorstore::Context::Default();
TENSORSTORE_ASSERT_OK(CreateTensorstore(context));
int startD1 = 0;
int endD1 = 64;
int startD2 = 0;
int endD2 = 64;
int startD3 = 0;
int endD3 = 64;
// reopen tensorstore - following code is done in parallel on multiple nodes
TENSORSTORE_ASSERT_OK_AND_ASSIGN(
auto store,
tensorstore::Open(
{
{"driver", "neuroglancer_precomputed"},
{"kvstore",
{
{"driver", "memory"},
{"path", "prefix/"},
}},
{"scale_index", 0},
},
context, tensorstore::OpenMode::open,
tensorstore::RecheckCached{true}, tensorstore::ReadWriteMode::write)
.result());
// size and interval of array should be aligned with shard
boost::multi_array<uint8_t, 4> dataArray(
boost::extents[D4][endD3 - startD3][endD2 - startD2][endD1 - startD1]);
// load data into array - not tensorstore
// do stuff with array
// write data to
std::vector<int64_t> shape = {endD1 - startD1, endD2 - startD2,
endD3 - startD3, D4};
auto intervalD1 = tensorstore::Dims(0).HalfOpenInterval(startD1, endD1);
auto intervalD2 = tensorstore::Dims(1).HalfOpenInterval(startD2, endD2);
auto intervalD3 = tensorstore::Dims(2).HalfOpenInterval(startD3, endD3);
auto arr =
tensorstore::Array(dataArray.data(), shape, tensorstore::fortran_order);
auto writeFuture =
tensorstore::Write(tensorstore::UnownedToShared(arr),
store | intervalD1 | intervalD2 | intervalD3);
writeFuture.commit_future.Wait();
TENSORSTORE_ASSERT_OK(writeFuture.commit_future.result());
}
You will need to put together a proper BUILD rule for this.
from tensorstore.
I don't think I can get this test case to crash.
The only time I have seen this error is when multiple workers run on multiple nodes at the same time and all open and write to the same dataset asynchronously. They should not interfere with each other because everyone should be working on their own shard.
But I have also observed, that it looks like tensorstore loads some surrounding data, even when it is not needed for the intended operation.
from tensorstore.
Do you attempt to create on every node? Or is the create completely independent?
We can make it a simple binary with a --create flag which takes start/end[1-3] as parameters.
You can try running your original code with verbose logging enabled. See https://github.com/google/tensorstore/blob/master/tensorstore/internal/log/verbose_flag.h
TENSORSTORE_VERBOSE_LOGGING=all
from tensorstore.
Thanks for reporting this.
I identified the bug, and have a fix that we can hopefully push out shortly.
I believe the specific case that would trigger this is:
- Process 1: Write all zeros to just a portion of a chunk. Starts writeback of shard. Observes that the chunk is equal to the fill value (all zero) because the existing chunk is either not present, or all unmodified elements are zero. At this point, the data array is freed since it is equal to the fill value, but the mask remains as it was to indicate a partial modification.
- Process 2: Concurrently modifies the shard.
- Process 1: Writeback must be retried due to concurrent modification. When integrating the new contents of the shard, assertion is triggered due to unexpected combination of partial modification with no data array.
There are two important things to note, though:
- If this sequence is indeed what triggered the bug, then that means your writes are in fact not shard aligned as you thought they were. Even with the bug fixed, shard aligned writes will be much more efficient.
- The assertion only triggers in debug builds (with NDEBUG not defined). For production use, disabling assertions may make it significantly faster. Usually NDEBUG will be defined automatically in release builds, so you may want to check to confirm you are building with optimizations.
from tensorstore.
Related Issues (20)
- Please create a pre-built PyPI wheel for linux arm64 HOT 7
- Support of large files for grpc_kvstore HOT 1
- S3 kvstore driver not being recognized HOT 3
- downsample driver produces repeating patterns HOT 8
- Debug logs for S3 driver, log requests and responses HOT 9
- Cache pool context from multiple python processes HOT 2
- Error while installing "tensorstore" on jetson linux (jetpack) HOT 1
- How to use a custom bazel installation? HOT 2
- Building on linux with ppc64le arch
- Specify S3 credentials directly HOT 3
- Transactional/ACID semantics HOT 1
- Failing to build tensorestore as a cmake project HOT 4
- Using s3 kvstore with minio HOT 3
- Error reading shard index, Requested byte range... was not satisfied by response with byte range ... HOT 4
- Unable to include tensorstore as a cmake dependency HOT 1
- Question: does tensorstore support array with multiple dynamic dimensions? HOT 2
- Clarify in documentation if the C++ API is thread safe HOT 2
- Writing local files fails on Windows 11 HOT 6
- Python library fails to compile with gcc 14 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorstore.