Hi there, i was writing a large neuroglanger dataset when some of my

Assertion `IsUnmodified()' failed about tensorstore HOT 6 CLOSED

LarsKoeppel commented on September 23, 2024

Assertion `IsUnmodified()' failed

from tensorstore.

Comments (6)

laramiel commented on September 23, 2024

When writing, sometimes a mask of values to overwrite is used. This error indicates that there is such a mask but there is no data.

This is only a fragment of the code. Can you reduce it to a minimal single function with all of the code necessary to reproduce it.

from tensorstore.

LarsKoeppel commented on September 23, 2024

Hi,

here is an example where i tried to summerize what my code is doing.
It is split into two parts, because the config generation happens on the main node before multiple nodes write the dataset in parallel. The boundaries of each node should allways align with shard boundaries.

//generate multiscale neuroglancer config - done on main node
 tensorstore::Open({{"driver", "neuroglancer_precomputed"},
                                       {"kvstore", {
                                            {"driver", "file"},
                                            {"path", "<path_to_new_dataset>"}
                                        }},
                                       {"multiscale_metadata", {
                                            {"data_type", "uint8_t"}, 
                                            {"num_channels", numberChannels},
                                            {"type", "image"},
                                        }},
                                       {"scale_metadata", {
                                            {"resolution", {resolution[0], resolution[1], resolution[2]}},
                                            {"encoding", encoding},
                                            {"chunk_size", {chunkSize[0], chunkSize[1], chunkSize[2]}},
                                            {"size", {sizeX, sizeY, sizeZ}},
                                            {"sharding", {
                                                 {"@type", "neuroglancer_uint64_sharded_v1"},
                                                 {"preshift_bits", neuroBits.preShiftBits},
                                                 {"minishard_bits", neuroBits.miniShardBits},
                                                 {"shard_bits", neuroBits.shardBits},
                                                 {"minishard_index_encoding", "raw"},
                                                 {"hash", "identity"}
                                             }},
                                        }},
                                       {"scale_index", index},
                                      },
                                      context,
                                      tensorstore::OpenMode::create,
                                      tensorstore::RecheckCached{false},
                                      tensorstore::ReadWriteMode::write).value();

//reopen tensorstore - following code is done in parallel on multiple nodes
tensorstore::TensorStore store =
                    tensorstore::Open({{"driver", "neuroglancer_precomputed"},
                                       {"kvstore", {{"driver", "file"},
                                                    {"path", "<path_to_new_dataset>"}}},
                                       {"scale_index", index}},
                                      context,
                                      tensorstore::OpenMode::open,
                                      tensorstore::RecheckCached{true},
                                      tensorstore::ReadWriteMode::write).value();

//size and intervall of array should be aligned with shard
boost::multi_array<uint8_t,4> dataPrint(boost::extents[D4][endD3 - startD3][endD2 - startD2][endD1 - startD1]);
//load data into array - not tensorstore
//do stuff with array

//write data to 
std::vector<int64_t> shape = {endD1 - startD1, endD2 - startD2, endD3 - startD3, D4};
auto intervalD1 = tensorstore::Dims(0).HalfOpenInterval(startD1, endD1);
auto intervalD2 = tensorstore::Dims(1).HalfOpenInterval(startD2, endD2);
auto intervalD3 = tensorstore::Dims(2).HalfOpenInterval(startD3, endD3);
auto arr = tensorstore::Array(dataArray.data(), shape, tensorstore::fortran_order);  
auto writeFuture = tensorstore::Write(tensorstore::UnownedToShared(arr), store | intervalD1 | intervalD2 | intervalD3);   
writeFuture.commit_future.value();   
auto result = writeFuture.result();   
exceptional_assert(result.ok(), "Error while writing to disk");

Do you need more information?
Thank you for your help.

from tensorstore.

laramiel commented on September 23, 2024

Let's try and get a self-contained repro case; I've built a guint test incorporating your spec.
Is it possible for you to edit this self-contained test so that it fails?

#include <cstdint>
#include <vector>

#include <gtest/gtest.h>

#include "absl/status/status.h"
#include "tensorstore/array.h"
#include "tensorstore/context.h"
#include "tensorstore/contiguous_layout.h"
#include "tensorstore/index_space/dim_expression.h"
#include "tensorstore/open.h"
#include "tensorstore/open_mode.h"
#include "tensorstore/staleness_bound.h"
#include "tensorstore/tensorstore.h"
#include "tensorstore/util/status_testutil.h"

// Boost
#include "boost/multi_array.hpp"

static constexpr int D4 = 1;

absl::Status CreateTensorstore(tensorstore::Context context) {
  int chunkSize[] = {16, 16, 16};
  int size[] = {1024, 1024, 1024};

  int preShiftBits = 2;
  int miniShardBits = 4;
  int shardBits = 8;

  return tensorstore::Open(
             {
                 {"driver", "neuroglancer_precomputed"},
                 {"kvstore",
                  {
                      {"driver", "memory"},
                      {"path", "prefix/"},
                  }},
                 {"multiscale_metadata",
                  {
                      {"data_type", "uint8"},  // not uint8_t
                      {"num_channels", D4},
                      {"type", "image"},
                  }},
                 {"scale_metadata",
                  {
                      {"resolution", {1.0, 1.0, 1.0}},
                      {"encoding", "raw"},
                      {"chunk_size",
                       {chunkSize[0], chunkSize[1], chunkSize[2]}},
                      {"size", {size[0], size[1], size[2]}},
                      {"sharding",
                       {{"@type", "neuroglancer_uint64_sharded_v1"},
                        {"preshift_bits", preShiftBits},
                        {"minishard_bits", miniShardBits},
                        {"shard_bits", shardBits},
                        {"minishard_index_encoding", "raw"},
                        {"hash", "identity"}}},
                  }},
                 {"scale_index", 0},
             },
             context, tensorstore::OpenMode::create,
             tensorstore::RecheckCached{false},
             tensorstore::ReadWriteMode::write)
      .status();
}

TEST(Issue155, Repro) {
  tensorstore::Context context = tensorstore::Context::Default();

  TENSORSTORE_ASSERT_OK(CreateTensorstore(context));

  int startD1 = 0;
  int endD1 = 64;
  int startD2 = 0;
  int endD2 = 64;
  int startD3 = 0;
  int endD3 = 64;

  // reopen tensorstore - following code is done in parallel on multiple nodes
  TENSORSTORE_ASSERT_OK_AND_ASSIGN(
      auto store,
      tensorstore::Open(
          {
              {"driver", "neuroglancer_precomputed"},
              {"kvstore",
               {
                   {"driver", "memory"},
                   {"path", "prefix/"},
               }},
              {"scale_index", 0},
          },
          context, tensorstore::OpenMode::open,
          tensorstore::RecheckCached{true}, tensorstore::ReadWriteMode::write)
          .result());

  // size and interval of array should be aligned with shard
  boost::multi_array<uint8_t, 4> dataArray(
      boost::extents[D4][endD3 - startD3][endD2 - startD2][endD1 - startD1]);
  // load data into array - not tensorstore
  // do stuff with array

  // write data to
  std::vector<int64_t> shape = {endD1 - startD1, endD2 - startD2,
                                endD3 - startD3, D4};
  auto intervalD1 = tensorstore::Dims(0).HalfOpenInterval(startD1, endD1);
  auto intervalD2 = tensorstore::Dims(1).HalfOpenInterval(startD2, endD2);
  auto intervalD3 = tensorstore::Dims(2).HalfOpenInterval(startD3, endD3);
  auto arr =
      tensorstore::Array(dataArray.data(), shape, tensorstore::fortran_order);
  auto writeFuture =
      tensorstore::Write(tensorstore::UnownedToShared(arr),
                         store | intervalD1 | intervalD2 | intervalD3);
  writeFuture.commit_future.Wait();

  TENSORSTORE_ASSERT_OK(writeFuture.commit_future.result());
}

You will need to put together a proper BUILD rule for this.

from tensorstore.

LarsKoeppel commented on September 23, 2024

I don't think I can get this test case to crash.
The only time I have seen this error is when multiple workers run on multiple nodes at the same time and all open and write to the same dataset asynchronously. They should not interfere with each other because everyone should be working on their own shard.
But I have also observed, that it looks like tensorstore loads some surrounding data, even when it is not needed for the intended operation.

from tensorstore.

laramiel commented on September 23, 2024

Do you attempt to create on every node? Or is the create completely independent?

We can make it a simple binary with a --create flag which takes start/end[1-3] as parameters.

You can try running your original code with verbose logging enabled. See https://github.com/google/tensorstore/blob/master/tensorstore/internal/log/verbose_flag.h

TENSORSTORE_VERBOSE_LOGGING=all

from tensorstore.

jbms commented on September 23, 2024

Thanks for reporting this.

I identified the bug, and have a fix that we can hopefully push out shortly.

I believe the specific case that would trigger this is:

Process 1: Write all zeros to just a portion of a chunk. Starts writeback of shard. Observes that the chunk is equal to the fill value (all zero) because the existing chunk is either not present, or all unmodified elements are zero. At this point, the data array is freed since it is equal to the fill value, but the mask remains as it was to indicate a partial modification.
Process 2: Concurrently modifies the shard.
Process 1: Writeback must be retried due to concurrent modification. When integrating the new contents of the shard, assertion is triggered due to unexpected combination of partial modification with no data array.

There are two important things to note, though:

If this sequence is indeed what triggered the bug, then that means your writes are in fact not shard aligned as you thought they were. Even with the bug fixed, shard aligned writes will be much more efficient.
The assertion only triggers in debug builds (with NDEBUG not defined). For production use, disabling assertions may make it significantly faster. Usually NDEBUG will be defined automatically in release builds, so you may want to check to confirm you are building with optimizations.

from tensorstore.

Assertion `IsUnmodified()' failed about tensorstore HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent