In my application I have several big data structures, that I want to dump to the disk,

Some ideas to maybe think about: use the <code class="notransl

Hello ! I like and agree with the ideas you described. <blockquo

I was playing around, but it seems to be harder than expected: <a href="https://gcc.go

Using cista array with big array size cause seg fault,about felixguendling/cista

Comments (20)

felixguendling commented on May 23, 2024

The problem is that currently, the type hashing requires the type to actually exist on the stack. However, your data structure does not fit on the stack. So the program segfaults when creating the value on the stack. This is nothing I can fix quickly. I will try to change the hashing function to work with pointers. Since the actual data is never really accessed (only the type names are hashed recursively), this could work.

from cista.

devildevilson commented on May 23, 2024

Hm, I see. Is there any notable overhead of choosing cista::vector over cista::array? If it's not then I'll use the first one.

from cista.

felixguendling commented on May 23, 2024

At least as a temporary solution using a cista::vector instead of a cista::array does probably make sense. A vector should have 32 byte overhead (pointer, used size, reserved size, self allocated flag). Alternatively, you could disable type hashing (versioning) the data structure (depending on your use case).

from cista.

devildevilson commented on May 23, 2024

Ok, I'll figure it out. But still It would be appreciate to make Cista work with big arrays. Thank you for the response.

from cista.

AdelKS commented on May 23, 2024

I just encountered this issue too: I used to raise the stack size to handle large std::arrays and wanted to not do that anymore by putting them behind cista::unique_ptr.

I believe "static"/"compile-time" hashing is an elegant solution:

Let's take std::array for example, we define a new kind of template functions static_type_hash like this

template <class T> requires std_array_type_v<T>
hash_t static_type_hash(hash_t h, std::map<hash_t, unsigned>& done)
{
  h = hash_combine(h, hash("array"));
  h = hash_combine(h, Size);
  if constexpr (has_static_type_hash_v<T::value_type>)
    return static_type_hash<T::value_type>(h, done);
  else return type_hash(T::value_type{}, h, done);
}

I started implementing a similar approach in #153 but it was using static methods within classes, we cannot do it any longer because we cannot define a static method within the std::array class. So we have to define these functions outside.

from cista.

felixguendling commented on May 23, 2024

Some ideas to maybe think about:

use the constexpr function type_str instead of canonical_type_str which cannot be constexpr in C++17
replace the std::map<hash_t, unsigned> with a constexpr-compatible std::array<std::pair<hash_t, unsigned>> and implement emplace() and contains() on it
switch the T const& argument to T const* (and always feed a nullptr) or replace the type with std::enable_if<std::is_same_v<T, ...>> (instead of the requires)
not sure, if the has_static_type_hash check needs to be done in every function. I think it's enough to do it top-level once for the call. has_static_type_hash needs to be recursive anyway - meaning it should check for a vector<X> also that has_static_type_hash<X>. So if you can make sure top-level that everything can be hashed at compile time, this should eliminate the need to have checks in each function.

For the implementation in general: As long as new C++2x features are not "system-enablers" but rather syntactic sugar, I would stick with code that is also C++17 compatible. In my own projects, I switched to C++20 or C++23. But I'm not sure how many people using this library have already switched. So if it's something small like std::enable_if_t vs requires, I would for now stick to the old version, even if it's a bit more code to write in cista, as long as the client code stays the same.

The implementation in #153 is problematic in the way that it doesn't even try to be compatible with the old hash, meaning for the same types it will always produce a different hash. I think it would make sense to stick closely to the old implementation, just make it constexpr by replacing T const& with T const* and feed nullptr, replace the map type and add the has_static_type_hash switch at the root call. Then testing should ensure that it produces the same hash values as the old implementation.

from cista.

AdelKS commented on May 23, 2024

Hello !

I like and agree with the ideas you described.

switch the T const& argument to T const* (and always feed a nullptr) or replace the type with std::enable_if<std::is_same_v<T, ...>> (instead of the requires)

I'd go with the std::enable_if route with template functions since the pointers serve no purpose other than expressing the type. But there's a caveat, see next point.

not sure, if the has_static_type_hash check needs to be done in every function. I think it's enough to do it top-level once for the call. has_static_type_hash needs to be recursive anyway - meaning it should check for a vector also that has_static_type_hash. So if you can make sure top-level that everything can be hashed at compile time, this should eliminate the need to have checks in each function.

I have a question: are there cases where we cannot know the type hash at compile time ? I assumed that, that's why I thought of two routes (static_type_hash and type_hash) where probably the check needs to happen at every level. Or can we simply change the type_hash function so it never instances any type ? If we can get a cista_members tuple of for every class then yes. The question is more about user classes: how do we ask them to define their type hash if we are to make this static type hashing change. If we are to take a template function approach, the user needs a constexpr construct to differenciate between his classes (std::enable_if with is_my_class_A_v<T>... Etc), whereas the nullptr approach would work with simple function overloads as is done now 🤔 but it doesn't feel clean :/

For the requires clause, I absolutely agree. I just wrote it to express meaning, I understand perfectly the need to remain C++17.

The implementation in #153 is problematic in the way that it doesn't even try to be compatible with the old hash, meaning for the same types it will always produce a different hash

Oh yeah that must be avoided if possible. It was still more of a POC than anything else x)

replace the std::map<hash_t, unsigned> with a constexpr-compatible std::array<std::pair<hash_t, unsigned>> and implement emplace() and contains() on it

Moving to entirely compile-time type hashing can be done in a follow-up PR I think, like this it would be more incremental.

I'll try to reopen my draft PR and edit it given your feedback. Let's see!

from cista.

felixguendling commented on May 23, 2024

are there cases where we cannot know the type hash at compile time ?

I don't think so. Types should all be known at compile time. So in theory, it should be possible to implement type hashing at compile time.

Or can we simply change the type_hash function so it never instances any type ?

I think this is possible (see above).

how do we ask them to define their type hash

The change should be backwards-compatible so that existing working code doesn't have to be changed.

but it doesn't feel clean :/

I think the pointer solution produces more readable and much shorter code (no need to implement is_std_array_v<T> for each and every class. If it works, this should be fine. Especially since everything that would be unclean doesn't compile with constexpr (compiler checks for UB in constexpr functions).

Moving to entirely compile-time type hashing can be done in a follow-up PR I think, like this it would be more incremental.

I think it would be more efficient to touch every function only once and then do it directly at compile time instead of touching every function twice (1x for conversion to pointer, another time for real constexprness). But maybe I'm just too lazy to touch everything twice.

from cista.

felixguendling commented on May 23, 2024

I was playing around, but it seems to be harder than expected: https://gcc.godbolt.org/z/ovrExTeWT
The done and member variables are not constexpr. done I understand, but member not really.

from cista.

AdelKS commented on May 23, 2024

It looks like the compiler wasn't reporting the correct error message because the issue was complex. Here's a working increment over your code: https://gcc.godbolt.org/z/6qbof7KTr

I removed handling pointers: actually aren't we supposed to not support pointers ? Since they have to be used with a cista::unique_ptr or cista::shared_ptr ?

from cista.

felixguendling commented on May 23, 2024

Thank you very much!

I converted it to C++17 compatible code: https://gcc.godbolt.org/z/P7zErb68q

Since the pair<hash_t, done_map> was always passed around together (either as argument or return value), I made a separate struct hash_data + some small other refactorings.

I removed handling pointers: actually aren't we supposed to not support pointers ? Since they have to be used with a cista::unique_ptr or cista::shared_ptr

Raw pointers are fully supported by cista. cista::raw::ptr<T> is just an alias to T*. cista::offset::ptr<T> is a class that stores the actual address it's pointing to relative to this and computes the actual address on access (dereference). unique_ptr and shared_ptr are used to express ownership over the pointed to memory (in cista but also in std). However, this is not sufficient if you want to have non-owning pointers. Therefore, cista also supports to serialize and deserialize raw pointers (to express "I point to something, but I don't own it"). So this should definitely be supported by type-hashing.

For reference, I paste the code here (not sure how permanent those Compiler Explorer links are):

#include "cista.h"

namespace cista {

template <typename T>
constexpr hash_t static_type2str_hash() noexcept {
  return hash_combine(hash(type_str<decay_t<T>>()), sizeof(T));
}

template <typename T>
constexpr auto to_member_type_list() {
  auto t = T{};
  return cista::to_tuple(t);
}

template <typename T>
using tuple_representation_t = decltype(to_member_type_list<T>());

template <typename K, typename V, std::size_t Size>
struct count_map {
  using key_type = K;
  using mapped_type = V;
  struct value_type {
    key_type first;
    mapped_type second;
  };
  using iterator = typename std::array<value_type, Size>::iterator;
  using const_iterator = typename std::array<value_type, Size>::const_iterator;

  constexpr count_map() = default;

  constexpr std::pair<mapped_type, bool> add(key_type k) {
    auto const it = find(k);
    if (it == end()) {
      arr_[size_] = value_type{k, static_cast<mapped_type>(size_)};
      ++size_;
      return {size_ - 1U, true};
    } else {
      return {it->second, false};
    }
  }

  constexpr const_iterator find(key_type k) const {
    for (auto it = begin(); it != end(); ++it) {
      if (it->first == k) {
        return it;
      }
    }
    return end();
  }

  constexpr const_iterator begin() const { return std::begin(arr_); }
  constexpr const_iterator end() const { return std::next(begin(), size_); }

  std::size_t size_{0U};
  std::array<value_type, Size> arr_{};
};

template <std::size_t NMaxTypes>
using hashes_t = count_map<hash_t, unsigned, NMaxTypes>;

template <std::size_t NMaxTypes>
struct hash_data {
  constexpr hash_data combine(hash_t const h) const {
    hash_data r;
    r.done_ = done_;
    r.h_ = hash_combine(h_, h);
    return r;
  }
  hashes_t<NMaxTypes> done_;
  hash_t h_{BASE_HASH};
};

template <std::size_t NMaxTypes>
constexpr auto static_type_hash(std::uint64_t const* el, hash_data<NMaxTypes> const h) {
  return h.combine(hash("u64"));
}

template <std::size_t NMaxTypes>
constexpr auto static_type_hash(float const* el,  hash_data<NMaxTypes> const h) {
  return h.combine(hash("f32"));
}

template <std::size_t NMaxTypes>
constexpr auto static_type_hash(double const* el, hash_data<NMaxTypes> const h) {
  return h.combine(hash("f64"));
}

template <typename Tuple, std::size_t NMaxTypes>
constexpr auto hash_tuple(hash_data<NMaxTypes> h) {
  auto const call_on_tuple_element = [&]<size_t I>() {
    using element_type = std::decay_t<std::tuple_element_t<I, Tuple>>;
    h = static_type_hash(static_cast<element_type const*>(nullptr), h);
  };

  auto const int_seq_for = [&]<std::size_t... I>(std::index_sequence<I...>) {
    (call_on_tuple_element.template operator()<I>(), ...);
  };

  int_seq_for(std::make_index_sequence<std::tuple_size_v<Tuple>>());

  return h;
}

template <typename T, std::size_t NMaxTypes>
constexpr auto static_type_hash(T const* el, hash_data<NMaxTypes> h) {
  using Type = decay_t<T>;

  auto const base_hash = static_type2str_hash<Type>();
  auto [ordering, inserted] = h.done_.add(base_hash);
  if (!inserted) {
    return h.combine(ordering);
  }

  static_assert(to_tuple_works_v<Type>, "Please implement custom type hash.");
  return hash_tuple<tuple_representation_t<T>>(h.combine(hash("struct")));
}

template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
  return h.combine(hash("array")).combine(Size);
}

template <typename T, std::size_t NMaxTypes = 128U>
constexpr hash_t static_type_hash() {
  return static_type_hash(static_cast<T const*>(nullptr), hash_data<NMaxTypes>{}).h_;
}

}  // namespace cista

struct x {
  float x_;
  std::uint64_t y_;
  std::array<cista::offset::ptr<x>, 3> z_;
};

int main() {
  printf("%" PRIu64 "\n", cista::static_type_hash<x>());
}

from cista.

AdelKS commented on May 23, 2024

[... code block ...]

What a beautiful increment/refactor ! The hash_data struct is very nice !

(to express "I point to something, but I don't own it")

True that, we need that x)

It looks like you are close to implementing compile-time type hashing on your own, should I simply wait for you to complete it ?

from cista.

AdelKS commented on May 23, 2024

template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
  return h.combine(hash("array")).combine(Size);
}

It looks like it's missing the hash of the type T 🤔

from cista.

felixguendling commented on May 23, 2024

Thanks!! :)

Should be better now: https://gcc.godbolt.org/z/zxGs5d5vG

Also added pointer hashing and used the old structure for the main hashing function that included also scalars.

I need to figure out how to include static type hashing.. options would be:

separate flag cista::mode::WITH_STATIC_VERSION
try to figure it out automatically (if the function compiles, it is used)

I think it's better to give the user more control and use the WITH_STATIC_VERSION. Otherwise, the automatic mode might not always chose the static type hashing because there are programming mistakes and it's hard to figure out what's wrong.

#include "cista.h"

namespace cista {

template <typename T>
constexpr hash_t static_type2str_hash() noexcept {
  return hash_combine(hash(type_str<decay_t<T>>()), sizeof(T));
}

template <typename T>
constexpr auto to_member_type_list() {
  auto t = T{};
  return cista::to_tuple(t);
}

template <typename T>
using tuple_representation_t = decltype(to_member_type_list<T>());

template <typename K, typename V, std::size_t Size>
struct count_map {
  using key_type = K;
  using mapped_type = V;
  struct value_type {
    key_type first;
    mapped_type second;
  };
  using iterator = typename std::array<value_type, Size>::iterator;
  using const_iterator = typename std::array<value_type, Size>::const_iterator;

  constexpr count_map() = default;

  constexpr std::pair<mapped_type, bool> add(key_type k) {
    auto const it = find(k);
    if (it == end()) {
      arr_[size_] = value_type{k, static_cast<mapped_type>(size_)};
      ++size_;
      return {size_ - 1U, true};
    } else {
      return {it->second, false};
    }
  }

  constexpr const_iterator find(key_type k) const {
    for (auto it = begin(); it != end(); ++it) {
      if (it->first == k) {
        return it;
      }
    }
    return end();
  }

  constexpr const_iterator begin() const { return std::begin(arr_); }
  constexpr const_iterator end() const { return std::next(begin(), size_); }

  std::size_t size_{0U};
  std::array<value_type, Size> arr_{};
};

template <std::size_t NMaxTypes>
struct hash_data {
  constexpr hash_data combine(hash_t const h) const {
    hash_data r;
    r.done_ = done_;
    r.h_ = hash_combine(h_, h);
    return r;
  }
  count_map<hash_t, unsigned, NMaxTypes> done_;
  hash_t h_{BASE_HASH};
};

template <typename Tuple, std::size_t NMaxTypes>
constexpr auto hash_tuple(hash_data<NMaxTypes> h) {
  auto const call_on_tuple_element = [&]<std::size_t I>() {
    using element_type = std::decay_t<std::tuple_element_t<I, Tuple>>;
    h = static_type_hash(static_cast<element_type*>(nullptr), h);
  };

  auto const int_seq_for = [&]<std::size_t... I>(std::index_sequence<I...>) {
    (call_on_tuple_element.template operator()<I>(), ...);
  };

  int_seq_for(std::make_index_sequence<std::tuple_size_v<Tuple>>());

  return h;
}

template <typename T, std::size_t NMaxTypes, typename = std::enable_if_t<!std::is_pointer_v<T>>>
constexpr auto static_type_hash(T const* el, hash_data<NMaxTypes> h) {
  using Type = decay_t<T>;
  
  auto const base_hash = static_type2str_hash<Type>();
  auto const [ordering, inserted] = h.done_.add(base_hash);
  if (!inserted) {
    return h.combine(ordering);
  }

  if constexpr (is_pointer_v<Type>) {
    using PointeeType = remove_pointer_t<Type>;
    if constexpr (std::is_same_v<PointeeType, void>) {
      return h.combine(hash("void*"));
    } else {
      h = h.combine(hash("pointer"));
      return static_type_hash(static_cast<remove_pointer_t<Type>*>(nullptr), h);
    }
  } else if constexpr (std::is_integral_v<Type>) {
    return h.combine(hash("i")).combine(sizeof(Type));
  } else if constexpr (std::is_scalar_v<Type>) {
    return h.combine(static_type2str_hash<T>());
  } else {
    static_assert(to_tuple_works_v<Type>, "Please implement custom type hash.");
    return hash_tuple<tuple_representation_t<T>>(h.combine(hash("struct")));
  }
}

template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
  h = static_type_hash(static_cast<T*>(nullptr), h);
  return h.combine(hash("array")).combine(Size);
}

template <typename T, std::size_t NMaxTypes = 128U>
constexpr hash_t static_type_hash() {
  return static_type_hash(static_cast<T*>(nullptr), hash_data<NMaxTypes>{}).h_;
}

}  // namespace cista

struct x {
  float x_;
  std::uint64_t y_;
  std::array<cista::offset::ptr<x>, 3> z_;
};

static_assert(6399962213331131165 == cista::static_type_hash<x>());

int main() {
  constexpr auto const hash = cista::static_type_hash<x>();
  printf("%" PRIu64 "\n", hash);
}

from cista.

AdelKS commented on May 23, 2024

You told me we can in theory switch to 100% static type hashing so I don't really understand the dilemma you are facing 🤔

the automatic mode might not always chose the static type hashing

This is assuming to have both type hashing implemented right (runtime and compile time) ?

from cista.

felixguendling commented on May 23, 2024

With the current version of static type hashing it's not possible to reproduce exactly the type hashes from the old runtime-hashing version. So I think it would be better to support both versions in parallel. This way, people who use the library who have data that is serialized with the runtime-version don't need to re-serialize to upgrade cista. Of course static type hashing will be done at compile time.

Implementation wip is here: #172

Edit: reason why the new version doesn't produce compatible hashes is primarily the difference between canonical_type_str and type_str. This is done to get the same string for the same type on different compilers (but still isn't perfect..). These replacements cannot be done on std::string_view. One solution would be to switch from std::string_view to std::array<std::string_view> to be able to cut out unwanted parts of the string at compile time. But I didn't experiment with it yet. This would be a way to make static_type_hash<T> == type_hash<T> for every T.

from cista.

felixguendling commented on May 23, 2024

Merged #172

Closing here. Please open a new issue if there are any problems with the implementation.

from cista.

AdelKS commented on May 23, 2024

I finally tested the change on my side, all good ! Awesome work !

from cista.

felixguendling commented on May 23, 2024

Thank you!

And thank you for your work on making my "first shitty draft" compile! :-)

from cista.

AdelKS commented on May 23, 2024

Happy to help and discuss things to get the idea into shape !

from cista.

Using cista array with big array size cause seg fault about cista HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent