Comments (20)
The problem is that currently, the type hashing requires the type to actually exist on the stack. However, your data structure does not fit on the stack. So the program segfaults when creating the value on the stack. This is nothing I can fix quickly. I will try to change the hashing function to work with pointers. Since the actual data is never really accessed (only the type names are hashed recursively), this could work.
from cista.
Hm, I see. Is there any notable overhead of choosing cista::vector
over cista::array
? If it's not then I'll use the first one.
from cista.
At least as a temporary solution using a cista::vector
instead of a cista::array
does probably make sense. A vector should have 32 byte overhead (pointer, used size, reserved size, self allocated flag). Alternatively, you could disable type hashing (versioning) the data structure (depending on your use case).
from cista.
Ok, I'll figure it out. But still It would be appreciate to make Cista work with big arrays. Thank you for the response.
from cista.
I just encountered this issue too: I used to raise the stack size to handle large std::array
s and wanted to not do that anymore by putting them behind cista::unique_ptr
.
I believe "static"/"compile-time" hashing is an elegant solution:
Let's take std::array
for example, we define a new kind of template functions static_type_hash
like this
template <class T> requires std_array_type_v<T>
hash_t static_type_hash(hash_t h, std::map<hash_t, unsigned>& done)
{
h = hash_combine(h, hash("array"));
h = hash_combine(h, Size);
if constexpr (has_static_type_hash_v<T::value_type>)
return static_type_hash<T::value_type>(h, done);
else return type_hash(T::value_type{}, h, done);
}
I started implementing a similar approach in #153 but it was using static methods within classes, we cannot do it any longer because we cannot define a static method within the std::array
class. So we have to define these functions outside.
from cista.
Some ideas to maybe think about:
- use the
constexpr
functiontype_str
instead ofcanonical_type_str
which cannot beconstexpr
in C++17 - replace the
std::map<hash_t, unsigned>
with aconstexpr
-compatiblestd::array<std::pair<hash_t, unsigned>>
and implementemplace()
andcontains()
on it - switch the
T const&
argument toT const*
(and always feed anullptr
) or replace the type withstd::enable_if<std::is_same_v<T, ...>>
(instead of therequires
) - not sure, if the
has_static_type_hash
check needs to be done in every function. I think it's enough to do it top-level once for the call.has_static_type_hash
needs to be recursive anyway - meaning it should check for avector<X>
also thathas_static_type_hash<X>
. So if you can make sure top-level that everything can be hashed at compile time, this should eliminate the need to have checks in each function.
For the implementation in general: As long as new C++2x features are not "system-enablers" but rather syntactic sugar, I would stick with code that is also C++17 compatible. In my own projects, I switched to C++20 or C++23. But I'm not sure how many people using this library have already switched. So if it's something small like std::enable_if_t
vs requires
, I would for now stick to the old version, even if it's a bit more code to write in cista, as long as the client code stays the same.
The implementation in #153 is problematic in the way that it doesn't even try to be compatible with the old hash, meaning for the same types it will always produce a different hash. I think it would make sense to stick closely to the old implementation, just make it constexpr
by replacing T const&
with T const*
and feed nullptr
, replace the map type and add the has_static_type_hash
switch at the root call. Then testing should ensure that it produces the same hash values as the old implementation.
from cista.
Hello !
I like and agree with the ideas you described.
switch the T const& argument to T const* (and always feed a nullptr) or replace the type with std::enable_if<std::is_same_v<T, ...>> (instead of the requires)
I'd go with the std::enable_if route with template functions since the pointers serve no purpose other than expressing the type. But there's a caveat, see next point.
not sure, if the has_static_type_hash check needs to be done in every function. I think it's enough to do it top-level once for the call. has_static_type_hash needs to be recursive anyway - meaning it should check for a vector also that has_static_type_hash. So if you can make sure top-level that everything can be hashed at compile time, this should eliminate the need to have checks in each function.
I have a question: are there cases where we cannot know the type hash at compile time ? I assumed that, that's why I thought of two routes (static_type_hash
and type_hash
) where probably the check needs to happen at every level. Or can we simply change the type_hash
function so it never instances any type ? If we can get a cista_members
tuple of for every class then yes. The question is more about user classes: how do we ask them to define their type hash if we are to make this static type hashing change. If we are to take a template function approach, the user needs a constexpr construct to differenciate between his classes (std::enable_if
with is_my_class_A_v<T>
... Etc), whereas the nullptr approach would work with simple function overloads as is done now 🤔 but it doesn't feel clean :/
For the requires
clause, I absolutely agree. I just wrote it to express meaning, I understand perfectly the need to remain C++17.
The implementation in #153 is problematic in the way that it doesn't even try to be compatible with the old hash, meaning for the same types it will always produce a different hash
Oh yeah that must be avoided if possible. It was still more of a POC than anything else x)
replace the std::map<hash_t, unsigned> with a constexpr-compatible std::array<std::pair<hash_t, unsigned>> and implement emplace() and contains() on it
Moving to entirely compile-time type hashing can be done in a follow-up PR I think, like this it would be more incremental.
I'll try to reopen my draft PR and edit it given your feedback. Let's see!
from cista.
are there cases where we cannot know the type hash at compile time ?
I don't think so. Types should all be known at compile time. So in theory, it should be possible to implement type hashing at compile time.
Or can we simply change the type_hash function so it never instances any type ?
I think this is possible (see above).
how do we ask them to define their type hash
The change should be backwards-compatible so that existing working code doesn't have to be changed.
but it doesn't feel clean :/
I think the pointer solution produces more readable and much shorter code (no need to implement is_std_array_v<T>
for each and every class. If it works, this should be fine. Especially since everything that would be unclean doesn't compile with constexpr
(compiler checks for UB in constexpr
functions).
Moving to entirely compile-time type hashing can be done in a follow-up PR I think, like this it would be more incremental.
I think it would be more efficient to touch every function only once and then do it directly at compile time instead of touching every function twice (1x for conversion to pointer, another time for real constexpr
ness). But maybe I'm just too lazy to touch everything twice.
from cista.
I was playing around, but it seems to be harder than expected: https://gcc.godbolt.org/z/ovrExTeWT
The done
and member
variables are not constexpr
. done
I understand, but member
not really.
from cista.
It looks like the compiler wasn't reporting the correct error message because the issue was complex. Here's a working increment over your code: https://gcc.godbolt.org/z/6qbof7KTr
I removed handling pointers: actually aren't we supposed to not support pointers ? Since they have to be used with a cista::unique_ptr
or cista::shared_ptr
?
from cista.
Thank you very much!
I converted it to C++17 compatible code: https://gcc.godbolt.org/z/P7zErb68q
Since the pair<hash_t, done_map>
was always passed around together (either as argument or return value), I made a separate struct hash_data
+ some small other refactorings.
I removed handling pointers: actually aren't we supposed to not support pointers ? Since they have to be used with a cista::unique_ptr or cista::shared_ptr
Raw pointers are fully supported by cista. cista::raw::ptr<T>
is just an alias to T*
. cista::offset::ptr<T>
is a class that stores the actual address it's pointing to relative to this
and computes the actual address on access (dereference). unique_ptr
and shared_ptr
are used to express ownership over the pointed to memory (in cista but also in std
). However, this is not sufficient if you want to have non-owning pointers. Therefore, cista also supports to serialize and deserialize raw pointers (to express "I point to something, but I don't own it"). So this should definitely be supported by type-hashing.
For reference, I paste the code here (not sure how permanent those Compiler Explorer links are):
#include "cista.h"
namespace cista {
template <typename T>
constexpr hash_t static_type2str_hash() noexcept {
return hash_combine(hash(type_str<decay_t<T>>()), sizeof(T));
}
template <typename T>
constexpr auto to_member_type_list() {
auto t = T{};
return cista::to_tuple(t);
}
template <typename T>
using tuple_representation_t = decltype(to_member_type_list<T>());
template <typename K, typename V, std::size_t Size>
struct count_map {
using key_type = K;
using mapped_type = V;
struct value_type {
key_type first;
mapped_type second;
};
using iterator = typename std::array<value_type, Size>::iterator;
using const_iterator = typename std::array<value_type, Size>::const_iterator;
constexpr count_map() = default;
constexpr std::pair<mapped_type, bool> add(key_type k) {
auto const it = find(k);
if (it == end()) {
arr_[size_] = value_type{k, static_cast<mapped_type>(size_)};
++size_;
return {size_ - 1U, true};
} else {
return {it->second, false};
}
}
constexpr const_iterator find(key_type k) const {
for (auto it = begin(); it != end(); ++it) {
if (it->first == k) {
return it;
}
}
return end();
}
constexpr const_iterator begin() const { return std::begin(arr_); }
constexpr const_iterator end() const { return std::next(begin(), size_); }
std::size_t size_{0U};
std::array<value_type, Size> arr_{};
};
template <std::size_t NMaxTypes>
using hashes_t = count_map<hash_t, unsigned, NMaxTypes>;
template <std::size_t NMaxTypes>
struct hash_data {
constexpr hash_data combine(hash_t const h) const {
hash_data r;
r.done_ = done_;
r.h_ = hash_combine(h_, h);
return r;
}
hashes_t<NMaxTypes> done_;
hash_t h_{BASE_HASH};
};
template <std::size_t NMaxTypes>
constexpr auto static_type_hash(std::uint64_t const* el, hash_data<NMaxTypes> const h) {
return h.combine(hash("u64"));
}
template <std::size_t NMaxTypes>
constexpr auto static_type_hash(float const* el, hash_data<NMaxTypes> const h) {
return h.combine(hash("f32"));
}
template <std::size_t NMaxTypes>
constexpr auto static_type_hash(double const* el, hash_data<NMaxTypes> const h) {
return h.combine(hash("f64"));
}
template <typename Tuple, std::size_t NMaxTypes>
constexpr auto hash_tuple(hash_data<NMaxTypes> h) {
auto const call_on_tuple_element = [&]<size_t I>() {
using element_type = std::decay_t<std::tuple_element_t<I, Tuple>>;
h = static_type_hash(static_cast<element_type const*>(nullptr), h);
};
auto const int_seq_for = [&]<std::size_t... I>(std::index_sequence<I...>) {
(call_on_tuple_element.template operator()<I>(), ...);
};
int_seq_for(std::make_index_sequence<std::tuple_size_v<Tuple>>());
return h;
}
template <typename T, std::size_t NMaxTypes>
constexpr auto static_type_hash(T const* el, hash_data<NMaxTypes> h) {
using Type = decay_t<T>;
auto const base_hash = static_type2str_hash<Type>();
auto [ordering, inserted] = h.done_.add(base_hash);
if (!inserted) {
return h.combine(ordering);
}
static_assert(to_tuple_works_v<Type>, "Please implement custom type hash.");
return hash_tuple<tuple_representation_t<T>>(h.combine(hash("struct")));
}
template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
return h.combine(hash("array")).combine(Size);
}
template <typename T, std::size_t NMaxTypes = 128U>
constexpr hash_t static_type_hash() {
return static_type_hash(static_cast<T const*>(nullptr), hash_data<NMaxTypes>{}).h_;
}
} // namespace cista
struct x {
float x_;
std::uint64_t y_;
std::array<cista::offset::ptr<x>, 3> z_;
};
int main() {
printf("%" PRIu64 "\n", cista::static_type_hash<x>());
}
from cista.
[... code block ...]
What a beautiful increment/refactor ! The hash_data
struct is very nice !
(to express "I point to something, but I don't own it")
True that, we need that x)
It looks like you are close to implementing compile-time type hashing on your own, should I simply wait for you to complete it ?
from cista.
template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
return h.combine(hash("array")).combine(Size);
}
It looks like it's missing the hash of the type T
🤔
from cista.
Thanks!! :)
Should be better now: https://gcc.godbolt.org/z/zxGs5d5vG
Also added pointer hashing and used the old structure for the main hashing function that included also scalars.
I need to figure out how to include static type hashing.. options would be:
- separate flag
cista::mode::WITH_STATIC_VERSION
- try to figure it out automatically (if the function compiles, it is used)
I think it's better to give the user more control and use the WITH_STATIC_VERSION
. Otherwise, the automatic mode might not always chose the static type hashing because there are programming mistakes and it's hard to figure out what's wrong.
#include "cista.h"
namespace cista {
template <typename T>
constexpr hash_t static_type2str_hash() noexcept {
return hash_combine(hash(type_str<decay_t<T>>()), sizeof(T));
}
template <typename T>
constexpr auto to_member_type_list() {
auto t = T{};
return cista::to_tuple(t);
}
template <typename T>
using tuple_representation_t = decltype(to_member_type_list<T>());
template <typename K, typename V, std::size_t Size>
struct count_map {
using key_type = K;
using mapped_type = V;
struct value_type {
key_type first;
mapped_type second;
};
using iterator = typename std::array<value_type, Size>::iterator;
using const_iterator = typename std::array<value_type, Size>::const_iterator;
constexpr count_map() = default;
constexpr std::pair<mapped_type, bool> add(key_type k) {
auto const it = find(k);
if (it == end()) {
arr_[size_] = value_type{k, static_cast<mapped_type>(size_)};
++size_;
return {size_ - 1U, true};
} else {
return {it->second, false};
}
}
constexpr const_iterator find(key_type k) const {
for (auto it = begin(); it != end(); ++it) {
if (it->first == k) {
return it;
}
}
return end();
}
constexpr const_iterator begin() const { return std::begin(arr_); }
constexpr const_iterator end() const { return std::next(begin(), size_); }
std::size_t size_{0U};
std::array<value_type, Size> arr_{};
};
template <std::size_t NMaxTypes>
struct hash_data {
constexpr hash_data combine(hash_t const h) const {
hash_data r;
r.done_ = done_;
r.h_ = hash_combine(h_, h);
return r;
}
count_map<hash_t, unsigned, NMaxTypes> done_;
hash_t h_{BASE_HASH};
};
template <typename Tuple, std::size_t NMaxTypes>
constexpr auto hash_tuple(hash_data<NMaxTypes> h) {
auto const call_on_tuple_element = [&]<std::size_t I>() {
using element_type = std::decay_t<std::tuple_element_t<I, Tuple>>;
h = static_type_hash(static_cast<element_type*>(nullptr), h);
};
auto const int_seq_for = [&]<std::size_t... I>(std::index_sequence<I...>) {
(call_on_tuple_element.template operator()<I>(), ...);
};
int_seq_for(std::make_index_sequence<std::tuple_size_v<Tuple>>());
return h;
}
template <typename T, std::size_t NMaxTypes, typename = std::enable_if_t<!std::is_pointer_v<T>>>
constexpr auto static_type_hash(T const* el, hash_data<NMaxTypes> h) {
using Type = decay_t<T>;
auto const base_hash = static_type2str_hash<Type>();
auto const [ordering, inserted] = h.done_.add(base_hash);
if (!inserted) {
return h.combine(ordering);
}
if constexpr (is_pointer_v<Type>) {
using PointeeType = remove_pointer_t<Type>;
if constexpr (std::is_same_v<PointeeType, void>) {
return h.combine(hash("void*"));
} else {
h = h.combine(hash("pointer"));
return static_type_hash(static_cast<remove_pointer_t<Type>*>(nullptr), h);
}
} else if constexpr (std::is_integral_v<Type>) {
return h.combine(hash("i")).combine(sizeof(Type));
} else if constexpr (std::is_scalar_v<Type>) {
return h.combine(static_type2str_hash<T>());
} else {
static_assert(to_tuple_works_v<Type>, "Please implement custom type hash.");
return hash_tuple<tuple_representation_t<T>>(h.combine(hash("struct")));
}
}
template <typename T, std::size_t Size, std::size_t NMaxTypes>
constexpr auto static_type_hash(array<T, Size> const*, hash_data<NMaxTypes> h) {
h = static_type_hash(static_cast<T*>(nullptr), h);
return h.combine(hash("array")).combine(Size);
}
template <typename T, std::size_t NMaxTypes = 128U>
constexpr hash_t static_type_hash() {
return static_type_hash(static_cast<T*>(nullptr), hash_data<NMaxTypes>{}).h_;
}
} // namespace cista
struct x {
float x_;
std::uint64_t y_;
std::array<cista::offset::ptr<x>, 3> z_;
};
static_assert(6399962213331131165 == cista::static_type_hash<x>());
int main() {
constexpr auto const hash = cista::static_type_hash<x>();
printf("%" PRIu64 "\n", hash);
}
from cista.
You told me we can in theory switch to 100% static type hashing so I don't really understand the dilemma you are facing 🤔
the automatic mode might not always chose the static type hashing
This is assuming to have both type hashing implemented right (runtime and compile time) ?
from cista.
With the current version of static type hashing it's not possible to reproduce exactly the type hashes from the old runtime-hashing version. So I think it would be better to support both versions in parallel. This way, people who use the library who have data that is serialized with the runtime-version don't need to re-serialize to upgrade cista. Of course static type hashing will be done at compile time.
Implementation wip is here: #172
Edit: reason why the new version doesn't produce compatible hashes is primarily the difference between canonical_type_str
and type_str
. This is done to get the same string for the same type on different compilers (but still isn't perfect..). These replacements cannot be done on std::string_view
. One solution would be to switch from std::string_view
to std::array<std::string_view>
to be able to cut out unwanted parts of the string at compile time. But I didn't experiment with it yet. This would be a way to make static_type_hash<T> == type_hash<T>
for every T
.
from cista.
Merged #172
Closing here. Please open a new issue if there are any problems with the implementation.
from cista.
I finally tested the change on my side, all good ! Awesome work !
from cista.
Thank you!
And thank you for your work on making my "first shitty draft" compile! :-)
from cista.
Happy to help and discuss things to get the idea into shape !
from cista.
Related Issues (20)
- Discussion: STL containers can actually be supported ? HOT 3
- Help: Serialization and inheritance (with standard layout classes) HOT 22
- Wiki Typo HOT 1
- Any plans for c++20? HOT 2
- cista_members() approach: serialize tuples properly HOT 2
- nigiri Branch entfernt HOT 1
- Dealing with Data Pointer Offset in Checked Mode HOT 4
- Documentation request: how to build and run tests HOT 6
- Silence clang specific #pragma directives for other compilers ? HOT 2
- Static type hashing of std::complex HOT 5
- Create a stable release branch HOT 5
- Missing const& specifier for return type
- Cleanup: static_cast for `adjustment` should be `offset_t` since `start` is that type
- Support storing trailing \0 byte at the end of string HOT 5
- Support for `resize` [perhaps also `capacity` and `reserve`] for cista::generic_string/basic_string HOT 2
- String gives different size at "empty" input for short/non_short HOT 1
- MSVC v14.1x build error: C2296,C2297 HOT 3
- How can I serialize a struct which contains anonymous union member HOT 2
- Bug in std_vector_test.cc HOT 1
- Support for std::visit HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cista.