flow-ipc / ipc Goto Github PK
View Code? Open in Web Editor NEW[Start here!] Flow-IPC - Modern C++ toolkit for high-speed inter-process communication (IPC)
Home Page: https://flow-ipc.github.io/
License: Apache License 2.0
[Start here!] Flow-IPC - Modern C++ toolkit for high-speed inter-process communication (IPC)
Home Page: https://flow-ipc.github.io/
License: Apache License 2.0
This is apparently a common trick: GitHub will host files if one attaches them to an issue or PR. (The issue need not even be created, but it seems best to make it visible, for our own sanity.)
This should be used sparingly. As of this writing, I'm researching embedding video in a README.md, and apparently one can directly do so using a <video>
tag; but unlike Markdown image links - which can refer to images in the repo, via relative path - the same does not work for the video tag. One has to, it appears, give an absolute URL. Ideally such videos-for-docs would be in the repo too, but we do what we must. AGAIN: Sparingly. (And keep the size down; let's not abuse GitHub.)
Filed by @ygoldfeld pre-open-source:
The current situation is as follows:
General description: Whether run locally with my clang-17, or in the GitHub pipeline with clang-15/16/17, reliably some tests in some situations hit a certain specific point within the test, at which point console gets
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=74526)
and the test hangs forever right there. To be clear this is not a normal TSAN warning about a race or anything; but rather TSAN instrumentation code hitting a problem and refusing to proceed further. By the text of the problem, indeed some sort of limit of 64 "locks with contexts" is reached, and TSAN blows up. (No further analysis done on that but read on.)
1 test, even if run absolutely by itself, always hits this problem: Jemalloc_shm_pool_collection_test.Multiprocess. Hence it is explicitly skipped in the pipeline at the moment, using the gtest command line feature that can exclude tests individually.
The other problematic tests -- meaning that failing to exclude all of them from a run, while keeping all the others => problem -- are:
LOCK_HEAVY_TESTS='Shm_session_test.External_process_array:\
Shm_session_test.External_process_vector_offset_ptr:\
Shm_session_test.External_process_string_offset_ptr:\
Shm_session_test.External_process_list_offset_ptr:\
Shm_session_test.Multisession_external_process:\
Shm_session_test.Disconnected_external_process:\
Borrower_shm_pool_collection_test.Multiprocess:\
Shm_pool_collection_test.Multiprocess'
Happily, though, they run just fine in a group -- but not if run as part of all the many other tests. Therefore, to avoid hitting the limitation, I have changed the pipeline to the following:
It is not ideal, but it does give good TSAN coverage, thus reducing the priority of this ticket. The priority somewhat rises due to Jemalloc_shm_pool_collection_test.Multiprocess being unable to complete even by itself however.
As for what to do -- just ideas:
It is worth looking into, but it is not a hair-on-fire problem. We can skip one test w/r/t to TSAN and survive.
As of this writing, in its original form, Flow-IPC (including Flow, repository flow
, which is self-contained) builds in gcc/clang; targets Linux running in 64-bit mode (x64_64 aka AMD64). Nothing about its design is Linux-y per se (and earlier versions of Flow ran in such environments as Android and iOS and Windows); Linux was just the thing needed in practice.
This issue:
I should note that some of the technical notes below could be wrong; if my knowledge was in error, or if I looked up something incorrectly. Still, should be usefl.
First the goal(s):
*BSD
, meaning FreeBSD, OpenBSD, maybe NetBSD. They're not the same, and also I personally am not super familiar with them. That said my impression so far is that the stuff that would need to be ported has straightforward answers that would either carry-over from macOS, or be different but fairly simple.
*BSD
. EDIT: WAIT, NO! See note just below: No capnp for BSD, so no Flow-IPC for BSD until then.So, basically, at the moment:
Now for the notes.
Firstly, should get Flow-IPC/flow#88 out of the way before tackling this. Then it'll be static_assert()s failing instead of sometimes those, sometimes direct #error
s. Detail... but anyway.
Then: Be aware that everything that we consciously knew was not portable, anywhere in the code, was #ifdef
ed and should #error or static_assert(false) if one tries to build for the port-target OS/arch. (NOTE!!!! SHM-jemalloc - namely anything in ipc_shm_arena_lend/ paths src/ipc/shm/arena_lend/** except src/ipc/shm/arena_lend/detail/shm_pool_offset_ptr_data.?pp src/ipc/session/standalone/** - may not have held to this. Will need special attention. Other than that, though, yep.)
So, I went over all instances of such to-port code and made the following notes. Following these (where they're correct at least) may well be the bulk of the work.
Last thing before that though... I'd recommend, before doing any of that, setting up two things:
Once that's ready can start hacking away at the to-port code. Now those notes:
This code checks the location of a certain executable, by itself: const auto bin_path = fs::read_symlink("/proc/self/exe"); // Can throw.
/var/run is assumed to exist in Linux by default (though I have a mechanism for overriding this location, if only for test/debug at least). Is it similarly assumed to exist in macOS/iOS? And is there a Windows equivalent? In particular, it would be nice if it were something that would get cleared via reboot.
A key one: We use /dev/shm/ listing for important SHM-pool and bipc-MQ management. (Also /dev/mqueue for POSIX MQs - however since no POSIX MQs in macOS, it is moot.)
We use boost::interprocess::shared_memory_object::remove(X) to remove SHM-segment (SHM-pool) named X. Error handling, specifically, relies on the undocumented fact that on failure it'll set errno
. So then we check that, if it throws, and throw the appropriate error. What about the other OS?
Using kill(process_id, 0)
to check whether PID process_id is running at the moment.
#include <windows.h>
bool process_running(DWORD process_id) {
HANDLE processHandle = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, process_id);
if (processHandle != NULL) {
CloseHandle(processHandle);
return true; // Process is running
}
if (GetLastError() == ERROR_INVALID_PARAMETER) {
return false; // Process does not exist
}
// ...etc...whatever...
}
PIDs being used as effectively unique across time:
Process_credentials::process_invoked_as()
checks how (via what command line, argv[0]-ish) this->m_process_id process was executed, assuming that guy is currently active (which it is). It reads /proc/...pid.../cmdline. This works, unlike some other things, even if ...pid... is running as some other user - which is an important thing in our use-case. We use this for a safety check when opening session: A certain location shall be hard-coded by user in ipc::session::App struct known to us; and we use process_invoked_as() to see what the OS says. The two must match, or else the app is misbehaving, and we reject session. So the goal here is not the specific behavior we use in Linux, but merely some check like it that we could execute.
In Linux we use getsockopt(AF_LOCAL/SO_PEERCRED) to obtain the opposing process's info: PID (important, not just for safety), UID+GID (for safety check only).
We use, for Native_socket_stream, on which very much stuff is built, a Unix-domain stream socket. This is for two important purposes without which everything falls apart:
It is a fast/good way to transmit binary blobs. (bipc-MQ is also available in all OS. POSIX MQ is also available but in Linux only, not in Windows or macOs; BSD does have it, I believe, but moot.)
It is the only way to transmit I/O handles, including from socketpair() so as to create more pre-connected socket streams!
It also provides the bootstrap process-to-process acceptor-connector pair of classes. That's how sessions start, so it is very much a key thing!
Related: The very concept of transmitting Native_handle
s is well-defined in Linux and macOS (and BSD and any Unix): it just wraps an int (FD). (EDIT: I didn't mean to imply it's a mere matter of sending an int; rather it's sendmsg() with SOL_SOCKET/SCM_RIGHTS ancillary data, which will create an FD in the receiver process pointing to the same file-description.) But in Windows, what kind of thing would we even potentially want to transmit?
SOCKET
network handles? What else? -- then we should support that.macOS: It has all of this the same as Linux, portably. It is FD-based, with Unix-domain stream sockets being able to transmit FDs via sendmsg/recvmsg().... Whew!
Windows: Unclear! I have some notes, but I am not pasting them yet. We need a strategy for all of the above use-cases, before starting to port to Windows. So filling this is in a MAJOR TODO and precludes all other Windows-port work.
Flow's Logger::this_thread_set_logged_nickname(util::String_view thread_nickname, Logger* logger_ptr, bool also_set_os_name)
-- if that bool=true -- uses pthread_setname_np(pthread_self(), os_name.c_str())
to set system-visible thread nickname. This is useful in top
, sanitizer output, debuggers... it's quite useful. However, Linux has a 15-char limit which we grapple with, not very well.
const DWORD MS_VC_EXCEPTION = 0x406D1388;
#pragma pack(push, 8)
typedef struct tagTHREADNAME_INFO
{
DWORD dwType; // Must be 0x1000.
LPCSTR szName; // Pointer to name (in user addr space).
DWORD dwThreadID; // Thread ID (-1=caller thread).
DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)
void SetThreadName(DWORD dwThreadID, const char* threadName) // -1 for cur thread
{
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = threadName;
info.dwThreadID = dwThreadID;
info.dwFlags = 0;
__try
{
RaiseException(MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(ULONG_PTR), (ULONG_PTR*)&info);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
}
}
cpu_idx() gets the current processor core index (0, 1, ..., N-1), where N = # of logical cores. In Linux can get it from ::sched_getcpu().
optimize_pinning_in_thread_pool()
:
void optimize_pinning_in_thread_pool(flow::log::Logger* logger_ptr,
const std::vector<util::Thread*>& threads_in_pool,
[[maybe_unused]] bool est_hw_core_sharing_helps_algo,
bool est_hw_core_pinning_helps_algo,
bool hw_threads_is_grouping_collated)
where est_hw_core_pinning_helps_algo is to be set by the user to true if and only if the algorithm they're running over the given thread pool would be actively helped perf-wise if one were to pin each thread to a different CPU core. So if there are 32 threads and 32 logical cores, AND they set this to true, THEN it'll pin thread 1 to core 1, thread 2 to core 2, etc.; if there are 16 threads and 32 logical cores (but 16 physical cores), then it'll pin thread 1 to cores 1+17, thread 2 to 2+18, etc. (Unless hw_threads_is_grouping_collated, then t1 to 1+2, t2 to 3+4, etc.) I am not describing this perfectly, but I think you get the idea.
Linux: Use pthread_setaffinity_np().
SetThreadAffinityMask(GetCurrentThread(), affinityMask);
.Last but not least, in SHM-jemalloc fancy-pointers -- the core living in Shm_pool_offset_ptr_data which I wrote -- there is a pointer-tagging scheme in use, designed to ensure sizeof() is 64.
1
to indicate that, in fact, a SHM-pointer is there.0
, while the other bits are simply copied from the original real vaddr. Then, the get-vaddr operation in this case:
This is a processor thing, not an OS thing. So this is where ARM64 could have different behavior. Turns out, apparently, its behavior is different but simpler: still only the low 48-ish bits are the significant ones; hence it is safe for us to keep using the MSB for our pointer-tagging scheme. But, it is simpler: there is no "canonical form": so we can just return the entire thing as-is in the get-vaddr op. There will be an #if
but a simple one.
Filed by @ygoldfeld pre-open-source:
~All types in Flow-IPC (outside of non-ipc::session::shm SHM-jemalloc-land, I think, which should probably be changed, but that's orthogonal) have operator<< for ostream output, used in ~all log call sites. Plus when creating background threads they're usually named using the same thing. Some types also have nickname() (given by user in ctor) in the mix in there.
The names tend to be reasonably helpfully descriptive, though a once-over on that count wouldn't hurt as part of this ticket. However the formatting is inconsistent; sometimes there's this
address near the start, sometimes at the end (which can be particularly unnerving when an internally stored object is itself <<ed inside the <<). Sometimes they're too long to be used as Linux thread names helpfully. Sometimes nickname() is included, other times it IS nickname().
You get the idea. Someone should holistically look at this, come up with some conventions, and straighten it all out.
Additionally it would be nice to come up with some boiler-plate-reducing way of pre-pending the "<whatever object> [" << *this << "]:"
thing automatically to log calls.
Anyway. Just make it nice. And probably it should share DNA with Flow-IPC/flow#86.
Addendum: Flow-IPC/flow#71 is related; or at least the musing in its Description "addendum" is. Basically that is about thread naming conventions, possibly having a separate API arg in flow.log to name a thread natively for the OS (top
and such), which has a length limit in Linux at least. It's all part of one mess that would be good to straighten out. So this guy and that guy should probably be looked-at together.
Original description by issue filer: your performance is higher than iceoryx?
Then I (@ygoldfeld) edited in the following:
It would be good to add a "Comparison with other approaches, products" section to the documentation (guided Manual); it would cover iceoryx potentially but also other things like local gRPC, capnp-RPC - and the fact that Flow-IPC can be used together with those to make them faster and possibly easier to use.
I shall also pre-answer the iceoryx question in a comment below.
The state of documentation is good IMO. The Reference (as generated by Doxygen from source) covers everything (except see below); the guided Manual (as generated by Doxygen from additional not-really-source-but-source files) covers every feature (except see below); significant effort went into these. Plus there's a nice system making the generated docs available: Actions CI pipeline checks-in generated docs into main
branch automatically; plus for each Release tag; plus links to these are auto-placed into the web site.
The present ticket covers gaps -- things lacking from the above. I labeled it as "bug," as they are IMO specific gaps we've identified -- not just niceties -- so "bug" would draw attention to it. That said, to reiterate, what is there is fairly comprehensive, meaning these gaps IMO don't constitute a major deficit. (Or not... it's subjective... examples-meant-to-be-examples really would be good to have, in particular.)
The gaps:
set(DOC_EXCLUDE_PATTERNS */ipc/session/standalone/shm/arena_lend/* */ipc/shm/arena_lend/*
).@page universes Multi-split Universes
and @page transport_core Transport Core Layer: Transmitting Unstructured Data
.
@page api_overview API Overview / Synopses
(the not-yet-written page links to it for now, near where it says under construction).In most of Flow-API, APIs are non-blocking. There are various key async APIs, but that's not blocking. There are however a handful of potentially-blocking APIs. Ideally these should behave in some reasonable way on signal, so that at least it would be possible to interrupt a program on SIGTERM while it is blocking inside such an API. In POSIX generally blocking I/O functions will emit EINTR in this situation. We should do something analogous; or failing that at least something predictable and consistent; or failing that present mitigations (but in that case not resolve this ticket).
Here are the existing blocking APIs as of this writing.
Posix_mq_handle::*()
Bipc_mq_handle::*()
pthread_cond_[timed]wait()
s. How this will act on signal is unclear; as of this writing I have not checked or do not remember also what the Boost code does in wrapping the pthread_ call. But the call itself is documented to yield no EINTR ever in some man
pages, and EINTR only from pthread_cond_timedwait() (not the one without timed_ though) in some other man
pages.promise::get_future().wait[_for]()
which almost certainly internally in turn uses pthread_cond_[timed]wait()
.Ideally all three -- and any future blocking impls -- should get interrupted if-and-only-if a built-in blocking call such as ::read() or ::poll() would be interrupted, on signal. How they're interrupted is less important: whether it's EINTR or another Error_code, as long as it's documented (that said perhaps it's best to standardize around an error::Code of ours we'd add for this purpose).
I'll forego explaining how to impl this. The Posix_mq_handle guy is already there; but the other two are a different story. (For Bipc_mq_handle one could somehow leverage the already-existing Persistent_mq_handle::interrupt_*()
APIs. But, then another thread would be necessary. Food for thought at any rate.) (For Channel, not sure, but we should be able to swing something; maybe an intermediary thread; signal => satisfy the promise.)
Lastly, but very much not leastly (sorry), users should be aware of the following mitigating facts, until this ticket is resolved:
interrupt_*()
is available to you. So if it is necessary to interrupt a wait or blocking-send/receive, then you can use a separate thread to run it in and invoke interrupt_*()
from a signal handler. This isn't no-fuss, but it does work if needed at least.timed_*()
are also available: You could use those with a short, but not-too-short, timeout -- in which case a signal can affect them but only after a bit of lag; could be fine. This is the usual approach... not perfect but doable and pretty simple.expect_msg[s]()
pending on the opposing side (for future speculated sync_expect_msg() API: if there is no already-received-and-queued-inside in-message of the proper type, or one isn't forthcoming immediately).expect_msg[s]()
by the time we issue the sync_request(). That's just good for responsive IPC; and it's nice not to need an async call like async_request(), where a non-blocking one would do much more simply.
Sometimes, but especially over the past couple of weeks for some reason, the fancy GitHub Action CI build/test workflow fails in ASAN- and/or TSAN-enabled build configs. Almost always the failure is before any of our code even builds but rather during the build-dependencies phase conan install
. Exactly once I saw it when running some random test binary.
Example: https://github.com/Flow-IPC/ipc/actions/runs/8336944527/job/22819113854
At first it looked like jemalloc failed during configure
; or m4
failed during its configure
. As usual default configure
output does not help. But once one hacks together capturing config.log, the problem is clear:
configure:5928: ./conftest
FATAL: ThreadSanitizer: unexpected memory mapping 0x767804872000-0x767804d00000
From memory, the one time it was an actual test program failing (after deps built fine), it was just like that: Nothing at all executed, visibly at least, and instead an error like above was shown, program exiting.
This led to StackOverflow, namely
which led to work-around
https://stackoverflow.com/questions/5194666/disable-randomization-of-memory-addresses
Essentially, something involving memory addresses being randomized "tickles" a clang-TSAN/ASAN problem/bug. According to the first link the fix would be in clang-18; but we cannot simply drop testing of clang-15 through 17. The work-around is not 100% ideal but seems essentially okay. So prepending
setarch
uname -m -R
to commands that'll invoke sanitizer-enabled executables (there are lots of those) will make it go away.
So far, at least, doing so for the problematic conan install
command did indeed make it go away.
Since this is diagnosed, here are the tasks to resolve this:
sudo sysctl
which may not be available. Just try it; this would be less of a change from vanilla *SAN runs, which is probably better than turning off the randomization thing.This Issue is fairly high-priority, as it's been wreaking havoc lately in the pipeline results -- which were ever-so-clean up to that point.
TSAN warned once: https://github.com/Flow-IPC/ipc/actions/runs/8261259787/job/22598241886#step:27:190
Logs attached (also in above link for a while).
cli.log
cli.console.log
srv.log
srv.console.log
Filed by @ygoldfeld pre-open-source:
We should formally analyze coverage of the source code. Consider doing so both before and after #83.
Filed by @ygoldfeld pre-open-source:
unit_test (and its helpers) extensively test ipc_shm_arena_lend (a/k/a IPC: SHM-jemalloc), which is the module dependent on all other modules. @echan-dev wrote this as a unit test (within GoogleTest framework), and in addition in my opinion several of the higher-level tests, such as the session-test, are effectively integration-testing all of the dependencies: ipc_core, ipc_transport_structured (it is used by tests for their purposes as well as inside ipc_shm_arena_lend impl internally), ipc_session (used to set-up interprocess communication); though ipc_shm not so much.
transport_test is an extensive integration test of all of the above. The scripted mode focuses on ipc_core (and is extensible, both with new commands and more scripts), while the exercise mode focuses on APIs in ipc_session, ipc_transport_structured, ipc_shm, and ipc_shm_arena_lend. Additional functional testing is in perf_demo as of this writing; surely there will be more.
In that sense IMO, subjectively speaking, the test coverage is pretty damned good, and my confidence in correctness is at a high level.
However, the ipc_*
modules other than ipc_shm_arena_lend really should have direct representation in unit_test in the same way ipc_shm_arena_lend (a/k/a IPC: SHM-jemalloc) enjoys. I.e., more or less every class/function – at least public/protected APIs – should have a GoogleTest, similarly to what Mr. Chan did.
Filed by @ygoldfeld pre-open-source:
Environments:
Observed:
The problem itself presents as follows: console prints:
==76700==WARNING: Can't read from symbolizer at fd 8
==76700==WARNING: Can't write to symbolizer at fd 44
LLVM ERROR: Sections with relocations should have an address of 0
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /usr/bin/llvm-symbolizer-17 --demangle --inlines --default-arch=x86_64
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libLLVM-17.so.1 0x00007f4efaacc406 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1 libLLVM-17.so.1 0x00007f4efaaca5b0 llvm::sys::RunSignalHandlers() + 80
2 libLLVM-17.so.1 0x00007f4efaacca9b
3 libc.so.6 0x00007f4ef9642520
4 libc.so.6 0x00007f4ef96969fc pthread_kill + 300
5 libc.so.6 0x00007f4ef9642476 raise + 22
6 libc.so.6 0x00007f4ef96287f3 abort + 211
7 libLLVM-17.so.1 0x00007f4efaa2eb15 llvm::report_fatal_error(llvm::Twine const&, bool) + 437
8 libLLVM-17.so.1 0x00007f4efaa2e956
9 libLLVM-17.so.1 0x00007f4efc1b3002
10 libLLVM-17.so.1 0x00007f4efc3855a8 llvm::DWARFContext::create(llvm::object::ObjectFile const&, llvm::DWARFContext::ProcessDebugRelocations, llvm::LoadedObjectInfo const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (llvm::Error)>, std::function<void (llvm::Error)>) + 4328
11 libLLVM-17.so.1 0x00007f4efc517fcf llvm::symbolize::LLVMSymbolizer::getOrCreateModuleInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) + 2479
12 libLLVM-17.so.1 0x00007f4efc5147aa llvm::Expected<llvm::DIGlobal> llvm::symbolize::LLVMSymbolizer::symbolizeDataCommon<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 58
13 libLLVM-17.so.1 0x00007f4efc514769 llvm::symbolize::LLVMSymbolizer::symbolizeData(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 9
14 llvm-symbolizer-17 0x0000557435c89893
15 llvm-symbolizer-17 0x0000557435c884ef
16 llvm-symbolizer-17 0x0000557435c87860
17 libc.so.6 0x00007f4ef9629d90
18 libc.so.6 0x00007f4ef9629e40 __libc_start_main + 128
19 llvm-symbolizer-17 0x0000557435c85905
==76700==WARNING: Can't read from symbolizer at fd 8
==76700==WARNING: Can't write to symbolizer at fd 15
LLVM ERROR: Sections with relocations should have an address of 0
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /usr/bin/llvm-symbolizer-17 --demangle --inlines --default-arch=x86_64
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libLLVM-17.so.1 0x00007f7cdf4cc406 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1 libLLVM-17.so.1 0x00007f7cdf4ca5b0 llvm::sys::RunSignalHandlers() + 80
2 libLLVM-17.so.1 0x00007f7cdf4cca9b
3 libc.so.6 0x00007f7cde042520
4 libc.so.6 0x00007f7cde0969fc pthread_kill + 300
5 libc.so.6 0x00007f7cde042476 raise + 22
6 libc.so.6 0x00007f7cde0287f3 abort + 211
7 libLLVM-17.so.1 0x00007f7cdf42eb15 llvm::report_fatal_error(llvm::Twine const&, bool) + 437
8 libLLVM-17.so.1 0x00007f7cdf42e956
9 libLLVM-17.so.1 0x00007f7ce0bb3002
10 libLLVM-17.so.1 0x00007f7ce0d855a8 llvm::DWARFContext::create(llvm::object::ObjectFile const&, llvm::DWARFContext::ProcessDebugRelocations, llvm::LoadedObjectInfo const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (llvm::Error)>, std::function<void (llvm::Error)>) + 4328
11 libLLVM-17.so.1 0x00007f7ce0f17fcf llvm::symbolize::LLVMSymbolizer::getOrCreateModuleInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) + 2479
12 libLLVM-17.so.1 0x00007f7ce0f147aa llvm::Expected<llvm::DIGlobal> llvm::symbolize::LLVMSymbolizer::symbolizeDataCommon<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 58
13 libLLVM-17.so.1 0x00007f7ce0f14769 llvm::symbolize::LLVMSymbolizer::symbolizeData(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 9
14 llvm-symbolizer-17 0x00005586f49f8893
15 llvm-symbolizer-17 0x00005586f49f74ef
16 llvm-symbolizer-17 0x00005586f49f6860
17 libc.so.6 0x00007f7cde029d90
18 libc.so.6 0x00007f7cde029e40 __libc_start_main + 128
19 llvm-symbolizer-17 0x00005586f49f4905
==76700==WARNING: Can't read from symbolizer at fd 8
==76700==WARNING: Can't write to symbolizer at fd 15
LLVM ERROR: Sections with relocations should have an address of 0
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /usr/bin/llvm-symbolizer-17 --demangle --inlines --default-arch=x86_64
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libLLVM-17.so.1 0x00007f12a6ccc406 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1 libLLVM-17.so.1 0x00007f12a6cca5b0 llvm::sys::RunSignalHandlers() + 80
2 libLLVM-17.so.1 0x00007f12a6ccca9b
3 libc.so.6 0x00007f12a5842520
4 libc.so.6 0x00007f12a58969fc pthread_kill + 300
5 libc.so.6 0x00007f12a5842476 raise + 22
6 libc.so.6 0x00007f12a58287f3 abort + 211
7 libLLVM-17.so.1 0x00007f12a6c2eb15 llvm::report_fatal_error(llvm::Twine const&, bool) + 437
8 libLLVM-17.so.1 0x00007f12a6c2e956
9 libLLVM-17.so.1 0x00007f12a83b3002
10 libLLVM-17.so.1 0x00007f12a85855a8 llvm::DWARFContext::create(llvm::object::ObjectFile const&, llvm::DWARFContext::ProcessDebugRelocations, llvm::LoadedObjectInfo const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (llvm::Error)>, std::function<void (llvm::Error)>) + 4328
11 libLLVM-17.so.1 0x00007f12a8717fcf llvm::symbolize::LLVMSymbolizer::getOrCreateModuleInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) + 2479
12 libLLVM-17.so.1 0x00007f12a87147aa llvm::Expected<llvm::DIGlobal> llvm::symbolize::LLVMSymbolizer::symbolizeDataCommon<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 58
13 libLLVM-17.so.1 0x00007f12a8714769 llvm::symbolize::LLVMSymbolizer::symbolizeData(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::object::SectionedAddress) + 9
14 llvm-symbolizer-17 0x00005614bacd4893
15 llvm-symbolizer-17 0x00005614bacd34ef
16 llvm-symbolizer-17 0x00005614bacd2860
17 libc.so.6 0x00007f12a5829d90
18 libc.so.6 0x00007f12a5829e40 __libc_start_main + 128
19 llvm-symbolizer-17 0x00005614bacd0905
==76700==WARNING: Can't read from symbolizer at fd 8
2023-12-12 15:28:16.230671546 +0000 [info]: T7f36dc2ff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 9
2023-12-12 15:28:16.249109513 +0000 [info]: T7f36dc2ff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 9
2023-12-12 15:28:16.284143943 +0000 [info]: T7f36cf7fa640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 2
2023-12-12 15:28:16.300969510 +0000 [info]: T7f36cf7fa640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 2
2023-12-12 15:28:16.331467917 +0000 [info]: T7f36d07fb640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 3
2023-12-12 15:28:16.344776151 +0000 [info]: T7f36d07fb640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 3
2023-12-12 15:28:16.631431376 +0000 [info]: T7f36d27fd640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 6
2023-12-12 15:28:16.632357574 +0000 [info]: T7f36d27fd640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 6
2023-12-12 15:28:16.700983790 +0000 [info]: T7f36dedff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 8
2023-12-12 15:28:16.702557775 +0000 [info]: T7f36dedff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 8
2023-12-12 15:28:16.722868851 +0000 [info]: T7f36d47ff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 7
2023-12-12 15:28:16.724069064 +0000 [info]: T7f36d47ff640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 7
2023-12-12 15:28:16.760789707 +0000 [info]: T7f36d37fe640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 4
2023-12-12 15:28:16.763157102 +0000 [info]: T7f36d37fe640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 4
2023-12-12 15:28:16.772291545 +0000 [info]: T7f36d17fc640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 5
2023-12-12 15:28:16.774043434 +0000 [info]: T7f36d17fc640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 5
2023-12-12 15:28:16.791601069 +0000 [info]: T7f36cd7f8640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 0
2023-12-12 15:28:16.793051823 +0000 [info]: T7f36cd7f8640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(179): Destroyed thread cache id 0
2023-12-12 15:28:16.815875854 +0000 [info]: T7f36ce7f9640: TEST-SHM: memory_manager.cpp:destroy_thread_cache(170): Destroying thread cache id 1
==76700==WARNING: Can't write to symbolizer at fd 15
==76700==WARNING: Failed to use and restart external symbolizer!
==================
WARNING: ThreadSanitizer: data race (pid=76700)
Write of size 8 at 0x7f36e08e5140 by thread T289:
#0 <null> <null> (libipc_unit_test.exec+0x76454d) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libipc_unit_test.exec+0x732a9b) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#2 <null> <null> (libipc_unit_test.exec+0x770bf4) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x797533) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x798fdd) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x745dbb) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x740872) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libipc_unit_test.exec+0x72cd58) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#8 <null> <null> (libipc_unit_test.exec+0x4fc012) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#9 <null> <null> (libipc_unit_test.exec+0x50ea6a) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#10 <null> <null> (libipc_unit_test.exec+0x50e7f3) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#11 <null> <null> (libipc_unit_test.exec+0x412ac9) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#12 <null> <null> (libipc_unit_test.exec+0x412f20) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#13 <null> <null> (libipc_unit_test.exec+0x4fe235) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#14 <null> <null> (libc.so.6+0x45d9e) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
Previous read of size 8 at 0x7f36e08e5140 by thread T290 (mutexes: write M0, write M1):
#0 <null> <null> (libipc_unit_test.exec+0x769d24) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libipc_unit_test.exec+0x76473e) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#2 <null> <null> (libipc_unit_test.exec+0x764564) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x732a9b) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x770bf4) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x797275) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x796720) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libipc_unit_test.exec+0x79a5b6) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#8 <null> <null> (libc.so.6+0x91690) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
Mutex M0 (0x7f36e08032a8) created at:
#0 <null> <null> (libipc_unit_test.exec+0xc19b0) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libipc_unit_test.exec+0x773e98) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#2 <null> <null> (libipc_unit_test.exec+0x762fdd) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x738cdd) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x720a43) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x72d9b9) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x73099a) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libipc_unit_test.exec+0x72d38b) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#8 <null> <null> (libc.so.6+0x29eba) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
Mutex M1 (0x561bce9e3ca8) created at:
#0 <null> <null> (libipc_unit_test.exec+0xc19b0) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libipc_unit_test.exec+0x773e98) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#2 <null> <null> (libipc_unit_test.exec+0x7740bd) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x768c09) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x72d90c) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x73099a) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x72d38b) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libc.so.6+0x29eba) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
Thread T289 (tid=77197, running) created by main thread at:
#0 <null> <null> (libipc_unit_test.exec+0xbffcb) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libstdc++.so.6+0xe6388) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)
#2 <null> <null> (libipc_unit_test.exec+0x644b76) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x61c32d) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x61dc86) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x61ef64) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x637499) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libipc_unit_test.exec+0x646017) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#8 <null> <null> (libipc_unit_test.exec+0x636c51) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#9 <null> <null> (libipc_unit_test.exec+0x48b154) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#10 <null> <null> (libc.so.6+0x29d8f) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
Thread T290 (tid=77198, finished) created by main thread at:
#0 <null> <null> (libipc_unit_test.exec+0xbffcb) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#1 <null> <null> (libstdc++.so.6+0xe6388) (BuildId: 2db998bd67acbfb235c464c0275d4070061695fb)
#2 <null> <null> (libipc_unit_test.exec+0x644b76) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#3 <null> <null> (libipc_unit_test.exec+0x61c32d) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#4 <null> <null> (libipc_unit_test.exec+0x61dc86) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#5 <null> <null> (libipc_unit_test.exec+0x61ef64) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#6 <null> <null> (libipc_unit_test.exec+0x637499) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#7 <null> <null> (libipc_unit_test.exec+0x646017) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#8 <null> <null> (libipc_unit_test.exec+0x636c51) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#9 <null> <null> (libipc_unit_test.exec+0x48b154) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
#10 <null> <null> (libc.so.6+0x29d8f) (BuildId: a43bfc8428df6623cd498c9c0caeb91aec9be4f9)
SUMMARY: ThreadSanitizer: data race (/home/runner/work/ipc/ipc/install/RelWithDebInfo/bin/libipc_unit_test.exec+0x76454d) (BuildId: 7992d1dbec863670d21d6bb09687761223088e76)
What happens next depends. I've seen:
# First the reason in detail: This run semi-reliably (50%+) fails at this point in the server binary:
# 2023-12-20 11:36:11.322479842 +0000 [info]: Tguy: ex_srv.hpp:send_req_b(1428): App_session [0x7b3800008180]:
# Chan B[0]: Filling/send()ing payload (description = [reuse out-message + SHM-handle to modified (unless
# SHM-jemalloc) existing STL data]; alt-payload? = [0]; reusing msg? = [1]; reusing SHM payload? = [1]).
# LLVM ERROR: Sections with relocations should have an address of 0
# PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
# Stack dump:
# 0. Program arguments: /usr/bin/llvm-symbolizer-17 --demangle --inlines --default-arch=x86_64
# Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
# ...
# ==77990==WARNING: Can't read from symbolizer at fd 599
# 2023-12-20 11:36:31.592293322 +0000 [info]: Tguy: ex_srv.hpp:send_req_b(1547): App_session [0x7b3800008180]: Chan B[0]: Filling done. Now to send.
# Sometimes the exact point is different, depending on timing; but in any case it is always the above
# TSAN/LLVM error, at which point the thread gets stuck for a long time (10+ seconds); but eventually gets
# unstuck; however transport_test happens to be testing a feature in a certain way so that a giant blocking
# operation in this thread delays certain processing, causes an internal timeout, and the test exits/fails.
# Sure, we could make some changes to the test for that to not happen, but that's beside the point: TSAN
# at run-time is trying to do something and fails terribly; I have no wish to try to work around that situation;
# literally it says "PLEASE submit a bug report [to clang devs]."
#
# TODO: Revisit; figure out how to not trigger this; re-enable. For the record, I (ygoldfel) cannot reproduce
# in a local clang-17, albeit with libc++ (LLVM STL) instead of libstdc++ (GNU STL). I've also tried to
# reduce optimization to -O1, as well as with and without LTO, and with and without -fno-omit-frame-pointer;
# same result.
We should:
Do note that I've tried a few things (see last paragraph); but that's different from an investigation into the nitty-gritty of it. What's with this relocation thing? We should find out.
The priority is medium. We have TSAN coverage for the problematic tests (which are themselves only a subset), just not with that particular compiler, clang-17 (and locally that works too).
Last point! TSAN is officially in beta. This is not said in the ASAN, UBSAN, or even MSAN docs. So some level of problems is to be expected. THAT said, even though TSAN can be quite a pain in the butt with things like this, it is worth remembering that it has found tricky real problems and is an extremely valuable tool. It is worth fighting for, so to speak.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.