hpc / hxhim Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I get segfaults when I try to run with more than one process. Valgrind finds an "invalid read" here:
==14423== Invalid read of size 8
==14423== at 0x4E46615: client_bget(mdhim*, index*, TransportBGetMessage**) (client.cpp:135)
==14423== by 0x4E64906: _bget_records(mdhim*, index*, void**, int*, int, int, TransportGetMessageOp) (mdhim_private.cpp:334)
==14423== by 0x4E6253B: mdhimGet (mdhim.cpp:577)
==14423== by 0x109308: main (putget.c:86)
==14423== Address 0xffffffff is not stack'd, malloc'd or (recently) free'd
nothing in the test
subdirectory actually tests the hxhim interface.
I'm glad you found a pkgconfig file for leveldb in your environment, but most don't have one:
help says it reads from stdin, but after generating an input file, it does not appear to work:
% ~/work/soft/hxhim/bin/mdhim/cli < examples/mdhim/commands.in
Syntax: /home/robl/work/soft/hxhim/bin/mdhim/cli print?
Input is passed in through stdin in the following formats:
PUT <KEY> <VALUE>
GET|DEL|WHICH <KEY>
BPUT N <KEY_1> <VALUE_1> ... <KEY_N> <VALUE_N>
BGET|BDEL|BWHICH N <KEY_1> ... <KEY_N>
COMMIT
It's almost there, but a little c program that calls mdhimPut and mdhimGet will bring in src/comm/comm.h
because mdhim_private needs CommTransport for one of its structs.
If mdhim_t
were truly opaque, client code would not need mdhim_private.h (and wouldn't need the C++ headers).. Perhaps there needs to be a getter function instead of directly accessing md.primary_index
as is done in the test case? it would also limit the number of header files needed to install.
When I run the 'cli' tool on 200 randomly generated key-value pairs, all the keys end up on server zero.
I haven't modified the mdhim.conf , so if there is a hash setting I'm supposed to set, I haven't.
I learned about this from the Darshan report, but I put a debug printf in _which_db
and confirmed.
I observed this with both the MPI and Thallium transports.
I'm running this on my laptop... would you expect different behavior if I ran this across multiple physical nodes?
Something changed while I was not paying attention for a few months. Now when I run the 'putget' example I get
% ./build/examples/mdhim/putget
Error Reading Configuration
What am I forgetting to set up?
I try to build HXHIM on the Argonne Theta machine (a Cray XC40 I think). I get this error:
from /home/robl/src/hxhim/include/hxhim_private.hpp(9),
from /home/robl/src/hxhim/include/return.hpp(5),
from /home/robl/src/hxhim/include/hxhim.hpp(7),
from /home/robl/src/hxhim/include/hxhim.cpp(8):
/home/robl/src/hxhim/src/transport/transport.hpp(314): error: namespace "std" has no member "enable_if_t"
template <typename T, typename = std::enable_if_t<std::is_convertible<T, TransportMessage>::value> >
This is with the /opt/cray/pe/craype/2.5.14/bin/CC
compiler, which in turn is wrapped around Intel compiler 18.0.0
I would have thought Intel's C++ would support this. Perhaps intel is picking up an older libstdc++ on this system?
With MPICH-3.2.1, at higher number of ranks, MPI_Comm_create
in index_init_comm
will somehow modify md->p->db_opts->path
and md->p->db_opts->name
, but nothing else.
std::cout << md->p->db_opts->name << std::endl; // mdhimTstDB-
if ((ret = MPI_Comm_create(md->mdhim_comm, new_group, &new_comm))
!= MPI_SUCCESS) {
mlog(MDHIM_SERVER_CRIT, "MDHIM Rank %d - "
"Error while creating the new communicator in range_server_init_comm",
md->mdhim_rank);
return MDHIM_ERROR;
}
std::cout << md->p->db_opts->name << std::endl; // 8
Trying to continue from the first std::cout
to the second in gdb results in:
(gdb) bt
#0 0x00007ffff7def84f in _dl_runtime_resolve_sse_vex ()
from /lib64/ld-linux-x86-64.so.2
#1 0x0000000000000000 in ?? ()
This causes the database directory names to be corrupted
$ ls
''$'\034''8-0-0' ''$'\034''8-0-23' ''$'\034''8-0-37_stats' ''$'\034''8-0-51' ''$'\034''8-0-65_stats' ''$'\034''8-0-7_stats' ''$'\034''8-0-93_stats'
''$'\034''8-0-0_stats' ''$'\034''8-0-23_stats' ''$'\034''8-0-38' ''$'\034''8-0-51_stats' ''$'\034''8-0-66' ''$'\034''8-0-8' ''$'\034''8-0-94'
''$'\034''8-0-1' ''$'\034''8-0-24' ''$'\034''8-0-38_stats' ''$'\034''8-0-52' ''$'\034''8-0-66_stats' ''$'\034''8-0-80' ''$'\034''8-0-94_stats'
''$'\034''8-0-10' ''$'\034''8-0-24_stats' ''$'\034''8-0-39' ''$'\034''8-0-52_stats' ''$'\034''8-0-67' ''$'\034''8-0-80_stats' ''$'\034''8-0-95'
''$'\034''8-0-10_stats' ''$'\034''8-0-25' ''$'\034''8-0-39_stats' ''$'\034''8-0-53' ''$'\034''8-0-67_stats' ''$'\034''8-0-81' ''$'\034''8-0-95_stats'
''$'\034''8-0-11' ''$'\034''8-0-25_stats' ''$'\034''8-0-3_stats' ''$'\034''8-0-53_stats' ''$'\034''8-0-68' ''$'\034''8-0-81_stats' ''$'\034''8-0-96'
''$'\034''8-0-11_stats' ''$'\034''8-0-26' ''$'\034''8-0-4' ''$'\034''8-0-54' ''$'\034''8-0-68_stats' ''$'\034''8-0-82' ''$'\034''8-0-96_stats'
''$'\034''8-0-12' ''$'\034''8-0-26_stats' ''$'\034''8-0-40' ''$'\034''8-0-54_stats' ''$'\034''8-0-69' ''$'\034''8-0-82_stats' ''$'\034''8-0-97'
''$'\034''8-0-12_stats' ''$'\034''8-0-27' ''$'\034''8-0-40_stats' ''$'\034''8-0-55' ''$'\034''8-0-69_stats' ''$'\034''8-0-83' ''$'\034''8-0-97_stats'
''$'\034''8-0-13' ''$'\034''8-0-27_stats' ''$'\034''8-0-41' ''$'\034''8-0-55_stats' ''$'\034''8-0-6_stats' ''$'\034''8-0-83_stats' ''$'\034''8-0-98'
''$'\034''8-0-13_stats' ''$'\034''8-0-28' ''$'\034''8-0-41_stats' ''$'\034''8-0-56' ''$'\034''8-0-7' ''$'\034''8-0-84' ''$'\034''8-0-98_stats'
''$'\034''8-0-14' ''$'\034''8-0-28_stats' ''$'\034''8-0-42' ''$'\034''8-0-56_stats' ''$'\034''8-0-70' ''$'\034''8-0-84_stats' ''$'\034''8-0-99'
''$'\034''8-0-14_stats' ''$'\034''8-0-29' ''$'\034''8-0-42_stats' ''$'\034''8-0-57' ''$'\034''8-0-70_stats' ''$'\034''8-0-85' ''$'\034''8-0-99_stats'
''$'\034''8-0-15' ''$'\034''8-0-29_stats' ''$'\034''8-0-43' ''$'\034''8-0-57_stats' ''$'\034''8-0-71' ''$'\034''8-0-85_stats' ''$'\034''8-0-9_stats'
''$'\034''8-0-15_stats' ''$'\034''8-0-2_stats' ''$'\034''8-0-43_stats' ''$'\034''8-0-58' ''$'\034''8-0-71_stats' ''$'\034''8-0-86' CMakeCache.txt
''$'\034''8-0-16' ''$'\034''8-0-3' ''$'\034''8-0-44' ''$'\034''8-0-58_stats' ''$'\034''8-0-72' ''$'\034''8-0-86_stats' CMakeFiles
''$'\034''8-0-16_stats' ''$'\034''8-0-30' ''$'\034''8-0-44_stats' ''$'\034''8-0-59' ''$'\034''8-0-72_stats' ''$'\034''8-0-87' cmake_install.cmake
''$'\034''8-0-17' ''$'\034''8-0-30_stats' ''$'\034''8-0-45' ''$'\034''8-0-59_stats' ''$'\034''8-0-73' ''$'\034''8-0-87_stats' examples
''$'\034''8-0-17_stats' ''$'\034''8-0-31' ''$'\034''8-0-45_stats' ''$'\034''8-0-5_stats' ''$'\034''8-0-73_stats' ''$'\034''8-0-88' gmock_main.pc
''$'\034''8-0-18' ''$'\034''8-0-31_stats' ''$'\034''8-0-46' ''$'\034''8-0-6' ''$'\034''8-0-74' ''$'\034''8-0-88_stats' gmock.pc
''$'\034''8-0-18_stats' ''$'\034''8-0-32' ''$'\034''8-0-46_stats' ''$'\034''8-0-60' ''$'\034''8-0-74_stats' ''$'\034''8-0-89' googletest-build
''$'\034''8-0-19' ''$'\034''8-0-32_stats' ''$'\034''8-0-47' ''$'\034''8-0-60_stats' ''$'\034''8-0-75' ''$'\034''8-0-89_stats' googletest-download
''$'\034''8-0-19_stats' ''$'\034''8-0-33' ''$'\034''8-0-47_stats' ''$'\034''8-0-61' ''$'\034''8-0-75_stats' ''$'\034''8-0-8_stats' googletest-src
''$'\034''8-0-1_stats' ''$'\034''8-0-33_stats' ''$'\034''8-0-48' ''$'\034''8-0-61_stats' ''$'\034''8-0-76' ''$'\034''8-0-9' gtest_main.pc
''$'\034''8-0-2' ''$'\034''8-0-34' ''$'\034''8-0-48_stats' ''$'\034''8-0-62' ''$'\034''8-0-76_stats' ''$'\034''8-0-90' gtest.pc
''$'\034''8-0-20' ''$'\034''8-0-34_stats' ''$'\034''8-0-49' ''$'\034''8-0-62_stats' ''$'\034''8-0-77' ''$'\034''8-0-90_stats' Makefile
''$'\034''8-0-20_stats' ''$'\034''8-0-35' ''$'\034''8-0-49_stats' ''$'\034''8-0-63' ''$'\034''8-0-77_stats' ''$'\034''8-0-91' mdhim.conf
''$'\034''8-0-21' ''$'\034''8-0-35_stats' ''$'\034''8-0-4_stats' ''$'\034''8-0-63_stats' ''$'\034''8-0-78' ''$'\034''8-0-91_stats' mdhim_manifest_1_0_0
''$'\034''8-0-21_stats' ''$'\034''8-0-36' ''$'\034''8-0-5' ''$'\034''8-0-64' ''$'\034''8-0-78_stats' ''$'\034''8-0-92' src
''$'\034''8-0-22' ''$'\034''8-0-36_stats' ''$'\034''8-0-50' ''$'\034''8-0-64_stats' ''$'\034''8-0-79' ''$'\034''8-0-92_stats' test
''$'\034''8-0-22_stats' ''$'\034''8-0-37' ''$'\034''8-0-50_stats' ''$'\034''8-0-65' ''$'\034''8-0-79_stats' ''$'\034''8-0-93'
When I run examples/putget
it will work.. sometimes. But more often I get deadlock. I hooked up a debugger and here's where the threads are
(gdb) thread apply all bt
Thread 2 (Thread 0x7fe12f6d0700 (LWP 12598)):
#0 0x00007fe1314adbf8 in __GI___nanosleep (requested_time=requested_time@entry=
0x7fe12f6cfdb0, remaining=remaining@entry=0x0) at ../sysdeps/unix/sysv/linux/nan
osleep.c:27
#1 0x00007fe1314e0184 in usleep (useconds=<optimized out>) at ../sysdeps/posix/
usleep.c:32
#2 0x000055bc12ed0bcc in only_receive_rangesrv_work ()
#3 0x000055bc12ed0e3e in receive_rangesrv_work ()
#4 0x000055bc12edc484 in listener_thread(void*) ()
#5 0x00007fe1320af7fc in start_thread (arg=0x7fe12f6d0700) at pthread_create.c:
465
#6 0x00007fe1314e9b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:
95
Thread 1 (Thread 0x7fe1331d2300 (LWP 12596)):
#0 0x00007fe1320b6072 in futex_wait_cancelable (private=<optimized out>, expect
ed=0, futex_word=0x55bc14208cb8) at ../sysdeps/unix/sysv/linux/futex-internal.h:
88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x55bc14208c68, cond=0x55bc14
208c90) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x55bc14208c90, mutex=0x55bc14208c68) at pthread_c
ond_wait.c:655
#3 0x000055bc12ef44f2 in get_msg_self(mdhim*) ()
#4 0x000055bc12ef461e in local_client_put(mdhim*, mdhim_putm_t*) ()
#5 0x000055bc12ecebaa in _put_record(mdhim*, index_t*, void*, int, void*, int)
()
#6 0x000055bc12ecd14b in mdhimPut ()
#7 0x000055bc12ecc789 in main ()
(gdb)
This is Ubuntu-17.10 running linux 4.13.0-36-generic compiled with gcc 7.2.0 and whatever pthread library comes along with libc-2.26-0ubuntu2.1
I'll toss in the other libraries involved just to be thorough.
% ldd examples/putget
linux-vdso.so.1 => (0x00007ffe095dc000)
libleveldb.so.1 => /home/robl/work/spack/opt/spack/linux-ubuntu17.10-x86_64/gcc-7/leveldb-1.20-jtbwzhhonona22os62hq4qstjqiedje4/lib/libleveldb.so.1 (0x00007fe679de2000)
libmpi.so.0 => /home/robl/work/soft/mpich/lib/libmpi.so.0 (0x00007fe679550000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe679331000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe678fab000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe678c55000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe678a3e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe67865e000)
libsnappy.so.1 => /home/robl/work/spack/opt/spack/linux-ubuntu17.10-x86_64/gcc-7/snappy-1.1.7-ufim2grx4rm2dcwhaeib2jjkybrepjom/lib/libsnappy.so.1 (0x00007fe678456000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007fe678239000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe678031000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe67a27d000)
If i configure hxhim to install into ${HOME}/work/soft/hxhim
, all I get in that prefix is gtest and gmock.
Where did the hxhim header and library files go?
If you configure MPICH with --enable-g=all
, MPICH will carp at you if you forget to release MPI resources. now, these are not usually big deals, but they can matter for long-running MPI-based services. Also, if you run a 1k process job with this resource logging enabled, you get a lot of spam in the output.
The leaked resources look like this for example/putget
:
leaked context IDs detected: mask=0x7f2951e90a00 mask[0]=0xfffffffe
In direct memory block for handle type REQUEST, 2 handles are still allocated
In direct memory block for handle type GROUP, 1 handles are still allocated
[0] 16 at [0x000055b72e6d1c60], /home/robl/work/mpich/src/mpi/group/grouputil.c[80]
[0] 8 at [0x000055b72e6d0b70], e/robl/work/mpich/src/util/procmap/local_proc.c[97]
[0] 8 at [0x000055b72e6d0ab0], e/robl/work/mpich/src/util/procmap/local_proc.c[95]
[0] 24 at [0x000055b72e6d0860], home/robl/work/mpich/src/mpid/ch3/src/mpid_vc.c[77]
working backwards, the last four lines refer to leaked memory. Those are only important to MPICH developers.
The GROUP leak is simply a matter of not freeing the range server communicator. That won't grow over time.
The REQUEST leak is more concerning, since the put-get example only does one put and one get, yet leaks two MPI REQUEST handles. that one will probably grow over time.
related to #12 i'm sure but if i run the 'putget' example with two mpi processes, mdhim does not cleanly shutdown. one proces is stuck waiting for a response from another process who will never respond --it's in MPI_Finalize:
Here are some backtraces for you. First a process off in Finalize:
#0 0x00007faa68c27951 in __GI___poll (fds=0x556c04010e70, nfds=3, timeout=0) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007faa69b12977 in MPID_nem_tcp_connpoll (in_blocking_poll=1) at /home/robl/work/mpich/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c:1819
#2 0x00007faa69aedae2 in MPID_nem_network_poll (in_blocking_progress=1) at /home/robl/work/mpich/src/mpid/ch3/channels/nemesis/src/mpid_nem_network_poll.c:16
#3 0x00007faa69ac950c in MPID_nem_mpich_test_recv (cell=0x7ffe2ab2fba8, in_fbox=0x7ffe2ab2fb88, in_blocking_progress=1) at /home/robl/work/mpich/src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:863
#4 0x00007faa69acb6a6 in MPIDI_CH3I_Progress (progress_state=0x7ffe2ab2fd10, is_blocking=1) at /home/robl/work/mpich/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:510
#5 0x00007faa699bb9dc in MPIDI_CH3U_VC_WaitForClose () at /home/robl/work/mpich/src/mpid/ch3/src/ch3u_handle_connection.c:383
#6 0x00007faa69a71fe5 in MPID_Finalize () at /home/robl/work/mpich/src/mpid/ch3/src/mpid_finalize.c:110
#7 0x00007faa697f7f71 in PMPI_Finalize () at /home/robl/work/mpich/src/mpi/init/finalize.c:250
#8 0x0000556c027e11b7 in cleanup ()
#9 0x0000556c027e14fa in main ()
and the other process waiting for a response:
#0 0x00007f72dc802bf8 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffd2ea74ec0, remaining=remaining@entry=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:27
#1 0x00007f72dc835184 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
#2 0x000055f968ea4e52 in MPIBase::receive_all_client_responses(int*, int, char***, int**) ()
#3 0x000055f968ea8723 in MPIEndpoint::AddGetReply(int*, int, mdhim_bgetrm_t***) ()
#4 0x00007f72ddfc6f34 in client_bget(mdhim*, index_t*, mdhim_bgetm_t**) () from /home/robl/work/hxhim/build/src/mdhim/libmdhim.so
#5 0x00007f72ddfe5d84 in _bget_records(mdhim*, index_t*, void**, int*, int, int, int) () from /home/robl/work/hxhim/build/src/mdhim/libmdhim.so
#6 0x00007f72ddfe32a1 in mdhimGet () from /home/robl/work/hxhim/build/src/mdhim/libmdhim.so
#7 0x000055f968ea33ea in main ()
I wanted to use the preprpocessor MDHIM_PUT in my darshan module, but the header ./src/mdhim/messages.h
already defined it. that was surprising.
in mdhimPut and mdhimGet the key and value length parameters are 'int' . shouldn't they be size_t ?
I suppose no one is lobbing 3 GiB values into mdhim yet but as someone who spent a big chunk of the 2010s fixing 'int' overflow bugs in MPICH, i'm a little sensitive to the issue.
is_range_server
is defined to return an unsigned int, but MDHIM_ERROR is -1
Line 871 in 4cd8949
how can is_range_server ever return a negative value?
In this darshan module I have
struct mdhim_brm_t *DARSHAN_DECL(mdhimPut)(mdhim_t *md,
void *key, int key_len,
void *value, int value_len,
struct secondary_info *secondary_global_info,
struct secondary_info * secondary_local_info)
and I want to turn that into a server id using as little internal information as possible.
get_range_servers
is close, except I would need a way to get the index out of the mdhim_private_t
. I could do that with get_index
, though maybe the "local index" vs "remote index" logic could be stuffed inside get_range_servers
?
I tried to fix all the valgrind warnings where a C++ new was cleaned up with a C free in pull request #16
There was one spot where I could not do the right thing because sometimes items are malloc()
and sometimes items are new
ed
c.f the new in local_client_commit and malloc in listener_thread() from range_server.cpp
hxhim/src/mdhim/range_server.cpp
Line 1102 in 39ebdca
In tracking down #16 , the code here looked off to me:
Line 948 in 4cd8949
Why didn't you use MPI_COMM_SPLIT_TYPE? Instead of creating groups then creating a comm from the group, MPI_COMM_SPLIT_TYPE can just consult the 'ranks' array and its own rank in the parent communicator.
A fairly minor issue, all in all. I bet there's a good reason you didn't use MPI_COMM_SPLIT_TYPE in the first place.
There is an issue with FixedBufferPool that is preventing more than 3 ranks from running at the same time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.