Comments (4)
I've attached to one of the process in deadlock, so I am going to dump some backtrace's here in no particular order:
#0 0x00007f7b7647bb7d in __pthread_join (threadid=140166649870080, thread_return=0x0) at pthread_join.c:90
#1 0x0000557a20de3910 in std::sys::unix::thread::Thread::join () at libstd/sys/unix/thread.rs:176
#2 0x0000557a20d53ea2 in <std::thread::JoinInner<T>>::join (self=<optimized out>)
at /checkout/src/libstd/thread/mod.rs:1200
#3 <std::thread::JoinHandle<T>>::join (self=...) at /checkout/src/libstd/thread/mod.rs:1322
#4 0x0000557a20d6293a in dream_go::mcts::predict_aux (server=<optimized out>, num_workers=64, starting_tree=...,
starting_point=<optimized out>, starting_color=<optimized out>) at src/mcts/mod.rs:347
#5 0x0000557a20d62e99 in dream_go::mcts::predict (server=0x7ffc3587ae70, num_workers=..., starting_tree=...,
starting_point=0x7f7b69abb440, starting_color=dream_go::go::Color::Black) at src/mcts/mod.rs:388
#6 0x0000557a20d009bb in dream_go::gtp::Gtp::generate_move (self=0x7ffc35895910, id=...,
color=dream_go::go::Color::Black) at src/gtp/mod.rs:266
#7 0x0000557a20d013ba in dream_go::gtp::Gtp::process (self=0x7ffc35895910, id=..., cmd=...) at src/gtp/mod.rs:478
#8 0x0000557a20d05697 in dream_go::gtp::run () at src/gtp/mod.rs:549
#9 0x0000557a20cfdb03 in dream_go::main () at src/main.rs:89
#0 0x00007f7b75f92297 in accept4 (fd=11, addr=..., addr_len=0x7f7b6cc12718, flags=524288)
at ../sysdeps/unix/sysv/linux/accept4.c:32
#1 0x00007f7b73044216 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007f7b7303880d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007f7b73044e80 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f7b7647a7fc in start_thread (arg=0x7f7b6cc13700) at pthread_create.c:465
#5 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b75f84951 in __GI___poll (fds=0x7f7b6ba10000, nfds=10, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007f7b7304348b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007f7b730a878f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007f7b73044e80 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f7b7647a7fc in start_thread (arg=0x7f7b6c412700) at pthread_create.c:465
#5 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481786 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f7b6b7fe740, expected=0,
futex_word=0x7f7b6aa0d028) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7f7b6b7fe740, mutex=0x7f7b75094368, cond=0x7f7b6aa0d000) at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7f7b6aa0d000, mutex=0x7f7b75094368, abstime=0x7f7b6b7fe740) at pthread_cond_wait.c:667
#3 0x00007f7b73045a57 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f7b72ffe2c7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007f7b73044e80 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007f7b7647a7fc in start_thread (arg=0x7f7b6b7ff700) at pthread_create.c:465
#7 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b73d07780, cond=0x7f7b73d077b0) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b73d077b0, mutex=0x7f7b73d07780) at pthread_cond_wait.c:655
#3 0x0000557a20d28837 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b73d07780, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d1195c in dream_go::parallel::service::worker_thread (is_running=..., state=..., queue=...)
at src/parallel/service.rs:87
#7 0x0000557a20d28000 in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#8 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#9 0x0000557a20d1e8a0 in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#10 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#11 0x0000557a20d1e4c3 in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#12 0x0000557a20d28124 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#13 0x0000557a20d554fd in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#14 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73d07ab0, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#15 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#16 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#17 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#18 0x00007f7b7647a7fc in start_thread (arg=0x7f7b5d3fc700) at pthread_create.c:465
#19 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7aee670058)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7aee670000, cond=0x7f7aee670030) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7aee670030, mutex=0x7f7aee670000) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7aee670000, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d1d762 in dream_go::mcts::forward::{{closure}} () at src/mcts/mod.rs:104
#9 dream_go::mcts::global_cache::get_or_insert (board=0x7f7b175f6d20, color=<optimized out>, supplier=...)
at src/mcts/global_cache.rs:196
#10 0x0000557a20d43a60 in dream_go::mcts::forward (server=<optimized out>, board=<optimized out>, color=<optimized out>)
at src/mcts/mod.rs:96
#11 0x0000557a20d623a7 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:259
#12 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#13 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#14 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#15 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#16 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#17 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#18 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#19 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#20 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#21 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73aa4200, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#22 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#23 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#24 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#25 0x00007f7b7647a7fc in start_thread (arg=0x7f7b175ff700) at pthread_create.c:465
#26 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This looks suspicious...
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b2aa63088)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b2aa63090, cond=0x7f7b2aa63060) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b2aa63060, mutex=0x7f7b2aa63090) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b2aa63090, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d62211 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:267
#9 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#10 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#11 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#12 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#13 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#14 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#15 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#16 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#17 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#18 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b744fa600, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#19 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#20 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#21 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#22 0x00007f7b7647a7fc in start_thread (arg=0x7f7b17bfe700) at pthread_create.c:465
#23 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This also looks suspicious...
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b2a40d088)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b2a40d090, cond=0x7f7b2a40d060) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b2a40d060, mutex=0x7f7b2a40d090) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b2a40d090, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d1d762 in dream_go::mcts::forward::{{closure}} () at src/mcts/mod.rs:104
#9 dream_go::mcts::global_cache::get_or_insert (board=0x7f7b17df6d20, color=<optimized out>, supplier=...)
at src/mcts/global_cache.rs:196
#10 0x0000557a20d43a60 in dream_go::mcts::forward (server=<optimized out>, board=<optimized out>, color=<optimized out>)
at src/mcts/mod.rs:96
#11 0x0000557a20d623a7 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:259
#12 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#13 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#14 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#15 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#16 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#17 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#18 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#19 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#20 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#21 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b744fb400, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#22 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#23 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#24 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#25 0x00007f7b7647a7fc in start_thread (arg=0x7f7b17dff700) at pthread_create.c:465
#26 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This also looks suspicious...
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b2a327058)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b2a327000, cond=0x7f7b2a327030) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b2a327030, mutex=0x7f7b2a327000) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b2a327000, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d62211 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:267
#9 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#10 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#11 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#12 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#13 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#14 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#15 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#16 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#17 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#18 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b744fc200, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#19 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#20 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#21 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#22 0x00007f7b7647a7fc in start_thread (arg=0x7f7b183ff700) at pthread_create.c:465
#23 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b29c14118)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b29c140c0, cond=0x7f7b29c140f0) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b29c140f0, mutex=0x7f7b29c140c0) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b29c140c0, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d62211 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:267
#9 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#10 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#11 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#12 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#13 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#14 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#15 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#16 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#17 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#18 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73a9e000, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#19 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#20 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#21 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#22 0x00007f7b7647a7fc in start_thread (arg=0x7f7b18bfe700) at pthread_create.c:465
#23 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b28c6a028)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b28c6a030, cond=0x7f7b28c6a000) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b28c6a000, mutex=0x7f7b28c6a030) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b28c6a030, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d62211 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:267
#9 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#10 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#11 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#12 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#13 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#14 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#15 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#16 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#17 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#18 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73a9ee00, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#19 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#20 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#21 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#22 0x00007f7b7647a7fc in start_thread (arg=0x7f7b18dff700) at pthread_create.c:465
#23 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b29998058)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b29998000, cond=0x7f7b29998030) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b29998030, mutex=0x7f7b29998000) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b29998000, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d1d762 in dream_go::mcts::forward::{{closure}} () at src/mcts/mod.rs:104
#9 dream_go::mcts::global_cache::get_or_insert (board=0x7f7b195f6d20, color=<optimized out>, supplier=...)
at src/mcts/global_cache.rs:196
#10 0x0000557a20d43a60 in dream_go::mcts::forward (server=<optimized out>, board=<optimized out>, color=<optimized out>)
at src/mcts/mod.rs:96
#11 0x0000557a20d623a7 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:259
#12 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#13 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#14 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#15 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#16 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#17 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#18 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#19 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#20 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#21 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73a9fc00, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#22 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#23 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#24 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#25 0x00007f7b7647a7fc in start_thread (arg=0x7f7b195ff700) at pthread_create.c:465
#26 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#0 0x00007f7b76481072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f7b29251148)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f7b29251150, cond=0x7f7b29251120) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f7b29251120, mutex=0x7f7b29251150) at pthread_cond_wait.c:655
#3 0x0000557a20d28787 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>)
at /checkout/src/libstd/sys/unix/condvar.rs:78
#4 std::sys_common::condvar::Condvar::wait (mutex=0x7f7b29251150, self=<optimized out>)
at /checkout/src/libstd/sys_common/condvar.rs:51
#5 std::sync::condvar::Condvar::wait (self=<optimized out>, guard=...) at /checkout/src/libstd/sync/condvar.rs:212
#6 0x0000557a20d45739 in <dream_go::parallel::one_shot_channel::OneReceiver<T>>::recv (this=...)
at src/parallel/one_shot_channel.rs:83
#7 0x0000557a20d12275 in <dream_go::parallel::service::ServiceGuard<'a, I>>::send (self=<optimized out>, req=...)
at src/parallel/service.rs:205
#8 0x0000557a20d62211 in dream_go::mcts::predict_worker (context=..., server=...) at src/mcts/mod.rs:267
#9 0x0000557a20d3046e in dream_go::mcts::predict_aux::{{closure}}::{{closure}} () at src/mcts/mod.rs:343
#10 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:133
#11 0x0000557a20d2803e in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:406
#12 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>)
at /checkout/src/libstd/panic.rs:300
#13 0x0000557a20d1e83e in std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:479
#14 0x0000557a20def70f in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#15 0x0000557a20d1e75c in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:458
#16 0x0000557a20d28190 in std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:365
#17 0x0000557a20d5533f in std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:405
#18 <F as alloc::boxed::FnBox<A>>::call_box (self=0x7f7b73aa0a00, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:817
#19 0x0000557a20dde0b8 in _$LT$alloc..boxed..Box$LT$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hb0b36c038cd2d960 () at /checkout/src/liballoc/boxed.rs:827
#20 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#21 0x0000557a20de38c9 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#22 0x00007f7b7647a7fc in start_thread (arg=0x7f7b19bff700) at pthread_create.c:465
#23 0x00007f7b75f90b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I think this is enough to see what is going on. There are monte carlo search threads waiting on the service to respond to some requests, and all of the service workers are asleep, so there is some race condition in there.
from dream-go.
This might be a duplicate of #20 since the following is present in the console when logging stderr:
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:335:21
note: Run with `RUST_BACKTRACE=1` for a backtrace.
This is a fairly general error message, so might be different but would not complain if I could catch two birds with one stone.
from dream-go.
Setting the number of service threads to one seems to be a workaround for this issue. This suggest the race condition is limited to either the worker_thread
or the process
method.
Current hypothesis is that the race condition is that the has_more = false
thread is not always executed after the previous thread when using more than one service thread. To fix this we need to use the same lock inside of the server as inside. Or provide another way to acquire whether there are more requests pending.
from dream-go.
As suggested in the previous post, by enforcing an order of the requests in the Service
this issue seemed to have disappeared.
from dream-go.
Related Issues (20)
- Re-balance search tree size vs neural network size HOT 2
- Scoring and `kgs-genmove_cleanup` improvements
- About MCTSnet HOT 2
- Introduce a new self-play mode
- Poor GPU utilization observed during play HOT 2
- Re-factor MCTS code to use asynchronous framework
- Shape of the convolution in the policy head
- Monte-Carlo tree search as regularized policy optimization HOT 3
- Investigate MCTS parallelism degradation HOT 7
- Prune nodes from the search tree that are obviously bad HOT 1
- Re-implement `INT8x32_CONFIG` support during inference
- Investigate SWISH as activation function in cuDNN
- GPU vs CPU matrix multiplication HOT 1
- Sparse Quantized Model
- MLP-Mixer: An all-MLP Architecture for Vision HOT 7
- NNUE (ƎUИИ Efficiently Updatable Neural Network) for Go HOT 5
- Triton: Open-Source GPU Programming for Neural Networks
- Long startup times due to `cudnnBuildRNNDynamic`
- 2022 TCGA Computer Go Tournament is coming! HOT 1
- Unsound uninitialized array
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dream-go.