Giter Club home page Giter Club logo

Comments (14)

sfackler avatar sfackler commented on July 1, 2024

There is zero unsafe code in this library. It is extremely unlikely the crash is due to a bug in r2d2.

The screenshot of the stacktrace in that repo is of a thread blocked in a futex_wait call, which is not a location would be triggering a segfault. You should find the faulting thread.

from r2d2.

eloff avatar eloff commented on July 1, 2024

With no unsafe code, it does seem unlikely. It must be related to the interplay between the libpq based driver in diesel and r2d2. All the core dumps are for r2d2 worker threads.

Note that the crash doesn't happen if I comment out either of the min_idle or max_size lines:

Pool::builder()
        .min_idle(Some(10))
        .max_size(96)
        .build(manager)
        .expect("Failed to create pool.")

from r2d2.

sfackler avatar sfackler commented on July 1, 2024

All the core dumps are for r2d2 worker threads.

What specifically do you mean by that? Core dumps are normally a dump of process memory, including all threads in the application at the time of the crash.

from r2d2.

eloff avatar eloff commented on July 1, 2024

I use Python in my day job, so apologies if I'm uneducated here:

cat /proc/sys/kernel/core_pattern
core.%e.%p
ls core.*
core.r2d2-worker-0.245652  core.r2d2-worker-1.121711  core.r2d2-worker-1.260255
core.r2d2-worker-0.254359  core.r2d2-worker-1.208703  core.r2d2-worker-2.208734
core.r2d2-worker-0.254489  core.r2d2-worker-1.245542
core.r2d2-worker-0.254602  core.r2d2-worker-1.245628

from r2d2.

sfackler avatar sfackler commented on July 1, 2024

I believe that indicates that the crashing thread was an r2d2 worker thread, but not the one that is included in the screenshot.

from r2d2.

eloff avatar eloff commented on July 1, 2024

All the core dumps are for futex_wait, called from run() -> condvar -> parking_lot:

See: https://github.com/eloff/diesel-segfault/blob/main/coredump.png

from r2d2.

sfackler avatar sfackler commented on July 1, 2024

There are multiple thread in each core dump. Click the drop-down that says Thread-4.

from r2d2.

eloff avatar eloff commented on July 1, 2024

That is thread-4, the one that crashed. Here are the others:

https://github.com/eloff/diesel-segfault/blob/main/thread-1.png
https://github.com/eloff/diesel-segfault/blob/main/thread-2.png
https://github.com/eloff/diesel-segfault/blob/main/thread-3.png

from r2d2.

sfackler avatar sfackler commented on July 1, 2024

I have already said that thread-4 is not the one that crashed. I'm not sure how much more explicit I can be.

Thread 1 is trying to take a lock inside of OpenSSL while thread 2 is performing cleanup inside of an atexit handle set by OpenSSL. This is probably openssl/openssl#6214.

from r2d2.

eloff avatar eloff commented on July 1, 2024

I have already said that thread-4 is not the one that crashed. I'm not sure how much more explicit I can be.

Sorry, I thought the little lightning bolt there meant this was the thread that caused the problem. I don't load coredumps often.

You're right and the stacktrace looks similar to the ones in that openssl thread:

* thread #2, name = 'r2d2-worker-0', stop reason = signal SIGSEGV: invalid address (fault address: 0x18)
  * frame #0: 0x00007ffff7f21d46 libpthread.so.0`__GI___pthread_rwlock_wrlock at pthread_rwlock_common.c:604:7
    frame #1: 0x00007ffff7f21d46 libpthread.so.0`__GI___pthread_rwlock_wrlock(rwlock=0x0000000000000000) at pthread_rwlock_wrlock.c:27
    frame #2: 0x00007ffff7867889 libcrypto.so.1.1`CRYPTO_THREAD_write_lock(lock=<unavailable>) at threads_pthread.c:78
    frame #3: 0x00007ffff7839532 libcrypto.so.1.1`RAND_get_rand_method at rand_lib.c:849
    frame #4: 0x00007ffff78399f9 libcrypto.so.1.1`RAND_status at rand_lib.c:958
    frame #5: 0x00007ffff7f7f545 libpq.so.5`___lldb_unnamed_symbol294$$libpq.so.5 + 37
    frame #6: 0x00007ffff7f5ef2e libpq.so.5`___lldb_unnamed_symbol8$$libpq.so.5 + 142
    frame #7: 0x00007ffff7f76be1 libpq.so.5`___lldb_unnamed_symbol139$$libpq.so.5 + 1873
    frame #8: 0x00007ffff7f6362a libpq.so.5`PQconnectPoll + 3578
    frame #9: 0x00007ffff7f64797 libpq.so.5`___lldb_unnamed_symbol35$$libpq.so.5 + 279
    frame #10: 0x00007ffff7f679e8 libpq.so.5`PQconnectdb + 56
    frame #11: 0x00005555555c78ff acme`diesel::pg::connection::raw::RawConnection::establish::hf66f33a54d350dd8(database_url=(data_ptr = "postgres://acme:1021f155ba05dd9cbfa2a955@/acme?sslmode=disable", length = 62)) at raw.rs:23:39
    frame #12: 0x00005555555cfbf1 acme`_$LT$diesel..pg..connection..PgConnection$u20$as$u20$diesel..connection..Connection$GT$::establish::h55fc3bc2dc70ac2f(database_url=(data_ptr = "postgres://acme:1021f155ba05dd9cbfa2a955@/acme?sslmode=disable", length = 62)) at mod.rs:170:9
    frame #13: 0x000055555557a8aa acme`_$LT$diesel..r2d2..ConnectionManager$LT$T$GT$$u20$as$u20$r2d2..ManageConnection$GT$::connect::hb76e3f16e2530650(self=0x00005555556a0888) at r2d2.rs:192:9
    frame #14: 0x0000555555578e5c acme`r2d2::add_connection::inner::_$u7b$$u7b$closure$u7d$$u7d$::he22fd9af265b9649 at lib.rs:241:24
    frame #15: 0x000055555557a608 acme`scheduled_thread_pool::thunk::Thunk$LT$$LP$$RP$$C$R$GT$::new::_$u7b$$u7b$closure$u7d$$u7d$::hde78cba3c8d5d267((null)=<unavailable>) at thunk.rs:20:35
    frame #16: 0x000055555557a64d acme`_$LT$F$u20$as$u20$scheduled_thread_pool..thunk..Invoke$LT$A$C$R$GT$$GT$::invoke::h6651609b8afd4a0d(self=0x0000555555685680, arg=<unavailable>) at thunk.rs:50:9
    frame #17: 0x00005555555de540 acme`scheduled_thread_pool::thunk::Thunk$LT$A$C$R$GT$::invoke::h0d3f0d3691c28468(self=Thunk<(), ()> @ 0x00007ffff6af7ee0, arg=<unavailable>) at thunk.rs:35:9
    frame #18: 0x00005555555e1007 acme`scheduled_thread_pool::Worker::run_job::h204e5ba542e425b0(self=0x00007ffff6af8940, job=Job @ 0x00007ffff6af83b8) at lib.rs:364:33
    frame #19: 0x00005555555e0a82 acme`scheduled_thread_pool::Worker::run::_$u7b$$u7b$closure$u7d$$u7d$::h9f5de07f20041366 at lib.rs:326:61
    frame #20: 0x00005555555dd750 acme`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h922b39ce782e6da9(self=<unavailable>, _args=<unavailable>) at unwind_safe.rs:271:9
    frame #21: 0x00005555555e6efa acme`std::panicking::try::do_call::h353343075a44a7a2(data="\b\x88\xaf���) at panicking.rs:492:40
    frame #22: 0x00005555555e7b5b acme`__rust_try + 27
    frame #23: 0x00005555555e6ce6 acme`std::panicking::try::hbd1af4a4897eae11(f=<unavailable>) at panicking.rs:456:19
    frame #24: 0x00005555555dcba1 acme`std::panic::catch_unwind::h8487ef104e1ed0b7(f=<unavailable>) at panic.rs:137:14
    frame #25: 0x00005555555e0a0d acme`scheduled_thread_pool::Worker::run::h0b0fddce95686635(self=0x00007ffff6af8940) at lib.rs:326:21
    frame #26: 0x00005555555e08b0 acme`scheduled_thread_pool::Worker::start::_$u7b$$u7b$closure$u7d$$u7d$::h685180c0a62bc588 at lib.rs:320:30
    frame #27: 0x00005555555dcb6e acme`std::sys_common::backtrace::__rust_begin_short_backtrace::h148f92b33cf1c7fb(f={closure_env#0} @ 0x00007ffff6af8968) at backtrace.rs:122:18
    frame #28: 0x00005555555eb49b acme`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h6c24d17a05224da0 at mod.rs:514:17
    frame #29: 0x00005555555dd71f acme`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h329ec1e2ffc93f6a(self=AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<scheduled_thread_pool::{impl#8}::start::{closure_env#0}, ()>> @ 0x00007ffff6af89b0, _args=<unavailable>) at unwind_safe.rs:271:9
    frame #30: 0x00005555555e6f61 acme`std::panicking::try::do_call::h7ad9760e4b97e4ac(data="�,jUUU") at panicking.rs:492:40
    frame #31: 0x00005555555e7b5b acme`__rust_try + 27
    frame #32: 0x00005555555e6d8f acme`std::panicking::try::hf0bb4c1fa863e31d(f=AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<scheduled_thread_pool::{impl#8}::start::{closure_env#0}, ()>> @ 0x00007ffff6af8a80) at panicking.rs:456:19
    frame #33: 0x00005555555dcbcf acme`std::panic::catch_unwind::h86a8b4bcde1746d1(f=AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<scheduled_thread_pool::{impl#8}::start::{closure_env#0}, ()>> @ 0x00007ffff6af8ad0) at panic.rs:137:14
    frame #34: 0x00005555555eb26d acme`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h2c1a5cd6dd347e00 at mod.rs:513:30
    frame #35: 0x00005555555e482f acme`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h662c6075d9edf8bc((null)=0x00005555556a0990, (null)=<unavailable>) at function.rs:248:5
    frame #36: 0x0000555555623c63 acme`std::sys::unix::thread::Thread::new::thread_start::h62ca48b42d48a8fc [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h49f797984e2121bf at boxed.rs:1940:9
    frame #37: 0x0000555555623c5d acme`std::sys::unix::thread::Thread::new::thread_start::h62ca48b42d48a8fc [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hfa4f3d0ee6440e0b at boxed.rs:1940
    frame #38: 0x0000555555623c56 acme`std::sys::unix::thread::Thread::new::thread_start::h62ca48b42d48a8fc at thread.rs:108
    frame #39: 0x00007ffff7f1c609 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:477:8
    frame #40: 0x00007ffff7cec133 libc.so.6`__clone at clone.S:95

from r2d2.

eloff avatar eloff commented on July 1, 2024

In that issue a dev states:

All threads using openssl must be done before main thread exits. I'm not so sure that's unreasonable, or out of step with other things. But it's not going to be addressed in this release. Maybe not ever, just to set expectations accurately.

Is there a way I can join() all the worker threads for the pool r2d2 is using at the end of main() to meet that requirement?

from r2d2.

sfackler avatar sfackler commented on July 1, 2024

I'd recommend disabling the atexit callback instead. One way to do this is to call openssl-sys's init function at the very beginning of your program.

from r2d2.

eloff avatar eloff commented on July 1, 2024

I'd recommend disabling the atexit callback instead. One way to do this is to call openssl-sys's init function at the very beginning of your program.

I'll give that a try and report back.

from r2d2.

eloff avatar eloff commented on July 1, 2024

You called it right. Bringing in the latest openssl crate (0.10.43) for rust as an explicit dependency, and calling openssl::init() at the top of main() fixes the crash.

This recent commit to openssl rust here is the one that fixes it: https://github.com/sfackler/rust-openssl/pull/1649/files#r1040260747, I guess I have openssl 111b on my machine. I checked under the debugger that it is that line being executed that disables the atexit handler.

from r2d2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.