Giter Club home page Giter Club logo

Comments (14)

lesteve avatar lesteve commented on August 20, 2024 1

Maybe this is OSX-specific, I can not reproduce the issue on my Linux machine:

❯ pytest joblib/test/test_parallel.py -k nested_parallel_warning -v
============================================================================================================ test session starts ============================================================================================================
platform linux -- Python 3.9.13, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/lesteve/miniconda3/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.9.13', 'Platform': 'Linux-5.15.0-47-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '6.2.5', 'py': '1.11.0', 'pluggy': '1.0.0'}, 'Plugins': {'anyio': '3.4.0', 'forked': '1.4.0', 'metadata': '2.0.1', 'cov': '3.0.0', 'asyncio': '0.16.0', 'json-report': '1.5.0', 'xdist': '2.5.0'}}
rootdir: /home/lesteve/dev/joblib, configfile: setup.cfg
plugins: anyio-3.4.0, forked-1.4.0, metadata-2.0.1, cov-3.0.0, asyncio-0.16.0, json-report-1.5.0, xdist-2.5.0
collected 388 items / 387 deselected / 1 selected                                                                                                                                                                                           

joblib/test/test_parallel.py::test_nested_parallel_warnings[loky-loky-False] PASSED                                                                                                                                                   [100%]

===================================================================================================== 1 passed, 387 deselected in 2.95s =====================================================================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 140435047106112)>, (<unlocked _thread.lock object at 0x7fb976c96f90>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7fb976c3f5e0>))]
[DEBUG:MainProcess:ExecutorManagerThread] releasing worker exit lock on 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] releasing worker exit lock on 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] found 2 processes to stop
[DEBUG:MainProcess:ExecutorManagerThread] sent 2 sentinels to the call queue
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining 2 processes
[DEBUG:MainProcess:ExecutorManagerThread] joining process 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] joined process 1320771: LokyProcess-1
[DEBUG:MainProcess:ExecutorManagerThread] joining process 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] joined process 1320772: LokyProcess-2
[DEBUG:MainProcess:ExecutorManagerThread] executor management thread clean shutdown of worker processes: [1320771, 1320772]
[DEBUG:MainProcess:MainThread] Successfully deleted /dev/shm/joblib_memmapping_folder_1320763_a2813182f81947a4b81141fcd858f043_8fdec611b3e14f8a8d7cd5e07070fcce
[DEBUG:MainProcess:MainThread] Successfully deleted /dev/shm/joblib_memmapping_folder_1320763_a7f2a043e6f1478f89f82ddcfda4c2a3_bd11a0b27a244a6a858aa8ea46cf3120
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers

from loky.

lesteve avatar lesteve commented on August 20, 2024 1

My feeling is that it could well be the same issue.

Another variation I tried:
On the main branch, edit joblib/test_parallel.py and uncomment the nested parallel test with loky nested within loky. I run only the loky within loky test, the test pass but I get a timeout at shutdown, something similar to #363 (comment).

❯ pytest joblib/test/test_parallel.py -v -k 'nested_parallel_warning and loky-loky'
==================================================================== test session starts ====================================================================
platform linux -- Python 3.9.13, pytest-7.1.3, pluggy-1.0.0 -- /home/lesteve/miniconda3/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.9.13', 'Platform': 'Linux-5.15.0-47-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '7.1.3', 'py': '1.11.0', 'pluggy': '1.0.0'}, 'Plugins': {'anyio': '3.4.0', 'forked': '1.4.0', 'metadata': '2.0.1', 'cov': '3.0.0', 'asyncio': '0.16.0', 'json-report': '1.5.0', 'xdist': '2.5.0'}}
rootdir: /home/lesteve/dev/joblib, configfile: setup.cfg
plugins: anyio-3.4.0, forked-1.4.0, metadata-2.0.1, cov-3.0.0, asyncio-0.16.0, json-report-1.5.0, xdist-2.5.0
collected 393 items / 392 deselected / 1 selected                                                                                                           

joblib/test/test_parallel.py::test_nested_parallel_warnings[loky-loky-False] PASSED                                                                   [100%]

============================================================= 1 passed, 392 deselected in 3.51s =============================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 140602304767552)>, (<unlocked _thread.lock object at 0x7fe063e7aa80>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7fe063e7af70>))]
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining processes
Timeout (0:01:00)!
Thread 0x00007fe0866e2640 (most recent call first):
  File "/home/lesteve/dev/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 56 in poll
  File "/home/lesteve/dev/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 77 in wait
  File "/home/lesteve/miniconda3/lib/python3.9/multiprocessing/process.py", line 149 in join
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 819 in join_executor_internals
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 564 in run
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fe069756640 (most recent call first):
  File "/home/lesteve/miniconda3/lib/python3.9/concurrent/futures/thread.py", line 81 in _worker
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 917 in run
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007fe08ccb3740 (most recent call first):
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1080 in _wait_for_tstate_lock
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1060 in join
  File "/home/lesteve/dev/joblib/joblib/externals/loky/process_executor.py", line 193 in _python_exit
  File "/home/lesteve/miniconda3/lib/python3.9/threading.py", line 1447 in _shutdown

On your debug-loky-deadlock-in-joblib branch, the shutdown does not time out, but I get lingering loky processes for some reason. Looking at joblib/joblib@master...ogrisel:joblib:debug-loky-deadlock-in-joblib it only looks like additional debug code, so not sure what is happening.

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

We get the following output if we wait for the timeout set in the pytest_unconfigure step of the conftest.py file:

======================================================= 6 passed, 2 skipped, 1219 deselected in 2.44s =======================================================
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6187708416)>, (<unlocked _thread.lock object at 0x103493a40>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x103473c40>))]
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:ExecutorManagerThread] joining processes
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
Timeout (0:00:05)!
Thread 0x0000000170d0f000 (most recent call first):
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 56 in poll
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/backend/popen_loky_posix.py", line 77 in wait
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 149 in join
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/process_executor.py", line 819 in join_executor_internals
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/process_executor.py", line 564 in run
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00000001006f4580 (most recent call first):
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1116 in _wait_for_tstate_lock
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1096 in join
  File "/Users/ogrisel/code/joblib/joblib/externals/loky/process_executor.py", line 193 in _python_exit
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/threading.py", line 1537 in _shutdown

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

Thanks for the feedback, this is interesting.

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

Have you tried with the latest pytest version? I am running 7.1.3.

from loky.

lesteve avatar lesteve commented on August 20, 2024

It works fine with pytest 7.1.3 on my machine too.

from loky.

lesteve avatar lesteve commented on August 20, 2024

One thing I noticed though is that there are loky processes hanging around after running the tests.

❯ ps aux | grep loky | grep -v grep
lesteve   352648  2.2  0.2 515880 48040 pts/0    Sl   10:08   0:00 /home/lesteve/miniconda3/bin/python -c from joblib.externals.loky.backend.resource_tracker import main; main(16, False)
lesteve   352682  7.7  0.7 315368 114188 pts/0   Sl   10:08   0:01 /home/lesteve/miniconda3/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-2:1 --pipe 15
lesteve   352683  7.7  0.7 315368 114144 pts/0   Sl   10:08   0:01 /home/lesteve/miniconda3/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-1:1 --pipe 15
lesteve   352684  7.7  0.7 315368 114204 pts/0   Sl   10:08   0:01 /home/lesteve/miniconda3/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-2:2 --pipe 16
lesteve   352686  7.6  0.7 315368 114024 pts/0   Sl   10:08   0:01 /home/lesteve/miniconda3/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-1:2 --pipe 16

I noticed this because, running the tests multiple times, I got some errors because my memory was completely full.

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

Very interesting. Can you please open a dedicated issue?

It would be great to identify if this is caused by a specific test and see if you can reproduce this with a simple Python script? Either using joblib or using loky directly?

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

I confirm I can reproduce the deadlock and leaked loky workers by isolating a reproducer in a simple script outside of pytest:

from joblib import Parallel, delayed
import logging
import warnings
import multiprocessing as mp
import os


log = mp.util.log_to_stderr(logging.DEBUG)
log.handlers[0].setFormatter(logging.Formatter(
    '[%(levelname)s:%(processName)s:%(threadName)s] %(message)s'))


def nested_parallel_func(inner_n_jobs, backend):
    res = Parallel(n_jobs=inner_n_jobs, backend=backend)(
        delayed(lambda x: x ** 2)(i) for i in range(3))
    print(f"[{os.getpid()}] parallel_func done")
    return res


def _assert_warning_nested(backend, inner_n_jobs):
    with warnings.catch_warnings(record=True) as records:
        warnings.simplefilter("always")
        nested_parallel_func(backend=backend, inner_n_jobs=inner_n_jobs)

    import faulthandler
    faulthandler.dump_traceback_later(5, exit=True)
    print("_assert_warning_nested ok")
    messages = [w.message for w in records]
    assert not messages
    return True


res = Parallel(n_jobs=2, backend="loky")(
    delayed(_assert_warning_nested)(
        backend="loky", inner_n_jobs=2,
    )
    for _ in range(5)
)

after running this, I get the usual timeout triggered by faulthandler and the following leaked processes:

ps aux | grep loky  | grep -v grep
ogrisel          32667   0.0  0.2 409343808  35280 s016  S     6:21PM   0:00.80 /Users/ogrisel/mambaforge/envs/dev/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-2:2 --pipe 33
ogrisel          32666   0.0  0.2 409353024  35856 s016  S     6:21PM   0:00.95 /Users/ogrisel/mambaforge/envs/dev/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-2:1 --pipe 31
ogrisel          32665   0.0  0.2 409615168  36368 s016  S     6:21PM   0:00.87 /Users/ogrisel/mambaforge/envs/dev/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-1:2 --pipe 33
ogrisel          32664   0.0  0.2 409474880  35824 s016  S     6:21PM   0:00.96 /Users/ogrisel/mambaforge/envs/dev/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-1:1 --pipe 31
ogrisel          32660   0.0  0.2 409617344  35040 s016  S     6:21PM   0:01.10 /Users/ogrisel/mambaforge/envs/dev/bin/python -c from joblib.externals.loky.backend.resource_tracker import main; main(3, False)

which I kill manually with: ps aux | grep loky | grep -v grep | awk '{print $2}' | xargs kill -9. I will investigate further and try to remove the joblib layer.

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

I simplified the reproducer to only use loky. It turns out that the warnings.catch_warning thing is not needed to reproduce the problem. Here is the new version of the reproducer and the full log of a run:

from loky import get_reusable_executor
import logging
import multiprocessing as mp
from time import sleep
import faulthandler


def enable_mp_logging():
    log = mp.util.log_to_stderr(logging.DEBUG)
    log.handlers[0].setFormatter(logging.Formatter(
        '[%(levelname)s:%(processName)s:%(threadName)s] %(message)s'))
    return log


def inner_parallel_func(j):
    sleep(0.1)
    # enable_mp_logging()  # apparently not needed
    mp.util.debug(f"inner_parallel_func({j}) done")
    faulthandler.dump_traceback_later(5, exit=True)
    return j ** 2


def outer_parallel_func(i):
    executor = get_reusable_executor(max_workers=2)
    list(executor.map(inner_parallel_func, range(2)))
    enable_mp_logging()
    mp.util.debug(f"outer_parallel_func({i}) done")
    # faulthandler.dump_traceback_later(10, exit=True)


enable_mp_logging()

executor = get_reusable_executor(max_workers=2)
list(executor.map(outer_parallel_func, range(2)))
mp.util.debug("main done")
[DEBUG:MainProcess:MainThread] Create a executor with max_workers=2.
[DEBUG:MainProcess:MainThread] created semlock with handle 3 and name "None"
[DEBUG:MainProcess:MainThread] launching resource tracker: ['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-c', 'from loky.backend.resource_tracker import main; main(4, False)']
[DEBUG:MainProcess:MainThread] created semlock with handle 9 and name "None"
[DEBUG:MainProcess:MainThread] created semlock with handle 10 and name "None"
[DEBUG:MainProcess:MainThread] created semlock with handle 11 and name "None"
[DEBUG:MainProcess:MainThread] created semlock with handle 14 and name "None"
[DEBUG:MainProcess:MainThread] created semlock with handle 15 and name "None"
[DEBUG:MainProcess:MainThread] ProcessPoolExecutor is setup
[DEBUG:MainProcess:MainThread] created semlock with handle 16 and name "None"
[DEBUG:MainProcess:MainThread] launched python with pid 83371 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-1', '--pipe', '20']
[DEBUG:MainProcess:MainThread] created semlock with handle 19 and name "None"
[DEBUG:MainProcess:MainThread] launched python with pid 83372 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-2', '--pipe', '22']
[DEBUG:MainProcess:MainThread] Adjust process count : {83371: <LokyProcess name='LokyProcess-1' pid=83371 parent=83368 started>, 83372: <LokyProcess name='LokyProcess-2' pid=83372 parent=83368 started>}
[DEBUG:MainProcess:MainThread] _start_executor_manager_thread called
[DEBUG:MainProcess:ExecutorManagerThread] Queue._start_thread()
[DEBUG:MainProcess:ExecutorManagerThread] doing self._thread.start()
[DEBUG:MainProcess:QueueFeederThread] starting thread to feed data to pipe
[DEBUG:MainProcess:ExecutorManagerThread] ... done self._thread.start()
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 9 and name "/loky-83368-jhh_7z62"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 10 and name "/loky-83368-j1ci5a0d"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 11 and name "/loky-83368-li11tprj"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 14 and name "/loky-83368-16mdkxj0"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 15 and name "/loky-83368-g2tt7p2i"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 3 and name "/loky-83368-mnm7ukax"
[DEBUG:LokyProcess-1:MainThread] recreated blocker with handle 16 and name "/loky-83368-lez89s5g"
[INFO:LokyProcess-1:MainThread] child process calling self.run()
[DEBUG:LokyProcess-1:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-1:MainThread] Create a executor with max_workers=2.
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 14 and name "None"
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 21 and name "None"
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 22 and name "None"
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 23 and name "None"
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 26 and name "None"
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 27 and name "None"
[DEBUG:LokyProcess-1:MainThread] ProcessPoolExecutor is setup
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 28 and name "None"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 9 and name "/loky-83368-jhh_7z62"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 10 and name "/loky-83368-j1ci5a0d"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 11 and name "/loky-83368-li11tprj"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 14 and name "/loky-83368-16mdkxj0"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 15 and name "/loky-83368-g2tt7p2i"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 3 and name "/loky-83368-mnm7ukax"
[DEBUG:LokyProcess-2:MainThread] recreated blocker with handle 19 and name "/loky-83368-eal6_mul"
[INFO:LokyProcess-2:MainThread] child process calling self.run()
[DEBUG:LokyProcess-2:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-2:MainThread] Create a executor with max_workers=2.
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 14 and name "None"
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 20 and name "None"
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 22 and name "None"
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 23 and name "None"
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 26 and name "None"
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 27 and name "None"
[DEBUG:LokyProcess-2:MainThread] ProcessPoolExecutor is setup
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 28 and name "None"
[DEBUG:LokyProcess-1:MainThread] launched python with pid 83373 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-1:1', '--pipe', '31']
[DEBUG:LokyProcess-1:MainThread] created semlock with handle 30 and name "None"
[DEBUG:LokyProcess-2:MainThread] launched python with pid 83374 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-2:1', '--pipe', '31']
[DEBUG:LokyProcess-2:MainThread] created semlock with handle 30 and name "None"
[DEBUG:LokyProcess-1:MainThread] launched python with pid 83375 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-1:2', '--pipe', '33']
[DEBUG:LokyProcess-1:MainThread] Adjust process count : {83373: <LokyProcess name='LokyProcess-1:1' pid=83373 parent=83371 started>, 83375: <LokyProcess name='LokyProcess-1:2' pid=83375 parent=83371 started>}
[DEBUG:LokyProcess-1:MainThread] _start_executor_manager_thread called
[DEBUG:LokyProcess-2:MainThread] launched python with pid 83376 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-2:2', '--pipe', '33']
[DEBUG:LokyProcess-2:MainThread] Adjust process count : {83374: <LokyProcess name='LokyProcess-2:1' pid=83374 parent=83372 started>, 83376: <LokyProcess name='LokyProcess-2:2' pid=83376 parent=83372 started>}
[DEBUG:LokyProcess-2:MainThread] _start_executor_manager_thread called
[DEBUG:LokyProcess-1:ExecutorManagerThread] Queue._start_thread()
[DEBUG:LokyProcess-1:ExecutorManagerThread] doing self._thread.start()
[DEBUG:LokyProcess-1:QueueFeederThread] starting thread to feed data to pipe
[DEBUG:LokyProcess-1:ExecutorManagerThread] ... done self._thread.start()
[DEBUG:LokyProcess-2:ExecutorManagerThread] Queue._start_thread()
[DEBUG:LokyProcess-2:ExecutorManagerThread] doing self._thread.start()
[DEBUG:LokyProcess-2:QueueFeederThread] starting thread to feed data to pipe
[DEBUG:LokyProcess-2:ExecutorManagerThread] ... done self._thread.start()
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 21 and name "/loky-83371-8myhkx08"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 20 and name "/loky-83372-iepticds"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 22 and name "/loky-83372-kjw1qcbs"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 22 and name "/loky-83371-8z71_hbd"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 20 and name "/loky-83372-iepticds"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 23 and name "/loky-83372-b_q9howv"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 23 and name "/loky-83371-yqoeayvx"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 22 and name "/loky-83372-kjw1qcbs"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 23 and name "/loky-83372-b_q9howv"
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 21 and name "/loky-83371-8myhkx08"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 26 and name "/loky-83372-nk_f0hpi"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 26 and name "/loky-83371-3q7bgbd3"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 27 and name "/loky-83372-yj33axj9"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 27 and name "/loky-83371-_8uen2gu"
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 22 and name "/loky-83371-8z71_hbd"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 26 and name "/loky-83372-nk_f0hpi"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 14 and name "/loky-83372-kvw7ug62"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 14 and name "/loky-83371-irmx81f5"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 27 and name "/loky-83372-yj33axj9"
[DEBUG:LokyProcess-2:1:MainThread] recreated blocker with handle 28 and name "/loky-83372-91m3obxf"
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 23 and name "/loky-83371-yqoeayvx"
[DEBUG:LokyProcess-1:1:MainThread] recreated blocker with handle 28 and name "/loky-83371-1js39_nr"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 14 and name "/loky-83372-kvw7ug62"
[DEBUG:LokyProcess-2:2:MainThread] recreated blocker with handle 30 and name "/loky-83372-ttndnrk9"
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 26 and name "/loky-83371-3q7bgbd3"
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 27 and name "/loky-83371-_8uen2gu"
[INFO:LokyProcess-2:1:MainThread] child process calling self.run()
[INFO:LokyProcess-1:1:MainThread] child process calling self.run()
[DEBUG:LokyProcess-2:1:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 14 and name "/loky-83371-irmx81f5"
[DEBUG:LokyProcess-1:1:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-1:2:MainThread] recreated blocker with handle 30 and name "/loky-83371-ddkccs5z"
[INFO:LokyProcess-2:2:MainThread] child process calling self.run()
[DEBUG:LokyProcess-2:2:MainThread] Worker started with timeout=10
[INFO:LokyProcess-1:2:MainThread] child process calling self.run()
[DEBUG:LokyProcess-1:2:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-1:1:MainThread] inner_parallel_func(0) done
[DEBUG:LokyProcess-2:1:MainThread] inner_parallel_func(0) done
[DEBUG:LokyProcess-1:2:MainThread] inner_parallel_func(1) done
[DEBUG:LokyProcess-2:2:MainThread] inner_parallel_func(1) done
[DEBUG:LokyProcess-1:MainThread] outer_parallel_func(0) done
[DEBUG:LokyProcess-2:MainThread] outer_parallel_func(1) done
[DEBUG/LokyProcess-1] outer_parallel_func(0) done
[DEBUG/LokyProcess-2] outer_parallel_func(1) done
[DEBUG:MainProcess:MainThread] main done
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6107000832)>, (<unlocked _thread.lock object at 0x11c345840>, <loky.process_executor._ThreadWakeup object at 0x11c327670>))]
[DEBUG:LokyProcess-2:1:MainThread] psutil return memory size: 18776064
[DEBUG:LokyProcess-1:2:MainThread] psutil return memory size: 18169856
[DEBUG:MainProcess:ExecutorManagerThread] closing call_queue
[DEBUG:MainProcess:ExecutorManagerThread] telling queue thread to quit
[DEBUG:MainProcess:ExecutorManagerThread] Queue.join_thread()
[DEBUG:MainProcess:ExecutorManagerThread] closing result_queue
[DEBUG:MainProcess:ExecutorManagerThread] closing thread_wakeup
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:ExecutorManagerThread] joining processes
[DEBUG:LokyProcess-2:2:MainThread] psutil return memory size: 18087936
[DEBUG:LokyProcess-1:1:MainThread] psutil return memory size: 18235392
[DEBUG:LokyProcess-1:MainThread] psutil return memory size: 18661376
[DEBUG/LokyProcess-1] psutil return memory size: 18661376
[INFO:LokyProcess-1:MainThread] Shutting down worker on sentinel
[INFO/LokyProcess-1] Shutting down worker on sentinel
[DEBUG:LokyProcess-1:MainThread] Exited cleanly
[DEBUG/LokyProcess-1] Exited cleanly
[INFO:LokyProcess-1:MainThread] process shutting down
[INFO/LokyProcess-1] process shutting down
[DEBUG:LokyProcess-1:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG/LokyProcess-1] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-1:MainThread] telling queue thread to quit
[DEBUG/LokyProcess-1] telling queue thread to quit
[DEBUG:LokyProcess-1:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG/LokyProcess-1] feeder thread got sentinel -- exiting
[INFO:LokyProcess-1:MainThread] calling join() for process LokyProcess-1:2
[INFO/LokyProcess-1] calling join() for process LokyProcess-1:2
[DEBUG:LokyProcess-2:MainThread] psutil return memory size: 18087936
[DEBUG/LokyProcess-2] psutil return memory size: 18087936
[INFO:LokyProcess-2:MainThread] Shutting down worker on sentinel
[INFO/LokyProcess-2] Shutting down worker on sentinel
[DEBUG:LokyProcess-2:MainThread] Exited cleanly
[DEBUG/LokyProcess-2] Exited cleanly
[INFO:LokyProcess-2:MainThread] process shutting down
[INFO/LokyProcess-2] process shutting down
[DEBUG:LokyProcess-2:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG/LokyProcess-2] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-2:MainThread] telling queue thread to quit
[DEBUG/LokyProcess-2] telling queue thread to quit
[DEBUG:LokyProcess-2:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG/LokyProcess-2] feeder thread got sentinel -- exiting
[INFO:LokyProcess-2:MainThread] calling join() for process LokyProcess-2:2
[INFO/LokyProcess-2] calling join() for process LokyProcess-2:2
Timeout (0:00:05)!
Thread 0x0000000100dc0580 (most recent call first):
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/selectors.py", line 416 in select
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py", line 936 in wait
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py", line 429 in _poll
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py", line 262 in poll
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/queues.py", line 113 in get
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/process_executor.py", line 390 in _process_worker
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/backend/popen_loky_posix.py", line 170 in <module>
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 86 in _run_code
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 196 in _run_module_as_main
Timeout (0:00:05)!
Timeout (0:00:05)!
Thread 0xThread 0x00000001011e0580000000010507c580 (most recent call first):
 (most recent call first):
  File   File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/queues.py"", line 108/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/selectors.py in "get, line 
416 in select
  File   File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/process_executor.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py"", line , line 390936 in  in _process_workerwait

  File   File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py"", line , line 108429 in  in run
  File _poll
  File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/connection.py"", line , line 314262 in  in poll
_bootstrap
  File   File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/queues.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/backend/popen_loky_posix.py"", line , line 113170 in  in get<module>

  File   File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/process_executor.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py"", line , line 39086 in  in _process_worker_run_code

  File   File ""/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py"", line , line 108196 in  in _run_module_as_mainrun

  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/backend/popen_loky_posix.py", line 170 in <module>
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 86 in _run_code
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 196 in _run_module_as_main
Timeout (0:00:05)!
Thread 0x0000000102828580 (most recent call first):
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/queues.py", line 108 in get
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/process_executor.py", line 390 in _process_worker
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/site-packages/loky/backend/popen_loky_posix.py", line 170 in <module>
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 86 in _run_code
  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.10/runpy.py", line 196 in _run_module_as_main
[DEBUG:LokyProcess-1:ExecutorManagerThread] A worker unexpectedly terminated. Workers that might have caused the breakage: {'LokyProcess-1:1': 1}
[DEBUG/LokyProcess-1] A worker unexpectedly terminated. Workers that might have caused the breakage: {'LokyProcess-1:1': 1}
[INFO:LokyProcess-2:MainThread] calling join() for process LokyProcess-2:1
[INFO/LokyProcess-2] calling join() for process LokyProcess-2:1
[INFO:LokyProcess-1:MainThread] calling join() for process LokyProcess-1:1
[DEBUG:LokyProcess-2:MainThread] running the remaining "atexit" finalizers
[INFO/LokyProcess-1] calling join() for process LokyProcess-1:1
[DEBUG/LokyProcess-2] running the remaining "atexit" finalizers
[DEBUG:LokyProcess-1:MainThread] running the remaining "atexit" finalizers
[DEBUG/LokyProcess-1] running the remaining "atexit" finalizers
[DEBUG:LokyProcess-1:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6108229632)>, (<unlocked _thread.lock object at 0x105b0f5c0>, <loky.process_executor._ThreadWakeup object at 0x105ae3370>))]
[DEBUG:LokyProcess-2:MainThread] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6177566720)>, (<unlocked _thread.lock object at 0x1017061c0>, <loky.process_executor._ThreadWakeup object at 0x1018df370>))]
[DEBUG/LokyProcess-1] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6108229632)>, (<unlocked _thread.lock object at 0x105b0f5c0>, <loky.process_executor._ThreadWakeup object at 0x105ae3370>))]
[DEBUG/LokyProcess-2] Interpreter shutting down. Waking up executor_manager_thread [(<_ExecutorManagerThread(ExecutorManagerThread, started 6177566720)>, (<unlocked _thread.lock object at 0x1017061c0>, <loky.process_executor._ThreadWakeup object at 0x1018df370>))]
[DEBUG:LokyProcess-1:ExecutorManagerThread] terminate process LokyProcess-1:2, reason: broken executor
[DEBUG/LokyProcess-1] terminate process LokyProcess-1:2, reason: broken executor
[DEBUG:LokyProcess-1:ExecutorManagerThread] terminate process LokyProcess-1:1, reason: broken executor
[DEBUG/LokyProcess-1] terminate process LokyProcess-1:1, reason: broken executor
[DEBUG:LokyProcess-1:ExecutorManagerThread] closing call_queue
[DEBUG/LokyProcess-1] closing call_queue
[DEBUG:LokyProcess-1:ExecutorManagerThread] Queue.join_thread()
[DEBUG/LokyProcess-1] Queue.join_thread()
[DEBUG:LokyProcess-1:ExecutorManagerThread] closing result_queue
[DEBUG/LokyProcess-1] closing result_queue
[DEBUG:LokyProcess-1:ExecutorManagerThread] closing thread_wakeup
[DEBUG/LokyProcess-1] closing thread_wakeup
[DEBUG:LokyProcess-1:ExecutorManagerThread] joining processes
[DEBUG/LokyProcess-1] joining processes
[DEBUG:LokyProcess-1:ExecutorManagerThread] executor management thread clean shutdown of worker processes: []
[DEBUG/LokyProcess-1] executor management thread clean shutdown of worker processes: []
[INFO:LokyProcess-1:MainThread] process exiting with exitcode 0
[INFO/LokyProcess-1] process exiting with exitcode 0
[DEBUG:LokyProcess-2:ExecutorManagerThread] A worker unexpectedly terminated. Workers that might have caused the breakage: {'LokyProcess-2:1': 1}
[DEBUG/LokyProcess-2] A worker unexpectedly terminated. Workers that might have caused the breakage: {'LokyProcess-2:1': 1}
[DEBUG:LokyProcess-2:ExecutorManagerThread] terminate process LokyProcess-2:2, reason: broken executor
[DEBUG/LokyProcess-2] terminate process LokyProcess-2:2, reason: broken executor
[DEBUG:LokyProcess-2:ExecutorManagerThread] terminate process LokyProcess-2:1, reason: broken executor
[DEBUG/LokyProcess-2] terminate process LokyProcess-2:1, reason: broken executor
[DEBUG:LokyProcess-2:ExecutorManagerThread] closing call_queue
[DEBUG/LokyProcess-2] closing call_queue
[DEBUG:LokyProcess-2:ExecutorManagerThread] Queue.join_thread()
[DEBUG/LokyProcess-2] Queue.join_thread()
[DEBUG:LokyProcess-2:ExecutorManagerThread] closing result_queue
[DEBUG/LokyProcess-2] closing result_queue
[DEBUG:LokyProcess-2:ExecutorManagerThread] closing thread_wakeup
[DEBUG/LokyProcess-2] closing thread_wakeup
[DEBUG:LokyProcess-2:ExecutorManagerThread] joining processes
[DEBUG/LokyProcess-2] joining processes
[DEBUG:LokyProcess-2:ExecutorManagerThread] executor management thread clean shutdown of worker processes: []
[DEBUG/LokyProcess-2] executor management thread clean shutdown of worker processes: []
[INFO:LokyProcess-2:MainThread] process exiting with exitcode 0
[INFO/LokyProcess-2] process exiting with exitcode 0
[DEBUG:MainProcess:ExecutorManagerThread] executor management thread clean shutdown of worker processes: [83371, 83372]
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

My analysis is that we need to recursively trigger the interpreter shutdown events in the level 1 workers prior to so as to send the sentinels in time to the level 2 workers otherwise the call [INFO/LokyProcess-2] calling join() for process LokyProcess-2:2 is bound to deadlock.

from loky.

lesteve avatar lesteve commented on August 20, 2024

I can confirm, I see the same behaviour as you in both the joblib and loky snippet above.

from loky.

ogrisel avatar ogrisel commented on August 20, 2024

This fixes the problem:

diff --git a/loky/process_executor.py b/loky/process_executor.py
index 63eb234..736d0d5 100644
--- a/loky/process_executor.py
+++ b/loky/process_executor.py
@@ -412,6 +412,7 @@ def _process_worker(call_queue, result_queue, initializer, initargs,
             # Notify queue management thread about worker shutdown
             result_queue.put(pid)
             is_clean = worker_exit_lock.acquire(True, timeout=30)
+            _python_exit()
             if is_clean:
                 mp.util.debug('Exited cleanly')
             else:

I now need to turn my reproducer into a non-regression test, do the PR, make sure it passes on the CI on loky, sync it with joblib in a branch, make sure the joblib tests are stable with that fix, make a new release of loky, vendor loky into joblib and make a release of joblib :)

from loky.

lesteve avatar lesteve commented on August 20, 2024

Sounds like fun, let me know if I can help!

from loky.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.