Problem Description
A database cluster built with Zalando Spilo (https://github.com/zalando/spilo) and postgres-operator (https://github.com/zalando/postgres-operator) uses patroni (https://github.com/zalando/patroni) for cluster state management . Running on Kubernetes.
When Instana python instrumentation is enabled, the patroni python process inside each 'postgresql' pod hangs a few seconds after startup.
The command line used to execute patroni via Spilo is available at https://github.com/zalando/spilo/blob/master/postgres-appliance/runit/patroni/run#L29 (used via runsv by spilo) - when the last exec in this changed to include INSTANA_DISABLE_AUTO_INSTR=true after env , patroni starts and runs normally.
The above was figured out after spending some time trying to figure out what is wrong. This traceback (which fortunately was printed due to a mistake in manually trying to get print a traceback from /usr/local/lib/python3.6/dist-packages/google/cloud/storage/batch.py) provided the hint of what is causing patroni to hang:
File "<string>", line 1, in <module>
File "/tmp/.instana/python/instana/__init__.py", line 206, in <module>
boot_agent()
File "/tmp/.instana/python/instana/__init__.py", line 155, in boot_agent
from .instrumentation.google.cloud import storage
File "/tmp/.instana/python/instana/instrumentation/google/cloud/storage.py", line 14, in <module>
from google.cloud import storage
File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/__init__.py", line 35, in <module>
from google.cloud.storage.batch import Batch
File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/batch.py", line 30, in <module>
traceback.print_stack(file=sys.stdout)
NameError: name 'sys' is not defined
What pointed towards 'batch.py' was this traceback from Patroni when it got stuck (acquired via kill -SIGABRT when the process was started with PYTHONFAULTHANDLER=true
)
Current thread 0x00007fbe8a4f0740 (most recent call first):
File "/usr/lib/python3.6/re.py", line 182 in search
File "/usr/lib/python3.6/ctypes/util.py", line 283 in _findSoname_ldconfig
File "/usr/lib/python3.6/ctypes/util.py", line 313 in find_library
File "/usr/lib/python3/dist-packages/asn1crypto/_perf/_big_num_ctypes.py", line 35 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/asn1crypto/_int.py", line 56 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/asn1crypto/_elliptic_curve.py", line 51 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/asn1crypto/keys.py", line 22 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/cryptography/x509/extensions.py", line 13 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/cryptography/x509/base.py", line 16 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/lib/python3/dist-packages/cryptography/x509/__init__.py", line 8 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/local/lib/python3.6/dist-packages/google/auth/crypt/_cryptography_rsa.py", line 27 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/auth/crypt/rsa.py", line 20 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/auth/crypt/__init__.py", line 43 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/auth/_service_account_info.py", line 22 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/oauth2/service_account.py", line 77 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/auth/transport/requests.py", line 48 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/local/lib/python3.6/dist-packages/google/cloud/_helpers/__init__.py", line 31 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/batch.py", line 28 in <module>
...
While attempting to figure out what is wrong, based on the above stack trace, noticed that uninstalling 'google-cloud-storage' fixed the problem. The curious part was that I could not find anything in patroni that would end up calling 'google-cloud-storage. While patroni supports WAL backups to google via WAL-E, and hence installs the google-cloud-storage python module, our configuration did not use it. Turned out the problem was not that patroni was importing it, but instana was, at https://github.com/instana/python-sensor/blob/master/instana/instrumentation/google/cloud/storage.py#L14 and as a consequence the whole patroni process hangs.
As said, if running strace on the patroni pid (strace -f -v -s 128 -p <pid>
) this does not occur and patroni runs fine. If there are other ways I could provide further debugging details, I will be glad to assist.
root@host# uname -a
Linux <redacted> 5.4.190-107.353.amzn2.x86_64 #1 SMP Wed Apr 27 21:16:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@host# cat /etc/issue
Ubuntu 18.04.6 LTS \n \l
Minimal, Complete, Verifiable, Example
setup kubernetes, install zalando-spilo using postgres-operator, enable instana instrumentation and attempt to launch a postgres cluster. After a few seconds (< 30), the patroni API at localhost:8008 (inside any of the postgresql cluster containers) stops answering requests.
(sorry, was not able to replicate this in any minimal nor complete way..)
Python Version
Python 3.6
Python Modules
`apt list '*python*' --installed `
libpython3-stdlib/now 3.6.7-1~18.04 amd64 [installed,local]
libpython3.6/now 3.6.9-1~18.04ubuntu1.7 amd64 [installed,local]
libpython3.6-minimal/now 3.6.9-1~18.04ubuntu1.7 amd64 [installed,local]
libpython3.6-stdlib/now 3.6.9-1~18.04ubuntu1.7 amd64 [installed,local]
postgresql-plpython3-10/now 10.21-1.pgdg18.04+1 amd64 [installed,local]
postgresql-plpython3-11/now 11.16-1.pgdg18.04+1 amd64 [installed,local]
postgresql-plpython3-12/now 12.11-1.pgdg18.04+1 amd64 [installed,local]
postgresql-plpython3-13/now 13.7-1.pgdg18.04+1 amd64 [installed,local]
postgresql-plpython3-14/now 14.4-1.pgdg18.04+1 amd64 [installed,local]
postgresql-plpython3-9.6/now 9.6.24-1.pgdg18.04+1 amd64 [installed,local]
python-babel-localedata/now 2.4.0+dfsg.1-2ubuntu1.1 all [installed,local]
python3/now 3.6.7-1~18.04 amd64 [installed,local]
python3-asn1crypto/now 0.24.0-1 all [installed,local]
python3-babel/now 2.4.0+dfsg.1-2ubuntu1.1 all [installed,local]
python3-blinker/now 1.4+dfsg1-0.1 all [installed,local]
python3-boto/now 2.44.0-1ubuntu2.18.04.1 all [installed,local]
python3-cachetools/now 2.0.0-2 all [installed,local]
python3-cdiff/now 1.0-1 all [installed,local]
python3-certifi/now 2018.1.18-2 all [installed,local]
python3-cffi/now 1.11.5-1 all [installed,local]
python3-cffi-backend/now 1.11.5-1 amd64 [installed,local]
python3-chardet/now 3.0.4-1 all [installed,local]
python3-click/now 6.7-3 all [installed,local]
python3-colorama/now 0.3.7-1 all [installed,local]
python3-consul/now 0.7.1-1 all [installed,local]
python3-cryptography/now 2.1.4-1ubuntu1.4 amd64 [installed,local]
python3-dateutil/now 2.6.1-1 all [installed,local]
python3-debtcollector/now 1.13.0-0ubuntu1 all [installed,local]
python3-dnspython/now 1.15.0-1 all [installed,local]
python3-docutils/now 0.14+dfsg-3 all [installed,local]
python3-etcd/now 0.4.3-2 all [installed,local]
python3-funcsigs/now 1.0.2-4 all [installed,local]
python3-gevent/now 1.2.2-2 amd64 [installed,local]
python3-greenlet/now 0.4.12-2 amd64 [installed,local]
python3-idna/now 2.6-1 all [installed,local]
python3-iso8601/now 0.1.11-1 all [installed,local]
python3-jwt/now 1.5.3+ds1-1 all [installed,local]
python3-kazoo/now 2.2.1-1ubuntu1 all [installed,local]
python3-keyring/now 10.6.0-1 all [installed,local]
python3-keystoneauth1/now 3.4.0-0ubuntu1 all [installed,local]
python3-keystoneclient/now 1:3.15.0-0ubuntu1 all [installed,local]
python3-lxml/now 4.2.1-1ubuntu0.6 amd64 [installed,local]
python3-meld3/now 1.0.2-2 amd64 [installed,local]
python3-minimal/now 3.6.7-1~18.04 amd64 [installed,local]
python3-monotonic/now 1.1-2 all [installed,local]
python3-msgpack/now 0.5.6-1 amd64 [installed,local]
python3-netaddr/now 0.7.19-1 all [installed,local]
python3-netifaces/now 0.10.4-0.1build4 amd64 [installed,local]
python3-oauthlib/now 2.0.6-1 all [installed,local]
python3-oslo.config/now 1:5.2.0-0ubuntu1 all [installed,local]
python3-oslo.i18n/now 3.19.0-0ubuntu1 all [installed,local]
python3-oslo.serialization/now 2.24.0-0ubuntu2 all [installed,local]
python3-oslo.utils/now 3.35.0-0ubuntu1.1 all [installed,local]
python3-pbr/now 3.1.1-3ubuntu3 all [installed,local]
python3-pkg-resources/now 39.0.1-2 all [installed,local]
python3-ply/now 3.11-1 all [installed,local]
python3-positional/now 1.1.1-3 all [installed,local]
python3-prettytable/now 0.7.2-3 all [installed,local]
python3-psutil/now 5.4.2-1ubuntu0.1 amd64 [installed,local]
python3-psycopg2/now 2.8.6-2~pgdg18.04+1 amd64 [installed,local]
python3-pyasn1/now 0.4.2-3 all [installed,local]
python3-pyasn1-modules/now 0.2.1-0.2 all [installed,local]
python3-pycparser/now 2.18-2 all [installed,local]
python3-pyparsing/now 2.2.0+dfsg1-2 all [installed,local]
python3-pystache/now 0.5.4-6 all [installed,local]
python3-requests/now 2.18.4-2ubuntu0.1 all [installed,local]
python3-rfc3986/now 0.3.1-2 all [installed,local]
python3-rsa/now 3.4.2-1 all [installed,local]
python3-six/now 1.11.0-2 all [installed,local]
python3-stevedore/now 1:1.28.0-0ubuntu1 all [installed,local]
python3-swiftclient/now 1:3.5.0-0ubuntu1 all [installed,local]
python3-tz/now 2018.3-2 all [installed,local]
python3-urllib3/now 1.22-1ubuntu0.18.04.2 all [installed,local]
python3-wrapt/now 1.9.0-3 amd64 [installed,local]
python3-yaml/now 3.12-1build2 amd64 [installed,local]
python3.6/now 3.6.9-1~18.04ubuntu1.7 amd64 [installed,local]
python3.6-minimal/now 3.6.9-1~18.04ubuntu1.7 amd64 [installed,local]
`pip3 list`
asn1crypto (0.24.0)
Babel (2.4.0)
blinker (1.4)
boto (2.44.0)
boto3 (1.23.10)
botocore (1.26.10)
cachetools (2.0.0)
cdiff (1.0)
certifi (2018.1.18)
cffi (1.11.5)
chardet (3.0.4)
click (6.7)
colorama (0.3.7)
cryptography (2.1.4)
debtcollector (1.13.0)
dnspython (1.15.0)
filechunkio (1.8)
funcsigs (1.0.2)
gevent (1.2.2)
google-api-core (2.8.2)
google-auth (2.8.0)
google-cloud-core (2.3.1)
google-cloud-storage (2.0.0)
google-crc32c (1.1.2)
google-resumable-media (2.3.3)
googleapis-common-protos (1.56.2)
greenlet (0.4.12)
idna (2.6)
iso8601 (0.1.11)
jmespath (0.10.0)
kazoo (2.2.1.dev0)
keystoneauth1 (3.4.0)
lxml (4.2.1)
meld3 (1.0.2)
monotonic (1.0)
msgpack (0.5.6)
netaddr (0.7.19)
netifaces (0.10.4)
oauthlib (2.0.6)
oslo.config (5.2.0)
oslo.i18n (3.19.0)
oslo.serialization (2.24.0)
oslo.utils (3.35.0)
patroni (2.1.4)
pbr (3.1.1)
pg-view (1.3.1)
pip (9.0.1)
ply (3.11)
positional (1.1.1)
prettytable (0.7.2)
protobuf (3.19.4)
psutil (5.4.2)
psycopg2 (2.8.6)
pyasn1 (0.4.2)
pyasn1-modules (0.2.1)
pycparser (2.18)
PyJWT (1.5.3)
pyparsing (2.2.0)
pystache (0.5.4)
python-consul (0.7.1)
python-dateutil (2.6.1)
python-etcd (0.4.3)
python-keystoneclient (3.15.0)
python-swiftclient (3.5.0)
pytz (2018.3)
PyYAML (3.12)
requests (2.18.4)
rfc3986 (0.3.1)
rsa (3.4.2)
s3transfer (0.5.2)
setuptools (59.6.0)
six (1.11.0)
stevedore (1.28.0)
urllib3 (1.22)
wal-e (1.1.1)
wrapt (1.9.0)
ydiff (1.2)
Python Environment
KUBERNETES_ROLE_LABEL=spilo-role
POD_IP=<redacted>
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://172.20.0.1:443
HOSTNAME=<redacted>
PGHOME=/home/postgres
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/14/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP=tcp://172.20.0.1:443
KUBERNETES_SERVICE_PORT_HTTPS=443
POD_NAMESPACE=platform
KUBERNETES_SERVICE_HOST=172.20.0.1
LC_ALL=en_US.utf-8
HOME=/home/postgres