We're noticing from time to time that apt updates never complete, and remain locked forever presumably due to a missing timeout and some kind of underlying network issue. Each time we've looked at this, the AR transport binary still seems to be running, which makes me think the missing timeout is somewhere within it.
$ apt-get update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
$ ps aux | grep -i apt
root 18760 0.0 0.0 37564 7040 ? S Mar31 6:06 /usr/bin/apt-get update
_apt 18768 0.0 0.1 45420 9180 ? S Mar31 0:00 /usr/lib/apt/methods/https
_apt 18769 0.0 0.1 45420 9108 ? S Mar31 0:00 /usr/lib/apt/methods/https
root 18770 0.0 0.1 108624 10468 ? Sl Mar31 0:48 /usr/lib/apt/methods/ar+https
_apt 18774 0.0 0.0 42388 6624 ? S Mar31 0:00 /usr/lib/apt/methods/http
_apt 18775 0.0 0.0 42396 6596 ? S Mar31 0:00 /usr/lib/apt/methods/http
_apt 18780 0.0 0.0 36412 5680 ? S Mar31 0:00 /usr/lib/apt/methods/gpgv
$ pstree -ap 18760
apt-get,18760 update
ββar+https,18770
β ββ{ar+https},18771
β ββ{ar+https},18772
β ββ{ar+https},18773
β ββ{ar+https},18776
β ββ{ar+https},18777
β ββ{ar+https},18778
ββgpgv,18780
ββhttp,18774
ββhttp,18775
ββhttps,18768
ββhttps,18769
Unfortunately don't have any logs or anything else available, as we mostly notice this when it's triggered via OSConfigAgent, which only seems to collect the resulting apt "exited uncleanly" error when you eventually kill ar+https
.
I haven't had a look through the code to see if there a missing timeouts or context propagations. It might also be worth adding a retry mechanism.