Description
There's something with the .net version of the worker app that causes it to randomly not connect to db or redis on startup, which results in a crash. Over the last year I've had thousands of students use this app to learn docker and swarm, and one of the most common issues is this worker failing on startup. It happens on all modern docker versions, across platforms, and there is no common theme to why it doesn't work.
Today in testing, we killed the broken service, which was re-creating the task over and over and it was failing, and once the service was recreated, and it worked... with no changes in how we created it. See log below for typical behavior. The stack trace tells you it can't resolve something, but doesn't show what it can't look up, so I can't tell what it thinks the problem is. From all the cases I've seen and testing I've done, it's not related to other services being down or general network issues.
This also happens for Kubernetes, as seen by other issues reported in this repo.
Hundreds of people have reported this problem to me, and deploying the java version fixes the issue.
Workaround
Deploy the java version of the worker, which I have build here: bretfisher/examplevotingapp_worker:java
Steps to reproduce the issue, if relevant:
In swarm:
- deploy all voting app services except worker. Ensure they work as they should.
- deploy .net version of worker, using examples in this repo README
- random deploys will randomly fail with stack trace of "no such device or address"
- service will create a new task, that may work, or may not
Describe the results you received:
Notice below that we created a worker service with two replicas, and you'll see one replica work, and the other fail, then get re-created on the same node and work the 2nd time. It's random if it fails, and which one would fail.
➜ vote git:(master) ✗ docker service logs 413rw4tamzd6
vote_worker.2.itk0wfpt22gh@node3 | Waiting for db
vote_worker.2.itk0wfpt22gh@node3 | Connected to db
vote_worker.2.itk0wfpt22gh@node3 | Found redis at 10.0.4.7
vote_worker.2.itk0wfpt22gh@node3 | Connecting to redis
vote_worker.2.itk0wfpt22gh@node3 | Processing vote for 'a' by 'fb54d895d481b473'
vote_worker.2.tv9i1viknli2@node3 | System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException: No such device or address
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.InternalGetHostByName(String hostName, Boolean includeIPv6)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.ResolveCallback(Object context)
vote_worker.2.tv9i1viknli2@node3 | --- End of stack trace from previous location where exception was thrown ---
vote_worker.2.tv9i1viknli2@node3 | at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.<>c.<GetHostAddressesAsync>b__25_1(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
vote_worker.2.tv9i1viknli2@node3 | --- End of inner exception stack trace ---
vote_worker.2.tv9i1viknli2@node3 | at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
vote_worker.2.tv9i1viknli2@node3 | at Npgsql.NpgsqlConnector.Connect(NpgsqlTimeout timeout)
vote_worker.2.tv9i1viknli2@node3 | at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout)
vote_worker.2.tv9i1viknli2@node3 | at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout)
vote_worker.2.tv9i1viknli2@node3 | at Npgsql.ConnectorPool.Allocate(NpgsqlConnection conn, NpgsqlTimeout timeout)
vote_worker.2.tv9i1viknli2@node3 | at Npgsql.NpgsqlConnection.OpenInternal()
vote_worker.2.tv9i1viknli2@node3 | at Worker.Program.OpenDbConnection(String connectionString) in /code/src/Worker/Program.cs:line 78
vote_worker.2.tv9i1viknli2@node3 | at Worker.Program.Main(String[] args) in /code/src/Worker/Program.cs:line 19
vote_worker.2.tv9i1viknli2@node3 | ---> (Inner Exception #0) System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (0x00000005): No such device or address
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.InternalGetHostByName(String hostName, Boolean includeIPv6)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.ResolveCallback(Object context)
vote_worker.2.tv9i1viknli2@node3 | --- End of stack trace from previous location where exception was thrown ---
vote_worker.2.tv9i1viknli2@node3 | at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Net.Dns.<>c.<GetHostAddressesAsync>b__25_1(IAsyncResult asyncResult)
vote_worker.2.tv9i1viknli2@node3 | at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)<---
vote_worker.2.tv9i1viknli2@node3 |
vote_worker.1.hjt41aigtbwq@node2 | Waiting for db
vote_worker.1.hjt41aigtbwq@node2 | Waiting for db
vote_worker.1.hjt41aigtbwq@node2 | Waiting for db
vote_worker.1.hjt41aigtbwq@node2 | Waiting for db
vote_worker.1.hjt41aigtbwq@node2 | Waiting for db
vote_worker.1.hjt41aigtbwq@node2 | Connected to db
vote_worker.1.hjt41aigtbwq@node2 | Found redis at 10.0.4.7
vote_worker.1.hjt41aigtbwq@node2 | Connecting to redis
vote_worker.1.hjt41aigtbwq@node2 | Processing vote for 'b' by 'fb54d895d481b473'
Describe the results you expected:
Worker always works :)
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
root@node3:~# docker version
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:44:24 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
root@node3:~# docker info
Containers: 20
Running: 0
Paused: 0
Stopped: 20
Images: 38
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: 1zyd50hmid7dado3s02tago7y
Is Manager: true
ClusterID: spa40crzsqgtn4pix7hj4sac1
Managers: 3
Nodes: 3
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 134.209.46.216
Manager Addresses:
134.209.46.216:2377
165.227.220.117:2377
68.183.159.208:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-154-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.1MiB
Name: node3
ID: UA2M:37YB:EPX6:BKDU:WMNA:3F4A:GVE6:X3GD:Q3UK:YKYF:6J2H:QQNC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Additional environment details (AWS, Docker for Mac, Docker for Windows, VirtualBox, physical, etc.):
This has happened on Docker Desktop, Digital Ocean, Docker Toolbox on VirtualBox, and more.