I believe the key error is here and it appears that everything is fine even though the distributed/worker.py
module thinks the response is unexpected.
"/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/worker.py", line 248, in _register_with_scheduler
raise ValueError("Unexpected response from register: %r" % (resp,))
ValueError: Unexpected response from register: {'status': 'OK', 'time': 1507305239.205065}
distributed.nanny - WARNING - Restarting worker
Below is more of the container log. It continues restarting until it is killed. The final error when it is killed is at the bottom.
Container: container_e110_1506861552726_19299_01_000002 on hostname.allstate.com_8041
======================================================================================
LogType:stderr
Log Upload Time:Fri Oct 06 10:54:03 -0500 2017
LogLength:22808
Log Contents:
/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/config.py:55: UserWarning: Could not write default config file to '/home/.dask/config.yaml'. Received error [Errno 13] Permission denied: '/home/.dask'
UserWarning)
distributed.nanny - INFO - Start Nanny at: 'tcp://10.195.102.32:45126'
/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/config.py:55: UserWarning: Could not write default config file to '/home/.dask/config.yaml'. Received error [Errno 13] Permission denied: '/home/.dask'
UserWarning)
distributed.worker - INFO - Start worker at: tcp://10.195.102.32:37045
distributed.worker - INFO - Listening to: tcp://10.195.102.32:37045
distributed.worker - INFO - nanny at: 10.195.102.32:45126
distributed.worker - INFO - http at: 10.195.102.32:32817
distributed.worker - INFO - bokeh at: 10.195.102.32:8789
distributed.worker - INFO - Waiting to connect to: tcp://10.195.208.190:40025
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 1
distributed.worker - INFO - Memory: 0.50 GB
distributed.worker - INFO - Local Directory: worker-nfomhqoz
distributed.worker - INFO - -------------------------------------------------
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/nanny.py", line 467, in run
yield worker._start(*worker_start_args)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/worker.py", line 319, in _start
yield self._register_with_scheduler()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1069, in run
yielded = self.gen.send(value)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/worker.py", line 248, in _register_with_scheduler
raise ValueError("Unexpected response from register: %r" % (resp,))
ValueError: Unexpected response from register: {'status': 'OK', 'time': 1507305239.205065}
distributed.nanny - WARNING - Restarting worker
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f96cdcfb510>, <tornado.concurrent.Future object at 0x7f96ce9c5978>)
Traceback (most recent call last):
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/ioloop.py", line 605, in _run_callback
ret = callback()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/ioloop.py", line 626, in _discard_future_result
future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/nanny.py", line 138, in _start
response = yield self.instantiate()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/nanny.py", line 205, in instantiate
yield self.process.start()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/nanny.py", line 311, in start
yield self._wait_until_running()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/tornado/gen.py", line 1069, in run
yielded = self.gen.send(value)
File "/hadoop02/yarn/nm/usercache/jlord/appcache/application_1506861552726_19299/container_e110_1506861552726_19299_01_000002/PYTHON_DIR/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58/lib/python3.6/site-packages/distributed/nanny.py", line 397, in _wait_until_running
raise ValueError("Worker not started")
ValueError: Worker not started