When indexing a collection, after ca. 74000 documents the process for
2018-02-08 12:33:35 1 INFO hoover.search.index updating <Collection: mycollection>
2018-02-08 12:33:35 1 INFO hoover.search.index resuming load: {'feed_state': 'http://snoop/htmidi/feed?lt=2018-02-07T04:20:59.559099Z', 'report': {'indexed': 74000}}
2018-02-08 12:34:00 1 WARNING elasticsearch POST http://search-es:9200/_bulk [status:N/A request:1.140s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1065, in _send_output
self.send(chunk)
File "/usr/local/lib/python3.6/http/client.py", line 986, in send
self.sock.sendall(data)
ConnectionResetError: [Errno 104] Connection reset by peer
I assume, there is a very large file in mycollection causing this error.
As a workaround, a max document size could be introduced or a large file split into several pieces.