I have a crawl running, where after 87000 seconds since last refresh, the following er

Refresh crawl status after long time leads to memory error. about scrapydweb HOT 7 CLOSED

my8100 commented on June 30, 2024

Refresh crawl status after long time leads to memory error.

from scrapydweb.

Comments (7)

my8100 commented on June 30, 2024 1

Actually, you only need to reconfig and restart ScrapydWeb, without interrupting your crawling.

from scrapydweb.

my8100 commented on June 30, 2024

In my experience, it's due to insufficient memory. Could your tell me the size of current log file and your spare / total RAM.

from scrapydweb.

my8100 commented on June 30, 2024

Also, if ScrapydWeb and Scrapyd run on the same host, you can set up the SCRAPYD_LOGS_DIR item to read local log file directly, which works only when your Scrapyd server is added as '127.0.0.1' in the config file of ScrapydWeb.
Note that parsing the log file with regular expression still may cause memory error due to insufficient memory.

https://github.com/my8100/scrapydweb/blob/master/scrapydweb/default_settings.py#L60

# Set to speed up loading scrapy logs.
# e.g., 'C:/Users/username/logs/' or '/home/username/logs/'
# The setting takes effect only when both ScrapydWeb and Scrapyd run on the same machine,
# and the Scrapyd server ip is added as '127.0.0.1'.
# Check out here to find out where the Scrapy logs are stored:
# https://scrapyd.readthedocs.io/en/stable/config.html#logs-dir
SCRAPYD_LOGS_DIR = ''

from scrapydweb.

WNiels commented on June 30, 2024

Thanks for the fast reply.
I don't want to interrupt the crawling, but it should finish within a few days. Then i'll test the above and give an update.

from scrapydweb.

my8100 commented on June 30, 2024

It's possible that you can't reproduce the problem after your crawling is finished, since there would be enough memory for ScrapydWeb to parse log.
Or you can run another ScrapydWeb instance on other computer with enough memory, as a temporary solution.

from scrapydweb.

WNiels commented on June 30, 2024

Ok, there's the issue. 600MB Ram left and a 800MB log.

from scrapydweb.

my8100 commented on June 30, 2024

Fixed in v1.1.0: Now the large logfile would be cut into chunks and parsed periodically and incrementally with the help of LogParser.

from scrapydweb.

Recommend Projects

Refresh crawl status after long time leads to memory error. about scrapydweb HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent