Comments (7)
Actually, you only need to reconfig and restart ScrapydWeb, without interrupting your crawling.
from scrapydweb.
In my experience, it's due to insufficient memory. Could your tell me the size of current log file and your spare / total RAM.
from scrapydweb.
Also, if ScrapydWeb and Scrapyd run on the same host, you can set up the SCRAPYD_LOGS_DIR
item to read local log file directly, which works only when your Scrapyd server is added as '127.0.0.1' in the config file of ScrapydWeb.
Note that parsing the log file with regular expression still may cause memory error due to insufficient memory.
https://github.com/my8100/scrapydweb/blob/master/scrapydweb/default_settings.py#L60
# Set to speed up loading scrapy logs.
# e.g., 'C:/Users/username/logs/' or '/home/username/logs/'
# The setting takes effect only when both ScrapydWeb and Scrapyd run on the same machine,
# and the Scrapyd server ip is added as '127.0.0.1'.
# Check out here to find out where the Scrapy logs are stored:
# https://scrapyd.readthedocs.io/en/stable/config.html#logs-dir
SCRAPYD_LOGS_DIR = ''
from scrapydweb.
Thanks for the fast reply.
I don't want to interrupt the crawling, but it should finish within a few days. Then i'll test the above and give an update.
from scrapydweb.
It's possible that you can't reproduce the problem after your crawling is finished, since there would be enough memory for ScrapydWeb to parse log.
Or you can run another ScrapydWeb instance on other computer with enough memory, as a temporary solution.
from scrapydweb.
Ok, there's the issue. 600MB Ram left and a 800MB log.
from scrapydweb.
Fixed in v1.1.0: Now the large logfile would be cut into chunks and parsed periodically and incrementally with the help of LogParser.
from scrapydweb.
Related Issues (20)
- project dependices package version incompatible HOT 3
- Not able to see stats section of the job HOT 1
- scrapydweb failed to run on python 3.8 HOT 5
- 启动报错:sqlite3.OperationalError: no such table: metadata HOT 13
- Is it possible to run multiple spider at the same time in a tmux machine with scrapydweb automatically
- items Oops! Something went wrong. HOT 1
- scrapydweb fresh install won't run HOT 8
- APScheduler 3.10 causing 500 errors HOT 2
- How to Change Timezone of scrapydweb? HOT 3
- Which scrapyd image you use? HOT 2
- Clean install on clean Ubuntu VM. Whatever I do it is not working. HOT 2
- Docker compose scrapdweb with scrapyd the log url use docker name
- Processes dont stop after finishing HOT 1
- v1.4.1 submit cron job can't run HOT 1
- ('Connection aborted.', timeout('timed out',))
- ERROR: Package 'scrapydweb' requires a different Python: HOT 4
- Error while installing scrapydweb HOT 2
- spiders are closed but showing as running/warning in the tasks page
- web界面可以使用中文吗? HOT 1
- DATABASE_URL配置连接域名:端口的mysql失败 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapydweb.