Giter Club home page Giter Club logo

zhihuspider's People

Contributors

alextan-b-z avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zhihuspider's Issues

都运行不起来

zhihuspider0.py以及zhihuspider1.py都运行不了,一直卡在第一个链接上,怀疑代码的能用性

selenium.common.exceptions.WebDriverException: Message: Service phantomjs unexpectedly exited

我用"npm install phantomjs-prebuilt"來裝phantomjs, 但一只出現這個error:

====

$ scrapy list
/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in
sys.exit(execute())
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 149, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 249, in init
super(CrawlerProcess, self).init(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 137, in init
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 336, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 25, in init
self._load_all_spiders()
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "/apps/AlexTan-b-z_ZhihuSpider/zhihu/zhihu/spiders/zhihuspider.py", line 22, in
class ZhihuspiderSpider(RedisSpider):
File "/apps/AlexTan-b-z_ZhihuSpider/zhihu/zhihu/spiders/zhihuspider.py", line 34, in ZhihuspiderSpider
obj = webdriver.PhantomJS(desired_capabilities=dcap)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 56, in init
self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 98, in start
self.assert_process_still_running()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 111, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service phantomjs unexpectedly exited. Status code was: -6

不明原因获取不到数据

我设置了3个帐号,其他所有配置没有动过

开始还因为Phantomjs不全导致无法运行,后来好了,但没有任何数据
输出信息如下:

qianzise@FengMaster-PC:~/ZhihuSpider-2.0/zhihu$ scrapy crawl zhihuspider
2017-11-27 16:55:37 [scrapy] INFO: Scrapy 1.1.0rc1 started (bot: zhihu)
2017-11-27 16:55:37 [scrapy] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36', 'DUPEFILTER_CLASS': 'zhihu.scrapy_redis.dupefilter.RFPDupeFilter', 'SPIDER_MODULES': ['zhihu.spiders'], 'NEWSPIDER_MODULE': 'zhihu.spiders', 'DOWNLOAD_TIMEOUT': 10, 'SCHEDULER': 'zhihu.scrapy_redis.scheduler.Scheduler', 'RETRY_TIMES': 1, 'REDIRECT_ENABLED': False, 'BOT_NAME': 'zhihu'}
2017-11-27 16:55:37 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole']
2017-11-27 16:55:37 [zhihuspider] INFO: Reading start URLs from redis key 'zhihuspider:start_urls' (batch size: 16, encoding: utf-8
2017-11-27 16:55:37 [zhihu.cookie] WARNING: The num of the cookies is 3
2017-11-27 16:55:37 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'zhihu.middlewares.UserAgentMiddleware',
'zhihu.middlewares.CookiesMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-11-27 16:55:37 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-11-27 16:55:37 [scrapy] INFO: Enabled item pipelines:
['zhihu.pipelines.ZhihuPipeline']
2017-11-27 16:55:37 [scrapy] INFO: Spider opened
2017-11-27 16:55:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 16:55:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-11-27 16:55:37 [zhihu.scrapy_redis.dupefilter] DEBUG: Filtered duplicate request <GET https://www.zhihu.com/api/v4/members/yun-he-shu-ju-8?include=locations,employments,industry_category,gender,educations,business,follower_count,following_count,description,badge[?(type=best_answerer)].topics> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)

2017-11-27 16:56:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 16:57:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 16:58:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 16:59:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 17:00:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-27 17:01:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.