Giter Club home page Giter Club logo

awesome-spider's Introduction

awesome-spider


Brigtdata,旧名Luminati 目前海外最牛的代理 IP 提供商,代理抓取成功率 99%。 现在在搞优惠活动,需要高质量稳定代理的可以考虑一下,客户使用任何套餐都送 150-250 美金. 点击链接注册后根据邮件联系中文客服。


收集各种爬虫 (默认爬虫语言为 python), 欢迎大家 提 pr 或 issue, 收集脚本见此项目 github-search

warning: 爬虫有时效性,如没法直接运行,请适当更改逻辑。

A

B

C

D

E

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

V

W

X

Y

Z

#

其他

欢迎大家关注公众号

facert

awesome-spider's People

Contributors

0xff-dev avatar 3inchtime avatar angelkitty avatar but0n avatar carmenliukang avatar casterwx avatar chengyumeng avatar chenjiandongx avatar courierkyn avatar cyang812 avatar danielyan86 avatar dta0502 avatar emc2-2022 avatar facert avatar hatcat123 avatar lonsty avatar qwertyuiop6 avatar qzcool avatar silverbooker avatar sy-records avatar wildwizard404 avatar xuefenghuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-spider's Issues

快手pc端爬去作品cookie动态校验问题,大神能否帮忙指点一下

  1. 快手主播:

https://live.kuaishou.com/profile/3xsx74sidgkz2bq

  1. 快手作品url:

https://live.kuaishou.com/graphql

  1. 请求header:

Origin: https://live.kuaishou.com
Referer: https://live.kuaishou.com/profile/3xsx74sidgkz2bq
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134
Cache-Control: max-age=0
accept: /
content-type: application/json
Accept-Language: zh-CN
Accept-Encoding: gzip, deflate, br
Host: live.kuaishou.com
Content-Length: 1323
Connection: Keep-Alive
Cookie: client_key=65890b29; clientid=3; did=web_221d3c8e7f94a5b146b513e484bf2a54; kuaishou.live.bfb1s=ac5f27b3b62895859c4c1622f49856a4
cookie:通过webmagic框架调用返回作品列表“$.data.publicFeeds.list”为null
4. post请求参数

:{"operationName":"publicFeedsQuery","variables":{"principalId":"3xsx74sidgkz2bq","pcursor":"","count":24},"query":"query publicFeedsQuery($principalId: String, $pcursor: String, $count: Int) {\n publicFeeds(principalId: $principalId, pcursor: $pcursor, count: $count) {\n pcursor\n live {\n user {\n id\n kwaiId\n eid\n profile\n name\n living\n __typename\n }\n watchingCount\n src\n title\n gameId\n gameName\n categoryId\n liveStreamId\n playUrls {\n quality\n url\n __typename\n }\n followed\n type\n living\n redPack\n liveGuess\n anchorPointed\n latestViewed\n expTag\n __typename\n }\n list {\n photoId\n caption\n thumbnailUrl\n poster\n viewCount\n likeCount\n commentCount\n timestamp\n workType\n type\n playUrl\n useVideoPlayer\n imgUrls\n imgSizes\n magicFace\n musicName\n location\n liked\n onlyFollowerCanComment\n relativeHeight\n width\n height\n user {\n id\n eid\n name\n profile\n __typename\n }\n expTag\n __typename\n }\n __typename\n }\n}\n"}

请求推广合作

作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)

m

m

网易云热评爬虫挂了

那个网易云的热评爬虫挂了,我可以pull requests一个自己写的深度优先遍历的热评爬虫吗

自荐分布式高可用代理爬虫 HAipproxy

感谢repo主把weibospider收录进去了。看了这么多 awesome spider,我觉得还差一款爬虫的基础支撑程序,所以自荐HAipproxy

Haipproxy是一款高可用低时延的分布式代理程序,高可用包含两个方面:

  • 代理资源的高可用(通过IP验证和筛选策略实现)
  • 各个组件的高可用(通过分布式来实现)

HAipproxy目前测试的速度可以达到 1w+ requests/hour。下面是以知乎为目标网站,单机测试结果

请求量 时间 耗时 IP负载策略 客户端
0 2018/03/03 22:03 0 greedy py_cli
10000 2018/03/03 11:03 1 hour greedy py_cli
20000 2018/03/04 00:08 2 hours greedy py_cli
30000 2018/03/04 01:02 3 hours greedy py_cli
40000 2018/03/04 02:15 4 hours greedy py_cli
50000 2018/03/04 03:03 5 hours greedy py_cli
60000 2018/03/04 05:18 7 hours greedy py_cli
70000 2018/03/04 07:11 9 hours greedy py_cli
80000 2018/03/04 08:43 11 hours greedy py_cli

pip install requirements.txt Exception

Collecting requirements.txt
Exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/usr/lib/python2.7/dist-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python2.7/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
require_hashes
File "/usr/lib/python2.7/dist-packages/pip/req/req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 568, in _get_pages
page = self._get_page(location)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 795, in get_page
resp.raise_for_status()
File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/requirements-txt/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.