Giter Club home page Giter Club logo

python-spider's Introduction

python-spider's People

Contributors

chess99 avatar crazybunqnq avatar dependabot[bot] avatar hjlarry avatar jack-cherish avatar steven7851 avatar sys0613 avatar zyszys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-spider's Issues

我是为了福利

我想爬取ttp://www.biqukan.com 这个网站所有小说,然后自己写个小说app,能给我点意见吗

代码报错

File "1.py", line 126, in run
video_names, video_urls, nickname = self.get_video_urls(user_id)
File "1.py", line 35, in get_video_urls
aweme_count = html['user_list'][0]['user_info']['aweme_count']
KeyError: 'user_list'

12306 抢票error

如下

等待验证码,自行输入...
购票页面开始...
循环点击查询... 第 1 次
[2832:8432:0114/111719.531:ERROR:service_manager.cc(157)] Connection InterfaceProviderSpec prevented service: content_renderer from binding interface: blink::mojom::ReportingServiceProxy exposed by: content_browser
Message: stale element reference: element is not attached to the page document
(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.34.522940 (1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.16299 x86_64)

还没开始预订 1
开始预订...
开始选择用户...
提交订单...
'ElementList' object has no attribute 'click'

F:\trainticket_booker-master>[5620:16128:0114/111729.132:ERROR:process_metrics.cc(105)] NOT IMPLEMENTED
[5620:16128:0114/111729.132:ERROR:process_metrics.cc(105)] NOT IMPLEMENTED
[5620:16128:0114/111729.133:ERROR:process_metrics.cc(105)] NOT IMPLEMENTED
[5620:16128:0114/111729.135:ERROR:process_metrics.cc(105)] NOT IMPLEMENTED

浏览器先显示了登陆界面,登录后跳转到订票页面,还没有来得及看清楚就又返回登陆页面了,以上是命令行显示的错误

出错了呀KeyError: 'aweme_list'`

解析视频链接中 Traceback (most recent call last): File "douyin_appsign.py", line 325, in <module> douyin.run() File "douyin_appsign.py", line 283, in run video_names, video_urls, share_urls, nickname = self.get_video_urls(user_id, type_flag) File "douyin_appsign.py", line 194, in get_video_urls for each in html['aweme_list']: KeyError: 'aweme_list'

出错了。。。。

B站视频下载不全

python bilibili.py -d lex -k lexburner2009年的零 -p 1
我这么下载一个10分钟的视频,只能下载前6分钟

12306抢票 error

等待验证码,自行输入...
购票页面开始...
循环点击查询... 第 1 次
no elements could be found with text "预订"
还没开始预订
循环点击查询... 第 2 次
Message: unknown error: Element ... is not clickable at point (1102, 15). O
ther element would receive the click: .
..
(Session info: chrome=63.0.3213.3)
(Driver info: chromedriver=2.34.522940 (1a76f96f66e3ca7b8e57d503b4dd3bccfba87a
f1),platform=Windows NT 6.1.7601 SP1 x86_64)

运行报错

Traceback (most recent call last):
File "D:/Files/Python Practice/test/test.py", line 160, in
douyin.run()
File "D:/Files/Python Practice/test/test.py", line 125, in run
video_names, video_urls, share_urls, nickname = self.get_video_urls(user_id)
File "D:/Files/Python Practice/test/test.py", line 46, in get_video_urls
uid = html['user_list'][0]['user_info']['uid']
KeyError: 'user_list'

VIP视频

大神video_downloader目录下没有requirements.txt文件,python安装不了。
爱奇艺vip用软件出来的只有免费的6分钟吗?

新的搜索api

要修改的地方

serarch_url = https://api.bilibili.com/x/web-interface/search/type?jsonp=jsonp&search_type=video&keyword={}&page={}

返回的json数据

videos = html["data"]['result']
多谢大神分享,学习了很多!

爬虫pycharm报错

你好,从CSDN上看到你的爬虫教程觉得很有趣点了个赞就跟着照做,但出现了点问题望能回复
问题很简单,在pharm中
import requests

if name == 'main':
target = 'http://gitbook.cn/'
req = requests.get(url=target)
print(req.text)

会报错
D:\Python\Python36\python.exe "C:/Users/zxy/PycharmProjects/Python Excerise/WebSpider/random.py"
Traceback (most recent call last):
File "C:/Users/zxy/PycharmProjects/Python Excerise/WebSpider/random.py", line 1, in
import requests
File "D:\Python\Python36\lib\site-packages\requests_init_.py", line 97, in
from . import utils
File "D:\Python\Python36\lib\site-packages\requests\utils.py", line 11, in
import cgi
File "D:\Python\Python36\lib\cgi.py", line 44, in
import tempfile
File "D:\Python\Python36\lib\tempfile.py", line 45, in
from random import Random as _Random
ImportError: cannot import name 'Random'

但是copy到cmd的python中运行没出现问题

抖音搜索接口变更

刚用了下抖音的爬虫,发现抖音搜索接口已变更 会为每个关键字生成两个token参数。不知如何破解,请大神帮忙看看有没有啥好方法。
抓包发现最新接口为http://aweme.snssdk.com/aweme/v1/general/search
会用到cookie判断是否登录
还有搜索关键字参数变化mas和as 参数都会变化
keyword=***&offset=0&mas=010abf2fd5bc52c15a2cccb755d17528ee1d793a0e0b26f18808de&as=a135dc16bc6edbd9022563

some problems trouble me

Hello, master Cui.
有一个问题想请教一下,我的代码(功能是一个电商平台自动下单)在windows 平台上跑没问题。但是在Linux 系统上面跑,只能得到get请求,post请求就没有得到响应,似乎是禁止了平台操作。还望大佬支招。

12306抢票报错

Message: unknown error: Element ... is not clickable at point (1112, 99). Other element would receive the click:


(Session info: chrome=63.0.3239.132)
(Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.10586 x86_64)

感觉都可以那这个去创个小业了

滑动验证码,我参照你的做了一个,遇到一些问题,有时间时候能否帮忙看看,谢谢

https://github.com/sys0613/python-spider/tree/master/geetest我暂时传到我的库了,等弄好了,再往你的库push,目前工商网站滑块不能测试,我就采用geetest官网的测试页面进行测试,点击登录,获取滑块图片时,我从fiddler抓包效果和说明我都写出来了。
我不太懂js和css,我想让你简单帮我说下流程。我现在不知道怎么往下走,获取到完整验证码图片,和有缺口的验证码图片了。
我现在有两个问题,想让你帮帮忙,谢谢。具体内容我都写到我github网址文件夹了。
问题1,我第一个请求访问的网址是不是根据登录页面中的js函数计算出来的?(我目前看不懂js)
问题2,我请求5和请求7中的乱序图片是要根据请求3中的js进行重新组合,还是根据请求4中的css进行组合,才能得到一个完整的图片,和一个有缺口的图片?
谢谢了,我目前的弱项是js和css。我看你的工商网址验证码改了,就想试好geetest官网的,push上来,让大家能用。目前卡在这里了。

网易云音乐下载失败

post_request函数一直返回post_request error,分析了一下是在执行get_song_url函数的时候产生的,不知道是什么原因?
image

"请先登录,再继续搜索吧"

After modifying douyin_pro_2.py:

			req = requests.get(search_url, headers=self.headers)
			html = json.loads(req.text)
			print(html) ###!!!!
			aweme_count = 32767 # html['user_list'][0]['user_info']['aweme_count']
			uid = html['user_list'][0]['user_info']['uid']
$ python douyin_pro_2.py
[...]
{'status_code': 2483, 'rid': '20180713080926010011047200583C2E', 'log_pb': {'impr_id': '20180713080926010011047200583C2E'}, 'status_msg': '请先登录,再继续搜索吧', 'extra': {'logid': '20180713080926010011047200583C2E', 'now': 1531440566657, 'fatal_item_ids': []}}
Traceback (most recent call last):                                                                                                                                                                                                            
  File "douyin_pro_2.py", line 153, in <module>
    douyin.run()
  File "douyin_pro_2.py", line 118, in run
    video_names, video_urls, share_urls, nickname = self.get_video_urls(user_id)
  File "douyin_pro_2.py", line 39, in get_video_urls
    uid = html['user_list'][0]['user_info']['uid']
KeyError: 'user_list'

How do I login?

致作者

能不能增加批量下载 喜玛拉雅 音频的爬虫?

小说的那个脚本貌似章节比较长的小说会报错,不显示下载进度

#我用的这个目录 网址 报错 一千多章的小说 http://www.biqukan.com/1_1094/
Traceback (most recent call last):
File "D:/py17/爬虫/小说.py", line 137, in
name,numbers,url_dict = d.get_download_url()
File "D:/py17/爬虫/小说.py", line 73, in get_download_url
download_dict['第' + str(numbers) + '章 ' + names[1]] = download_url
IndexError: list index out of range

#这个列表貌似没把整个目录全部取下来 ,只取到1259章,实际到1260章
1277 1278 ['第1256', ' 帝戟']
1278 1279 ['第1257', ' 你敢骂我?']
1279 1280 ['第1258', ' 陨落']
1280 1281 ['第1259张 风波再起!']

Process finished with exit code 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.