Giter Club home page Giter Club logo

driveit's People

Contributors

trim21 avatar xiazy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

driveit's Issues

DMZJ 显示:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

我的命令:python3 ./driveit.py http://www.dmzj.com/info/chuanlingwuyu.html
返回打印:
url http://www.dmzj.com/info/chuanlingwuyu.html header {'Referer': '', 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4'}
Traceback (most recent call last):
File "./driveit.py", line 99, in
website_object = SiteClass(user_input_url)
File "/Users//Documents//DriveIt-master/sites.py", line 107, in init
self.flyleaf_data = self.get_data(self.flyleaf_url).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

png 格式 漫画 被 误保存 为 jpg 格式

测试url
http://www.dm5.com/manhua-yiquanchaoren/
下载至‘外传:第8话 ’可复现,图片无法打开,修改后缀名为 .png后图片可正常观看

图片url 如
http://manhua1025.61-174-50-141.cdndm5.com/11/10684/399544/1_7450.jpg?cid=399544&key=26598c8617417b16446db25625610c9a
http://manhua1023.61-174-50-131.cdndm5.com/11/10684/145458/3_5957.png?cid=145458&key=26598c8617417b16446db25625610c9a
里有图片的格式,很好识别

p.s. 您的源码非常有帮助,谢谢您~

我居然来提issue了

似乎是idm5的网页结构变了?

···
URL?
http://www.dm5.com/manhua-cudianxinzhanzheng/
Traceback (most recent call last):
File ".\driveit.py", line 30, in
ref_box = website_object.get_parent_info()
File "C:\Users\Niu\DriveIt\sites.py", line 72, in get_parent_info
ref_title = li.a['title']
File "C:\Users\Niu\AppData\Local\Programs\Python\Python35\lib\site-packages\bs4\element.py", line 958, in getitem
return self.attrs[key]
KeyError: 'title'
···
刚刚爬粗点心战争的时候报的错
如果楼主有心修复就好啦-。-

url非纯英文出现的问题

实验漫画网址:

http://www.dm5.com/manhua-fangxuehoudefengbaoguanxianledui/

实验过程

PS E:\LearnPython\DriveIt> python3 .\driveit.py
URL?
http://www.dm5.com/manhua-fangxuehoudefengbaoguanxianledui/
Where to save?
E:\BaiduYunDownload\漫画\

实验结果:

放学后的风暴管弦乐队, total 5 chapters detected.
Traceback (most recent call last):
  File ".\driveit.py", line 67, in <module>
    main_loop(ref_box)
  File ".\driveit.py", line 12, in main_loop
    website_object.down(comic_name, parent_link, link, title, page)
  File "E:\LearnPython\DriveIt\sites.py", line 94, in down
    img_data = self.get_data(link, 'http://www.dm5.com%s' % parent_link)
  File "E:\LearnPython\DriveIt\base.py", line 23, in get_data
    web_page = request.urlopen(req)
  File "E:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "E:\Python34\lib\urllib\request.py", line 463, in open
    response = self._open(req, data)
  File "E:\Python34\lib\urllib\request.py", line 481, in _open
    '_open', req)
  File "E:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "E:\Python34\lib\urllib\request.py", line 1210, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "E:\Python34\lib\urllib\request.py", line 1182, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "E:\Python34\lib\http\client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "E:\Python34\lib\http\client.py", line 1116, in _send_request
    self.putrequest(method, url, **skips)
  File "E:\Python34\lib\http\client.py", line 973, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-21: ordinal not in rang
e(128)

问题:URL中有中文
因为我用firebug查看了网页元素,发现这个图片的网址是:http://manhua1014.61-147-113-113.cdndm5.com/f/放学后的风暴管弦乐队/放学后的风暴管弦队_ch01/000001_fb457d98.jpg?cid=49096&key=acd841ae172b8fa7b82c1a60d545f8ae
解决方案:
我等会尝试着用PR试一下……我自己不会弄,原理在这里:知乎——urlopen的中文问题

可以在header里添加压缩支持

requests是支持自动解码gzip, deflate和sdch,下一次commit的时候顺手在header里加上
"Accept-Encoding": "gzip, deflate, sdch",应该对速度有帮助

dm5应该是又改版了........

image

原因:原来应该是dm5没有移动端网页适配,现在有了,各种元素都重命名了

修复方法:
dm5不伪造UA,使用笔记本UA即可

class DM5(SharedBase):
    def get_data(self, url, referrer=''):
        self.webheader = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
            'Referer': referrer}
        req = request.Request(url=url, headers=self.webheader)
        web_page = request.urlopen(req)
        page_data = web_page.read()
        return page_data

    def __init__(self, url):

重写了get_data方法,但是在文件头要from urllib import request,有没有什么优雅一点的办法...

个人信息请保护好~

绅士你好。我因为你在某个网站的活跃而来到了这里,以后还是把.git文件给删了吧……毕竟两会召开了不是,个人资料就放在那里实在是比较危险。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.