lightnovel-center / linovelib2epub Goto Github PK
View Code? Open in Web Editor NEWCrawl light novel from some websites and convert it to epub.
Home Page: https://pypi.org/project/linovelib2epub/
License: GNU Affero General Public License v3.0
Crawl light novel from some websites and convert it to epub.
Home Page: https://pypi.org/project/linovelib2epub/
License: GNU Affero General Public License v3.0
ID 3721
下載的 EPub 中,每一個章節都內文顯示如下
(ò﹏ò)
抱歉,章节内容不支持该浏览器显示~
【为了使用完整的阅读功能】
请考虑使用〔Chrome 谷歌浏览器〕、〔Safari 苹果浏览器〕或者
〔Edge 微软浏览器〕等原生浏览器阅读!
谢谢!!!
我使用瀏覽器是能正常訪問文章內容。
https://www.linovelib.com/novel/8/194015.html
view-source:https://www.linovelib.com/novel/8/194015.html
我是chrome浏览器
下载后正文乱码,基本和浏览器 右键,检查源文件,正文之中的乱码一模一样,是什么原因呢
嗶哩的 ID 2978,共 431 章。
我跑第四次才能一次跑完
[大概跑約 300 章會因錯誤停止]
Traceback (most recent call last):
File "[path]\linovelib2epub-main\main.py", line 28, in <module>
linovelib_epub.run()
File "[path]\linovelib2epub-main\src\linovelib2epub\linovel.py", line 405, in run
novel = self._spider.fetch()
^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 49, in fetch
novel_whole = self._fetch()
^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 402, in _fetch
new_novel_with_content = self._crawl_book_content(book_catalog_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 181, in _crawl_book_content
raise Exception(f'[ERROR]: request {page_link} failed.')
Exception: [ERROR]: request [url] failed.
我不確定是不是有設定錯誤幾次就終止;
但如果有的話,我建議根據總章節數來適當增加可重試的次數。
Describe the bug(描述这个BUG)
选择章节进行下载后,浏览器会疯狂刷新,然后触发风控。浏览器以及驱动已确定为最新版 124.0.6367.91
To Reproduce(复现步骤)
from linovelib2epub import Linovelib2Epub
if __name__ == '__main__':
# /path/to/chromedriver
browser_driver_path = r'C:\Users\Administrator\Desktop\chromedriver-win64\chromedriver.exe'
linovelib_epub = Linovelib2Epub(book_id=8, chapter_crawl_delay=3, page_crawl_delay=2, select_volume_mode=True,
browser_driver_path=browser_driver_path,
log_level='DEBUG')
linovelib_epub.run()
Screenshots or Video(截图或者视频录制)
Environment(软件环境)
Python 3.10.9
pip --version
Additional context(补充信息[可选])
Add any other context about the problem here.
www.bilinovel.com_8.log
出現 403 錯誤,完整訊息如下:
(.venv) PS [path]\linovelib2epub-main> python .\main.py
2024-02-18,02:55:00 WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:01 WARNING MasiroSpider current_num_of_request: 1 utils.py:114
2024-02-18,02:55:02 WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:03 WARNING MasiroSpider current_num_of_request: 2 utils.py:114
WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:04 WARNING MasiroSpider current_num_of_request: 3 utils.py:114
WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:05 WARNING MasiroSpider current_num_of_request: 4 utils.py:114
2024-02-18,02:55:06 WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:07 WARNING MasiroSpider current_num_of_request: 5 utils.py:114
WARNING MasiroSpider Request https://masiro.me/admin/auth/login succeed but status utils.py:105
code is 403.
2024-02-18,02:55:08 WARNING MasiroSpider current_num_of_request: 6 utils.py:114
Traceback (most recent call last):
File "[path]\linovelib2epub-main\main.py", line 28, in <module>
linovelib_epub.run()
File "[path]\linovelib2epub-main\src\linovelib2epub\linovel.py", line 405, in run
novel = self._spider.fetch()
^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\masiro_spider.py", line 49, in fetch
novel = asyncio.run(self._fetch())
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\asyncio\runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\masiro_spider.py", line 67, in _fetch
login_info = await self._login(session)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\masiro_spider.py", line 98, in _login
await self._masiro_get_token(login_info, session)
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\masiro_spider.py", line 453, in _masiro_get_token
page_body = html.fromstring(res)
^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\.venv\Lib\site-packages\lxml\html\__init__.py", line 872, in fromstring
is_full_html = _looks_like_full_html_unicode(html)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'
因為我經由瀏覽器可以正常訪問 (含登入後瀏覽文章) masiro.me,所以我的網路環境是沒有問題的。
from linovelib2epub.linovel import Linovelib2Epub
# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=3279,select_volume_mode=True)
linovelib_epub.run()
用以上CODE下載回來的EPUB 沒有內文.全顯示成"NONE"。幾個月前一直在用,沒有問題。這幾天就用不了, 謝謝
我發現當章節超過四個時,會得出下述結果。而只要於每一次抓取後加上 sleep 就好了 (個人使用五秒暫停)。
Traceback (most recent call last):
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 244, in _expand_paginated_chapter_links
raise Exception(f'[ERROR]: request {url_next} failed.')
Exception: [ERROR]: request [url] failed.During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "[path]\linovelib2epub-main\main.py", line 6, in
linovelib_epub.run()
File "[path]\linovelib2epub-main\src\linovelib2epub\linovel.py", line 405, in run
novel = self._spider.fetch()
^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 47, in fetch
novel_whole = self._fetch()
^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 395, in _fetch
new_novel_with_content = self._crawl_book_content(book_catalog_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 167, in _crawl_book_content
url_next = self._expand_paginated_chapter_links(catalog_chapter, url_next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 251, in _expand_paginated_chapter_links
raise Exception(f'[ERROR]: request {url_next} failed.')
Exception: [ERROR]: request [url] failed.
代码如下:
from linovelib2epub import Linovelib2Epub, TargetSite
bookId = 251
browserPath = "C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=bookId, target_site=TargetSite.MASIRO, browser_path=browserPath)
linovelib_epub.run()
日志:
2024-03-24,23:16:09 INFO Linovelib2Epub linovel.py :428 ================================================================================
2024-03-24,23:25:41 INFO MasiroSpider masiro_spider.py :250 -> 已登录
2024-03-24,23:25:45 INFO MasiroSpider masiro_spider.py :454 User points balance is 646.
2024-03-24,23:25:45 INFO MasiroSpider masiro_spider.py :98 当前所有卷都是免费积分或你已经购买,直接执行下载。
2024-03-24,23:25:45 INFO MasiroSpider masiro_spider.py :131 page url set = 0
2024-03-24,23:25:45 INFO MasiroSpider masiro_spider.py :139 DOWNLOAD_PAGES concurrency level: 1.
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :330 volume: 公告
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :330 volume: 译名整合
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :330 volume: 文库插图
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :330 volume: web版
2024-03-24,23:25:45 INFO Linovelib2Epub linovel.py :418 The data of book(id=251) except image files is ready.
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :207 Image download strategy: ASYNCIO
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :215 len of image list: 1
2024-03-24,23:25:45 INFO MasiroSpider base_spider.py :112 len of light_novel_images= 1
2024-03-24,23:25:48 INFO MasiroSpider base_spider.py :183 image url https://masiro.me/images/encode/cover-210615175153-CpMM.png => local relative path novel_images/masiro.me/251/cover-210615175153-CpMM.png ok.
2024-03-24,23:25:48 INFO MasiroSpider base_spider.py :154 SUCCEED_COUNT: 1
2024-03-24,23:25:48 INFO MasiroSpider base_spider.py :155 [NEXT TURN]Pending task count: 0
2024-03-24,23:25:48 INFO MasiroSpider base_spider.py :193 (Perf metrics) Download Images took: 3.654403300024569 seconds
2024-03-24,23:25:48 INFO EpubWriter linovel.py :42 [Config]: has_illustration: True; divide_volume: False
2024-03-24,23:25:48 INFO EpubWriter linovel.py :60 (Perf metrics) Write epub took: 0.0743418000638485 seconds
2024-03-24,23:25:48 INFO Linovelib2Epub linovel.py :425 Write epub finished. Now delete all the artifacts if set.
2024-03-24,23:25:48 INFO Linovelib2Epub linovel.py :428 ================================================================================
转生王女和天才千金的魔法革命【Web版】有两个相同的章节名,会导致分卷下载时上一章被吞
当我尝试在venv环境中运行python -m pip install -e .
时,报错
ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: ~/repos/linovelib2epub
(A "pyproject.toml" file was found, but editable mode currently requires a setup.py based build.)
尝试使用--no-use-pep517
选项仍然不能正常安装,报错不变
应该是需要setup.py
才能正常安装项目。需要更新文档或者添加setup.py
@all-contributors please add @inkroom as a contributor for bug, code.
现在哔哩轻小说的Cover URL中存在?
字符,例如https://w.linovelib.com/files/article/image/2/2044/2044s.jpg?540916
。windows系统不允许路径中存在?
,这会导致下载图像时Cover无法正常保存,进而导致生成epub时因找不到Cover图像报错。
目前我没有想出通用的修复方案,也许需要将URL和filename存储在两个Dict中,或者把现有image_dict的值改成List以同时保存URL和filename?
测试用URL:https://w.linovelib.com/novel/3626.html
报错:
OSError: [Errno 22] Invalid argument: 'images/image-3-3626-3626s.jpg?553400'
临时解决方案为在src/linovelib2epub/spider/linovelib_mobile_spider.py中的_crawl_book_basic_info
方法中,找到:
book_cover_url = soup.find('img', {'class': 'book-cover'})['src']
改为:
book_cover_url = soup.find('img', {'class': 'book-cover'})['src'].split("?")[0]
Describe the bug(描述这个BUG)
A clear and concise description of what the bug is.
跳出的瀏覽器正常顯示,但錯誤訊息卻顯示為空。
To Reproduce(复现步骤)
复现的代码以及操作(例如分支选择、卷选择等等)
python 檔案
from linovelib2epub import Linovelib2Epub
if __name__ == "__main__":
linovelib_epub = Linovelib2Epub(
book_id=2264,
select_volume_mode=True,
log_level="DEBUG",
)
linovelib_epub.run()
執行
(.venv) PS D:\tmp\linovelib2epub> python .\main.py
2024-04-24,01:52:06 INFO LinovelibMobileSpider _html_content_id=acontentz linovelib_mobile_spider.py:33
INFO LinovelibMobileSpider len(_mapping_dict)=104 linovelib_mobile_spider.py:35
2024-04-24,01:52:07 INFO LinovelibMobileSpider Succeed to get the novel of book_id: linovelib_mobile_spider.py:88
2264
INFO LinovelibMobileSpider book name:《月光下的异世界之旅》 linovelib_mobile_spider.py:98
INFO LinovelibMobileSpider Succeed to get the catalog of book_id: linovelib_mobile_spider.py:148
2264
[?] Which volumes you want to download?(use SPACE to select one or multiple volumes):
> [X] 第一卷
[ ] 第二卷
[ ] 第三卷
[ ] 第四卷
[ ] 第五卷
[ ] 第六卷
[ ] 第七卷
[ ] 第八卷
[ ] 第8.5卷
[ ] 第九卷
[ ] 第十卷
[ ] 第十一卷
[ ] 第十二卷
2024-04-24,01:52:10 INFO LinovelibMobileSpider volume: 第一卷 linovelib_mobile_spider.py:164
INFO LinovelibMobileSpider chapter : 插图 linovelib_mobile_spider.py:178
DevTools listening on ws://127.0.0.1:64783/devtools/browser/87ee5b70-09c3-4015-a5bf-bc617950e392
2024-04-24,01:52:12 INFO LinovelibMobileSpider navigator.language.toLowerCase()=zh-tw linovelib_mobile_spider.py:368
2024-04-24,01:52:14 INFO LinovelibMobileSpider 初始化 Driver 完毕... linovelib_mobile_spider.py:379
2024-04-24,01:52:25 WARNING LinovelibMobileSpider linovelib_mobile_spider.py:306
https://www.bilinovel.com/novel/2264/122121.html encountered
TimeoutException.
WARNING LinovelibMobileSpider Retrying linovelib_mobile_spider.py:327
https://www.bilinovel.com/novel/2264/122121.html(1/10)...;
retry_interval: 1.76(s)
2024-04-24,01:52:28 DEBUG LinovelibMobileSpider linovelib_mobile_spider.py:202
page(https://www.bilinovel.com/novel/2264/122121.html)
size=17875
INFO LinovelibMobileSpider chapter : [插图] New Title= [插圖] linovelib_mobile_spider.py:210
CRITICAL LinovelibMobileSpider The content of linovelib_mobile_spider.py:224
https://www.bilinovel.com/novel/2264/122121.html is Empty
and content_id =acontentz.Please report this bug to [github
issue](https://github.com/lightnovel-center/linovelib2epub/i
ssues).
log 檔
2024-04-24,01:57:13 INFO LinovelibMobileSpider linovelib_mobile_spider.py:33 _html_content_id=acontentz
2024-04-24,01:57:13 INFO LinovelibMobileSpider linovelib_mobile_spider.py:35 len(_mapping_dict)=104
2024-04-24,01:57:14 INFO LinovelibMobileSpider linovelib_mobile_spider.py:88 Succeed to get the novel of book_id: 2264
2024-04-24,01:57:14 INFO LinovelibMobileSpider linovelib_mobile_spider.py:98 book name:《月光下的异世界之旅》
2024-04-24,01:57:14 INFO LinovelibMobileSpider linovelib_mobile_spider.py:148 Succeed to get the catalog of book_id: 2264
2024-04-24,01:57:17 INFO LinovelibMobileSpider linovelib_mobile_spider.py:164 volume: 第一卷
2024-04-24,01:57:17 INFO LinovelibMobileSpider linovelib_mobile_spider.py:178 chapter : 插图
2024-04-24,01:57:19 INFO LinovelibMobileSpider linovelib_mobile_spider.py:368 navigator.language.toLowerCase()=zh-tw
2024-04-24,01:57:23 INFO LinovelibMobileSpider linovelib_mobile_spider.py:379 初始化 Driver 完毕...
2024-04-24,01:57:25 DEBUG LinovelibMobileSpider linovelib_mobile_spider.py:202 page(https://www.bilinovel.com/novel/2264/122121.html) size=16861
2024-04-24,01:57:25 INFO LinovelibMobileSpider linovelib_mobile_spider.py:210 chapter : [插图] New Title= [插圖]
2024-04-24,01:57:25 CRITICAL LinovelibMobileSpider linovelib_mobile_spider.py:224 The content of https://www.bilinovel.com/novel/2264/122121.html is Empty and content_id =acontentz.Please report this bug to [github issue](https://github.com/lightnovel-center/linovelib2epub/issues).
Expected behavior(期望的行为)
A clear and concise description of what you expected to happen.
可以正常抓取內容
Screenshots or Video(截图或者视频录制)
If applicable, add screenshots to help explain your problem.
錯誤前有正常顯示網頁
Environment(软件环境)
Package Version Editable project location
------------------ ----------- -------------------------
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
ansicon 1.89.0
attrs 23.2.0
beautifulsoup4 4.12.3
blessed 1.20.0
Brotli 1.1.0
bs4 0.0.2
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
cssselect 1.2.0
DataRecorder 3.4.14
demjson3 3.0.6
DownloadKit 2.0.0
DrissionPage 4.0.4.21
dynaconf 3.2.5
EbookLib 0.18
editor 1.6.6
esprima 4.0.1
et-xmlfile 1.1.0
fake-useragent 1.5.1
filelock 3.13.4
frozenlist 1.4.1
h11 0.14.0
idna 3.7
inquirer 3.2.4
jinxed 1.2.1
linovelib2epub 0.1.3 D:\tmp\linovelib2epub
lxml 5.2.1
markdown-it-py 3.0.0
mdurl 0.1.2
multidict 6.0.5
openpyxl 3.1.2
outcome 1.3.0.post0
pillow 10.3.0
pip 24.0
psutil 5.9.8
pycparser 2.22
Pygments 2.17.2
PySocks 1.7.1
readchar 4.0.6
requests 2.31.0
requests-file 2.0.0
rich 13.7.1
runs 1.2.2
selenium 4.19.0
setuptools 69.5.1
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.5
tabulate 0.9.0
tldextract 5.1.2
trio 0.25.0
trio-websocket 0.11.1
typing_extensions 4.11.0
urllib3 2.2.1
uuid 1.30
wcwidth 0.2.13
websocket-client 1.7.0
wsproto 1.2.0
xmod 1.8.1
yarl 1.9.4
執行時的log
2024-04-04,16:12:18 INFO LinovelibMobileSpider Succeed to get the novel of book_id: 8 linovelib_mobile_spider.py:85
INFO LinovelibMobileSpider book linovelib_mobile_spider.py:95
name:《欢迎来到实力至上主义的教室》
INFO LinovelibMobileSpider Succeed to get the catalog of book_id: linovelib_mobile_spider.py:145
8
[?] Which volumes you want to download?(use SPACE to select one or multiple volumes):
[ ] 第十三卷 二年级篇 2
[ ] 第十四卷 二年级篇 3
[ ] 第十五卷 二年级篇 4
[ ] 第15.5卷 二年级篇 4.5
[ ] 第十六卷 二年级篇 5
[ ] 第十七卷 二年级篇 6
[ ] 第十八卷 二年级篇 7
[ ] 第十九卷 二年级篇 8
[ ] 第〇卷
[ ] 第二十卷 二年级篇 9
[ ] 第20.5卷 二年级篇 9.5
[ ] 第二十一卷 二年级篇 10
> [X] 第二十二卷 二年级篇 11
2024-04-04,16:12:22 INFO LinovelibMobileSpider volume: 第二十二卷 二年级篇 11 linovelib_mobile_spider.py:161
2024-04-04,16:12:25 INFO LinovelibMobileSpider chapter : 插图 linovelib_mobile_spider.py:175
DevTools listening on ws://127.0.0.1:52724/devtools/browser/55e8e920-83b7-4954-9c4e-6d99bcc29dc1
2024-04-04,16:12:33 INFO LinovelibMobileSpider 初始化 Driver 完毕... linovelib_mobile_spider.py:344
2024-04-04,16:12:35 INFO LinovelibMobileSpider chapter : [插图] New Title= [插圖] linovelib_mobile_spider.py:208
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226398.html
2024-04-04,16:12:38 INFO LinovelibMobileSpider chapter : 山村M纪的独白 linovelib_mobile_spider.py:175
2024-04-04,16:12:41 INFO LinovelibMobileSpider chapter : [山村M纪的独白] New Title= linovelib_mobile_spider.py:208
[山村美紀的獨白]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226399.html
2024-04-04,16:12:44 INFO LinovelibMobileSpider chapter : 若隐若现的双方面谈 linovelib_mobile_spider.py:175
2024-04-04,16:12:47 INFO LinovelibMobileSpider chapter : [若隐若现的双方面谈] New linovelib_mobile_spider.py:208
Title= [若隱若現的雙方面談]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226400.html
2024-04-04,16:12:49 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226400_2.html
2024-04-04,16:12:52 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226400_3.html
2024-04-04,16:12:55 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226400_4.html
2024-04-04,16:12:57 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226400_5.html
2024-04-04,16:13:00 INFO LinovelibMobileSpider chapter : 交流合宿 linovelib_mobile_spider.py:175
2024-04-04,16:13:02 WARNING LinovelibMobileSpider Request https://www.bilinovel.com/novel/8/226401_8.html utils.py:52
succeed but data is empty, retry 1 times
WARNING LinovelibMobileSpider current_num_of_request: 1; retry_interval: 1.14(s) utils.py:77
2024-04-04,16:13:03 WARNING LinovelibMobileSpider Request https://www.bilinovel.com/novel/8/226401_8.html utils.py:52
succeed but data is empty, retry 2 times
WARNING LinovelibMobileSpider current_num_of_request: 2; retry_interval: 2.24(s) utils.py:77
2024-04-04,16:13:05 WARNING LinovelibMobileSpider Request https://www.bilinovel.com/novel/8/226401_8.html utils.py:52
succeed but data is empty, retry 3 times
WARNING LinovelibMobileSpider current_num_of_request: 3; retry_interval: 4.39(s) utils.py:77
2024-04-04,16:13:10 WARNING LinovelibMobileSpider Request https://www.bilinovel.com/novel/8/226401_8.html utils.py:52
succeed but data is empty, retry 4 times
WARNING LinovelibMobileSpider current_num_of_request: 4; retry_interval: 8.92(s) utils.py:77
2024-04-04,16:13:22 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401.html
2024-04-04,16:13:24 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_2.html
2024-04-04,16:13:27 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_3.html
2024-04-04,16:13:29 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_4.html
2024-04-04,16:13:32 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_5.html
2024-04-04,16:13:35 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_6.html
2024-04-04,16:13:37 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_7.html
2024-04-04,16:13:40 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_8.html
2024-04-04,16:13:43 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226401_9.html
2024-04-04,16:13:46 INFO LinovelibMobileSpider chapter : 堀北的请求与绫小路的请求 linovelib_mobile_spider.py:175
2024-04-04,16:13:49 INFO LinovelibMobileSpider chapter : [堀北的请求与绫小路的请求] linovelib_mobile_spider.py:208
New Title= [堀北的請求與綾小路的請求]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226402.html
2024-04-04,16:13:51 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226402_2.html
2024-04-04,16:13:54 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226402_3.html
2024-04-04,16:13:57 INFO LinovelibMobileSpider chapter : 奇妙的违和感 linovelib_mobile_spider.py:175
2024-04-04,16:14:00 INFO LinovelibMobileSpider chapter : [奇妙的违和感] New Title= linovelib_mobile_spider.py:208
[奇妙的違和感]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226403.html
2024-04-04,16:14:03 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226403_2.html
2024-04-04,16:14:05 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226403_3.html
2024-04-04,16:14:08 INFO LinovelibMobileSpider chapter : 监视者,被监视者 linovelib_mobile_spider.py:175
2024-04-04,16:14:12 INFO LinovelibMobileSpider chapter : [监视者,被监视者] New linovelib_mobile_spider.py:208
Title= [監視者,被監視者]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404.html
2024-04-04,16:14:14 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404_2.html
2024-04-04,16:14:17 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404_3.html
2024-04-04,16:14:20 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404_4.html
2024-04-04,16:14:22 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404_5.html
2024-04-04,16:14:25 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226404_6.html
2024-04-04,16:14:28 INFO LinovelibMobileSpider chapter : 归于平静的终结 linovelib_mobile_spider.py:175
2024-04-04,16:14:32 INFO LinovelibMobileSpider chapter : [归于平静的终结] New Title= linovelib_mobile_spider.py:208
[歸於平靜的終結]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405.html
2024-04-04,16:14:34 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_2.html
2024-04-04,16:14:37 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_3.html
2024-04-04,16:14:40 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_4.html
2024-04-04,16:14:42 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_5.html
2024-04-04,16:14:45 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_6.html
2024-04-04,16:14:47 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226405_7.html
2024-04-04,16:14:50 INFO LinovelibMobileSpider chapter : 迈出的勇气 linovelib_mobile_spider.py:175
2024-04-04,16:14:53 INFO LinovelibMobileSpider chapter : [迈出的勇气] New Title= linovelib_mobile_spider.py:208
[邁出的勇氣]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226406.html
2024-04-04,16:14:56 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226406_2.html
2024-04-04,16:14:59 INFO LinovelibMobileSpider chapter : 谁是挑战者 linovelib_mobile_spider.py:175
2024-04-04,16:15:02 INFO LinovelibMobileSpider chapter : [谁是挑战者] New Title= linovelib_mobile_spider.py:208
[誰是挑戰者]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226407.html
2024-04-04,16:15:05 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226407_2.html
2024-04-04,16:15:07 INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226407_3.html
2024-04-04,16:15:10 INFO LinovelibMobileSpider chapter : 后记 linovelib_mobile_spider.py:175
2024-04-04,16:15:13 INFO LinovelibMobileSpider chapter : [后记] New Title= [後記] linovelib_mobile_spider.py:208
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226408.html
2024-04-04,16:15:16 INFO LinovelibMobileSpider chapter : 茶柱佐枝特典 眼前的学生 linovelib_mobile_spider.py:175
2024-04-04,16:15:19 INFO LinovelibMobileSpider chapter : [茶柱佐枝特典 眼前的学生] linovelib_mobile_spider.py:208
New Title= [茶柱佐枝特典 眼前的學生]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226409.html
2024-04-04,16:15:22 INFO LinovelibMobileSpider chapter : 椎名日和特典 不想忘却的回忆 linovelib_mobile_spider.py:175
2024-04-04,16:15:24 INFO LinovelibMobileSpider chapter : [椎名日和特典 linovelib_mobile_spider.py:208
不想忘却的回忆] New Title= [椎名日和特典 不想忘卻的回憶]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226410.html
2024-04-04,16:15:27 INFO LinovelibMobileSpider chapter : 森下蓝特典 请代替我倾听 linovelib_mobile_spider.py:175
2024-04-04,16:15:30 INFO LinovelibMobileSpider chapter : [森下蓝特典 请代替我倾听] linovelib_mobile_spider.py:208
New Title= [森下藍特典 請代替我傾聽]
INFO LinovelibMobileSpider Processing page... linovelib_mobile_spider.py:237
https://www.bilinovel.com/novel/8/226411.html
INFO LinovelibMobileSpider (Perf metrics) Fetch Book took: linovelib_mobile_spider.py:55
192.94781769998372 seconds
2024-04-04,16:15:30 INFO Linovelib2Epub The data of book(id=8) except image files is ready. linovel.py:420
INFO LinovelibMobileSpider Image download strategy: ASYNCIO base_spider.py:207
INFO LinovelibMobileSpider len of image list: 33 base_spider.py:215
INFO LinovelibMobileSpider len of light_novel_images= 33 base_spider.py:112
2024-04-04,16:15:31 INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245401.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245401.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245403.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245403.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245405.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245405.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/758/758166.jpg => local
relative path novel_images/www.bilinovel.com/8/0/758166.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/406/406361.jpg => local
relative path novel_images/www.bilinovel.com/8/0/406361.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245407.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245407.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/832/832386.jpg => local
relative path novel_images/www.bilinovel.com/8/0/832386.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245399.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245399.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/736/736940.jpg => local
relative path novel_images/www.bilinovel.com/8/0/736940.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/683/683006.jpg => local
relative path novel_images/www.bilinovel.com/8/0/683006.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/system/avatar/217/217090.jpg => local
relative path novel_images/www.bilinovel.com/8/0/217090.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245392.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245392.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://www.bilinovel.com/files/article/image/0/8/8s.jpg => local
relative path novel_images/www.bilinovel.com/8/8s.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245404.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245404.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245404.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245404.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245407.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245407.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245405.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245405.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245398.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245398.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245406.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245406.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245406.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245406.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245400.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245400.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245397.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245397.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245398.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245398.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img3.readpai.com/0/8/226398/245402.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245402.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245403.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245403.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245401.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245401.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245402.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245402.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245399.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245399.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245400.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245400.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245393.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245393.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245394.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245394.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245395.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245395.jpg ok.
INFO LinovelibMobileSpider image url base_spider.py:183
https://img1.readpai.com/0/8/226398/245396.jpg => local relative path
novel_images/www.bilinovel.com/8/0/245396.jpg ok.
INFO LinovelibMobileSpider SUCCEED_COUNT: 33 base_spider.py:154
INFO LinovelibMobileSpider [NEXT TURN]Pending task count: 0 base_spider.py:155
INFO LinovelibMobileSpider (Perf metrics) Download Images took: base_spider.py:193
0.9820730993524194 seconds
2024-04-04,16:15:31 INFO EpubWriter [Config]: has_illustration: True; divide_volume: True linovel.py:42
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245398.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245399.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245400.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245401.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245402.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245403.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245404.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245405.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245406.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
C:\Users\power70521\AppData\Local\Programs\Python\Python310\lib\zipfile.py:1517: UserWarning: Duplicate name: 'EPUB/novel_images/www.bilinovel.com/8/0/245407.jpg'
return self._open_to_write(zinfo, force_zip64=force_zip64)
2024-04-04,16:15:32 INFO EpubWriter (Perf metrics) Write epub took: 0.8078119000419974 seconds linovel.py:60
The output epub is located in this folder. (You can see the link if you use a modern shell.)
2024-04-04,16:15:32 INFO Linovelib2Epub Write epub finished. Now delete all the artifacts if set. linovel.py:427
INFO Linovelib2Epub linovel.py:430
============================================================================
====
另外確認過前面的issue有提到linovelib2epub的版本為0.1.3
Package Version Editable project location
------------------ ----------- -------------------------
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
ansicon 1.89.0
async-timeout 4.0.3
attrs 23.2.0
beautifulsoup4 4.12.3
blessed 1.20.0
Brotli 1.1.0
bs4 0.0.1
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 2.1.1
click 8.1.7
colorama 0.4.6
commonmark 0.9.1
cssselect 1.2.0
DataRecorder 3.4.12
demjson3 3.0.5
DownloadKit 2.0.0
DrissionPage 4.0.4.5
dynaconf 3.2.3
EbookLib 0.17.1
et-xmlfile 1.1.0
exceptiongroup 1.2.0
fake-useragent 1.1.1
filelock 3.13.3
frozenlist 1.4.1
h11 0.14.0
idna 3.6
inquirer 3.1.2
jinxed 1.2.1
linovelib2epub 0.1.3 H:\linovelib2epub
lxml 4.9.2
multidict 6.0.5
openpyxl 3.1.2
outcome 1.3.0.post0
Pillow 9.2.0
pip 22.2.1
psutil 5.9.8
pycparser 2.22
Pygments 2.17.2
PySocks 1.7.1
python-editor 1.0.4
readchar 4.0.6
requests 2.28.1
requests-file 2.0.0
rich 12.5.1
selenium 4.17.2
setuptools 63.2.0
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.5
tabulate 0.9.0
tldextract 5.1.2
trio 0.25.0
trio-websocket 0.11.1
typing_extensions 4.10.0
urllib3 1.26.18
uuid 1.30
wcwidth 0.2.13
websocket-client 1.7.0
wsproto 1.2.0
yarl 1.9.4
usage.py內容
from linovelib2epub import Linovelib2Epub
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=8, chapter_crawl_delay=3, page_crawl_delay=2, select_volume_mode=True)
linovelib_epub.run()
使用w3c/epubcheck的工具检查
ERROR(OPF-014): *.epub/EPUB/0.xhtml(-1,-1): The property "scripted" should be declared in the OPF file.
不知道该站点什么时候修改了catalog页面导致 _crawl_book_basic_info() 抛出异常,测试了一下需要修改src/linovelib2epub/spider/linovelib_mobile_spider.py 这两个部分:
# 82行
book_title = soup.find('h2', {'class': 'book-title'}).text
--->
book_title = soup.find('h1', {'class': 'book-title'}).text
# 334行
if item.name == 'h3' and 'chapter-bar' in item['class']:
--->
if item.name == 'li' and 'chapter-bar' in item['class']:
做了一個.py 執行以下碼, 直接閃退, LOG 也沒有留下...
求協助, 謝謝
from linovelib2epub import Linovelib2Epub, TargetSite
if name == 'main':
linovelib_epub = Linovelib2Epub(book_id=2961, target_site=TargetSite.WENKU8)
linovelib_epub.run()
从git clone下载安装的最新版本
使用默认源,bookid=75时,运行报错
AssertionError: content_id can't be empty string, please submit this bug to github issue.
Describe the bug(描述这个BUG)
A clear and concise description of what the bug is.
如目前要求使用的 pillow 版本 9.2.0 並不運作於 Python 3.12,需要使用 10 以上版本。
(還有 lxml 等套件也有同樣的問題)
Expected behavior(期望的行为)
A clear and concise description of what you expected to happen.
於說明文件添加 Python 版本的限制,或將 requirements.txt 內的 == 改為 >=。
個人是建議使用 setuptools 的 step.py 撰寫相依套件,這可以做比較細節的環境處理。
目前哔哩轻小说每章最后一句话会使用字体反爬:
例:https://www.bilinovel.com/novel/3825/228977_2.html
段落在不使用正确字体的情况下会显示为乱码,导致最终的epub包含乱码
可能的解决方案:
可以讓她下載.txt嗎,或著分捲下載?
大佬你好,如果只是单纯用对照表进行文字替换,我发现文字是对不上的,比如
ban_word = {
"": "的",
"": "一",
"": "是",
"": "了",
"": "我",
"": "不",
"": "人",
"": "在",
"": "他",
"": "有",
"": "这",
"": "个",
"": "上",
"": "们",
"": "来",
"": "到",
"": "时",
"": "大",
"": "地",
"": "为",
"": "子",
"": "中",
"": "你",
"": "说",
"": "生",
"": "国",
"": "年",
"": "着",
...
}
内容来源 https://www.bilinovel.com/themes/zhmb/js/readtools.js
上面的内容我发现和网站内加密字不能正确对应,请问一下是怎么解决的呢
运行以后 浏览器无法同时开启5个,执行的时候是one by one的,大神求指点
,最小重现代码
import asyncio
import time
from DrissionPage import ChromiumPage, ChromiumOptions, WebPage
# Semaphore to control concurrency
semaphore = asyncio.Semaphore(5) # Allow up to 10 concurrent tasks
def getsession():
co = ChromiumOptions()
co.auto_port()
co.headless(False)
browser_path=r"C:\Users\Administrator\AppData\Local\ms-playwright\chromium-1124\chrome-win\chrome.exe"
co.set_paths(browser_path=browser_path)
page = WebPage(chromium_options=co)
return page
async def test(task_id,page):
print(f"Task {task_id} started")
await asyncio.sleep(1) # Simulating some work (1 second)
tab= page.new_tab()
tab.get('https://baidu.com')
print(page.title)
print(f"Task {task_id} completed")
tab.close()
# Function to simulate a task asynchronously
async def simulate_task(task_id, page):
async with semaphore:
await test('m'+str(task_id),page)
# Function to run tasks asynchronously with specific concurrency
async def run_async_tasks():
tasks = []
page=getsession()
for i in range(1, 10):
task = asyncio.create_task(simulate_task(i, page))
tasks.append(task)
await asyncio.gather(*tasks)
page.quit()
# Example usage: Main coroutine
async def main():
start_time = time.time()
await run_async_tasks()
print(f"Time taken for asynchronous execution with concurrency limited by semaphore: {time.time() - start_time} seconds")
# Manually manage the event loop in Jupyter Notebook or other environments
if __name__ == "__main__":
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.close()
ID 3087
錯誤訊息如下:
2024-02-13,15:44:44 INFO LinovelibMobileSpider Succeed to get the linovelib_mobile_spider.py:80
novel of book_id: 3087
INFO LinovelibMobileSpider Succeed to get the linovelib_mobile_spider.py:138
catalog of book_id: 3087
Traceback (most recent call last):
File "[path]\linovelib2epub-main\main.py", line 6, in <module>
linovelib_epub.run()
File "[path]\linovelib2epub-main\src\linovelib2epub\linovel.py", line 405, in run
novel = self._spider.fetch()
^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 49, in fetch
novel_whole = self._fetch()
^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 399, in _fetch
new_novel_with_content = self._crawl_book_content(book_catalog_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 141, in _crawl_book_content
catalog_list: List[CatalogLinovelibMobileVolume] = self._convert_to_catalog_list(catalog_html)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[path]\linovelib2epub-main\src\linovelib2epub\spider\linovelib_mobile_spider.py", line 326, in _convert_to_catalog_list
catalog_html_items = catalog_wrapper.children # Use children to get both <h3> and <li> elements
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'children'
我把 _convert_to_catalog_list 的輸入 catalog_html 存下來,參見 Gofile。方式為如下:
with open('catalog_html.txt', mode='a+', encoding='utf8') as f:
f.write(catalog_html)
不加rate limit容易把网站爬挂掉,一直报错502.
建议的实现:允许通过参数设置基础延迟,sleep之后再请求下一页
-附测试文件
没有明天的我们,在昨天相恋.zip
哔哩的ID3915在下载10章左右时会触发访问频率限制,log如下:
2024-03-04,01:01:15 INFO LinovelibMobileSpider linovelib_mobile_spider.py:80 Succeed to get the novel of book_id: 3915
2024-03-04,01:01:17 INFO LinovelibMobileSpider linovelib_mobile_spider.py:138 Succeed to get the catalog of book_id: 3915
2024-03-04,01:01:17 INFO LinovelibMobileSpider linovelib_mobile_spider.py:154 volume: 第一章 于此岸奉上充满爱意之花束
2024-03-04,01:01:17 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 作品相关
2024-03-04,01:01:18 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214946.html
2024-03-04,01:01:18 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 你为了什么玩游戏?
2024-03-04,01:01:19 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214987.html
2024-03-04,01:01:19 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 粪作游戏迷,挑战神作
2024-03-04,01:01:22 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214988.html
2024-03-04,01:01:22 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 您对效率的先进贡献了什么
2024-03-04,01:01:26 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214989.html
2024-03-04,01:01:26 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 真实悻不影响质量,反之则另当别论
2024-03-04,01:01:32 INFO LinovelibMobileSpider linovelib_mobile_spider.py:189 chapter : [真实悻不影响质量,反之则另当别论] New Title= [真实〇不影响质量,反之则另当别论]
2024-03-04,01:01:32 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214990.html
2024-03-04,01:01:32 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 等的人不来,我方蛮族也
2024-03-04,01:01:34 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214991.html
2024-03-04,01:01:34 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 野悻的变态跳出来了!
2024-03-04,01:01:37 INFO LinovelibMobileSpider linovelib_mobile_spider.py:189 chapter : [野悻的变态跳出来了!] New Title= [野〇的变态跳出来了!]
2024-03-04,01:01:37 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214992.html
2024-03-04,01:01:37 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 变态,痛感效率的代价
2024-03-04,01:01:39 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214993.html
2024-03-04,01:01:39 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 鸟头VS巨蟒
2024-03-04,01:01:41 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214994.html
2024-03-04,01:01:41 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 奔跑吧,一边被打滑的伤害追赶着
2024-03-04,01:01:43 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214995.html
2024-03-04,01:01:43 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 从经验者的角度看半果的变态
2024-03-04,01:01:44 INFO LinovelibMobileSpider linovelib_mobile_spider.py:189 chapter : [从经验者的角度看半果的变态] New Title= [从经验者的角度看半〇的变态]
2024-03-04,01:01:44 INFO LinovelibMobileSpider linovelib_mobile_spider.py:218 Processing page... https://www.bilinovel.com/novel/3915/214996.html
2024-03-04,01:01:44 INFO LinovelibMobileSpider linovelib_mobile_spider.py:166 chapter : 用粪作冲洗臃肿的价值观
2024-03-04,01:01:46 WARNING LinovelibMobileSpider utils.py :52 Request https://www.bilinovel.com/novel/3915/214997.html succeed but data is empty.
2024-03-04,01:01:46 WARNING LinovelibMobileSpider utils.py :77 current_num_of_request: 1; retry_interval: 1.35(s)
2024-03-04,01:01:48 WARNING LinovelibMobileSpider utils.py :52 Request https://www.bilinovel.com/novel/3915/214997.html succeed but data is empty.
2024-03-04,01:01:48 WARNING LinovelibMobileSpider utils.py :77 current_num_of_request: 2; retry_interval: 2.86(s)
2024-03-04,01:01:52 WARNING LinovelibMobileSpider utils.py :52 Request https://www.bilinovel.com/novel/3915/214997.html succeed but data is empty.
我在linovelib_mobile_spider.py的第159行插入time.sleep(5)解决了这个问题:
因此建议添加一个参数用来设置爬取每章内容之间的时间间隔,避免触发类似机制。
PS. 由于程序会多次Request直到返回正确结果,该问题暂时不会影响使用,但是不排除网站管理员采取更激进措施的可能。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.