Giter Club home page Giter Club logo

kuaishou-crawler's Introduction

kuaishou-crawler

As you can see, a crawler for kuaishou pictures and videos

Latest

Version 0.5.0 (2020-08-06)

 View Change Log
  • 现在已经提供exe版本一键执行 查看 | 或者查看如何运行代码 查看
  • Python 3.7.3
    • requests
    • json
    • os
    • BeautifulSoup
    • re
  • 自v0.3.0版本开始,已用面向对象重构,核心代码在lib/crawler.py中,启动文件为crawl.py / ks.py
  • 功能:根据用户ID来爬取快手用户的作品,包括视频和图片
    1. 在preset文件(使用exe版本忽略此文件)中一行行填写用户id,若缺少文件会自动创建(目前版本已提供自动根据数字id获取真实eid)
    2. 使用时请自己用账号登录快手网站,并使用自己的cookie['headers']didweb替换,不保证源代码中对应值可用
    3. 因为快手官网会根据cookie,识别你是否在线,爬取的时候要将网页登录并挂着
      • 实测快手网站的用户验证存在30-60分钟左右的有效时长,出现list index out of range时极可能是有效期已过,登录网站验证即可
      • 暂且不知道快手官方对过多请求的处理,目前碰到的有上述验证失效,也许也会有请求达到数量会中断请求,此时注释preset中已爬取的用户id,重新开始运行脚本即可
    4. 爬取的视频暂时是带水印的(以后考虑获取无水印视频) 是无水印的 感谢@tjftjftjf提供手机抓包链接和方法
    5. 修复了无水印视频url的获取
  • 注意事项:
    • 不考虑提供列表可选的批量下载功能
    • 有需要的合理功能可以issue反馈,看到后会考虑是否修改
    • 如果需要自定义自己的需求,可以拿走代码自行修改,喜欢的话给个star给个follow
    • 本代码仅供学习使用,不可违反法律爬取视频,以及私自盗用搬运视频,后果自负
    • 本代码仅供学习使用,不可违反法律爬取视频,以及私自盗用搬运视频,后果自负
    • 本代码仅供学习使用,不可违反法律爬取视频,以及私自盗用搬运视频,后果自负
    • 重要的说三遍

Run

  1. python3环境与命令行工具
  2. 进入项目目录 cd kuaishou-crawler
  3. 安装依赖 pip install -r requirements.txt
  4. 运行,有两个版本,crawl.py为运行版本,ks.py是用于构建exe的版本,当然也可以运行
    • python crawl.py / python ks.py

Release

https://github.com/oGsLP/kuaishou-crawler/releases

  • 下载打包好的exe一键运行(点击download下载即可)
    • ks.exe
    • ks.7z

Future

  • 自动根据id获取eid
  • 获取无水印视频 √
  • 进一步丰富preset预设文件的可配置选项
  • 优化代码和log
  • 提供便捷的打包exe √

Again

本代码仅供学习使用,不可违反法律爬取视频,以及私自盗用搬运视频,后果自负

Else

爬虫源码免费开源,作者维护不易,喜欢的可以随意打赏一些>_<

kuaishou-crawler's People

Contributors

ogslp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kuaishou-crawler's Issues

运行ks.exe报错

Traceback (most recent call last):
File "ks.py", line 28, in
File "ks.py", line 22, in crawl
File "lib\crawler.py", line 73, in crawl
File "lib\crawler.py", line 113, in __crawl_user
File "lib\crawler.py", line 167, in __crawl_work
AttributeError: 'NoneType' object has no attribute 'group'
[8828] Failed to execute script ks

用户cookie中的did值

作者你好 我按你的方法获取了did值然后 输入了 用户的uid
但是出现闪退的情况
我觉得可能我获取错了
可以仔细的给我讲解一下
预先输入本用户cookie中的did值: 该填什么吗?

exe 版本会闪退

首先感谢作者。
但是使用过程中发现,下载到 150 多个视频的时候,会自动闪退

exe执行总是报错

报错如下,隔几天运行时,一开始可以正常下载,大概下载几十个之后就会报错,然后再次运行一直都是这个报错,需要隔几天后再次运行,如此反复。
Traceback (most recent call last):
File "ks.py", line 28, in
File "ks.py", line 22, in crawl
File "lib\crawler.py", line 73, in crawl
File "lib\crawler.py", line 113, in __crawl_user
File "lib\crawler.py", line 167, in __crawl_work
AttributeError: 'NoneType' object has no attribute 'group'
[19088] Failed to execute script ks

爬取出错

Traceback (most recent call last):
File "ks.py", line 28, in
File "ks.py", line 22, in crawl
File "lib\crawler.py", line 73, in crawl
File "lib\crawler.py", line 113, in __crawl_user
File "lib\crawler.py", line 167, in __crawl_work
AttributeError: 'NoneType' object has no attribute 'group'
[932] Failed to execute script ks

运行报错:AttributeError: 'NoneType' object has no attribute 'group'

Traceback (most recent call last):
File "C:/mypythonfile/car_info/driving_attention_video/kuaishou-crawler/ks.py", line 28, in
crawl()
File "C:/mypythonfile/car_info/driving_attention_video/kuaishou-crawler/ks.py", line 22, in crawl
crawler.crawl()
File "C:\mypythonfile\car_info\driving_attention_video\kuaishou-crawler\lib\crawler.py", line 81, in crawl
self.__crawl_user(uid)
File "C:\mypythonfile\car_info\driving_attention_video\kuaishou-crawler\lib\crawler.py", line 122, in __crawl_user
self.__crawl_work(dir, works[j], j + 1)
File "C:\mypythonfile\car_info\driving_attention_video\kuaishou-crawler\lib\crawler.py", line 178, in __crawl_work
v_url = re.search(pattern, html).group(1)+".mp4"
AttributeError: 'NoneType' object has no attribute 'group'

第一次可以运行,但是下载了十几个视频之后就一直报错

运行报错AttributeError: 'NoneType' object has no attribute 'group'

开始爬取用户 xxx,保存在目录 data/xxx/
共有21个作品
Traceback (most recent call last):
File "D:/develop-py/workspace/kuaishou-crawler-master/crawl.py", line 23, in
crawl()
File "D:/develop-py/workspace/kuaishou-crawler-master/crawl.py", line 19, in crawl
crawler.crawl()
File "D:\develop-py\workspace\kuaishou-crawler-master\lib\crawler.py", line 73, in crawl
self.__crawl_user(uid)
File "D:\develop-py\workspace\kuaishou-crawler-master\lib\crawler.py", line 113, in __crawl_user
self.__crawl_work(dir, works[j], j + 1)
File "D:\develop-py\workspace\kuaishou-crawler-master\lib\crawler.py", line 167, in __crawl_work
v_url = re.search(pattern, html).group(1)+".mp4"
AttributeError: 'NoneType' object has no attribute 'group'

快手限制下载视频的数量

当爬取数量达到几十个视频之后,就会出现下面的报错,等了很久之后才能继续下载。而且尝试过使用其他ip来继续请求也没用

| kuaishou-crawler (v0.5.0 20-08-06)
| 本程序由oGsLP提供, www.github.com/oGsLP/kuaishou-crawler, 喜欢的话可以给个star >_<

准备开始爬取,共有1个用户...

{"data":{"privateFeeds":{"pcursor":"","list":[],"__typename":"PCProfileFeeds"}}}

[]
Traceback (most recent call last):
File "D:/python_project/test1/crawlers/main.py", line 26, in
main()
File "D:/python_project/test1/crawlers/main.py", line 20, in main
kuaishou.crawler_kuaishou.main()
File "D:\python_project\test1\crawlers\kuaishou\crawler_kuaishou.py", line 30, in main
crawl(param_did,data_dir)
File "D:\python_project\test1\crawlers\kuaishou\crawler_kuaishou.py", line 15, in crawl
crawler.crawl()
File "D:\python_project\test1\crawlers\kuaishou\lib\crawler.py", line 81, in crawl
self.__crawl_user(uid)
File "D:\python_project\test1\crawlers\kuaishou\lib\crawler.py", line 106, in __crawl_user
if works[0]['id'] is None:
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.