Giter Club home page Giter Club logo

mooc-download's Introduction

MOOC-Download **大学慕爬虫

项目为基于python3实现的爬虫,用于爬取指定课程资源的视频及可下载文档。

实现过程说明可以查看我的博客

特点

  • 视频可选择清晰度,存储为下载链接,可使用第三方软件下载。
  • 下载视频提供Rename.bat用于批量修改视频名称
  • 所有文档按章节规范命名

使用方法

安装

pip install -r requirements.txt

使用

python main.py

注意事项

  • 课程id为进入课程页面后,位于地址栏l中的,例如地址为:https://www.icourse163.org/course/WHUT-1001861003?tid=1002066005 ,则课程id为:WHUT-1001861003
  • 运行后会出现文档结构如下
MOOC_DOWNLOAD
  -- PDFs                     存放所有下载的pdf文档
       -- something1.pdf
       -- something2.pdf
  -- main.py                  主程序
  -- Links.txt                视频下载链接
  -- Rename.bat               视频下载完成后重命名程序(置于视频根目录下)
  -- TOC.txt                  爬取慕课的整体结构
  -- sth else
  • 如果爬取的链接用第三方无法下载,将链接复制到游览器会404的话(极少数课程出现这种情况)。可以尝试把下载链接开头http://v.stu.126.net/mooc-video/部分替换为http://jdvodrvfb210d.vod.126.net/jdvodrvfb210d/(这个应该是慕课存储资源的路径问题)。同时由于替换链接下载的文件命名也和Rename中生成的不同,需要将main.py中第171行写入Rename.bat部分代码更换为下面的代码:
with open('Rename.bat', 'a', encoding='utf-8') as file:
     video_down_url=re.sub(r'/','_',video_down_url)
     file.write('rename "' + re.search(r'http:.*video_(.*.mp4)', video_down_url).group(1) + '" "' + name +'.mp4"' + '\n')
  • 如果部分链接提示下载失败(若全部无法下载看上一条解决方案),这个不影响,应该是爬虫爬到了已经被管理员删除或者还没有正式发布的资源,能够正常下载的就是该课程所有的资源。

TO DO LIST

  • 下载文档按课程归类并指定存储路径
  • 输入错误判定
  • 加入直接搜索慕课功能

致谢

本程序思路来源于Adam的程序,感谢!

mooc-download's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.