coursera-dl / coursera-dl Goto Github PK

View Code? Open in Web Editor NEW

9.3K 327.0 2.2K 1.77 MB

Script for downloading Coursera.org videos and naming them.

License: GNU Lesser General Public License v3.0

Python 94.41% Batchfile 1.96% PowerShell 3.45% Dockerfile 0.18%

coursera-dl video downloader lectures python coursera video-downloader storage archival

coursera-dl's Issues

downloads mp4 str txt, doesnt download pdf

https://class.coursera.org/hetero-2012-001

H:\_learning>cd H:\_learning\hetero-2012-001
Downloaded http://class.coursera.org/hetero-2012-001/lecture/index (19738 bytes)
Week_1_Section_1
   Lecture_0-_Course_Overview
     None https://class.coursera.org/hetero-2012-001/lecture/3
     txt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=3_en&format=txt
     srt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=3_en&format=srt
     mp4 https://class.coursera.org/hetero-2012-001/lecture/download.mp4?lecture_id=3
   Lecture_1.1-_Introduction_to_Heterogeneous_Parallel_Programming
     None https://class.coursera.org/hetero-2012-001/lecture/9
     txt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=9_en&format=txt
     srt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=9_en&format=srt
     mp4 https://class.coursera.org/hetero-2012-001/lecture/download.mp4?lecture_id=9

and so on

Feature Request: automatically download the text or html

A great feature would be to download the website or just the text too. Things like news, syllabus, schedule, exercises, etc would be nice to have a copy too (for the sake of completeness).

invalid netscape format cookies file

Attempting to download the NLP videos with cookies.txt from the chrome extension, I get:

/usr/lib64/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib64/python2.7/_MozillaCookieJar.py", line 71, in _really_load
    line.split("\t")
ValueError: need more than 1 value to unpack

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "/home/andy/bin/coursera-dl", line 198, in <module>
    main()
  File "/home/andy/bin/coursera-dl", line 193, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "/home/andy/bin/coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "/home/andy/bin/coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "/home/andy/bin/coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib64/python2.7/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org    FALSE   /nlp    FALSE   1335746269  csrf_token  Tdh8Cj1qQGZ4AD7N7VWZ'

No Lectures Found for Current Class Error

Hi I get following error when I run following command:

./coursera-dl pgm -c cookies.txt

Output:

Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

The pgm i.e. Probabilistic Graphical Models class is currently going on and one can even preview some of the lectures here: https://class.coursera.org/pgm/lecture/preview

I have a valid coursera account (however I could not enroll in the class as I got late. Hence this business of downloading the videos). I am not sure about the cookie error and why I get it.

I used the firefox extension to create the cookies.txt file.

Please respond.

Thanks.

on Windows: non-wget download creates bad files

On Windows 7, the default python download code creates video files which are large than they should be (and of course don't play).

Current workaround is to use a wget binary with the -w option.

Password on command line potentially insecure

Password on command line may be visible system-wide in process listing and may be written to user's shell history.

Better to allow password prompted from terminal rather than just exiting if not supplied.

Not all resources downloaded: directory name files are skipped

Steps to reproduce:

Example: Goto the Electric Engineering course from professor Don H. Johnson at Rice University

https://class.coursera.org/eefun-001/lecture/index
You will see that there are several files to download without a file name

E.g. for week 1 there are this 8 files which are not downloaded:

http://cnx.org/content/m0000/latest/

http://cnx.org/content/m0001/latest/

http://cnx.org/content/m0003/latest/

http://cnx.org/content/m0004/latest/

http://cnx.org/content/m0008/latest/

http://cnx.org/content/m0081/latest/

http://cnx.org/content/m0005/latest/

http://cnx.org/content/m0006/latest/

This are valid HTML web pages, which can be downloaded
(E.g. just open any of this 8 URLs in your browser (e.g. Firefox or Microsoft Internet Explorer),
and that will open successfully that HTML web page)
but latest downloaded version (Sunday 10 March 2013) of coursera-dl.py
does not download them.
Command line similar to

python.exe coursera_dl.py -u yourusername -p yourpassword eefun-001

Result: E.g. in week 1 there are 24 files to download, it downloads only 16 files. Skipping exactly this 8 files which have no filename but only a directory.
But expected was: Also this 8 files should possibly be downloaded

Thanks

SSL problem with downloading videos (only) from nlp-class

Hi,

I am unable to download videos from the nlp course website. I have tried recreating cookies, changing browsers but nothing worked. Pasting the backtrace below:

Downloaded http://class.coursera.org/nlp/lecture/index (174982 bytes)
Week_1_-_Course_Introduction
   Course_Introduction
     None https://class.coursera.org/nlp/lecture/view?lecture_id=124 
     pptx https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pptx
     pdf https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pdf
     txt https://class.coursera.org/nlp/lecture/subtitles?q=124_en&format=txt
     srt https://class.coursera.org/nlp/lecture/subtitles?q=124_en&format=srt
     mp4 https://class.coursera.org/nlp/lecture/download.mp4?lecture_id=124

(trimmed)

   Evaluating_Search_Engines
     None https://class.coursera.org/nlp/lecture/view?lecture_id=190 
     pptx https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2F05-02-09-IR-EvalSearchEngines-abridged.pptx
     pdf https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2F05-02-09-IR-EvalSearchEngines-abridged.pdf
     mp4 https://class.coursera.org/nlp/lecture/download.mp4?lecture_id=190
Found 19 sections and 87 lectures on this page
NLP_01_Week_1_-_Course_Introduction/01_Course_Introduction.pptx
Downloading https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pptx -> NLP_01_Week_1_-_Course_Introduction/01_Course_Introduction.pptx
Traceback (most recent call last):
  File "/home/abhinav/development/coursera/coursera-dl", line 235, in <module>
    main()
  File "/home/abhinav/development/coursera/coursera-dl", line 231, in main
    args.lecture_filter
  File "/home/abhinav/development/coursera/coursera-dl", line 145, in download_lectures
    download_file(url, lecfn, cookies_file, wget_bin)
  File "/home/abhinav/development/coursera/coursera-dl", line 155, in download_file
    download_file_nowget(url, fn, cookies_file)
  File "/home/abhinav/development/coursera/coursera-dl", line 171, in download_file_nowget
    urlfile = get_opener(cookies_file).open(url)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:504: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

"IndexError: list index out of range" error running coursera-dl

The following error consistently arises when running the program for several courses:

python coursera/coursera-dl compfinance-002
Downloading class: compfinance-002
Downloaded http://class.coursera.org/compfinance-002/lecture/index (208676 bytes)
Introduction
  Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
Week_1-_Time_Value_of_Money
  1.0_Week_1_Introduction
Week_1-_Simple_Returns
  1.1_Future_Value_Present_Value_and_Compounding
  1.2_Asset_Returns
  1.3_Portfolio_Returns
  1.4_Dividends
  1.5_Inflation
  1.6_Annualizing_Returns
Week_1-_Continuously_Compounded_Returns
  1.7_Continuously_Compounded_Returns
  1.8_CC_Portfolio_Returns_and_Inflation
Week_1-_Excel_Examples
  1.9_Simple_Returns
  1.10_Getting_Financial_Data_from_Yahoo
  1.11_Return_Calculations
  1.12_Growth_of_1
Week_2-_Probability_Review
  2.0_Week_2_Introduction
  2.1_Univariate_Random_Variables
  2.2_Cumulative_Distribution_Function
  2.3_Quantiles
  2.4_Standard_Normal_Distribution
  2.5_Expected_Value_and_Standard_Deviation
  2.6_General_Normal_Distribution
  2.7_Standard_Deviation_as_a_Measure_of_Risk
  2.8_Normal_Distribution-_Appropriate_for_simple_returns
  2.9_Skewness_and_Kurtosis
  2.10_Students-t_Distribution
  2.11_Linear_Functions_of_Random_Variables
Week_2-_Example
  2.12_Value_at_Risk
Traceback (most recent call last):
  File "coursera/coursera-dl", line 709, in <module>
    main()
  File "coursera/coursera-dl", line 703, in main
    download_class(args, class_name)
  File "coursera/coursera-dl", line 671, in download_class
    or tmp_cookie_file, args.reverse)
  File "coursera/coursera-dl", line 277, in parse_syllabus
    section_name = clean_filename(stag.contents[0].contents[1])
IndexError: list index out of range

Inconsistent output written to stdout/stderr

It used to be that everything was written to stdout. Now some things are written to stdout (like the number of bytes being downloaded), while the line with the filename of what is being downloaded is written to stderr. I'm not sure why the change was made. It seemed more consistent when everything went to stdout.

how did you get cookies by hand (using wget)?

how did you get cookies by hand (using wget) before you decided to write this tool? or have you always exported cookies from Firefox?

Problem downloading videos

Trying to download material from class econ1scientists-2012-001.
coursera-dl current as of 2013-01-28

Output from coursera-dl -u user -p password econ1scientists-2012-01:
https://gist.github.com/4655499

Bad cookies file or wrong class name - Coursera platform redesign

./coursera-dl -u -p wh1300-2012-001 -f "mp4 pdf"

Downloaded http://class.coursera.org/wh1300-2012-001/lecture/index (218312 bytes)

Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

-- I keep getting this error on any course I try downloading. I opened the lecture/index URL in Safari and it displays just fine. As raszpl pointed out, the most likely cause is the platform redesign.

on Windows: Video's don't play after download

This was from nlp class specifically, going to re export the cookies.txt file and try again. I'm wondering if it's not decrypting correctly.

on Ubuntu: it doesn't like the cookies file produced by Export Cookies FF extension

Using the cookies.txt file saved by the Export Cookies FF extension I get:
$ ./coursera-dl saas -c ./cookies.txt
Downloaded http://class.coursera.org/saas/lecture/index (14530 bytes)
Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

However, if i download the index with wget using the same cookies file and then pass the -w parameter to coursera-dl, it downloads happily, so I think something is wrong in the handling of cookies in coursera-dl.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

ii python 2.7.2-7ubuntu2
ii python-argparse 1.1-1ubuntu1
ii python-beautifulsoup 3.2.0-2

A question about python code about extracting videos from coursera.

I have written a python code about extracting videos from coursera.But codes below can not be used.
It raises error "urllib.error.HTTPError: HTTP Error 403: FORBIDDEN"
I know jplehmann / coursera is a popular code for coursera and hope you can help me.
Thank you very much!

login_page = "https://www.coursera.org/account/signin"
def set_cookie(username,password):
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor())
    values = {"signin-email":username,
              "signin-password":password,
              "login:":"Login"}
    data = urllib.parse.urlencode(values)
    binary_data = data.encode(encoding='utf-8', errors='strict')
    headers = {"User-Agent":"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6"}
    req = urllib.request.Request(login_page,binary_data,headers)
    opener.open(req)

    with open("1.txt",encoding='utf-8',mode='w') as record_file:
        op = opener.open("https://www.coursera.org")
        record_file.write(op.read().decode('utf-8'))

Support preview pages for past courses

Past couses like:

https://class.coursera.org/modelthinking/lecture/preview

offer a preview page which contains all the lectures of the course. As time goes by there will be more and more courses with this condition and it would be great if your script supported them.

Support download annotated pdf files.

Many lectures offer annotated pdf file and non-annotated file at the same time. So it would be very much useful that downloader can download both of them.

Downloading Issue

Following is the error I get when I download any course. Please let me know if anyone has any idea. I am using Python 2.6.6.

compmethods-2012-001\31_Week_10-Lecture_28-_Global_normal_forms_of_bifurcatio
n_structures_in_PDEs\04_W10_L28_P4-reduction_of_a_neuro-sensory_systems.srt alre
ady downloaded
Traceback (most recent call last):
File "./coursera_dl.py", line 820, in
main()
File "./coursera_dl.py", line 810, in main
if download_class(args, class_name):
File "./coursera_dl.py", line 790, in download_class
args.verbose_dirs,
File "./coursera_dl.py", line 454, in download_lectures
if time.time() - last_update > datetime.timedelta(days=30).total_seconds():
AttributeError: 'datetime.timedelta' object has no attribute 'total_seconds_

Utilize Requests for better session handling

Replace urllib2.

https://github.com/kennethreitz/requests

Can't find the file with the code itself

init.py and coursera_dl.py have only html code. Where is the file with the code to run?

Getting files from Syllabus page

Hi,

Is there any way to download the files from the Syllabus page?

Thanks!

Not being able to download any video

I am facing problems downloading any video. The following is the
error that I receive:
Traceback (most recent call last):
File "coursera-dl", line 1, in
coursera/coursera_dl.py
NameError: name 'coursera' is not defined
I have downloaded the latest version of couresera-dl. The problem does not seem to go
away. I am giving it the right password and the right username. Can someone tell me what I am doing wrong? Thank you.

Regards,

Ramana

Problem downloading course files, suspect not making use of http_proxy

Using latest version of script to access dataanalysis-001 lectures
Get
searlernz:~/coursera/data_analysis/lectures$ python ../../coursera-master/coursera/coursera_dl.py -u username -p pass --curl_bin /usr/bin/curl --debug dataanalysis-001
root[main] Downloading class: dataanalysis-001
Traceback (most recent call last):
File "../../coursera-master/coursera/coursera_dl.py", line 709, in
main()
File "../../coursera-master/coursera/coursera_dl.py", line 703, in main
download_class(args, class_name)
File "../../coursera-master/coursera/coursera_dl.py", line 667, in download_class
or tmp_cookie_file, args.local_page)
File "../../coursera-master/coursera/coursera_dl.py", line 225, in get_syllabus
page = get_page(url, cookies_file)
File "../../coursera-master/coursera/coursera_dl.py", line 201, in get_page
ret = opener.open(url).read()
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 438, in error
result = self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

My http_proxy environment variable is set and I can access the course index URL from firefox without difficulty.
Fails with or without --curl_bin option.

Extending to other Coursea portals (http://class.stanford.edu/)?

http://class.stanford.edu/solar/Fall2012/
http://class.stanford.edu/networking/Fall2012

Those use "Class2Go" (looks like Coursera portal framework), but dont work with the script. Is there a chance of extending, of modifying code to make it work? Im such a slave to downloadable videos I couldnt make myself participate and watch through WWW.

Syntax error: invalid syntax <!DOCTYPE html> line 4

Hi,
I am getting this error while running the coursera.dl script to download scientific computing class. https://class.coursera.org/scientificcomp-002/class/index

Bash shell command used: python ./coursera_dl.py -u email-address -p password scientificcomp-002

Error obtained:

File "./coursera_dl.py", line 4

^
SyntaxError: invalid syntax

Please provide your suggestion.

Thank you
Ron

Index Out Of Range Error

I get the following error running the latest code to date with this command:
./coursera-dl compfinance-2012-001 -u -p
(where user and pw are filled in)

Downloaded http://class.coursera.org/compfinance-2012-001/lecture/index (162511 bytes)
Introduction
Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
None https://class.coursera.org/compfinance-2012-001/lecture/31
Traceback (most recent call last):
File "./coursera-dl", line 308, in
main()
File "./coursera-dl", line 292, in main
sections = parse_syllabus(page, args.cookies_file or tmp_cookie_file)
File "./coursera-dl", line 145, in parse_syllabus
href = grab_hidden_video_url(a['data-lecture-view-link'], cookies_file)
File "./coursera-dl", line 87, in grab_hidden_video_url
return l[0]['src']
IndexError: list index out of range
*

Downloaded videos are all 7579 bytes

I'm running Python2.7.3 on Arch Linux. Everything works fine, and I can download the other files (pdf, pptx), but the mp4 files are all unplayable.

Here's what the console is showing

ALGO_01_I._INTRODUCTION/01_Introduction_-_Why_Study_Algorithms_.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=20 ->  ALGO_01_I._INTRODUCTION/01_Introduction_-_Why_Study_Algorithms_.mp4
7579 bytes read .
ALGO_01_I._INTRODUCTION/02_About_the_Course.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=21 ->  ALGO_01_I._INTRODUCTION/02_About_the_Course.mp4
7579 bytes read .
ALGO_01_I._INTRODUCTION/03_Merge_Sort-_Motivation_and_Example.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=1 ->   ALGO_01_I._INTRODUCTION/03_Merge_Sort-_Motivation_and_Example.mp4
7578 bytes read .
ALGO_01_I._INTRODUCTION/04_Merge_Sort-_Pseudocode.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=2 ->     ALGO_01_I._INTRODUCTION/04_Merge_Sort-_Pseudocode.mp4
7578 bytes read .
ALGO_01_I._INTRODUCTION/05_Merge_Sort-_Analysis.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=3 ->     ALGO_01_I._INTRODUCTION/05_Merge_Sort-_Analysis.mp4

They're all the same size, and I can't figure out why..

Crash if file not accessible

The Modelthinking course has file(s?) that can't be read. This gives an exception, and the whole download aborts.

I added a catch all clause after line 159 in method download_file(..) to change this.

159 sys.exit()
+160 except:
+161 print "\nXXXX Didnt work -- Removing partial file:", fn

Thanks for the downloader! This was a big help.

Getting urllib2.URLError: <urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol>

Traceback (most recent call last):
File "./coursera-dl.py", line 235, in
main()
File "./coursera-dl.py", line 231, in main
args.lecture_filter
File "./coursera-dl.py", line 145, in download_lectures
download_file(url, lecfn, cookies_file, wget_bin)
File "./coursera-dl.py", line 155, in download_file
download_file_nowget(url, fn, cookies_file)
File "./coursera-dl.py", line 171, in download_file_nowget
urlfile = get_opener(cookies_file).open(url)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol>
vijayram@ubuntu:~/coursera/coursera-1$

It doesn't download videos

Hello,

I'm using openSUSE 11.4 to download the courses with the following arguments:
python coursera-dl compfinance-2012-001 -u -p

What works and what doesn't (for me)

Works: Successfully parses and downloads .srt and .txt files.
Doesn't works: It doesn't download the videos.

log:
Downloaded http://class.coursera.org/compfinance-2012-001/lecture/index (75353 bytes)
Introduction
Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
None https://class.coursera.org/compfinance-2012-001/lecture/31
Week_1-_Time_Value_of_Money
1.0_Week_1_Introduction
None https://class.coursera.org/compfinance-2012-001/lecture/29
1.1_Future_Value_Present_Value_and_Compounding
None https://class.coursera.org/compfinance-2012-001/lecture/13
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=13_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=13_en&format=srt
Week_1-_Simple_Returns
1.2_Asset_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/3
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=3_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=3_en&format=srt
1.3_Portfolio_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/12
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=12_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=12_en&format=srt
1.4_Dividends
None https://class.coursera.org/compfinance-2012-001/lecture/6
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=6_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=6_en&format=srt
1.5_Inflation
None https://class.coursera.org/compfinance-2012-001/lecture/11
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=11_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=11_en&format=srt
1.6_Annualizing_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/2
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=2_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=2_en&format=srt
Week_1-_Continuously_Compounded_Returns
1.7_Continuously_Compounded_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/5
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=5_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=5_en&format=srt
1.8_CC_Portfolio_Returns_and_Inflation
None https://class.coursera.org/compfinance-2012-001/lecture/4
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=4_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=4_en&format=srt
... etc ...
Found 12 sections and 56 lectures on this page
COMPFINANCE-2012-001_02_Week_1-_Time_Value_of_Money/02_1.1_Future_Value_Present_Value_and_Compounding.txt
COMPFINANCE-2012-001_02_Week_1-_Time_Value_of_Money/02_1.1_Future_Value_Present_Value_and_Compounding.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/01_1.2_Asset_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/01_1.2_Asset_Returns.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/02_1.3_Portfolio_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/02_1.3_Portfolio_Returns.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/03_1.4_Dividends.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/03_1.4_Dividends.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/04_1.5_Inflation.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/04_1.5_Inflation.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/05_1.6_Annualizing_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/05_1.6_Annualizing_Returns.srt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/01_1.7_Continuously_Compounded_Returns.txt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/01_1.7_Continuously_Compounded_Returns.srt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/02_1.8_CC_Portfolio_Returns_and_Inflation.txt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/02_1.8_CC_Portfolio_Returns_and_Inflation.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/01_1.9_Simple_Returns.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/01_1.9_Simple_Returns.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/02_1.10_Getting_Financial_Data_from_Yahoo.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/02_1.10_Getting_Financial_Data_from_Yahoo.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/03_1.11_Return_Calculations.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/03_1.11_Return_Calculations.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/04_1.12_Growth_of_1.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/04_1.12_Growth_of_1.srt

Any help would be very appreciated.

Thanks a lot, I think this idea is just great! Congrats!

Password complexity issue

When I first tried this script with the -u and -p args I get an error:

bash: !an < rest of the password >: event not found

When I tried with the .netrc file I get a bad cookie or wrong credentials error, even though they were right. This is when I wanted to post the issue but I went through the code and found these lines:

if args.username and not args.password and not args.netrc:
  args.password = getpass.getpass("Coursera password for %s: " % args.username)

so I just entered my username via -u and the name of the class without my password, I got the prompt for the password, after entering it the download started normally. So I guess there is a problem with parsing the complex passwords - I'm not much of a coder so I'm not sure what exactly is an issue, hopefully you guys will know and improve this script!

Good script, it's a life savior! Cheers! :)

[Question] Does the script overwrite already downloaded video?

Hi,

I might be using the script multiple times on the same course, for instance when a new week of videos are put online. Will the script skip already downloaded sections or will it try to download from the start?

Script worked great to get me proglang videos :) Thanks a ton.

resume partially downloaded files

Currently kills partials due to user stopping the script. Consider other error conditions which cause partials.

Unable to download anything (authentication failure)

It appears that the Coursera folk have changed where you access the videos again. It appears that it can login ok, but it finds "0 sections and 0 lectures". I tried both with the netrc and with explicitly giving my username and password.

Probably bad cookies file (or wrong class name)

A related issue is here: https://github.com/jplehmann/coursera/issues/74. Sorry for the duplicate.

Coursera-dl previously worked , and I've downloaded part of the data analysis course already. Tried to continue

downloading today. Got the following error.
sudo python coursera-dl dataanalysis-001 -u ***** -p *****
Downloading class: dataanalysis-001
Downloaded http://class.coursera.org/dataanalysis-001/lecture/index (5332 bytes)
Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

Using newest coursera-dl script running & I tried also to download the innovation-001 course with the same error.

Please help.

Error in downloading!

$ coursera-dl algo2-2012-001 -u sarthaksahu****@gmail.com -p xxx
usage: coursera_dl.py [-h](-c COOKIES_FILE | -u USERNAME | -n) [-p PASSWORD]
[-f FILE_FORMATS] [-sf SECTION_FILTER]
[-lf LECTURE_FILTER] [-w WGET_BIN] [--curl_bin CURL_BIN]
[--aria2_bin ARIA2_BIN] [-o] [-l LOCAL_PAGE]
[--skip-download] [--path PATH] [--verbose-dirs]
[--debug] [--quiet] [--add-class ADD_CLASS]
class_names [class_names ...]
coursera_dl.py: error: too few arguments

Supply a template or txt file with course names for easy lookup

When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list. So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?

Past courses

neuralnets-2012-001 Neural Networks for Machine Learning
sciwrite-2012-001 Writing in the Sciences
progfun-2012-001 Functional Programming Principles in Scala
maththink-2012-001 Introduction to Mathematical Thinking
bigdata-2012-001 Web Intelligence and Big Data
healthpolicy-2012-001 Health Policy and the Affordable Care Act
intrologic Introduction to Logic
compilers Compilers
automata Automata
gametheory Game Theory
crypto Cryptography I

Current courses (possibly incomplete)

algo2-2012-001 Algorithms: Design and Analysis, Part 2
thinkagain-2012-001 Think Again: How to Reason and Argue
hetero-2012-001 Heterogeneous Parallel Programming
compmethods-2012-001 Computational Methods for Data Analysis
precalculus-001 Pre-Calculus
algebra-001 Algebra
proglang-2012-001 Programming Languages
calcsing-2012-001 Calculus in a Single Variable

Authentication via cookies doesn't work anymore

Hi.

It seems that coursera has changed its site now requiring a session cookie and the trick of exporting cookies from the browser doesn't work anymore.

OTOH, using wiedi/coursera@38c92a2 make things work again.

Well, I actually pulled all of @wiedi's patches, but reverted the ones that tweaked the naming of the files, as I prefer how things currently are. :)

BTW, have you considered getting the code in our youtube-dl tree?

Regards.

Newbie problem

This is surely exposing my extreme lack of experience with such things, but I have two problems with running this wonderful script:

I get this error message when running the module... have I missed where to input my login info/ download command?

"usage: Python batch downloader.py [-h](-c COOKIES_FILE | -u USERNAME | -n)
[-p PASSWORD] [-f FILE_FORMATS]
[-sf SECTION_FILTER] [-lf LECTURE_FILTER]
[-w WGET_BIN] [-o] [-l LOCAL_PAGE]
[--skip-download]
class_name
Python batch downloader.py: error: too few arguments

Traceback (most recent call last):
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 309, in
main()
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 289, in main
args = parseArgs()
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 272, in parseArgs
args = parser.parse_args()
File "C:\Python27\lib\argparse.py", line 1688, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "C:\Python27\lib\argparse.py", line 1720, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "C:\Python27\lib\argparse.py", line 1937, in parse_known_args
self.error(('too few arguments'))
File "C:\Python27\lib\argparse.py", line 2347, in error
self.exit(2, _('%s: error: %s\n') % (self.prog, message))
File "C:\Python27\lib\argparse.py", line 2335, in exit
_sys.exit(status)
SystemExit: 2"

When typing the command in the python shell (after running the module):
coursera-dl -u <(with my email)> -p <(with my passowrd> progfun-2012-001

it says syntax error at the "@" of my email.

I'm sure I'm missing something obvious, but have nonetheless spent too much time (admittedly randomly) trying different ways of making this work?

Thank you so much for your help!

Tashi

/usr/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!

log is

/usr/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib/python2.6/_MozillaCookieJar.py", line 99, in _really_load
    {})
  File "/usr/lib/python2.6/cookielib.py", line 738, in __init__
    if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1349111445.24318'

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "./coursera-dl", line 235, in <module>
    main()
  File "./coursera-dl", line 220, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "./coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "./coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "./coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib/python2.6/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org\tFALSE\t/\tFALSE\t1349111445.24318\tsessionid\t80b3f5ab0bf5fe0e19c7383606de7072'

Chrome's `cookie.txt export` plugin does not produce usable cookie file.

I've solved my issue by using a different cookie export plugin in Firefox. Copy-and-pasting from the Chrome plugin does not produce a usable file, even when tabs are preserved.

mike@*****:/*****$ ./coursera-dl/coursera-dl -c cookies.txt ml
/usr/lib/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib/python2.7/_MozillaCookieJar.py", line 99, in _really_load
    {})
  File "/usr/lib/python2.7/cookielib.py", line 739, in __init__
    if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1344973754.473932'

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "./coursera-dl/coursera-dl", line 235, in <module>
    main()
  File "./coursera-dl/coursera-dl", line 220, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "./coursera-dl/coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "./coursera-dl/coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "./coursera-dl/coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib/python2.7/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org\tFALSE\t/\tFALSE\t1344973754.473932\tsessionid\t28eeaeea129425a90f0f08bfff38ea38'

Add support for a continuous integration system

Unfortunately, it seems that I don't have the privileges to make modifications to the repo here, but one thing that we should seriously consider is the use of something like this:

https://travis-ci.org/rbrito/coursera

The commit that created the configuration was:

https://github.com/rbrito/coursera/commit/b224009adcdbde61046ae4b907714dfbb7973a4d

It is so cool to see the build lights turning green... :)

Download failing when video is missing

John,

One of the Science Writing videos cannot be loaded. When the script hits this video it errors and stops and does not download the later videos. Here is the error.

Great tool. I use it all the time.

Best,
Vivek

SCIWRITE-2012-001_04_Unit_4/07_4.7-_Upcoming_Writing_and_Editing_Assignment.mp4
Downloading https://class.coursera.org/sciwrite-2012-001/lecture/download.mp4?le
cture_id=59 -> SCIWRITE-2012-001_04_Unit_4/07_4.7-Upcoming_Writing_and_Editing
Assignment.mp4
Traceback (most recent call last):
File "coursera-dl", line 308, in
main()
File "coursera-dl", line 302, in main
args.lecture_filter
File "coursera-dl", line 193, in download_lectures
download_file(url, lecfn, cookies_file, wget_bin)
File "coursera-dl", line 203, in download_file
download_file_nowget(url, fn, cookies_file)
File "coursera-dl", line 219, in download_file_nowget
urlfile = get_opener(cookies_file).open(url)
File "/usr/lib/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 435, in error
return self._call_chain(_args)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(_args)
File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

Cookies Load error with Mac/Chrome

Got this error on mac:

$ ./coursera-dl saas -c cookies.txt
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py", line 99, in _really_load
{})
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/cookielib.py", line 739, in init
if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1334338629.053531'

_warn_unhandled_exception()
Traceback (most recent call last):
File "./coursera-dl", line 235, in
main()
File "./coursera-dl", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl", line 49, in get_page
opener = get_opener(cookies_file)
File "./coursera-dl", line 44, in get_opener
cj._really_load(cookies, "StringIO.cookies", False, False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py", line 111, in _really_load
(filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'developers.google.com\tFALSE\t/\tFALSE\t1334338629.053531\tsessionid\t60b9268cf45b27ee8af338942880ef36'

cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'Name: csrf_token'

vijayram@ubuntu:~/coursera/coursera-1$ ./coursera-dl.py -c ../class.coursera.org_csrf_token.txt nlp
/usr/lib/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
File "/usr/lib/python2.7/_MozillaCookieJar.py", line 71, in _really_load
line.split("\t")
ValueError: need more than 1 value to unpack

_warn_unhandled_exception()
Traceback (most recent call last):
File "./coursera-dl.py", line 235, in
main()
File "./coursera-dl.py", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl.py", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl.py", line 49, in get_page
opener = get_opener(cookies_file)
File "./coursera-dl.py", line 44, in get_opener
cj._really_load(cookies, "StringIO.cookies", False, False)
File "/usr/lib/python2.7/_MozillaCookieJar.py", line 111, in _really_load
(filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'Name: csrf_token'

New feature - PDFs of quizes and other materials

This is clearly a new feature (that would be nice I think). Currently, to keep copies of the quizes, I go in with Chrome and then print with "save to PDF". I do the same with the class syllabus, and other materials. It would be nice if this could be automated in this program. I know someone that did a version of this in a different python coursera downloader and added functionality using wkhtmltopdf to convert html to pdf format. They would find the quizes, download them as html files and then do the conversion. Unfortunately, I found that wkhtmltopdf blew up (threw an exception) on my windows box. It would be nice if it would also pdf the syllabus, etc. One last thing to point out (should you decide to do this), the announcements (aka "home") page typically changes at least once per week, so it might be good to recreate it every time.

not all resources downloaded

If multiple files in resource are of same extension then they are not downloaded, only the last one gets downloaded.
probably a bug in parse_syllabus

Please look into it

HTTP Error 400 for new-courses.

I was trying to download new course 'Internet History, Technology and Security' and got this error:

Traceback (most recent call last):
File "./coursera-dl", line 235, in
main()
File "./coursera-dl", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl", line 50, in get_page
return opener.open(url).read()
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

-n broken on windows

I have a version from January 16 and it works fine. However, the current version generates the following error:

coursera_dl.py: error: argument -n/--netrc: expected one argument

I'm executing:

python coursera_dl.py somecourse -n

I don't know why it would expect an argument.

HOME is set to my user directory and there is a .netrc file there (which is why the January 16 version works). The only thing I'm changing is the version of coursera_dl.py. I don't know python, so I haven't looked at the issue.

coursera-dl / coursera-dl Goto Github PK

coursera-dl's Issues

Recommend Projects

Recommend Topics

Recommend Org