When I tried to use the otherwise awesome I had to go and lookup all the names

Hi, Ivo. On Mon, Feb 11, 2013 at 1:22 PM, Ivo Flipse <a href="mailto

Supply a template or txt file with course names for easy lookup about coursera-dl HOT 4 OPEN

coursera-dl commented on May 17, 2024

Supply a template or txt file with course names for easy lookup

from coursera-dl.

Comments (4)

rbrito commented on May 17, 2024

Hi, Ivo.

On Mon, Feb 11, 2013 at 1:22 PM, Ivo Flipse [email protected] wrote:

When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list.

Well, supposedly, the idea would be to download material from courses
that you already know about (because you are subscribed to them). :)

So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?

I guess that one of the easiest routes would be to grab this
information from some site that aggregates this (e.g.,
classcentral.com), but this is on the borderline of the scope of
coursera-dl, which is meant for downloads, not discovery...

Furthermore, keeping such lists may need some manual intervention and
it is not really clear how they could be used by the script. The
person has to sign up for the courses anyway (and if you try to signup
for some courses after they are already running or after they have
been concluded, you will be denied access).

The reason for that may be because the course won't be offered on
coursera anymore (see, for instance, Jeniffer Widom's db course
migrating to Class2Go, Umesh Vazirani's qcomp migrating to EdX.org,
the saas courses moving to EdX too etc.).

And, of course, to have access to the courses, you have to click the
"I accept the honor code" or something like that. I don't intend to
make this particular step automated, for human/awareness reasons.

Please, clarify how you intend to keep the list of courses up-to-date
without the maintainers of the program (John and I) having extra work.
If you are persuasive enough, we may implement your idea. :)

Thanks,

Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

from coursera-dl.

ivoflipse commented on May 17, 2024

I personally only download courses when all the material is available, because else I would have to come back later and download the rest anyway. But I can understand if others use it to download video's to watch them offline or on-the-go. The issue with course material no longer being available could (hopefully) be caught with an exception when you get an access denied error.

I guess the only work around I could imagine would be to parse the Course page for logged in users.
https://www.coursera.org/user/i/<user_uuid>
Then check if left/width of the "coursera-course-listing-progress" element have reached 100%.
If so, extract the course url from the "coursera-course-listing-meta" element and try to run the script.

But I can understand if all this level of automation is out of scope of the script.

from coursera-dl.

jplehmann commented on May 17, 2024

I've personally been facing a similar issue with the explosion of classes. I have used the following regex:

# extract all the currently open classes I'm enrolled in on a single line, space separated
grepo "class.coursera.org/(.*?)/" courses.html | uniq | paste -s -d" "

where courses.html is the page displayed when you click on "courses" underneath your name in the menu, and "grepo" is a script I wrote which does something like "grep -o" except it outputs only the text matched by the group.

from coursera-dl.

ivoflipse commented on May 17, 2024

Inspired by your comment I messed around a little to see if I could get out this information. I couldn't get to my /courses page, so I just manually downloaded it. Automating this would be nice, but it works.

Then I load the page using BeautifulSoup:

page = open("Courses.htm")
soup = BeautifulSoup(page)
# Find the box that contains the course information
course_elements = soup.findAll("div", 
{"class":"coursera-course-listing-box coursera-course-listing-box-wide coursera-account-course-listing-box"})

This gives us a list that contains each of the boxes on the /course page. From here we can try and extract the relevant information:

# Iterate through each course box
for course in course_elements:
    # The date information is in a span element
    listing_start = course.findAll("span")
    # Some booleans for controlling behavior of the script
    is_course = True
    ended = False

    # Not every box seems to be a course, so we just try to parse it and else fail
    try:
        # There seem to be three different date formats:
        # Courses yet to start
        if "Starts" == listing_start[2].text.split()[0]:
            ending_time = listing_start[2].text
        # Courses that have already ended
        elif "Ended" == listing_start[2].text.split()[0]:
            ending_time = listing_start[2].text
            ended = True
        # Courses that have already started, but not yet ended
        else:
            ending_time = "End date: {}".format(listing_start[2].text)
    except:
        # If we can't get the date, flip this boolean, so we don't bother with further parsing
        is_course = False

    # If the current element is a course, print the info
    # If you set this check to ended, it'll only give you info for completed courses
    if is_course: #and ended:
        course_listing = course.findAll("h3")
        course_name = course_listing[0].text
        course_url = str(course_listing[0]).split("\"")[3]
        split_course_url = course_url.split("/")
        if split_course_url[3] == "course":
            course_handler = course_url.split("/")[4]
        else:
            course_handler = course_url.split("/")[3]
        print "Course name: {}".format(course_name) 
        print "Course handler: {}".format(course_handler)
        print "Course url: {}".format(course_url)
        print ending_time
        print

I added some prints, which aren't really needed, but just show you that you can retrieve the information you'd want. You could either use the url that's passed when you press the green button or use the course name, like your script currently uses. It seems that courses that are no longer accessible have a different url (with the auth part), so that's useful info too.

So depending on the status of the course, you'd get something like this:

Course in progress
Course name: Think Again: How to Reason and Argue
Course handler: thinkagain-2012-001
Course url: https://class.coursera.org/thinkagain-2012-001/auth/auth_redirector?type=login&subtype=normal
End date: Nov 26th

Course not yet started
Course name: Know Thyself
Course handler: knowthyself
Course url: https://www.coursera.org/course/knowthyself
Starts in 20 days

Ended course
Course name: Automata
Course handler: automata
Course url: https://class.coursera.org/automata/auth/auth_redirector?type=login&subtype=normal
Ended 8 months ago

Ended and closed course
Course name: Statistics One
Course handler: stats1
Course url: https://www.coursera.org/course/stats1
Ended 4 months ago

It would require some fiddling, because you no longer have to pass the names through the command line, so you'd have to insert them somewhere. Or make the script get the names from the parsed file and go through them one by one.

Anyway, this was a fun experiment :-) If only I could get it to retrieve this information from the live page and possibly list the courses available for me, so I could pass the number of the course I wanted the script to download, that would be awesome!

from coursera-dl.

Supply a template or txt file with course names for easy lookup about coursera-dl HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent