Comments (4)
Hi, Ivo.
On Mon, Feb 11, 2013 at 1:22 PM, Ivo Flipse [email protected] wrote:
When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list.
Well, supposedly, the idea would be to download material from courses
that you already know about (because you are subscribed to them). :)
So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?
I guess that one of the easiest routes would be to grab this
information from some site that aggregates this (e.g.,
classcentral.com), but this is on the borderline of the scope of
coursera-dl, which is meant for downloads, not discovery...
Furthermore, keeping such lists may need some manual intervention and
it is not really clear how they could be used by the script. The
person has to sign up for the courses anyway (and if you try to signup
for some courses after they are already running or after they have
been concluded, you will be denied access).
The reason for that may be because the course won't be offered on
coursera anymore (see, for instance, Jeniffer Widom's db course
migrating to Class2Go, Umesh Vazirani's qcomp migrating to EdX.org,
the saas courses moving to EdX too etc.).
And, of course, to have access to the courses, you have to click the
"I accept the honor code" or something like that. I don't intend to
make this particular step automated, for human/awareness reasons.
Please, clarify how you intend to keep the list of courses up-to-date
without the maintainers of the program (John and I) having extra work.
If you are persuasive enough, we may implement your idea. :)
Thanks,
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
from coursera-dl.
I personally only download courses when all the material is available, because else I would have to come back later and download the rest anyway. But I can understand if others use it to download video's to watch them offline or on-the-go. The issue with course material no longer being available could (hopefully) be caught with an exception when you get an access denied error.
I guess the only work around I could imagine would be to parse the Course page for logged in users.
https://www.coursera.org/user/i/<user_uuid>
Then check if left/width of the "coursera-course-listing-progress" element have reached 100%.
If so, extract the course url from the "coursera-course-listing-meta" element and try to run the script.
But I can understand if all this level of automation is out of scope of the script.
from coursera-dl.
I've personally been facing a similar issue with the explosion of classes. I have used the following regex:
# extract all the currently open classes I'm enrolled in on a single line, space separated
grepo "class.coursera.org/(.*?)/" courses.html | uniq | paste -s -d" "
where courses.html is the page displayed when you click on "courses" underneath your name in the menu, and "grepo" is a script I wrote which does something like "grep -o" except it outputs only the text matched by the group.
from coursera-dl.
Inspired by your comment I messed around a little to see if I could get out this information. I couldn't get to my /courses page, so I just manually downloaded it. Automating this would be nice, but it works.
Then I load the page using BeautifulSoup:
page = open("Courses.htm")
soup = BeautifulSoup(page)
# Find the box that contains the course information
course_elements = soup.findAll("div",
{"class":"coursera-course-listing-box coursera-course-listing-box-wide coursera-account-course-listing-box"})
This gives us a list that contains each of the boxes on the /course page. From here we can try and extract the relevant information:
# Iterate through each course box
for course in course_elements:
# The date information is in a span element
listing_start = course.findAll("span")
# Some booleans for controlling behavior of the script
is_course = True
ended = False
# Not every box seems to be a course, so we just try to parse it and else fail
try:
# There seem to be three different date formats:
# Courses yet to start
if "Starts" == listing_start[2].text.split()[0]:
ending_time = listing_start[2].text
# Courses that have already ended
elif "Ended" == listing_start[2].text.split()[0]:
ending_time = listing_start[2].text
ended = True
# Courses that have already started, but not yet ended
else:
ending_time = "End date: {}".format(listing_start[2].text)
except:
# If we can't get the date, flip this boolean, so we don't bother with further parsing
is_course = False
# If the current element is a course, print the info
# If you set this check to ended, it'll only give you info for completed courses
if is_course: #and ended:
course_listing = course.findAll("h3")
course_name = course_listing[0].text
course_url = str(course_listing[0]).split("\"")[3]
split_course_url = course_url.split("/")
if split_course_url[3] == "course":
course_handler = course_url.split("/")[4]
else:
course_handler = course_url.split("/")[3]
print "Course name: {}".format(course_name)
print "Course handler: {}".format(course_handler)
print "Course url: {}".format(course_url)
print ending_time
print
I added some prints, which aren't really needed, but just show you that you can retrieve the information you'd want. You could either use the url that's passed when you press the green button or use the course name, like your script currently uses. It seems that courses that are no longer accessible have a different url (with the auth part), so that's useful info too.
So depending on the status of the course, you'd get something like this:
Course in progress
Course name: Think Again: How to Reason and Argue
Course handler: thinkagain-2012-001
Course url: https://class.coursera.org/thinkagain-2012-001/auth/auth_redirector?type=login&subtype=normal
End date: Nov 26thCourse not yet started
Course name: Know Thyself
Course handler: knowthyself
Course url: https://www.coursera.org/course/knowthyself
Starts in 20 daysEnded course
Course name: Automata
Course handler: automata
Course url: https://class.coursera.org/automata/auth/auth_redirector?type=login&subtype=normal
Ended 8 months agoEnded and closed course
Course name: Statistics One
Course handler: stats1
Course url: https://www.coursera.org/course/stats1
Ended 4 months ago
It would require some fiddling, because you no longer have to pass the names through the command line, so you'd have to insert them somewhere. Or make the script get the names from the parsed file and go through them one by one.
Anyway, this was a fun experiment :-) If only I could get it to retrieve this information from the live page and possibly list the courses available for me, so I could pass the number of the course I wanted the script to download, that would be awesome!
from coursera-dl.
Related Issues (20)
- coursera: deprecating API endpoint onDemandCourseMaterials.v1 HOT 48
- Coursera-dl worked and now stops with an 404 error on the same class HOT 12
- Error 403 Client Error: Forbidden for url: https://api.coursera.org/api/memberships.v1?includes=courseId,courses.v1&q=me&showHidden=true&filter=current,preEnrolled getting page https://api.coursera.org/api/memberships.v1?includes=courseId,courses.v1&q=me&showHidden=true&filter=current,preEnrolled The server replied: {"errorCode":"Not Authorized","message":null,"details":null} HOT 7
- Cannot download files on Coursera platform
- Issue in Parsing syllabus of on-demand course HOT 2
- Unable to reinstall coursera-dl HOT 1
- Course with `-` prefix to course name not downloading! HOT 2
- How to download only the transcript
- Please provide a username with the -u option, or a .netrc file with the -n option. HOT 3
- stops downloading after a while
- connection.py
- Autosave failed and permission denied in all notebooks HOT 2
- Coursera - API Route Does Not Exist HOT 5
- Coursera - API Route Does Not Exist They have upgraded to V3 HOT 3
- What login pass should I use?
- Issue with coursera-dl: Error 400 Client Error: Bad Request HOT 4
- Not able to download course
- Not working with latest python versions HOT 7
- courser-helper error HOT 1
- HTTPError 404 Client Error: Not Found for URL HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coursera-dl.