Giter Club home page Giter Club logo

udemy-crawler's Introduction

udemy-crawler

Build Status

udemy-crawler crawls a Udemy course data by a given url, scrapes it and provides JSON back.

Installation

$ npm i -g udemy-crawler

or if you want to import in a project (install it locally):

$ npm i udemy-crawler -S

In case you'll change ./src/crawler.es.js, you will also need to install rollup (globally), to run tests and to build/compile the app (npm run build; see package.json for other scripts):

npm install --global rollup

Then, to run tests:

npm test

via Docker

You can make a Docker build for the udemy crawler CLI.

$ docker build -t udemy-crawler .
$ docker run udemy-crawler

You need to provide Udemy course URL, so it will be like:

$ docker run udemy-crawler https://www.udemy.com/success-life/

Usage

Command-line

$ udemy-crawler [URL]

Example and sample output

$ udemy-crawler https://www.udemy.com/code-grow-rich-earn-more-as-an-entrepreneur-or-developer/

Parsing Url: https://www.udemy.com/code-grow-rich-earn-more-as-an-entrepreneur-or-developer/

And getting JSON back:
{
    "id": "481696",
    "title": "Code & Grow Rich: Earn More As An Entrepreneur Or Developer",
    "headline": "Make More Money as an Entrepreneur, Web & Mobile App Developer, Software Engineer, Startup Junkie, or Programmer",
    "date": "Last updated 2/2016",
    "image": "https://i.udemycdn.com/course/750x422/481696_756c_3.jpg",
    "description": "<p><strong>----------------------------------------------------------------------------------------------------------------<br><em><br>*Feb 16th 2016*</em></strong></p><ul><li><strong>Over 3100 student enrollments within the first four days of course launch!</strong></li><li><strong>Now with over 57 hours of video content and 500+ pages of curated resources!</strong></li><li><strong>11 discrete income producing trajectories!</strong></li><li><strong>Over 100 real-world mock interactions (phone calls and email templates)!  </strong></li></ul><p><strong><br>Code &amp; Grow Rich is a comprehensive course that empowers both non-technical entrepreneurs and software developers with the skills to succeed in today\'s techno-centric business world. </strong><strong>-----------------------------------------------------------------------------------------------------------------</strong></p> (...)",
    audiences: [
                'entrepreneurs',
                'mobile app developers ',
                'web developers',
                'startup founders ',
                'non-technical individuals that want a blue-print for profiting through technology',
                'software engineers ',
                'computer science students ',
                'video game developers ',
                'non-technical co-founders who want to scale a startup',
                'software developers ',
                'computer programmers ',
                'software architects ',
                'intrapreneurs that want to learn how to better weave technology into their current role ',
                'web designers',
                'career switchers who are looking to explore how technology and entrepreneurship can be used to replace their 9-5 jobs',
                'marketers ',
                'growth hackers ',
                'traditional or online teachers / instructors ',
                'engineers and software developers ',
                'user experience designers (UX designers) & user interface designers (UI designers) ',
                'information technology professionals and managers ',
                'database administrators and database architects ',
                'information security analysts ',
                'hackers ',
                'computer systems analysts ',
                'website content developers and website conversion experts ',
                'content management system (CMS) developers and enterprise resource planning (ERP) experts  ',
                'data scrapers ',
                'data miners ',
                'data analysts ',
                'cyber-security experts '
            ],
    "topics": [
                'Development',
                'Web Development',
                'Entrepreneurship Fundamentals'
            ],
    "couponCode": "SUMMERSALE",
    "price": 10.99,
    "fullPrice": 199.99,
    "discount": 94,
    "curriculum":
    {
        "contents": [ [Object], [Object], [Object], [Object], [Object] ],
        "courseLength": '02:36:13'
    },
    "authors": [ 'Giles McMullen-Klein' ]

API

// import the package
import UdemyCrawler from 'udemy-crawler'
// apply options
const crawler = new UdemyCrawler({
  headers: {
    'User-Agent':
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    'upgrade-insecure-requests': 1
  }
})

// execute crawler
const scrapeUdemyCourseToJSON = async url => {
  return crawler.execute(url, (err, course) => {
    if (err) {
      return console.error(err.message)
    }
    // console.log('Finishing scraping...', course)
    return course
  })
}
// somewhere else
const udemyCourseContents = await scrapeUdemyCourseToJSON('your_url')

Method: execute(url[, callback])

Arguments

  • url: String - The udemy course URL to parse. It should start with following: https://www.udemy.com/ and ends optionally with ?couponCode=CODE. The crawling won't start without the course url.
  • callback(err, course): Function (Optional).
    • err: Error - will be null if no error occurred.
    • course: Object - is the result course info in below format.

course: Object format:

{
    "id": String,
    "title": String,
    "headline": String,
    "date": String, /* last date of update of the course */
    "image": String, /* the 750x422 size */
    "description": String, /* the HTML format */
    "couponCode": String, /* defaults to "" */
    "price": Number, /* current price (floating point number) */
    "fullPrice": Number, /* list price (floating point number) */
    "discount": Number, /* course discount in percent: default 0. Max value is 100. */
    "audiences": String[], /* target audiences */
    "curriculum": Object[], /* table of contents aka list of lectures */
    "topics": String[], /* course keywords aka topics */
    "authors": String, /* author of the course */
    "discountExpiration": Date /* date of the discount's expiration */
}

Note:

  • discount = 0 and price = 0: this is FREE COURSE.
  • discount = 100: this is 100% OFF with couponCode value set.

License

MIT 2018 by Pete Houston 2019 by D. Galakhov

udemy-crawler's People

Contributors

galakhov avatar petehouston avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

udemy-crawler's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.