Giter Club home page Giter Club logo

node-podcast-parser's Introduction

node-podcast-parser

build status Coverage Status

Parses a podcast RSS feed and returns easy to use object

Output format

Takes an opinionated view on what should be included so not everything is. The goal is to have the result be as normalized as possible across multiple feeds.

{
  "title":       "<Podcast title>",
  "description": {
    "short":       "<Podcast subtitle>",
    "long":        "<Podcast description>"
  },
  "link":       "<Podcast link (usually website for podcast)>",
  "image":      "<Podcast image>",
  "language":   "<ISO 639 language>",
  "copyright":  "<Podcast copyright>",
  "updated":    "<pubDate or latest episode pubDate>",
  "explicit":   "<Podcast is explicit, true/false>",
  "categories": [
    "Category>Subcategory"
  ],
  "author": "<Author name>",
  "owner": {
    "name":  "<Owner name>",
    "email": "<Owner email>"
  },
  "episodes": [
    {
      "guid":        "<Unique id>",
      "title":       "<Episode title>",
      "description": "<Episode description>",
      "explicit":    "<Episode is is explicit, true/false>",
      "image":       "<Episode image>",
      "published":   "<date>",
      "duration":    120,
      "categories":  [
        "Category"
      ],
      "enclosure": {
        "filesize": 5650889,
        "type":     "audio/mpeg",
        "url":      "<mp3 file>"
      }
    }
  ]
}

Installation

yarn add node-podcast-parser

Usage

const parsePodcast = require('node-podcast-parser');

parsePodcast('<podcast xml>', (err, data) => {
  if (err) {
    console.error(err);
    return;
  }

  // data looks like the format above
  console.log(data);
});

Parsing a remote feed

node-podcast-parser only takes care of the parsing itself, you'll need to download the feed first yourself.

Download the feed however you want, for instance using request

Example:

const request = require('request');
const parsePodcast = require('node-podcast-parser');

request('<podcast url>', (err, res, data) => {
  if (err) {
    console.error('Network error', err);
    return;
  }

  parsePodcast(data, (err, data) => {
    if (err) {
      console.error('Parsing error', err);
      return;
    }

    console.log(data);
  });
});

Testing

yarn install
yarn run test

Test coverage

yarn install
yarn run cover

Special notes

Language

A lot of podcasts have the language set something like en. The spec requires the language to be ISO 639 so it will be convered to en-us. A non-English language will be lang-lang such as de-de. The language is always lowercase.

Cleanup

Most content is left as it is but whitespace at beginning and end of strings is trimmed.

Missing properties

Unfortunately not all podcasts contain all properties. If so they are simply ommited from the output.

These properties include:

  • feed TTL
  • episode categories
  • episode image
  • etc

Episode categories are included as an empty array if the podcast doesn't contain any categories.

Generic RSS feeds

This module is specifically aimed at parsing RSS feeds and doesn't cater for more generic feeds from blogs etc.

Use node-feedparser

node-podcast-parser's People

Contributors

akupila avatar cbosco avatar dependabot[bot] avatar dhleong avatar oillescas avatar thejoezack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

node-podcast-parser's Issues

TypeError: Cannot read property 'length' of undefined on many podcasts

Seems to crash during parsing

TypeError: Cannot read property 'length' of undefined
    at charAt (/some/path/node_modules/sax/lib/sax.js:973:19)
    at SAXParser.write (/some/path/node_modules/sax/lib/sax.js:994:11)
    at parse (/some/path/node_modules/node-podcast-parser/index.js:211:12)
    at Request._callback (/some/path/jobs/update-podcasts.js:36:17)
    at self.callback (/some/path/node_modules/request/request.js:185:22)
    at Request.emit (events.js:200:13)
    at Request.onRequestError (/some/path/node_modules/request/request.js:881:8)
    at ClientRequest.emit (events.js:200:13)
    at Socket.socketOnEnd (_http_client.js:436:9)
    at Socket.emit (events.js:205:15)
    at endReadableNT (_stream_readable.js:1154:12)
    at processTicksAndRejections (internal/process/task_queues.js:74:11)
(node:9101) UnhandledPromiseRejectionWarning: ReferenceError: res is not defined
    at /some/path/jobs/update-podcasts.js:39:37
    at parse (/some/path/node_modules/node-podcast-parser/index.js:213:5)
    at Request._callback (/some/path/jobs/update-podcasts.js:36:17)
    at self.callback (/some/path/node_modules/request/request.js:185:22)
    at Request.emit (events.js:200:13)
    at Request.onRequestError (/some/path/node_modules/request/request.js:881:8)
    at ClientRequest.emit (events.js:200:13)
    at Socket.socketOnEnd (_http_client.js:436:9)
    at Socket.emit (events.js:205:15)
    at endReadableNT (_stream_readable.js:1154:12)
    at processTicksAndRejections (internal/process/task_queues.js:74:11)
(node:9101) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 6)

Haven't had the time to look it up yet, but I will try to do it.

Support for promise

It would be great to support promise in this library. Right now, i wrote a quick wrapper to do this.

Support for new tags in iOS 11

iOS 11 has added new supported tags for podcast feeds. These fields are optional, but it would be nice if node-podcast-parser supported them

itunes:type (show/channel level): “episodic” for non-chronological episodes that will behave as they have for years and download the latest episode, or “serial” for chronological episodes that should be consumed oldest to newest.
itunes:episodeType (episode/item level): “full” for normal episodes; “trailer” to promote an upcoming show, season, or episode; or “bonus” for extra content related to a show, season, or episode.
itunes:title (episode/item level): only the episode title—no episode number, season number, or show title. This can be used with any show and episode type. (note, this already exists - the change is that shows should be removing the season and episode number from here)
itunes:episode (episode/item level): any number to indicate the current episode number, which can be relative to the entire show (like “316”), or relative to the current season (like “5”). This can be used with any show and episode type.
itunes:season (episode/item level): any number to indicate the season in which this episode belongs. This can be used with any show and episode type.
itunes:summary (episode/item level): this updated (but not new) tag is best for a short description of your episode. It will display above the full show notes.
content:encoded (episode/item level): this updated (but not new) tag is for your full show notes. It will display below the title and summary.

Short description of the terms (and where I got the short descriptions from)
https://theaudacitytopodcast.com/how-to-start-using-the-new-itunes-podcast-tags-for-ios-11-tap316/

Official documenation: http://podcasts.apple.com/resources/spec/ApplePodcastsSpecUpdatesiOS11.pdf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.