Giter Club home page Giter Club logo

metaphor's People

Contributors

hueniverse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaphor's Issues

The header content contains invalid characters

I am using metaphor and trying to get the information from a website.

I am getting following error while getting data from a website.

_http_outgoing.js:358
> throw new TypeError('The header content contains invalid characters');
> ^

TypeError: The header content contains invalid characters
> at ClientRequest.OutgoingMessage.setHeader (_http_outgoing.js:358:11)
> at new ClientRequest (_http_client.js:86:14)
> at Object.exports.request (http.js:31:10)
> at Object.exports.request (https.js:199:15)
> at internals.Client.request (/project/node_modules/wreck/lib/index.js:158:24)
> at options.beforeRedirect (/project/node_modules/wreck/lib/index.js:206:38)
> at formatCookies (/project/node_modules/metaphor/lib/index.js:104:28)
> at cookies.forEach (/project/node_modules/metaphor/lib/index.js:118:28)
> at Array.forEach (native)
> at Object.beforeRedirect (/project/node_modules/metaphor/lib/index.js:113:25)
> at ClientRequest.onResponse (/project/node_modules/wreck/lib/index.js:204:24)
> at ClientRequest.g (events.js:291:16)
> at emitOne (events.js:96:13)
> at ClientRequest.emit (events.js:188:7)
> at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:474:21)
> at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)

What to do now?

engine.describe throws when site has malformed URL in its meta tags

Bumped into issues using engine.describe on the URL http://bit.ly/2u4uitl. On investigation, that URL redirects to https://collapsed.co/startups/sidecar, whose meta tags seem to have an error - the og:url and twitter:url meta tags give the canonical URL of the page as https:collapsed.co/startups/sidecar (no //). In https://github.com/hueniverse/metaphor/blob/master/lib/index.js#L189, Metaphor takes this URL and parses it with Url.parse, then tries to call split on the hostname; this throws an error, because the result of Url.parse on the malformed URL does not have a hostname.

Metaphor should probably be making sure the URL actually has a valid hostname there and using a fallback if not (and use similar measures if we try to parse this URL elsewhere).

Fallback description URL does not follow redirects

If there is no canonical URL given by a site's meta tags etc., the URL that describe() was originally called with is used in the description. However, this means that if that URL redirected somewhere, we end up with the original URL, which might be (for instance) something shortened by a URL shortener, rather than the proper URL of the page.

This wouldn't be too much of an issue except that our application tried to describe() this website, which has no og:url, but does have an og:image: /template_images/ecliniqua/clinicalInformaticsNews-og.jpg. Since this is a relative URL, our application tries to get an absolute URL, using new URL(description.image.url, description.url) - but because description.url is http://bit.ly/2pR9IGK, the actual link we were given, we end up generating the incorrect URL http://bit.ly/template_images/ecliniqua/clinicalInformaticsNews-og.jpg.

We'd actually appreciate if describe() would return absolute URLs for all images so we wouldn't have to deal with that in our application logic, but regardless, it seems obviously more correct for description.url (and site_name) to match the actual final destination URL than the input to the function - or else, it would be great to have some other property giving access to this information without having to fetch the URL again just to see where it redirected to.

(Looking at the rest of the describe output, the icons appear to be constructed similarly to what we're doing with the images - for this site we get http://bit.ly/template_images/ecliniqua/favicon.ico, which is incorrect. So that at least is definitely a bug.)

Crash: Cannot read property 'tag' of undefined

For the most part this module works great. I'm puzzled by the following crash. I'm using the module in a static website generator, and the website I'm testing against has dozens of pages with links to YouTube video pages. Since this doesn't report the URL triggering the problem, I'm not entirely sure which page is crashing.

It appears to be happening on one of the index pages -- each of these pages has about 60 youtube links to process, and to just get the thumbnail.

Is this running past a usage limit?

/Users/david/ws/t/hmp/node_modules/metaphor/lib/ogp.js:119
                if (prev[sub] &&
                        ^

TypeError: Cannot read property 'tag' of undefined
    at Object.exports.describe (/Users/david/ws/t/hmp/node_modules/metaphor/lib/ogp.js:119:25)
    at /Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:216:33
    at Object.HtmlParser2.Parser.onend (/Users/david/ws/t/hmp/node_modules/metaphor/lib/tags.js:139:20)
    at Parser.onend (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Parser.js:309:32)
    at Tokenizer._finish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Tokenizer.js:838:12)
    at Tokenizer.end (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Tokenizer.js:829:25)
    at Parser.end (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Parser.js:337:18)
    at Object.exports.parse (/Users/david/ws/t/hmp/node_modules/metaphor/lib/tags.js:144:12)
    at Object.exports.Engine.constructor.options._describe.setup.redirected.beforeRedirect.Wreck.request.exports.parse.Tags.parse [as parse] (/Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:212:10)
    at /Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:151:36
    at finish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/wreck/lib/index.js:328:20)
    at wrapped (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/hoek/lib/index.js:871:20)
    at onReaderFinish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/wreck/lib/index.js:399:16)
    at g (events.js:260:16)
    at emitNone (events.js:72:20)
    at emit (events.js:166:7)

htmlparser throws on link tags without rel

Hey there!
I'm trying to parse a German newspaper website which apparently contains a link tag without a rel property.

<link itemprop="primaryImageOfPage" href="http://img.zeit.de/politik/ausland/2016-06/cameron-farage-tv-debatte/wide__1300x731">

This makes the htmlparser throw:

const Metaphor = require('metaphor')
const parser = new Metaphor.Engine({ preview: false })

parser.describe('http://www.zeit.de/politik/ausland/2016-06/brexit-tv-duell-david-cameron-nigel-farage-eu-austritt', (descr) => console.log(descr))

TypeError: Cannot read property 'split' of undefined
    at Object.HtmlParser2.Parser.onopentag (/Users/clemens/share-page/node_modules/metaphor/lib/tags.js:91:44)
    at Parser.onopentagend (/Users/clemens/share-page/node_modules/htmlparser2/lib/Parser.js:169:37)
    at Tokenizer._stateBeforeAttributeName (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:230:13)
    at Tokenizer._parse (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:658:9)
    at Tokenizer.write (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:632:7)
    at Parser.write (/Users/clemens/share-page/node_modules/htmlparser2/lib/Parser.js:331:18)
    at Object.exports.parse (/Users/clemens/share-page/node_modules/metaphor/lib/tags.js:142:12)
    at Object.exports.parse (/Users/clemens/share-page/node_modules/metaphor/lib/index.js:209:10)
    at Wreck.read (/Users/clemens/share-page/node_modules/metaphor/lib/index.js:148:36)
    at finish (/Users/clemens/share-page/node_modules/wreck/lib/index.js:328:20)

We should probably check for the existence of rel here https://github.com/hueniverse/metaphor/blob/master/lib/tags.js#L88

if (name === 'link' && attributes.href && attributes.rel) {}

Thank you!

Describing a url that has a faulty og:url value fails

This url (http://kb.mailchimp.com/delivery/deliverability-research/gmail-is-clipping-my-email) has the following host set for og:url http//127.0.0.1/ which breaks describe because the url module cannot parse it.

Flipping the check here https://github.com/hueniverse/metaphor/blob/master/lib/index.js#L225 would fix it though.

To reproduce:

const metaphor = require('metaphor');
const engine = new metaphor.Engine();

engine.describe('http://kb.mailchimp.com/delivery/deliverability-research/gmail-is-clipping-my-email', res => {
  console.log('res', res);
});

providers.json not in latest npm package v3.5.2

Hey Eran,
just noticed this:

ls -l node_modules/metaphor
total 40
-rwxr-xr-x  1 clemens  staff  1660 28 Jul 20:25 LICENSE
-rwxr-xr--  1 clemens  staff  8644 28 Jul 20:25 README.md
drwxr-xr-x  9 clemens  staff   306 29 Jul 12:48 lib
drwxr-xr-x  3 clemens  staff   102 29 Jul 12:48 node_modules
-rw-r--r--  1 clemens  staff  2636 29 Jul 12:48 package.json

which results in:

Error: Cannot find module '../providers.json'
    at Function.Module._resolveFilename (module.js:440:15)
    at Function.Module._load (module.js:388:25)
    at Module.require (module.js:468:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/clemens/webapp-server/node_modules/metaphor/lib/index.js:13:19)
    at Module._compile (module.js:541:32)
    at Object.Module._extensions..js (module.js:550:10)
    at Module.load (module.js:458:32)
    at tryModuleLoad (module.js:417:12)
    at Function.Module._load (module.js:409:3)

in v3.5.1 it was still there.

Handle protected/private tweets

Debug: internal, implementation, error TypeError: Uncaught error: Cannot read property 'html' of undefined at metaphor.describe (/Users/kye/sideway/server/node_modules/@sideway/embed/lib/index.js:134:47) at settings.preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:204:24) at Object.internals.preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:300:12) at internals.sizes (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:198:27) at Object.exports.parallel (/Users/kye/sideway/server/node_modules/items/lib/index.js:47:9) at Object.internals.sizes (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:374:11) at _preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:196:19) at exports.parse (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:152:104) at Oembed.describe (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:242:20) at Object.exports.describe (/Users/kye/sideway/server/node_modules/metaphor/lib/oembed.js:83:16) at Tags.parse (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:222:16) at Object.HtmlParser2.Parser.onend (/Users/kye/sideway/server/node_modules/metaphor/lib/tags.js:139:20) at Parser.onend (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Parser.js:310:32) at Tokenizer._finish (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Tokenizer.js:838:12) at Tokenizer.end (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Tokenizer.js:829:25) at Parser.end (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Parser.js:338:18)

Oembed support

The breaking change is the additional required options argument.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.