outmoded / metaphor Goto Github PK
View Code? Open in Web Editor NEWOpen Graph, Twitter Card, and oEmbed Metadata Collector
License: Other
Open Graph, Twitter Card, and oEmbed Metadata Collector
License: Other
I am using metaphor and trying to get the information from a website.
I am getting following error while getting data from a website.
_http_outgoing.js:358
> throw new TypeError('The header content contains invalid characters');
> ^TypeError: The header content contains invalid characters
> at ClientRequest.OutgoingMessage.setHeader (_http_outgoing.js:358:11)
> at new ClientRequest (_http_client.js:86:14)
> at Object.exports.request (http.js:31:10)
> at Object.exports.request (https.js:199:15)
> at internals.Client.request (/project/node_modules/wreck/lib/index.js:158:24)
> at options.beforeRedirect (/project/node_modules/wreck/lib/index.js:206:38)
> at formatCookies (/project/node_modules/metaphor/lib/index.js:104:28)
> at cookies.forEach (/project/node_modules/metaphor/lib/index.js:118:28)
> at Array.forEach (native)
> at Object.beforeRedirect (/project/node_modules/metaphor/lib/index.js:113:25)
> at ClientRequest.onResponse (/project/node_modules/wreck/lib/index.js:204:24)
> at ClientRequest.g (events.js:291:16)
> at emitOne (events.js:96:13)
> at ClientRequest.emit (events.js:188:7)
> at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:474:21)
> at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)
What to do now?
Bumped into issues using engine.describe on the URL http://bit.ly/2u4uitl. On investigation, that URL redirects to https://collapsed.co/startups/sidecar, whose meta tags seem to have an error - the og:url and twitter:url meta tags give the canonical URL of the page as https:collapsed.co/startups/sidecar (no //). In https://github.com/hueniverse/metaphor/blob/master/lib/index.js#L189, Metaphor takes this URL and parses it with Url.parse, then tries to call split on the hostname; this throws an error, because the result of Url.parse on the malformed URL does not have a hostname.
Metaphor should probably be making sure the URL actually has a valid hostname there and using a fallback if not (and use similar measures if we try to parse this URL elsewhere).
If there is no canonical URL given by a site's meta tags etc., the URL that describe() was originally called with is used in the description. However, this means that if that URL redirected somewhere, we end up with the original URL, which might be (for instance) something shortened by a URL shortener, rather than the proper URL of the page.
This wouldn't be too much of an issue except that our application tried to describe() this website, which has no og:url, but does have an og:image: /template_images/ecliniqua/clinicalInformaticsNews-og.jpg
. Since this is a relative URL, our application tries to get an absolute URL, using new URL(description.image.url, description.url)
- but because description.url
is http://bit.ly/2pR9IGK
, the actual link we were given, we end up generating the incorrect URL http://bit.ly/template_images/ecliniqua/clinicalInformaticsNews-og.jpg
.
We'd actually appreciate if describe() would return absolute URLs for all images so we wouldn't have to deal with that in our application logic, but regardless, it seems obviously more correct for description.url
(and site_name) to match the actual final destination URL than the input to the function - or else, it would be great to have some other property giving access to this information without having to fetch the URL again just to see where it redirected to.
(Looking at the rest of the describe output, the icons appear to be constructed similarly to what we're doing with the images - for this site we get http://bit.ly/template_images/ecliniqua/favicon.ico
, which is incorrect. So that at least is definitely a bug.)
Hi,
First of all I would like to thank @hueniverse for this awesome package. Everything is working well in Safari 10 but when using it in Safari 9 I got the following error:
SyntaxError unexpected keyword 'const'. const declarations are not supported in strict mode.
Does somebody know how to fix this and make it work in Safari 9 ?
Hi I have tried to use metaphor library, but got this error when trying to use describe.
For the most part this module works great. I'm puzzled by the following crash. I'm using the module in a static website generator, and the website I'm testing against has dozens of pages with links to YouTube video pages. Since this doesn't report the URL triggering the problem, I'm not entirely sure which page is crashing.
It appears to be happening on one of the index pages -- each of these pages has about 60 youtube links to process, and to just get the thumbnail.
Is this running past a usage limit?
/Users/david/ws/t/hmp/node_modules/metaphor/lib/ogp.js:119
if (prev[sub] &&
^
TypeError: Cannot read property 'tag' of undefined
at Object.exports.describe (/Users/david/ws/t/hmp/node_modules/metaphor/lib/ogp.js:119:25)
at /Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:216:33
at Object.HtmlParser2.Parser.onend (/Users/david/ws/t/hmp/node_modules/metaphor/lib/tags.js:139:20)
at Parser.onend (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Parser.js:309:32)
at Tokenizer._finish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Tokenizer.js:838:12)
at Tokenizer.end (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Tokenizer.js:829:25)
at Parser.end (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/htmlparser2/lib/Parser.js:337:18)
at Object.exports.parse (/Users/david/ws/t/hmp/node_modules/metaphor/lib/tags.js:144:12)
at Object.exports.Engine.constructor.options._describe.setup.redirected.beforeRedirect.Wreck.request.exports.parse.Tags.parse [as parse] (/Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:212:10)
at /Users/david/ws/t/hmp/node_modules/metaphor/lib/index.js:151:36
at finish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/wreck/lib/index.js:328:20)
at wrapped (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/hoek/lib/index.js:871:20)
at onReaderFinish (/Users/david/ws/t/hmp/node_modules/metaphor/node_modules/wreck/lib/index.js:399:16)
at g (events.js:260:16)
at emitNone (events.js:72:20)
at emit (events.js:166:7)
This will allow for changing of the layout through css
Hey there!
I'm trying to parse a German newspaper website which apparently contains a link
tag without a rel
property.
<link itemprop="primaryImageOfPage" href="http://img.zeit.de/politik/ausland/2016-06/cameron-farage-tv-debatte/wide__1300x731">
This makes the htmlparser throw:
const Metaphor = require('metaphor')
const parser = new Metaphor.Engine({ preview: false })
parser.describe('http://www.zeit.de/politik/ausland/2016-06/brexit-tv-duell-david-cameron-nigel-farage-eu-austritt', (descr) => console.log(descr))
TypeError: Cannot read property 'split' of undefined
at Object.HtmlParser2.Parser.onopentag (/Users/clemens/share-page/node_modules/metaphor/lib/tags.js:91:44)
at Parser.onopentagend (/Users/clemens/share-page/node_modules/htmlparser2/lib/Parser.js:169:37)
at Tokenizer._stateBeforeAttributeName (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:230:13)
at Tokenizer._parse (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:658:9)
at Tokenizer.write (/Users/clemens/share-page/node_modules/htmlparser2/lib/Tokenizer.js:632:7)
at Parser.write (/Users/clemens/share-page/node_modules/htmlparser2/lib/Parser.js:331:18)
at Object.exports.parse (/Users/clemens/share-page/node_modules/metaphor/lib/tags.js:142:12)
at Object.exports.parse (/Users/clemens/share-page/node_modules/metaphor/lib/index.js:209:10)
at Wreck.read (/Users/clemens/share-page/node_modules/metaphor/lib/index.js:148:36)
at finish (/Users/clemens/share-page/node_modules/wreck/lib/index.js:328:20)
We should probably check for the existence of rel
here https://github.com/hueniverse/metaphor/blob/master/lib/tags.js#L88
if (name === 'link' && attributes.href && attributes.rel) {}
Thank you!
This url (http://kb.mailchimp.com/delivery/deliverability-research/gmail-is-clipping-my-email) has the following host set for og:url
http//127.0.0.1/
which breaks describe
because the url
module cannot parse it.
Flipping the check here https://github.com/hueniverse/metaphor/blob/master/lib/index.js#L225 would fix it though.
To reproduce:
const metaphor = require('metaphor');
const engine = new metaphor.Engine();
engine.describe('http://kb.mailchimp.com/delivery/deliverability-research/gmail-is-clipping-my-email', res => {
console.log('res', res);
});
Hey Eran,
just noticed this:
ls -l node_modules/metaphor
total 40
-rwxr-xr-x 1 clemens staff 1660 28 Jul 20:25 LICENSE
-rwxr-xr-- 1 clemens staff 8644 28 Jul 20:25 README.md
drwxr-xr-x 9 clemens staff 306 29 Jul 12:48 lib
drwxr-xr-x 3 clemens staff 102 29 Jul 12:48 node_modules
-rw-r--r-- 1 clemens staff 2636 29 Jul 12:48 package.json
which results in:
Error: Cannot find module '../providers.json'
at Function.Module._resolveFilename (module.js:440:15)
at Function.Module._load (module.js:388:25)
at Module.require (module.js:468:17)
at require (internal/module.js:20:19)
at Object.<anonymous> (/Users/clemens/webapp-server/node_modules/metaphor/lib/index.js:13:19)
at Module._compile (module.js:541:32)
at Object.Module._extensions..js (module.js:550:10)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
in v3.5.1
it was still there.
Debug: internal, implementation, error TypeError: Uncaught error: Cannot read property 'html' of undefined at metaphor.describe (/Users/kye/sideway/server/node_modules/@sideway/embed/lib/index.js:134:47) at settings.preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:204:24) at Object.internals.preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:300:12) at internals.sizes (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:198:27) at Object.exports.parallel (/Users/kye/sideway/server/node_modules/items/lib/index.js:47:9) at Object.internals.sizes (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:374:11) at _preview (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:196:19) at exports.parse (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:152:104) at Oembed.describe (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:242:20) at Object.exports.describe (/Users/kye/sideway/server/node_modules/metaphor/lib/oembed.js:83:16) at Tags.parse (/Users/kye/sideway/server/node_modules/metaphor/lib/index.js:222:16) at Object.HtmlParser2.Parser.onend (/Users/kye/sideway/server/node_modules/metaphor/lib/tags.js:139:20) at Parser.onend (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Parser.js:310:32) at Tokenizer._finish (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Tokenizer.js:838:12) at Tokenizer.end (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Tokenizer.js:829:25) at Parser.end (/Users/kye/sideway/server/node_modules/htmlparser2/lib/Parser.js:338:18)
The breaking change is the additional required options
argument.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.