erikriver / opengraph Goto Github PK
View Code? Open in Web Editor NEWA python module to parse the Open Graph Protocol
Home Page: http://ogp.me/
License: MIT License
A python module to parse the Open Graph Protocol
Home Page: http://ogp.me/
License: MIT License
from opengraph import OpenGraph
cannot import name 'OpenGraph'
in opengraph/init.py
It works when I run with Python 2, but when I run with Python 3 I get the following error.
Traceback (most recent call last):
File "og.py", line 9, in <module>
import opengraph
File "/usr/local/lib/python3.5/dist-packages/opengraph/__init__.py", line 1, in <module>
from opengraph import OpenGraph
Udemy.com is blocking the default User Agent of opengraph.
I'm getting
How do I set a custom user agent for OpenGraph module
urllib2.HTTPError: HTTP Error 403: Unauthorized
As a workaround I have created a custom getter using requests module
def custom_get_img_from_link(link):
"""
"""
#headers = {"User-Agent":get_random_UA()}
headers = {"User-Agent": "My bot"}
r = requests.get(link, headers=headers)
parsed_uri = urlparse(link)
domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
OpenGraph.parser = parser
OpenGraph.scrape = True # workaround for some subtle bug in opengraph
page = OpenGraph(html=r.content)
if page.is_valid():
image_url = page.get('image', None)
if not image_url.startswith('http'):
image_url = urljoin(domain, page['image'])
return image_url
Ran these command in ipython:
from opengraph import OpenGraph
og=OpenGraph("http://www.livemint.com/Industry/VlSovF4AGkwhupYQ2Ps3YN/Bschool-placements-Modest-rise-in-average-salary-offered.html")
og
og=OpenGraph("http://facebook.com")
og
output1:
{'locale:alternate': 'ja_JP', 'site_name': 'http://www.livemint.com/', 'description': 'Initial analysis of the ongoing placement season at IIMs and other top B-schools indicate that most salaries offered show a single digit growth over last year', 'title': 'B-school placements see modest salary growth, fewer offers from start-ups', 'url': 'http://www.livemint.com/Industry/VlSovF4AGkwhupYQ2Ps3YN/Bschool-placements-Modest-rise-in-average-salary-offered.html', 'image': 'http://www.livemint.com/rf/Image-621x414/LiveMint/Period2/2017/02/08/Photos/Processed/[email protected]', 'locale': 'hi_IN', 'type': 'article'}
output2:
{'locale:alternate': 'ja_JP', 'site_name': 'Facebook', 'description': 'Initial analysis of the ongoing placement season at IIMs and other top B-schools indicate that most salaries offered show a single digit growth over last year', 'title': 'B-school placements see modest salary growth, fewer offers from start-ups', 'url': 'https://www.facebook.com/', 'image': 'https://www.facebook.com/images/fb_icon_325x325.png', 'locale': 'hi_IN', 'type': 'article'}
description and title are same.
If you have lxml installed, BeautifulSoup4 will set lxml as the default parser, so it would be better to be able to specify the parser depending on the situation.
opengraph/opengraph/opengraph.py
Line 63 in e232256
Depending on the environment, the following issue cases may occur due to the above reasons
#37
As a solution, I think it would be a good idea to add a new parser that can be selected in the following arguments
opengraph/opengraph/opengraph.py
Line 28 in e232256
Is it possible to add licensing information to this project? I'd like to modify it to suit my custom needs (in commercial product).
Some OG tags are not found if the tags are in the body, not the header. Currently the code only checks doc.html.header. Will provide a fix to search all HTML (maybe this should be configurable?)
extending dict makes using the module very confusing
It's mixing object attrs with returned code
Hi,
I am having an issue with getting the metadata using opengraph_py3
, urllib
and bs4
.
In parser
method you are only checking the <head>
but it looks like <meta>
tags are sometimes in the body. Any ideas how can I fix this ? Is it due to the UserAgent ?
import re
import opengraph_py3 as opengraph
import urllib
from bs4 import BeautifulSoup
raw = urllib.request.FancyURLopener().open("https://youtu.be/DQwU_kU4pUg")
html = raw.read()
soap = BeautifulSoup(html, 'html.parser')
# This is the same code as in `parser`
soap.html.head.findAll(property=re.compile(r'^og'))
# []
soap.html.body.findAll(property=re.compile(r'^og'))
# [<meta content="YouTube" property="og:site_na....]
https://pypi.org/project/PyOpenGraph/
It's suggested by https://ogp.me/
C:\Python27\lib\site-packages\bs4\__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html5lib")
markup_type=markup_type))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.