recursecenter / blaggregator Goto Github PK
View Code? Open in Web Editor NEWA blog aggregator for the Recurse Center community
Home Page: https://blaggregator.recurse.com
A blog aggregator for the Recurse Center community
Home Page: https://blaggregator.recurse.com
so I'm locked out ๐ฆ
Kenya's working on this
I get the following error message when I go to http://blaggregator.us/:
400 Bad Request
nginx
The admin login info is out of date given the oauth changes. Need to add shell instructions to make your account a superuser.
A handful of blogs are throwing the following error when crawled in production. It was only three blogs on last crawl, so this is not super high priority.
2014-11-02 01:00:00
Traceback (most recent call last):
File "/app/home/management/commands/crawlposts.py", line 86, in handle_noargs
self.crawlblog(blog)
File "/app/home/management/commands/crawlposts.py", line 50, in crawlblog
date = timezone.make_aware(date, timezone.get_default_timezone())
File "/app/.heroku/python/lib/python2.7/site-packages/django/utils/timezone.py", line 280, in make_aware
return timezone.localize(value, is_dst=None)
File "/app/.heroku/python/lib/python2.7/site-packages/pytz/tzinfo.py", line 349, in localize
raise AmbiguousTimeError(dt)
AmbiguousTimeError: 2014-11-02 01:00:00
on prod: announce
on staging: testing
locally: testing
Write a script that can be run weekly. It grabs the last week of LogEvents and tallies up the visits. Outputs a CSV to be emailed. Maybe with a structure like this:
post_id, post_title, post_url, visits
This is the first step toward weekly digest emails out to alums.
Here's what I get when I click the new login button:
An error has occurred
The redirect uri included is not valid.
Works fine from blaggregator.us, though. If this is part of a plan to get people to use the new URL, I suspect you could figure out a more direct way to do it :-).
add unique constraint to feed URL
/Users/sasha/code/blaggregator/lib/python2.7/site-packages/django/db/models/fields/__init__.py:827: RuntimeWarning: DateTimeField received a naive datetime (2013-03-04 22:31:00) while time zone support is active.
The current crawler script pulls the publication date from the post itself, so that the posts are correctly ordered when a new blog is registered. Otherwise, the entire history of one blog would appear to be the time when the blog was added to Blaggregator, which is confusing.
Now that there is high volume of posts, someone can start a draft, take two days to post it, and then when they post it it's already buried on the second page of Blaggregator listings. It also doesn't get Zuliped out as there is a filter for posts <2 days old.
Solution: continue to use existing crawlposts.py to crawl posts on their initial add. Create a second script that looks for new posts every 10 minutes, and timestamps each new post it sees as datetime.now
.
That way, a post will be at the top of Blaggregator and Zuliped when in becomes public, which is what users are expecting. This should obviously be throttled, in the case of a blog going offline for a few days and then suddenly coming back.
Each registered blog is pinged, in order. This is the slowest (and most expensive) part of the crawling, and the biggest reason why Blaggregator costs me $30/mo to run. There are a LOT of blogs that aren't parsing, which really slows down the crawling. I've been ignoring this, but as blaggregator grows this is becoming a bigger issue.
There are several possible ways to do this. One way:
Thoughts?
@punchagan Links to view blog posts works great from blaggregator.us, but ALL links are broken when coming from Zulip. I'm at the end of my time allotted to work on this today so I haven't looked at this in detail. Going to roll back now.
Here is a portion of the logsโI believe this is 1.5 of these errors. It's a little hard to tell since Heroku doesn't record | display logs in order.
2015-03-08T23:02:10.017844+00:00 app[web.1]: Traceback (most recent call last):
2015-03-08T23:02:10.017838+00:00 app[web.1]: Internal Server Error: /post/ThNclE/view
2015-03-08T23:02:10.017850+00:00 app[web.1]: File "/app/home/views.py", line 60, in view_post
2015-03-08T23:02:10.017848+00:00 app[web.1]: response = callback(request, *callback_args, **callback_kwargs)
2015-03-08T23:02:10.017854+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/manager.py", line 149, in create
2015-03-08T23:02:10.017859+00:00 app[web.1]: obj.save(force_insert=True, using=self.db)
2015-03-08T23:02:10.017852+00:00 app[web.1]: user_agent=request.META.get('HTTP_USER_AGENT', None),
2015-03-08T23:02:10.017855+00:00 app[web.1]: return self.get_query_set().create(**kwargs)
2015-03-08T23:02:10.017862+00:00 app[web.1]: force_update=force_update, update_fields=update_fields)
2015-03-08T23:02:10.017873+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 937, in execute_sql
2015-03-08T23:02:10.017867+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/manager.py", line 215, in _insert
2015-03-08T23:02:10.017863+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 650, in save_base
2015-03-08T23:02:10.017872+00:00 app[web.1]: return query.get_compiler(using=using).execute_sql(return_id)
2015-03-08T23:02:10.017870+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/query.py", line 1661, in insert_query
2015-03-08T23:02:10.017846+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
2015-03-08T23:02:10.017878+00:00 app[web.1]: six.reraise(utils.IntegrityError, utils.IntegrityError(*tuple(e.args)), sys.exc_info()[2])
2015-03-08T23:02:10.017865+00:00 app[web.1]: result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
2015-03-08T23:02:10.017875+00:00 app[web.1]: cursor.execute(sql, params)
2015-03-08T23:02:10.017876+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 56, in execute
2015-03-08T23:02:10.017885+00:00 app[web.1]: DETAIL: Failing row contains (29, 6677, 2015-03-08 23:02:10.010438+00, null, 10.123.66.198, Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.3...).
2015-03-08T23:02:10.017869+00:00 app[web.1]: return insert_query(self.model, objs, fields, **kwargs)
2015-03-08T23:02:10.017883+00:00 app[web.1]: IntegrityError: null value in column "referer" violates not-null constraint
2015-03-08T23:02:10.017860+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 546, in save
2015-03-08T23:02:10.017881+00:00 app[web.1]: return self.cursor.execute(query, args)
2015-03-08T23:02:10.017886+00:00 app[web.1]:
2015-03-08T23:02:10.017880+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 54, in execute
2015-03-08T23:02:20.749020+00:00 heroku[router]: at=info method=GET path="/post/UHZQQj/view" host=www.blaggregator.us request_id=53f98df7-cb9f-400e-a04a-00e6ea1dd86b fwd="24.193.114.250" dyno=web.1 connect=1ms service=60ms status=500 bytes=225
2015-03-08T23:02:20.744323+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
2015-03-08T23:02:20.744330+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/manager.py", line 149, in create
2015-03-08T23:02:20.744333+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/query.py", line 402, in create
2015-03-08T23:02:20.744339+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 650, in save_base
2015-03-08T23:02:20.744342+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/manager.py", line 215, in _insert
2015-03-08T23:02:20.744349+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 937, in execute_sql
2015-03-08T23:02:20.744353+00:00 app[web.1]: six.reraise(utils.IntegrityError, utils.IntegrityError(*tuple(e.args)), sys.exc_info()[2])
2015-03-08T23:02:20.744356+00:00 app[web.1]: return self.cursor.execute(query, args)
2015-03-08T23:02:20.744358+00:00 app[web.1]: IntegrityError: null value in column "referer" violates not-null constraint
2015-03-08T23:02:20.744332+00:00 app[web.1]: return self.get_query_set().create(**kwargs)
2015-03-08T23:02:20.744328+00:00 app[web.1]: user_agent=request.META.get('HTTP_USER_AGENT', None),
2015-03-08T23:02:20.744335+00:00 app[web.1]: obj.save(force_insert=True, using=self.db)
2015-03-08T23:02:20.744338+00:00 app[web.1]: force_update=force_update, update_fields=update_fields)
2015-03-08T23:02:20.744341+00:00 app[web.1]: result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
2015-03-08T23:02:20.744346+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/query.py", line 1661, in insert_query
2015-03-08T23:02:20.744301+00:00 app[web.1]: Internal Server Error: /post/UHZQQj/view
2015-03-08T23:02:20.744306+00:00 app[web.1]: Traceback (most recent call last):
2015-03-08T23:02:20.744360+00:00 app[web.1]: DETAIL: Failing row contains (30, 6593, 2015-03-08 23:02:20.735531+00, null, 10.123.66.198, Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.3...).
2015-03-08T23:02:20.744325+00:00 app[web.1]: response = callback(request, *callback_args, **callback_kwargs)
2015-03-08T23:02:20.744327+00:00 app[web.1]: File "/app/home/views.py", line 60, in view_post
2015-03-08T23:02:20.744344+00:00 app[web.1]: return insert_query(self.model, objs, fields, **kwargs)
2015-03-08T23:02:20.744336+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 546, in save
2015-03-08T23:02:20.744350+00:00 app[web.1]: cursor.execute(sql, params)
2015-03-08T23:02:20.744347+00:00 app[web.1]: return query.get_compiler(using=using).execute_sql(return_id)
2015-03-08T23:02:20.744361+00:00 app[web.1]:
2015-03-08T23:02:20.744352+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 56, in execute
2015-03-08T23:02:20.744355+00:00 app[web.1]: File "/app/.heroku/python/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 54, in execute
There are lingering files
Because I'm dumb
Please see this issue: brannerchinese/feedergrabber#1, affecting Blaggregator.
Add a try/catch to ask user to try again on the same page.
Blaggregator bot announced my latest post 3 times. Haven't seen this replicated with others' blog posts though.
Django 1.5 has long been unsupported. It is quite important to update to one of the supported version, preferably the latest version of Django.
When Sonali updated everyone's avatars for this batch, the URLs all changed. We currently hotlink to their avatar images rather than storing them ourselves (that's another Issue entirely) so they're all broken.
Solution: write a web scraping script to run once to pull all the URLs from hackerschool.com/private and update the avatar URLs in our database.
Add a dummy 'HS' user and their blog feed so official posts are tracked.
after deploying #90
Updating at Sasha's request on 20130719.
Some blog-post titles display certain Unicode symbols incorrectly. The symbols involved are those for which there are special HTML escapes. The Blaggregator is receiving these symbols correctly as Unicode strings but for some reason is converting them to HTML escapes, and then after that they are appearing literally as ASCII representations of those escapes, in which the ampersand element is being rendered as &
. This last issue is the reason for the incorrect appearance of the symbols on the website; if well-formed HTML escapes were being written to HTML, browsers would probably display them correctly.
feedergrabber27.py
for feedparser.parse()
, which are passing Unicode strings correctly.Example 1 โ the greater-than symbol:
In [1]: text = u'Now filtering (> /dev/null) some spam before it reaches the Gmail Spam folder'
In [2]: text
Out[2]: u'Now filtering (> /dev/null) some spam before it reaches the Gmail Spam folder'
In [3]: print text
Now filtering (> /dev/null) some spam before it reaches the Gmail Spam folder
It appears on the Blaggregator at http://blaggregator.herokuapp.com/post/4ruqWY/ โ the HTML contains &gt;
rather than >
or >
. Appears normally on WordPress: http://brannerchinese.wordpress.com/2013/06/26/freeman-halton-3x3-exact-test/
Example 2 โ the cross-product symbol:
In [4]: another = u'Freeman-Halton 3\xd73 exact test'
In [5]: another
Out[5]: u'Freeman-Halton 3\xd73 exact test'
In [6]: print another
Freeman-Halton 3ร3 exact test
It appears on the Blaggregator at http://blaggregator.herokuapp.com/post/GQrsuX/ โ the HTML contains &#215;
rather than ×
or ร
. Appears normally on WordPress: http://brannerchinese.wordpress.com/2013/07/07/now-filtering-devnull-some-spam-before-it-reaches-the-gmail-spam-folder/
Important: Note that the Blaggregator's HTML does not contain the actual HTML escapes, which would probably display correctly; it contains an ASCII rendering of the HTML escapes, with the ampersand element of each escape replaced by &
.
The HTML should contain the line
<meta charset="UTF-8">
within <head>
. At the moment it does not.
Previously, I had thought that this was a problem in feedergrabber27.py
. I am now able to demonstrate that it is not.
[end]
This is a good test case (hah) for tests. Found a small bug in the pull request I just merged, after deploying:
http://www.blaggregator.us/post/e9VGgd/view
throws the standard 500 (I expected it to throw the pretty 404)
http://www.blaggregator.us/post/e9VnthoeunthGgd/view
throws a pretty 404 (as expected)
While reading http://www.blaggregator.us/post/5RrMX3/view, if I click the link near the end of the post ("this page of the ActiveRecord github repo"), nothing happens. If I first close the frame and then click the link, it works fine.
Now that #67 is deployed, new folks to the blogging stream on Zulip won't know about the blaggregator.us site, and won't know how to add their own blogs.
A periodic (perhaps monthly?) reminder to the stream about the site, how to add or edit their blogs, how to contribute, etc, might be useful for everyone.
cc @punchagan
Am I failing to find the UI element for this, or is there really no way to do it? I changed domains, and added my "new" blog, and now it seems to exist twice, which caused my most recent blog post to show up twice. I may try to figure out how to add the feature myself when I have time, but I know neither programming language used for blaggregator.
Also, for reasons that might be related (???), my second most recent blog post shows up more than just twice. Not sure what's going on there.
Whenever I look at Blaggregator, it's always the same articles at the top.
It looks like this is happening when blogs set their date as being in the future, thus nothing current will ever pass them.
Could Blaggregator filter out articles that have invalid dates, so that content doesn't get stuck at the top of the feed?
Thanks :)
I'm currently working on this. There's a lot of work that goes on in the background that we don't need to make a user wait for, like checking their blog for updates and sending messages to Humbug. Celery is a much less janky way to handle this.
It will also be able to handle upcoming features, like error notifications and email digests.
Looks like deleting the blog post violates the foreign key constraint on corresponding LogEntry instances.
Environment:
Request Method: GET
Request URL: http://blaggregator-staging.herokuapp.com/delete_blog/44/
Django Version: 1.5.1
Python Version: 2.7.4
Installed Applications:
('django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.staticfiles',
'home',
'django.contrib.admin',
'storages',
'south',
'django.contrib.humanize',
'social.apps.django_app.default')
Installed Middleware:
('django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'social.apps.django_app.middleware.SocialAuthExceptionMiddleware')
Traceback:
File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
response = callback(request, _callback_args, *_callback_kwargs)
return view_func(request, _args, *_kwargs)
blog.delete()
collector.delete()
transaction.commit(using=self.using)
connection.commit()
self._commit()
six.reraise(utils.IntegrityError, utils.IntegrityError(*tuple(e.args)), sys.exc_info()[2])
return self.connection.commit()
Exception Type: IntegrityError at /delete_blog/44/
Exception Value: update or delete on table "home_post" violates foreign key constraint "post_id_refs_id_673a729446c2b217" on table "home_logentry"
DETAIL: Key (id)=(725) is still referenced from table "home_logentry".
Right now, the people with the most posts are the most likely to show up. Limit instead so it only shows one post per user.
after db save on line 113
views.py
, line 127 uses get()
which assumes the queryset will only return one blog instance. If the user already has a blog on their profile this will add the blog but throw a 500.
Not sure what the best solution is here. Probably to display the URLs associated with the logged-in user (so they know what blogs they've already added) and to make the logic more flexible.
when a user adds their blog URL, their blog is crawled properly but the newest posts don't go out to humbug.
Need to refactor the add_blog view: remove the auto-crawl feature and let the hourly crawlposts
handle it.
Should be in the format used by Django models.
You are receiving this email because the following apps that you own are using an older release of the Python runtime (e.g. 2.7.0โ2.7.10) that is not officially supported by Heroku:
blaggregator
No action is required, but using the latest stable release is highly recommended.
You can upgrade your app to Python 2.7.11 by adding a runtime.txt file (next to requirements.txt) with the contents: python-2.7.11. After deploying, this change will install the updated version of Python, as well as re-install all of your dependencies.
Exception:
2013-04-25T21:58:26.919945+00:00 app[scheduler.1307]: ** CRAWLING http://brannerchinese.wordpress.com/feed/atom/
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: Retrieved 'ascii' codec can't encode character u'\u014d' in position 4: ordinal not in range(128)
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: UnicodeEncodeError: 'ascii' codec can't encode character u'\u014d' in position 4: ordinal not in range(128)
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: self.crawlblog(blog)
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: Traceback (most recent call last):
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: File "/app/home/management/commands/crawlposts.py", line 57, in crawlblog
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: File "/app/home/management/commands/crawlposts.py", line 78, in handle_noargs
2013-04-25T21:58:27.194885+00:00 app[scheduler.1307]: print "Retrieved", title
Add this to the README
Want to contribute? Since people may not check in every day but want to stay updated with a daily email digest of today's posts. This will also really help the community stay engaged, leading to better discussions on the site.
I get a failed login error when using a differently-capitalized version of my email address than the one I registered with (say, [email protected] vs. [email protected]).
Hey, great work on Blaggregator, it's awesome that it's still being contributed to and maintained.
I propose retiring frames and pointing all Blaggregator links directly to the source. This would fix three problems frames cause:
leading to a 500.
see zulip petal: https://zulip.com/#narrow/stream/announce/topic/new.20blog.20post.3A.20Setting.20up.20automatic.20testing.20with.20Travis-C.2E.2E.2E
http://blaggregator.us/hackerschool --> the app
I'll take care of this since I own the domain.
Add full text search. Initial work here
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.