snarfed / bridgy Goto Github PK

📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...

License: Creative Commons Zero v1.0 Universal

Python 79.44% JavaScript 5.46% CSS 0.63% HTML 14.34% Shell 0.13%

bridgy's Introduction

Bridgy

Bridgy connects your web site to social media. Likes, reposts, mentions, cross-posting, and more. See the user docs for more details, or the developer docs if you want to contribute.

https://brid.gy/

Bridgy is part of the IndieWeb ecosystem. In IndieWeb terminology, Bridgy offers backfeed, POSSE, and webmention support as a service.

License: This project is placed in the public domain. You may also use it under the CC0 License.

Development

Pull requests are welcome! Feel free to ping me in #indieweb-dev with any questions.

First, fork and clone this repo. Then, install the Google Cloud SDK and run gcloud components install cloud-firestore-emulator to install the Firestore emulator. Once you have them, set up your environment by running these commands in the repo root directory:

gcloud config set project brid-gy
python3 -m venv local
source local/bin/activate
pip install -r requirements.txt
# needed to serve static files locally
ln -s local/lib/python3*/site-packages/oauth_dropins/static oauth_dropins_static

Now, you can fire up the gcloud emulator and run the tests:

gcloud emulators firestore start --host-port=:8089 --database-mode=datastore-mode < /dev/null >& /dev/null &
python3 -m unittest discover -s tests -t .
kill %1

If you send a pull request, please include or update a test for your new code!

To run the app locally, use flask run:

gcloud emulators firestore start --host-port=:8089 --database-mode=datastore-mode < /dev/null >& /dev/null &
GAE_ENV=localdev FLASK_ENV=development flask run -p 8080

Open localhost:8080 and you should see the Bridgy home page!

To test a poll or propagate task, find the relevant Would add task line in the logs, eg:

INFO:root:Would add task: projects//locations/us-central1/queues/poll {'app_engine_http_request': {'http_method': 'POST', 'relative_uri': '/_ah/queue/poll', 'app_engine_routing': {'service': 'background'}, 'body': b'source_key=agNhcHByFgsSB1R3aXR0ZXIiCXNjaG5hcmZlZAw&last_polled=1970-01-01-00-00-00', 'headers': {'Content-Type': 'application/x-www-form-urlencoded'}}, 'schedule_time': seconds: 1591176072

...pull out the relative_uri and body, and then put them together in a curl command against localhost:8080 (but don't run it yet!), eg:

curl -d 'source_key=agNhcHByFgsSB1R3aXR0ZXIiCXNjaG5hcmZlZAw&last_polled=1970-01-01-00-00-00' \
  http://localhost:8080/_ah/queue/poll

Then, restart the app with FLASK_APP=background to run the background task processing service, eg:

gcloud emulators firestore start --host-port=:8089 --database-mode=datastore-mode
GAE_ENV=localdev FLASK_ENV=development flask run -p 8080

Now, run the curl command you constructed above.

If you hit an error during setup, check out the oauth-dropins Troubleshooting/FAQ section. For searchability, here are a handful of error messages that have solutions there:

bash: ./bin/easy_install: ...bad interpreter: No such file or directory

ImportError: cannot import name certs

ImportError: cannot import name tweepy

File ".../site-packages/tweepy/auth.py", line 68, in _get_request_token
  raise TweepError(e)
TweepError: must be _socket.socket, not socket

error: option --home not recognized

There's a good chance you'll need to make changes to granary or oauth-dropins at the same time as bridgy. To do that, clone their repos elsewhere, then install them in "source" mode with:

pip uninstall -y oauth-dropins
pip install -e <path-to-oauth-dropins-repo>
ln -sf <path-to-oauth-dropins-repo>/oauth_dropins/static oauth_dropins_static

pip uninstall -y granary
pip install -e <path to granary>

To deploy to App Engine, run scripts/deploy.sh.

remote_api_shell is a useful interactive Python shell that can interact with the production app's datastore, memcache, etc. To use it, create a service account and download its JSON credentials, put it somewhere safe, and put its path in your GOOGLE_APPLICATION_CREDENTIALS environment variable.

Deploying to your own App Engine project can be useful for testing, but is not recommended for production. To deploy to your own App Engine project, create a project on gcloud console and activate the Tasks API. Initialize the project on the command line using gcloud config set project <project-name> followed by gcloud app create. You will need to update TASKS_LOCATION in util.py to match your project's location. Finally, you will need to add your "background" domain (eg background.YOUR-APP-NAME.appspot.com) to OTHER_DOMAINS in util.py and set host_url in tasks.py to your base app url (eg app-dot-YOUR-APP-NAME.wn.r.appspot.com). Finally, deploy (after testing) with gcloud -q beta app deploy --no-cache --project YOUR-APP-NAME *.yaml

To work on the browser extension:

cd browser-extension
npm install
npm run test

To run just one test:

npm run test -- -t 'part of test name'

Browser extension: logs in the JavaScript console

If you're working on the browser extension, or you're sending in a bug report for it,, its JavaScript console logs are invaluable for debugging. Here's how to get them in Firefox:

Open about:debugging
Click This Firefox on the left
Scroll down to Bridgy
Click Inspect
Click on the Console tab

Here's how to send them in with a bug report:

Right click, Export Visible Messages To, File, save the file.
Email the file to bridgy @ ryanb.org. Do not post or attach it to a GitHub issue, or anywhere else public, because it contains sensitive tokens and cookies.

Browser extension: release

Here's how to cut a new release of the browser extension and publish it to addons.mozilla.org:

ln -fs manifest.firefox.json manifest.json
Load the extension in Firefox (about:debugging). Check that it works.
Bump the version in browser-extension/manifest.json.
Update the Changelog in the README.md section below this one.

Build and sign the artifact:

cd browser-extension/
npm test
./node_modules/web-ext/bin/web-ext.js build

Submit it to AMO.

# get API secret from Ryan if you don't have it
./node_modules/web-ext/bin/web-ext.js sign --api-key user:14645521:476 --api-secret ...

# If this succeeds, it will say:
...
Your add-on has been submitted for review. It passed validation but could not be automatically signed because this is a listed add-on.
FAIL
...

It's usually auto-approved within minutes. Check the public listing here.

Here's how to publish it to the Chrome Web Store:

ln -fs manifest.chrome.json manifest.json
Load the extension in Chrome (chrome://extensions/, Developer mode on). Check that it works.

Build and sign the artifact:

cd browser-extension/
npm test
./node_modules/web-ext/bin/web-ext.js build

Open the console.
Open the Bridgy item.
Choose Package on the left.
Click the Upload new package button.
Upload the new version's zip file from browser-extension/web-ext-artifacts/.
Update the Changelog in the Description box. Leave the rest unchanged.
Click Save draft, then Submit for review.

Browser extension: Changelog

0.7.0, 2024-01-03

Remove Instgram. Their anti-bot defenses have led them to suspend a couple people's accounts for using this extension, so we're disabling it out of an abundance of caution. Sorry for the bad news.

0.6.1, 2022-09-18

Don't open silo login pages if they're not logged in. This ran at extension startup time, which was mostly harmless in manifest v2 since the background page was persistent stayed loaded, but in manifest v3 it's a service worker or non-persistent background page, which gets unloaded and then reloaded every 5m.

0.6.0, 2022-09-17

Migrate Chrome (but not Firefox) from Manifest v2 to v3.

0.5, 2022-07-21

Update Instagram scraping.

0.4, 2022-01-30

Fix Instagram comments. Add extra client side API fetch, forward to new Bridgy endpoint.
Expand error messages in options UI.

0.3.5, 2021-03-04

Dynamically adjust polling frequency per silo based on how often we're seeing new comments and reactions, how recent the last successful webmention was, etc.

0.3.4, 2021-02-22

Allow individually enabling or disabling Instagram and Facebook.

0.3.3, 2021-02-20

Only override requests from the browser extension, not all requests to the silos' domains.

0.3.2, 2021-02-18

Fix compatibility with Facebook Container Tabs.

0.3.1, 2021-02-17

Add Facebook support!

0.2.1, 2021-01-09

Add more details to extensions option page: Instagram login, Bridgy IndieAuth registration, etc.
Support Firefox's Facebook Container Tabs addon.

0.2, 2021-01-03

Add IndieAuth login on https://brid.gy/ and token handling.
Add extension settings page with status info and buttons to login again and poll now.
Better error handling.

0.1.5, 2020-12-25

Initial beta release!

Adding a new silo

So you want to add a new silo? Maybe MySpace, or Friendster, or even Tinder? Great! Here are the steps to do it. It looks like a lot, but it's not that bad, honest.

Find the silo's API docs and check that it can do what Bridgy needs. At minimum, it should be able to get a user's posts and their comments, likes, and reposts, depending on which of those the silo supports. If you want publish support, it should also be able to create posts, comments, likes, reposts, and/or RSVPs.
Fork and clone this repo.
Create an app (aka client) in the silo's developer console, grab your app's id (aka key) and secret, put them into new local files in the repo root dir, following this pattern. You'll eventually want to send them to @snarfed too, but no hurry.
Add the silo to oauth-dropins if it's not already there:
1. Add a new .py file for your silo with an auth model and handler classes. Follow the existing examples.
2. Add a 100 pixel tall button image named [NAME]_2x.png, where [NAME] is your start handler class's NAME constant, eg 'twitter'.
3. Add it to the app front page and the README.
Add the silo to granary:
1. Add a new .py file for your silo. Follow the existing examples. At minimum, you'll need to implement get_activities_response and convert your silo's API data to ActivityStreams.
2. Add a new unit test file and write some tests!
3. Add it to api.py (specifically Handler.get), app.py, index.html, and the README.
Add the silo to Bridgy:
1. Add a new .py file for your silo with a model class. Follow the existing examples.
2. Add it to app.py and handlers.py (just import the module).
3. Add a 48x48 PNG icon to static/.
4. Add a new [SILO]_user.html file in templates/ and add the silo to index.html. Follow the existing examples.
5. Add the silo to about.html and this README.
6. If users' profile picture URLs can change, add a cron job that updates them to cron.py.
Optionally add publish support:
1. Implement create and preview_create for the silo in granary.
2. Add the silo to publish.py: import its module, add it to SOURCES, and update this error message.

Good luck, and happy hacking!

Monitoring

App Engine's built in dashboard and log browser are pretty good for interactive monitoring and debugging.

For alerting, we've set up Google Cloud Monitoring (née Stackdriver). Background in issue 377. It sends alerts by email and SMS when HTTP 4xx responses average >.1qps or 5xx >.05qps, latency averages >15s, or instance count averages >5 over the last 15m window.

Stats

I occasionally generate stats and graphs of usage and growth from the BigQuery dataset (#715). Here's how.

Export the full datastore to Google Cloud Storage. Include all entities except *Auth, Domain and others with credentials or internal details. Check to see if any new kinds have been added since the last time this command was run.
```
gcloud datastore export --async gs://brid-gy.appspot.com/stats/ --kinds Activity,Blogger,BlogPost,BlogWebmention,Bluesky,Facebook,FacebookPage,Flickr,GitHub,GooglePlusPage,Instagram,Mastodon,Medium,Meetup,Publish,PublishedPage,Reddit,Response,SyndicatedPost,Tumblr,Twitter,WordPress
```
Note that --kinds is required. From the export docs, Data exported without specifying an entity filter cannot be loaded into BigQuery. Also, expect this to cost around $10.
Wait for it to be done with gcloud datastore operations list | grep done or by watching the Datastore Import/Export page.

Import it into BigQuery:

for kind in Activity BlogPost BlogWebmention Publish SyndicatedPost; do
  bq load --replace --nosync --source_format=DATASTORE_BACKUP datastore.$kind gs://brid-gy.appspot.com/stats/all_namespaces/kind_$kind/all_namespaces_kind_$kind.export_metadata
done

for kind in Blogger Bluesky Facebook FacebookPage Flickr GitHub GooglePlusPage Instagram Mastodon Medium Meetup Reddit Tumblr Twitter WordPress; do
  bq load --replace --nosync --source_format=DATASTORE_BACKUP sources.$kind gs://brid-gy.appspot.com/stats/all_namespaces/kind_$kind/all_namespaces_kind_$kind.export_metadata
done

Open the Datastore entities page for the Response kind, sorted by updated ascending, and check out the first few rows: https://console.cloud.google.com/datastore/entities;kind=Response;ns=__$DEFAULT$__;sortCol=updated;sortDir=ASCENDING/query/kind?project=brid-gy

Open the existing Response table in BigQuery: https://console.cloud.google.com/bigquery?project=brid-gy&ws=%211m10%211m4%214m3%211sbrid-gy%212sdatastore%213sResponse%211m4%211m3%211sbrid-gy%212sbquxjob_371f97c8_18131ff6e69%213sUS

Update the year in the queries below to three years before this year. Query for the same first few rows sorted by updated ascending, check that they're the same:

SELECT * FROM `brid-gy.datastore.Response`
WHERE updated >= TIMESTAMP('202X-11-01T00:00:00Z')
ORDER BY updated ASC
LIMIT 10

Delete those rows:

DELETE FROM `brid-gy.datastore.Response`
WHERE updated >= TIMESTAMP('202X-11-01T00:00:00Z')

Load the new Response entities into a temporary table:

bq load --replace=false --nosync --source_format=DATASTORE_BACKUP datastore.Response-new gs://brid-gy.appspot.com/stats/all_namespaces/kind_Response/all_namespaces_kind_Response.export_metadata

Append that table to the existing Response table:

SELECT
leased_until,
original_posts,
type,
updated,
error,
sent,
skipped,
unsent,
created,
source,
status,
failed,

ARRAY(
  SELECT STRUCT<`string` string, text string, provided string>(a, null, 'string')
  FROM UNNEST(activities_json) as a
 ) AS activities_json,

IF(urls_to_activity IS NULL, NULL,
   STRUCT<`string` string, text string, provided string>
     (urls_to_activity, null, 'string')) AS urls_to_activity,

IF(response_json IS NULL, NULL,
   STRUCT<`string` string, text string, provided string>
     (response_json, null, 'string')) AS response_json,

ARRAY(
  SELECT STRUCT<`string` string, text string, provided string>(x, null, 'string')
  FROM UNNEST(old_response_jsons) as x
) AS old_response_jsons,

__key__,
__error__,
__has_error__

FROM `brid-gy.datastore.Response-new`

More => Query settings, Set a destination table for query results, dataset brid-gy.datastore, table Response, Append, check Allow large results, Save, Run.

Open sources.Facebook, edit schema, add a url field, string, nullable.

Check the jobs with bq ls -j, then wait for them with bq wait.
Run the full stats BigQuery query. Download the results as CSV.
Open the stats spreadsheet. Import the CSV, replacing the data sheet.
Change the underscores in column headings to spaces.
Open each sheet, edit the chart, and extend the data range to include all of the new rows.
Check out the graphs! Save full size images with OS or browser screenshots, thumbnails with the Download Chart button. Then post them!

Final cleanup: delete the temporary Response-new table.

Delete old responses

Bridgy's online datastore only keeps responses for a year or two. I garbage collect (ie delete) older responses manually, generally just once a year when I generate statistics (above). All historical responses are kept in BigQuery for long term storage.

I use the Datastore Bulk Delete Dataflow template with a GQL query like this. (Update the years below to two years before today.)

SELECT * FROM Response WHERE updated < DATETIME('202X-11-01T00:00:00Z')

I either use the interactive web UI or this command line:

gcloud dataflow jobs run 'Delete Response datastore entities over 1y old'
  --gcs-location gs://dataflow-templates-us-central1/latest/Datastore_to_Datastore_Delete
  --region us-central1
  --staging-location gs://brid-gy.appspot.com/tmp-datastore-delete
  --parameters datastoreReadGqlQuery="SELECT * FROM `Response` WHERE updated < DATETIME('202X-11-01T00:00:00Z'),datastoreReadProjectId=brid-gy,datastoreDeleteProjectId=brid-gy"

Expect this to take at least a day or so.

Once it's done, update the stats constants in admin.py.

Misc

The datastore is exported to BigQuery (#715) twice a year.

We use this command to set a Cloud Storage lifecycle policy on our buckets to prune older backups and other files:

gsutil lifecycle set cloud_storage_lifecycle.json gs://brid-gy.appspot.com
gsutil lifecycle set cloud_storage_lifecycle.json gs://brid-gy_cloudbuild
gsutil lifecycle set cloud_storage_lifecycle.json gs://staging.brid-gy.appspot.com
gsutil lifecycle set cloud_storage_lifecycle.json gs://us.artifacts.brid-gy.appspot.com

See how much space we're currently using in this dashboard. Run this to download a single complete backup:

gsutil -m cp -r gs://brid-gy.appspot.com/weekly/datastore_backup_full_YYYY_MM_DD_\* .

bridgy's People

Contributors

Stargazers

Watchers

bridgy's Issues

fix SSL cert

evidently it's missing the intermediate chaining cert. thanks to @aaronpk for debugging!

repro and test:
openssl s_client -connect www.brid.gy:443
openssl s_client -connect brid-gy.appspot.com:443
test: https://www.ssllabs.com/ssltest/analyze.html?d=www.brid.gy&s=74.125.194.121

drop propagate tasks for deleted sources

e.g.:

  File "/base/data/home/apps/s~brid-gy/2.372909736058875896/tasks.py", line 199, in post
    response.source.kind(), response.key().name())
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 3720, in __get__
    reference_id.to_path())
ReferencePropertyResolveError: ReferenceProperty failed to be resolved:

unshorten target URLs by following redirects before sending webmentions

it looks like pretty much all the webmention servers we've hit so far accept shortened target URLs that are different than the u-in-reply-to URL in the source markup, but we shouldn't depend on it. we should probably unshorten them ourselves first.

example of sending a shortened target URL (it happened to succeed):

2014-01-01 01:59:55.388259 D Params: UnicodeMultiDict([(u'response_key', u'aglzfmJyaWQtZ3lyNQsSCFJlc3BvbnNlIid0YWc6dHdpdHRlci5jb20sMjAxNDo0MTc3MTM5MDQ4ODY2MTE5NjgM')])
...
2014-01-01 02:00:06.735980 I Sent! {'body': u'<!DOCTYPE html>\n<html>\n  <head>\n    <title>WebMention</title>\n  </head>\n  <body>\n      <p>WebMention was successful</p>\n  </body>\n</html>', 'request': u'POST http://barryfrost.com/webmention (with source=http://www.brid.gy/comment/twitter/barryf/417676306151911424/417713904886611968, target=http://bfr.st/76)', 'http_status': 202}

here's the code to do it:

# may be a shortened link. try following redirects.
# (could use a service like http://unshort.me/api.html instead,
# but not sure it'd buy us anything.)
try:
  resolved = urlfetch.fetch(expanded_url, method='HEAD',
                            follow_redirects=True, deadline=999)
  if getattr(resolved, 'final_url', None):
    logging.debug('Resolved short url %s to %s', expanded_url,
                  resolved.final_url)
    expanded_url = resolved.final_url
except urlfetch.DownloadError, e:
  logging.error("Couldn't resolve URL: %s", e)

don't ever let poll immediately retry

...we should just let it fail and re-insert the task with the same delay. no sense in thrashing on failures when an immediate retry usually won't help.

new front page design, part 2

#19 was a good first step, and we're ok for now, but we'll eventually need another redesign. showing all users on the front page won't scale. :P

honestly, I have no idea what to do. I have plenty of ideas, but I don't like any of them. I'm kind of dreading this.

all bridgy posts should have explicit p-name inside their h-entry

Currently bridgy posts typically have an h-entry without any explicit p-name. Thus by the microformats2 implied p-name rules, the entirety of the h-entry's inner text becomes its p-name which results in unexpected / undesirable UI effects, e.g. see backfed mentions on this event post: http://aaronparecki.com/events/2014/01/29/1/homebrew-website-club

All bridgy posts with h-entry (which is all of them I think) should have an explicit p-name that makes sense for that post, that a receiver could use to display a brief text summary of the post.

animate front page transitions

specifically, for opening and closing recent responses and the "more" div. i have a stash that does this with a CSS transition on max-height, which works but isn't great because you have to pick an arbitrary max-height, which won't match height: auto, so the timing is off. i also tried a transition on the background color when changing "Recent responses..." to "No responses yet," but no luck. @keyframes might work, but meh.

support facebook pages as accounts

requested by rim_light: https://twitter.com/rim_light/status/425595328520544256

weird bug with linkifying mention tags

not sure what's going on yet. log:

https://appengine.google.com/logs?app_id=s~brid-gy&severity_level_override=1&severity_level=3&tz=US%2FPacific&filter=3575419784237836925&filter_type=regex&date_type=now&date=2014-01-07&time=15%3A50%3A00&limit=20&view=Search

handle Twitter HTTP 429 rate limiting

...in both poll and propagate tasks. linear backoff by 5m or so. check the recommendations in their docs.

batch Twitter searches

facebook and google+ both support batching API requests. facebook can even parameterize requests in the batch on the results of earlier requests in the same batch.

full polls for FB and G+ both take a fair number of requests, so this would definitely help.

https://developers.facebook.com/docs/reference/api/batch/
https://developers.google.com/api-client-library/python/guide/batch

webmention discovery fails on notizblog.org

we currently think http://notizblog.org/2013/04/18/sempress-auf-wordpress-com/ has no webmention support, but it has these in the head section:

<link rel="http://webmention.org/" href="http://notizblog.org/?webmention=endpoint">
<link rel="webmention" href="http://notizblog.org/?webmention=endpoint">

sorry @pfefferle! i'll get this fixed soon.

fix module import paths

most of the modules are imported from absolute paths, e.g. from activitystreams.oauth_dropins.python_instagram import bind, but some of the third party libraries, like python-instagram, expect that modules are installed and import them directly, e.g. import httplib2. this ends up with modules duplicated under different paths, which means that sometimes i can't import them from the same path that they're used from. example:

https://github.com/snarfed/activitystreams-unofficial/blob/ecf78d88587e778e4d0702641af0a781f16f639c/googleplus.py#L85

one solution is to import the from where they're expected, e.g. from apiclient above instead of from oauth_dropins.apiclient. another is to pre-fill them in sys.modules under the second path, e.g. sys.modules['httplib2'] = sys.modules['oauth_dropins.httplib2'].

bleh. fun with programming languages!

handle oauth declines

for both signup and delete. we currently 400, and don't even try to render something reasonable, which is pretty embarrassing.

unit test for activitystreams-unofficial Twitter.fetch_replies()

FB isn't returning 'link' posts in the /posts edge any more

...again. sigh. we probably need to start querying /links explicitly.

example:

thanks to @danito for finding!

find an alternative to the resident backend for twitter streaming

this might cut down on cost, since a module could serve the frontend requests too. B1 backends get 9h free per day. dynamic modules get 28h free per day, manual only 8h.

https://developers.google.com/appengine/kb/billing#free_quota_backends https://developers.google.com/appengine/docs/python/modules/#scaling_types

alternatively, maybe use only the backend and have it serve HTTP requests too?

cache served microformats2 responses

...with a short expiration. ideally include the cache expiration in the headers and/or content itself.

alternatively, serve them from the stored response JSON in the datastore. drawback is that we don't get updated content.

store oauth refresh tokens

good hygiene, and would speed up polling, since e.g. G+ has to refresh the token on every single API call right now, and sometimes it gets into long loops and fails entirely.

detect and skip non-HTML original post links

...maybe with a HEAD request. e.g. https://twitter.com/snarfed_org/status/414172539837874176 links to https://research.microsoft.com/en-us/people/mickens/thesaddestmoment.pdf, which we know can't be a webmention target. (right?)

update profile pictures when they change

when an account is added, we store its profile picture URL in the datastore, but we never try to update it. facebook uses simple username-based URLs that automatically return the newest picture, but twitter doesn't, so its picture URLs start 404ing when a user changes to a new one. (G+ and instagram's picture URLs also point to specific pictures, but it's unclear whether they'll eventually 404 too, or live forever.)

example: https://www.brid.gy/#twitter-djp1974 , https://twitter.com/djp1974 . updating it manually for now.

hitting Google+ API quota

we currently hit our daily quota for Google+ API calls at around 6:30-7pm or so. :/ we're currently limited to the courtesy limit of 10k requests per day, and the G+ API doesn't yet support billing, so the only option is to request more quota.

https://code.google.com/apis/console/b/0/?noredirect&pli=1#project:1029605954231:quotas
https://support.google.com/plus/contact/request_quota?id=1029605954231

example log: https://www.brid.gy/log?start_time=1388982947&key=aglzfmJyaWQtZ3lyKQsSDkdvb2dsZVBsdXNQYWdlIhUxMDM2NTEyMzE2MzQwMTgxNTg3NDYM

HttpError 403 when requesting https://www.googleapis.com/plus/v1/people/me/activities/public?alt=json&maxResults=20 returned "Daily Limit Exceeded"

don't report webmention failures as task failures

...in the app engine dashboard. specifically, find an HTTP status code that will make the task queue retry but won't show up as an error request. not sure it exists, but would be nice.

new front page design

fetching every single user and their recent responses won't last for long. :P either ajax in the recent responses on demand, add paging, or switch to global recent responses and individual user pages.

port to ndb

won't help front page latency much because queries aren't cached, but i think it's a pretty painless port, and it will help gets, which tasks and handlers do.

clear yellow notification messages after a brief delay

...with JS somehow. god help me.

send new webmentions for updated/deleted responses

Invalid HTML output for microformats HTML

a <title> is missing, which has to be there, it is not optional.
dt-updated datetime is empty, the tag needs to go or fill in a date string
<img> must have a alt attribute, even if it is empty
<a>-tags are non void, so you can't self-close them like <a />, you need to do <a></a>
the document should have a character encoding just add a <meta charset="utf-8"> in there and you're good

You can check it here: http://validator.w3.org/check?uri=https%3A%2F%2Fbrid-gy.appspot.com%2Flike%2Ftwitter%2Fjeena%2F424554756917702656%2F109427493&charset=%28detect+automatically%29&doctype=Inline&group=0

handle bad URLs found in original post discovery

we currently choke in tasks.in_webmention_blacklist(). i can easily catch that, but i'm not sure what to do with the bad URL. drop it? say it doesn't support webmention?

  File "/base/data/home/apps/s~brid-gy/2.372832571667703411/tasks.py", line 111, in post
    self.do_post(source)
  File "/base/data/home/apps/s~brid-gy/2.372832571667703411/tasks.py", line 140, in do_post
    targets = get_webmention_targets(activity)
  File "/base/data/home/apps/s~brid-gy/2.372832571667703411/tasks.py", line 71, in get_webmention_targets
    not in_webmention_blacklist(url)):
  File "/base/data/home/apps/s~brid-gy/2.372832571667703411/tasks.py", line 78, in in_webmention_blacklist
    domain = urlparse.urlparse(url).netloc
  File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/urlparse.py", line 142, in urlparse
    tuple = urlsplit(url, scheme, allow_fragments)
  File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/urlparse.py", line 190, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

cache some API calls with a short expiration, e.g. twitter mentions

improve and handle httplib2.HttpException

httplib2 swallows some common errors, notably app engine's urlfetch deadline, and raises bare httplib.HttpException instead with no details. :/ see https://github.com/jcgregorio/httplib2/blob/master/python2/httplib2/__init__.py#L1119

i should patch it (hi @jcgregorio!) to include details, maybe as subclasses of the proper type based on HTTP status code, then handle those in the poll task.

example stacktraces:

Traceback (most recent call last):
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/httplib2/__init__.py", line 1108, in request
    validate_certificate=self.validate_certificate)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/urlfetch.py", line 270, in fetch
    return rpc.get_result()
...
DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL: https://...

Traceback (most recent call last):
...
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/tasks.py", line 116, in do_post
    etag=source.last_activities_etag, min_id=source.last_activity_id)
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/instagram.py", line 63, in get_activities_response
    return self.as_source.get_activities_response(*args, group_id=SELF, **kwargs)
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/instagram.py", line 115, in get_activities_response
    media, _ = self.api.user_recent_media(user_id, **kwargs)
...
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/httplib2/__init__.py", line 1585, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/httplib2/__init__.py", line 1333, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/httplib2/__init__.py", line 1303, in _conn_request
    response = conn.getresponse()
  File "/base/data/home/apps/s~brid-gy/2.373361167601449605/activitystreams/httplib2/__init__.py", line 1126, in getresponse
    raise httplib.HTTPException()
HTTPException

detect and handle non-public posts and responses

simplest thing to do is ignore them. could also handle them and send webmentions but not show them on the front page. other alternatives? not sure what to do yet.

port to ronkyuu?

vrypan/webmention-tools has worked well enough so far, but it's not really actively maintained. i've been the only one working on it for a while now.

on the other hand, @bear and @kartikprabhu are actively working on https://github.com/bear/ronkyuu . it might be worth looking at using that instead.

not a pain point right now, but maybe eventually!

rsvps need u-url

thanks to @benwerd for discovering.

handle incoming webmentions against silo targets

convert them to silo API calls to post them as comments, likes, etc.

big feature request. basically POSSE as a service. doable though!

Google+ and Facebook problems

It seems that Brid.gy has some problems with Facebook and Google+ posts...

This Google+ Post for example https://plus.google.com/u/0/+MatthiasPfefferle/posts/8opH1C1hfeB wasn't yet indexed by brid.gy https://www.brid.gy/#googleplus-118205389455269229204

Same with Facebook https://www.facebook.com/pfefferle/activity/10152202074018552 and https://www.brid.gy/#facebook-660908551

unicode!

i've spent approximately zero effort on this so far. :/ go back and do it right!

...as a first pass, it might be enough to centralize quote_plus(), etc into util.add_query_params.

replace t.co links in original posts as well as responses

ie in Response.activity_json, which shows up in the "on ...." in recent responses on the front page. already done for response_json; not sure why it's not done for activity_json too.

some facebook response HTML missing an in-reply-to link for the webmention target

examples:

Error sending to endpoint: {u'error_description': u'The source URI does not contain a link to the target URI', 'code': 'RECEIVER_ERROR', 'request': u'POST http://webmention.io/tantek.com/webmention (with source=https://brid-gy.appspot.com/like/facebook/214611/10100896813096163/4001907, target=http://ttk.me/t4Tt1)', 'http_status': 400, u'error': u'no_link_found'}

Error sending to endpoint: {u'error_description': u'The source URI does not contain a link to the target URI', 'code': 'RECEIVER_ERROR', 'request': u'POST http://webmention.io/tantek.com/webmention (with source=https://brid-gy.appspot.com/comment/facebook/214611/10100895437243383/10100895437243383_7659064, target=http://ttk.me/t4Ts3)', 'http_status': 400, u'error': u'no_link_found'}

the second one is here, but it's hard to tell from the HTML.
https://www.facebook.com/tantek.celik/posts/10100895437243383?comment_id=7665356&offset=0&total_comments=5

cache webmention discovery

backfeed comments from event posts (not currently exposed by FB)

bridgy should backfeed comments from event posts

should we send webmentions for event invites?

bridgy currently sends webmentions for FB event invitees as well as explicit rsvps. details in #42. should it not? what's the right thing to do in general?

the webmention sources contain u-in-reply-to but not p-rsvp. here's simplified markup from this example:

<article class="h-entry">
...
  <div class="h-card p-author">...</div>

  <div class="e-content">
  </div>

  <a class="u-in-reply-to" href="http://werd.io/2014/homebrew-website-club"></a>
  <a class="u-in-reply-to" href="http://indiewebcamp.com/events/2014-01-29-homebrew-website-club"></a>
</article>

cc @tantek, @aaronpk, @benwerd. feel free to add anyone else.

allow connecting accounts

e.g. for the same person's accounts on each silo. could use indieauth to also let them attach their domain, but not strictly necessary.

drop target URLs that 404

...and also other errors that we expect are permanent(ish)? related to #23.

port more API polling to streaming and/or webhooks

...like twitter favorites, which currently come over the twitter user streams API.

handle events and backfeed rsvps

only for FB and G+ right now, since they're the only ones that do events.

mostly just consists of adding event and rsvp support to activitystreams-unofficial. after that, bridgy should automatically do original post discovery, send webmentions for rsvps, and render the rsvps as mf2 without any additional bridgy-specific code.

thanks to @aaronpk and @tantek for the idea!

<a class="u-like u-like-of" href="http://twitter.com/pfefferle/status/423744359297585152"> </a>

unit tests for google+

both bridgy and activitystreams-unofficial.