Comments (8)
@jlicht Not sure if you're still working on this, or who might know the answer, but I'd appreciate it if you had time to take a look at this issue, if only to tell me whether I'm crazy in thinking that the Internet Archive API endpoint is totally broken or not.
🙏🏻
from amber_wordpress.
Hi Jer, I used to work on Amber but am no longer at BKC. I'm going include @jsdiaz here to make sure someone internal to the Organization sees this. I'm sure they'll appreciate the detailed issue report!
from amber_wordpress.
Thanks Ryan!
If anyone sees this and just knows where I can find docs for the web.archive.org/save/
endpoint from InternetArchiveFetcher
that would help a lot.
Some more research shows lots of people expecting that Content-Location
header to be there (Stack Overflow Question, researchers posting about it with examples as recent as July 8, but not after July 10)
But my attempts to replicate their example code on the command line aren't going well. I'm not geting Content-Location
, but I'm also not getting any 200
either, but a variety of 50*
errors after long long waits.
Maybe this comes down to some big problems over at IA. I tried tweeting them but didn't get anything back.
from amber_wordpress.
@jerclarke I don't see any reason to doubt that your analysis of the problem, either with regards to IA's behavior or the bug that you've identified. Thanks for the detailed investigation!
from amber_wordpress.
Alright, an update on the API question.
This post from Oct 2019 seems very relevant: The Wayback Machine’s Save Page Now is New and Improved
It doesn't mention the "endpoint" type usage that Amber and many bookmarklets made of the old web.archive.org/save/
service, but it talks about a big change to the overall "Save Page Now" feature, and this comment implies that to some degree the web.archive.org/save/
GET
based service has been broken since the Oct 2019 changes:
I’ve been using a Save Page Now bookmarklet that doesn’t work anymore since this feature was launched. It simply appends the URL:
javascript:(function(){location.href=’http://web.archive.org/save/’+(location.href);})();
I looked at the new source and the problem is the form now requires a fancy
POST
method. Why break what has worked?
Unfortunately I can't find any links anywhere to indicate what the POST
method might be, and I'm not sure where the "source" mentioned in the comment could be found.
Another lead is this gist that uses the old endpoint who's author also seems to think there is a new POST
system (not sure if they are basing it on the same comment I found or not).
from amber_wordpress.
I tried just looking at dev tools when using the website version of /save/
and it seems like the POST
request is super simple, just url=$url
.
When I run that request through PHP (WordPress HTTP API) it seems to work based on the content that's returned, but there's still no Content-Location header:
$result = wp_remote_post('http://web.archive.org/save/', array( 'body' => array('url'=>'http://google.com')));
Headers:
[server] => nginx/1.15.8
[date] => Thu, 30 Jul 2020 22:48:12 GMT
[content-type] => text/html; charset=utf-8
[cache-control] => no-cache
[x-app-server] => wwwb-app102
[x-ts] => 200
[x-location] => /save/
[x-cache-key] => httpweb.archive.org/save/MX
[content-encoding] => gzip
Relevant section of body:
Saving page http://google.com
The capture is estimated to start in 0 minutes.
Save also in my web archive.
Done!
In the normal "browser" version of the save
page, it first shows a progress message like "Saving..." then eventually you get this message:
A snapshot was captured. Visit page: /web/20200730225500/https://www.google.com/
It's possible there's no API way to get the content-location
without waiting for that page anymore...
from amber_wordpress.
Alright, I think that was an appropriate amount of effort in the name of archiving the open web.
Hopefully someone someday finds this and gives me a hint at how I can programatically save a URL to WayBack and get the path to the archive to save it to the amber_cache
db.
Until then, sadly, Amber will have to go dormant on Global Voices. RIP sweet Amber, you were a pain in the butt as well as a worthy moonshot.
from amber_wordpress.
Making some notes about things I couldn't find earlier using Google:
"api.archivelab.org" seems to have an API that is relevant, and which was noted as down then "fixed" over the last few days:
ArchiveLabs/api.archivelab.org#21
When I tried it it was still broken though, so 🤷🏻♀️
These docs seem to offer an API that will work at least in theory for Amber:
https://archive.readme.io/docs/creating-a-snapshot
The documentation site links to this JSON API for Wayback which I had seen before, but doesn't seem to offer an option to save
, and thus doesn't seem like it will help Amber, but maybe I'm missing something. I'm not super hot at JSON API usage.
from amber_wordpress.
Related Issues (20)
- Dashboard can display misleading messages if snapshot stored in multiple locations HOT 4
- Prevent direct access to saved snapshots HOT 1
- Validate WordPress 4.5 compatibility
- Upgrading to 1.4.3 causes error including AmberDB HOT 3
- Improve error messages when cannot save snapshots due to permissions issues
- Allow administrator to specify alternate hostnames under which the site might be accessed
- Track page/post where links originate and show in dashboard HOT 1
- getallheaders function is undefined (sites using nginx) HOT 1
- Settings page should link to Amber Dashboard
- Twitter blocks any URL with ? via. robots.txt - Filter twitter URLs to remove "ref_src" HOT 2
- Facebook blocks all bots via robots.txt - Don't try to snapshot facebook URLs HOT 2
- Dashboard should display "message" field from db rather than always "Could not capture snapshot" HOT 2
- Allow site admins to "speed up" dequeuing process to more than 1/5min
- Preview iframe system in "down" popup needs deep review HOT 2
- Link attributes injected in RSS feeds HOT 1
- Dashboard: Lack of pagination in links table causes WSOD error when DB is large
- Support nginx HOT 1
- Allow multiple snapshots for same link
- Is Amber dead? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amber_wordpress.