Hi everyone, I wonder who else is still using Amber!? We've still go

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Internet Archive API seems to be broken as of Jul 10, 2020: All URLs getting marked as down about amber_wordpress HOT 8 OPEN

jerclarke commented on July 25, 2024

Internet Archive API seems to be broken as of Jul 10, 2020: All URLs getting marked as down

from amber_wordpress.

Comments (8)

jerclarke commented on July 25, 2024

@jlicht Not sure if you're still working on this, or who might know the answer, but I'd appreciate it if you had time to take a look at this issue, if only to tell me whether I'm crazy in thinking that the Internet Archive API endpoint is totally broken or not.

🙏🏻

from amber_wordpress.

ryanttb commented on July 25, 2024

Hi Jer, I used to work on Amber but am no longer at BKC. I'm going include @jsdiaz here to make sure someone internal to the Organization sees this. I'm sure they'll appreciate the detailed issue report!

from amber_wordpress.

jerclarke commented on July 25, 2024

Thanks Ryan!

If anyone sees this and just knows where I can find docs for the web.archive.org/save/ endpoint from InternetArchiveFetcher that would help a lot.

Some more research shows lots of people expecting that Content-Location header to be there (Stack Overflow Question, researchers posting about it with examples as recent as July 8, but not after July 10)

But my attempts to replicate their example code on the command line aren't going well. I'm not geting Content-Location, but I'm also not getting any 200 either, but a variety of 50* errors after long long waits.

Maybe this comes down to some big problems over at IA. I tried tweeting them but didn't get anything back.

from amber_wordpress.

jlicht commented on July 25, 2024

@jerclarke I don't see any reason to doubt that your analysis of the problem, either with regards to IA's behavior or the bug that you've identified. Thanks for the detailed investigation!

from amber_wordpress.

jerclarke commented on July 25, 2024

Alright, an update on the API question.

This post from Oct 2019 seems very relevant: The Wayback Machine’s Save Page Now is New and Improved

It doesn't mention the "endpoint" type usage that Amber and many bookmarklets made of the old web.archive.org/save/ service, but it talks about a big change to the overall "Save Page Now" feature, and this comment implies that to some degree the web.archive.org/save/ GET based service has been broken since the Oct 2019 changes:

I’ve been using a Save Page Now bookmarklet that doesn’t work anymore since this feature was launched. It simply appends the URL:

javascript:(function(){location.href=’http://web.archive.org/save/’+(location.href);})();

I looked at the new source and the problem is the form now requires a fancy POST method. Why break what has worked?

Unfortunately I can't find any links anywhere to indicate what the POST method might be, and I'm not sure where the "source" mentioned in the comment could be found.

Another lead is this gist that uses the old endpoint who's author also seems to think there is a new POST system (not sure if they are basing it on the same comment I found or not).

from amber_wordpress.

jerclarke commented on July 25, 2024

I tried just looking at dev tools when using the website version of /save/ and it seems like the POST request is super simple, just url=$url.

When I run that request through PHP (WordPress HTTP API) it seems to work based on the content that's returned, but there's still no Content-Location header:

$result = wp_remote_post('http://web.archive.org/save/', array( 'body' => array('url'=>'http://google.com')));

Headers:

                 [server] => nginx/1.15.8
                 [date] => Thu, 30 Jul 2020 22:48:12 GMT
                 [content-type] => text/html; charset=utf-8
                 [cache-control] => no-cache
                 [x-app-server] => wwwb-app102
                 [x-ts] => 200
                 [x-location] => /save/
                 [x-cache-key] => httpweb.archive.org/save/MX
                 [content-encoding] => gzip

Relevant section of body:

     Saving page http://google.com
                 The capture is estimated to start in 0 minutes.
    
         Save also in my web archive.
           Done!

In the normal "browser" version of the save page, it first shows a progress message like "Saving..." then eventually you get this message:

A snapshot was captured. Visit page: /web/20200730225500/https://www.google.com/

It's possible there's no API way to get the content-location without waiting for that page anymore...

from amber_wordpress.

jerclarke commented on July 25, 2024

Alright, I think that was an appropriate amount of effort in the name of archiving the open web.

Hopefully someone someday finds this and gives me a hint at how I can programatically save a URL to WayBack and get the path to the archive to save it to the amber_cache db.

Until then, sadly, Amber will have to go dormant on Global Voices. RIP sweet Amber, you were a pain in the butt as well as a worthy moonshot.

from amber_wordpress.

jerclarke commented on July 25, 2024

Making some notes about things I couldn't find earlier using Google:

"api.archivelab.org" seems to have an API that is relevant, and which was noted as down then "fixed" over the last few days:

ArchiveLabs/api.archivelab.org#21

When I tried it it was still broken though, so 🤷🏻‍♀️

These docs seem to offer an API that will work at least in theory for Amber:

https://archive.readme.io/docs/creating-a-snapshot

The documentation site links to this JSON API for Wayback which I had seen before, but doesn't seem to offer an option to save, and thus doesn't seem like it will help Amber, but maybe I'm missing something. I'm not super hot at JSON API usage.

from amber_wordpress.

Internet Archive API seems to be broken as of Jul 10, 2020: All URLs getting marked as down about amber_wordpress HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent