Giter Club home page Giter Club logo

Comments (8)

jerclarke avatar jerclarke commented on June 27, 2024

@jlicht Not sure if you're still working on this, or who might know the answer, but I'd appreciate it if you had time to take a look at this issue, if only to tell me whether I'm crazy in thinking that the Internet Archive API endpoint is totally broken or not.

🙏🏻

from amber_wordpress.

ryanttb avatar ryanttb commented on June 27, 2024

Hi Jer, I used to work on Amber but am no longer at BKC. I'm going include @jsdiaz here to make sure someone internal to the Organization sees this. I'm sure they'll appreciate the detailed issue report!

from amber_wordpress.

jerclarke avatar jerclarke commented on June 27, 2024

Thanks Ryan!

If anyone sees this and just knows where I can find docs for the web.archive.org/save/ endpoint from InternetArchiveFetcher that would help a lot.

Some more research shows lots of people expecting that Content-Location header to be there (Stack Overflow Question, researchers posting about it with examples as recent as July 8, but not after July 10)

But my attempts to replicate their example code on the command line aren't going well. I'm not geting Content-Location, but I'm also not getting any 200 either, but a variety of 50* errors after long long waits.

Maybe this comes down to some big problems over at IA. I tried tweeting them but didn't get anything back.

from amber_wordpress.

jlicht avatar jlicht commented on June 27, 2024

@jerclarke I don't see any reason to doubt that your analysis of the problem, either with regards to IA's behavior or the bug that you've identified. Thanks for the detailed investigation!

from amber_wordpress.

jerclarke avatar jerclarke commented on June 27, 2024

Alright, an update on the API question.

This post from Oct 2019 seems very relevant: The Wayback Machine’s Save Page Now is New and Improved

It doesn't mention the "endpoint" type usage that Amber and many bookmarklets made of the old web.archive.org/save/ service, but it talks about a big change to the overall "Save Page Now" feature, and this comment implies that to some degree the web.archive.org/save/ GET based service has been broken since the Oct 2019 changes:

I’ve been using a Save Page Now bookmarklet that doesn’t work anymore since this feature was launched. It simply appends the URL:

javascript:(function(){location.href=’http://web.archive.org/save/’+(location.href);})();

I looked at the new source and the problem is the form now requires a fancy POST method. Why break what has worked?

Unfortunately I can't find any links anywhere to indicate what the POST method might be, and I'm not sure where the "source" mentioned in the comment could be found.

Another lead is this gist that uses the old endpoint who's author also seems to think there is a new POST system (not sure if they are basing it on the same comment I found or not).

from amber_wordpress.

jerclarke avatar jerclarke commented on June 27, 2024

I tried just looking at dev tools when using the website version of /save/ and it seems like the POST request is super simple, just url=$url.

When I run that request through PHP (WordPress HTTP API) it seems to work based on the content that's returned, but there's still no Content-Location header:

$result = wp_remote_post('http://web.archive.org/save/', array( 'body' => array('url'=>'http://google.com')));

Headers:

                 [server] => nginx/1.15.8
                 [date] => Thu, 30 Jul 2020 22:48:12 GMT
                 [content-type] => text/html; charset=utf-8
                 [cache-control] => no-cache
                 [x-app-server] => wwwb-app102
                 [x-ts] => 200
                 [x-location] => /save/
                 [x-cache-key] => httpweb.archive.org/save/MX
                 [content-encoding] => gzip

Relevant section of body:

     Saving page http://google.com
                 The capture is estimated to start in 0 minutes.
    
         Save also in my web archive.
           Done!

In the normal "browser" version of the save page, it first shows a progress message like "Saving..." then eventually you get this message:

A snapshot was captured. Visit page: /web/20200730225500/https://www.google.com/

It's possible there's no API way to get the content-location without waiting for that page anymore...

from amber_wordpress.

jerclarke avatar jerclarke commented on June 27, 2024

Alright, I think that was an appropriate amount of effort in the name of archiving the open web.

Hopefully someone someday finds this and gives me a hint at how I can programatically save a URL to WayBack and get the path to the archive to save it to the amber_cache db.

Until then, sadly, Amber will have to go dormant on Global Voices. RIP sweet Amber, you were a pain in the butt as well as a worthy moonshot.

from amber_wordpress.

jerclarke avatar jerclarke commented on June 27, 2024

Making some notes about things I couldn't find earlier using Google:

"api.archivelab.org" seems to have an API that is relevant, and which was noted as down then "fixed" over the last few days:

ArchiveLabs/api.archivelab.org#21

When I tried it it was still broken though, so 🤷🏻‍♀️

These docs seem to offer an API that will work at least in theory for Amber:

https://archive.readme.io/docs/creating-a-snapshot

The documentation site links to this JSON API for Wayback which I had seen before, but doesn't seem to offer an option to save, and thus doesn't seem like it will help Amber, but maybe I'm missing something. I'm not super hot at JSON API usage.

from amber_wordpress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.