Giter Club home page Giter Club logo

collections-api's Introduction

⚠️ the museum api has changed a bunch, this needs a rewrite

scrapi, a metropolitan museum collections api

scrAPI.org is an api that grabs object information from the metropolitan museum's collections website.

Get a random object (/random)

Try curl scrapi.org/random in a terminal, or just click on /random

$ curl 'scrapi.org/random'
{
  "CRDID": 12351,
  "accessionNumber": "65.211.3",
  ...
}

Object information (/object/:id)

Try curl scrapi.org/object/123 in a terminal, or just click on object/1234

$ curl 'scrapi.org/object/123'
{
  "CRDID": 123,
  "accessionNumber": "64.291.2",
  ...
}

Searching for object ids (/search/:terms)

You can now search for terms, and get back an array of hrefs to object pages

$ curl 'scrapi.org/search/mirror'
{
  "collection": {
    "items": [
      {
          "href": "http://scrapi.org/object/156225"
      },
      {
          "href": "http://scrapi.org/object/207785"
      },
      ...
      ]
    }

}

additional Params in search:

&page=X - for additional pages

&gallerynos=X for only objects in that gallery

Filtering with the fields parameter

If you want to filter any response, use the fields parameter, like so:

$ curl 'scrapi.org/object/123?fields=title,whoList/who/name'
{
  "whoList": {
    "who": {
      "name": "Richard Wittingham"
    }
  },
  "title": "Andiron"
}

The syntax to filter out fields is loosely based on XPath:

  • a,b,c comma-separated list will select multiple fields
  • a/b/c path will select a field from its parent
  • a(b,c) sub-selection will select many fields from a parent
  • a/*/c the star * wildcard will select all items in a field

I like the following fields for basic object information: fields=title,primaryArtistNameOnly,primaryImageUrl,medium,whatList/what/name,whenList/when/name,whereList/where/name,whoList/who/name

Guidelines

The code is CC0, but if you do anything interesting with the data, it would be nice to give attribution to The Metropolitan Museum of Art. If you do anything interesting with the code, it would be nice to give attribution to the contributors, or even better, become one!

Please submit all questions, bugs and feature requests to the issue page.

Dedicated to the memory of Aaron Swartz.

Installation and Deployment

The API requires node.js, uses redis for caching, and is built on the koa web framework.

If you already have nodejs installed:

which yarn || npm install -g yarn
yarn
yarn start
open 127.0.0.1:8080 || xdg-open 127.0.0.1:8080

If you don't want to have to setup node, yarn, and redis on your local machine, I published a docker image:

which docker || { sudo apt-get install -y docker || cask install docker }
docker pull jedahan/collections-api
docker run -d -p 8080:8080 --name collections-api jedahan/collections-api
open 127.0.0.1:8080 || xdg-open 127.0.0.1:8080
curl localhost:8080/random

You can build the docker image yourself if you want:

which docker || { sudo apt-get install -y docker || cask install docker }
docker build -t jedahan/collections-api:latest .
docker run -d -p 8080:8080 --name collections-api jedahan/collections-api
open 127.0.0.1:8080 || xdg-open 127.0.0.1:8080
curl localhost:8080/random

collections-api's People

Contributors

donundeen avatar josegonzalez avatar mathisonian avatar panman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collections-api's Issues

Reserved word "yield"

trying to run
coffee -c server.coffee

gets:
/Users/undeed/.Trash/collections-api/server.coffee:18:34: error: reserved word "yield"

offending line is
cache = ratelimit = -> (next) -> yield next

cache returns objects that should 404

first hit shows 404, but still saves an empty object in the cache >_>

ssh scrapi.org
redis-cli
FLUSH /object/700
exit
exit
http -h scrapi.org/object/700
http -h scrapi.org/object/700

image missing for some objects

eg:
http://www.metmuseum.org/Collections/search-the-collections/1981
has images on website
http://scrapi.org:80/object/1981
No image attribute (but does have related-images).

But:
http://www.metmuseum.org/Collections/search-the-collections/1814
has images on website
http://scrapi.org/object/1814
Has image attribute (and also related-images)

or
http://www.metmuseum.org/Collections/search-the-collections/5403
Image, no related images
http://scrapi.org/object/5403
no image

test using zombie

zombie seems closer to the metal and more robust as it uses webkit

problem in json's structure

When crawling /random to get the json response, I see some lines like "timelineList" but has only one "timeline item", those Lists are now defined without the "[" and "]" construct (don't know how to say that) which is only valid for a single item and it makes me confused if the response comes with more than one "timeline" item in the List. please help.

add human readable titles to rel links

"_links": {
"self": {
"href": "https://api-sandbox.foxycart.com/users/2/stores",
"title": "This Collection"
},
"first": {
"href": "https://api-sandbox.foxycart.com/users/2/stores?offset=0",
"title": "First Page of this Collection"
},
"prev": {
"href": "https://api-sandbox.foxycart.com/users/2/stores?offset=0",
"title": "Previous Page of this Collection"
},
"next": {
"href": "https://api-sandbox.foxycart.com/users/2/stores?offset=0",
"title": "Next Page of this Collection"
},
"last": {
"href": "https://api-sandbox.foxycart.com/users/2/stores?offset=0",
"title": "Last Page of this Collection"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.