Giter Club home page Giter Club logo

avanavana / esovdb-api Goto Github PK

View Code? Open in Web Editor NEW

This project forked from daniloc/airtable-api-proxy

12.0 12.0 2.0 5.15 MB

Public API cache proxy built on the Earth Science Online Video Database, an Airtable base, which also syncs to Zotero and broadcasts new submissions to Discord, Twitter, etc.

Home Page: http://www.esovdb.org

JavaScript 99.43% Shell 0.57%
airtable airtable-api cache cache-proxy discord-webhook-api earth-science earth-science-online-video-database earth-sciences esovdb express geology node proxy-server redis twitter-api-v2 webhook zotero zotero-api

esovdb-api's People

Contributors

avanavana avatar daniloc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

esovdb-api's Issues

Webhooks API, with CRUD for public webhooks

Get all webhooks

GET /webhooks
Lists all webhook subscriptions registered with the provided callback URL.

URL query params:

  • url - the callback URL to query for associated webhook subscriptions
    Sample response: 200 OK req.query.url=http%3A%2F%2Fcallback%2Furl
[
    "example.event2",
    "example.event3"
]

Sample response: 200 OK

[
    "example.event",
    "example.event2",
    "example.event3",
    "example.event4",
    "example.event5"
]

Create webhook(s)

POST /webhooks
Adds one or more new webhook subscriptions, associated with the provided callback URL, to each of the provided webhook event.

Request body:

{
    "events": ["example.event"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "added":  ["example.event"],
    "unchanged": [],
    "failed": []
}

Update webhook(s)

PUT /webhooks
Adds specified webhook events to the provided callback URL's subscription, and also removes any existing webhook event subscriptions not included in the request body.

Request body:

{
    "events": ["example.event2", "example.event3", "example.event4"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "added": ["example.event"]
    "removed":  ["example.event3", "example.event4"],
    "unchanged": ["example.event2"],
    "failed": []
}

Delete webhook(s)

DELETE /webhooks
Removes one or more webhook subscription, associated with the provided callback URL, from each of the provided webhook events.

Request body:

{
    "events": ["example.event"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "removed":  ["example.event"],
    "unchanged": [],
    "failed": []
}

Upgrade channel import script to get YouTube video durations

Unfortunately, YouTube makes it difficult (read: impossible) to get video durations in a search API call. Therefore, all of the videos, once retrieved, must be added to a batch request (max 50, loop through groups of 50 or less if total is greater) for another YouTube videos API call, from which the contentDetails.duration value may be retrieved and appended to each video result. The raw duration retrieved from YouTube is in ISO-8601 duration format, and must be converted into seconds for inserting into an Airtable duration field. This utility function for converting to seconds has already been written.

Refactor videos/query endpoint to allow for more params, filterByFormula, etc.

Params:

  • :pg
  • maxRecords
  • pageSize
  • modifiedAfter
  • createdAfter

Future params:

  • search (in fields[])
  • responseFields (choose the fields returned to the API consumer, per #29 )
  • filterByFormat
  • filterByMedium
  • filterByTopic (includes, excludes)
  • filterByTag (includes, excludes)
  • filterBySeries? (seems like may be best used with search, along with publisher and presenter
  • filterByYear (is, is not, before, after)
  • filterByDuration (shorter, longer)
  • filterByGeoLocated w/ #GEOjson? (see #29)
  • combine multiple filters + searches?

Tweet new submissions using the @esovdb Twitter account

For individual submissions, a single tweet with a link should be generated.

For batch submissions, the number of items in the batch should be announced in the main message, and then a random item should be selected from the batch and information about that item, including a link should be used to generate the tweet.

Tags and topics should turn into hashtags.

ESOVDB Homepage

Features:

  • What's new (pull using API)
  • DB link
  • DB explainer video
  • Future plans
  • Donations
  • Social links (Twitter, Discord)
  • Description/about/credits/team

Featured videos in social media posts

Rather than just picking random videos from a batch, add the ability to choose a video in the batch to feature before the batch is sent for processing.

The featured video will then be the one shown on social media posts to Twitter and the ESOVDB Discord.

Write tool to scrape Airtable metadata for tag categories, etc and open-source separately

Related to #39 .

To make the categories also sync, either gain access to the Airtable Metadata API (not going to happen, they've stopped onboarding new teams), or write code that scrapes the ESOVDB Airtable API docs and syncs the tag categories with Zotero, creating collections with the Tags parent collection, and then returning their Zotero Keys and Versions back to the ESOVDB API, either stored in a new table on the Airtable (not preferable), or in a JSON data file. This should all happen on command, set it up as a command line tool

Since Airtable is no longer onboarding new users on the Metadata API, all that can be done is to scrape the API documentation page, which contains the same data, but annoyingly distributed throughout rendered HTML. The idea is to create a command-line tool that will take an Airtable Base ID, email, and password, and spit out a JSON file for all the metadata in that Airtable base.

Schema:

{
    "name": "baseName",
    "id": "baseId",
    "apiBaseURL": "https://api.airtable.com/v0/baseId",
    "tables": [
        {
            "name": "tableName",
            "fields": [
                {
                    "name": "fieldName",
                    "fieldType": "airtableFieldType",
                    "type": "dataType",
                    "description": "airtableDescription",
                    "examples": [
                        {
                            "type": "text",
                            "value": "exampleText"
                        },
                        {
                            "type": "array",
                            "value": [
                                "arrayItem",
                                ...
                            ]
                        }
                    ]
                },
                ...
            ]
        },
        ...
    ]
}

This can be open-sourced and distributed to others separate from the ESOVDB, who might find it usefu, given the moratorium on Airtable metadata API.

The data will be scraped using puppeteer.js and cheerio.js, with puppeteer in stealth mode.

The JSON file that this script writes out can be parsed and used for instance to dynamically list tag categories in #40, and this can either run on a schedule with crontab, or with the various ESOVDB sync functions, or manually through the command-line.

Support for multiple formats of response data

Currently all data from Airtable are sent to the user as JSON, formatted using names, format, and structure designed for Zotero, since originally the purpose of this proxy server was to download and sync the Airtable data with Zotero alone.

This format is far less useful to others, so I would like to create a format query parameter that uses a different default JSON structure, something closer to the data in Airtable itself, something more like what the future ESOVDB front-end website will use, and provide a few other formats.

Formats

  • Default (Airtable JSON)
  • Basic (simplified)
  • YouTube (id, recordId, esovdbId, zoteroKey, added)
  • Zotero (what I already have)
  • GeoJSON (should filter any records lacking geospatial data)
  • CSV

Set up a service to cache the entire DB once every day, to send to a free public API endpoint.

See this issue: https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236967

Along with providing a premium, paid, bleeding-edge endpoint for the current-to-the-millisecond version of the ESOVDB catalog, I am going to create a service or cron job that retrieves and caches the whole catalog once per day, and offer this data to the public for free, since it requires far less blocking activity on the server.

Set up API on RapidAPI to handle public access.

See: https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236967 and https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236962

I will be using RapidAPI to authenticate and monitor API requests going forward, as well as offer some premium, paid endpoints, to help offset the costs of maintaining and hosting the ESOVDB. The project is still a 'non-profit', in that I am not attempting to make any money from this, just enough to cover the costs of maintaining and running it.

Chrome Extension for Curators/Editors that displays whether or not a YouTube video currently being viewed is in the ESOVDB

This will use #26 (the new https://api.esovdb.org/v1/videos/youtube/:id endpoint). A chrome extension merely has to detect that the current page is a YouTube video page, grab the video ID, and send a request to that endpoint. If unsuccessful, display no, otherwise, return a handful of details about the video, zotero key, esovdbID, and added date, use the IDs/keys to create links to open in Zotero or ESOVDB, and then format a noticeable indicator icon next to the video, or over the video, like a banner.

The contentscript should show a 'submit' button that allows easy adding to the ESOVDB submissions table, if the video hasn't already been added.

Implement donation feature using the Stripe API

The ESOVDB is a totally not-for-profit project, yet it requires a certain amount of budget to operate and maintain (domain, hosting the server VMs, Airtable pro plan, etc). Several people have inquired about how to help fund the ESOVDB, and up until now there has been no way to donate except for directly sending @avanavana cash.

It would be ideal to have a dedicated Stripe API-based solution or form that legitimizes and simplifies the donation process, and perhaps even spells out concrete ways to help (e.g. "$20 – Cover 1 month of operations"...).

Write onboarding manual

  • ESOVDB mission, values, scope
  • Editor job description and responsibilities
  • How the ESOVDB is organized
  • The ESOVDB API layer
  • The ESOVDB network
  • Automation
  • Reviewing submissions
  • Creating new submissions
  • How to find new content
  • Technical description of what happens when a submission is processed

Post new submissions to the ESOVDB on the ESOVDB Discord in the #whats-new channel

For individual submissions, a rich message with embeds should be generated.

For batch submissions, the number of items in the batch should be announced in the main message, and then a random item should be selected from the batch and a rich message with embeds for that item be generated.

Tags should be encoded to match the ESOVDB Discord tag channels, to turn them into clickable links.

Add Airtable script block that lets an editor paste in a YouTube channel URL or ID and automatically import all its videos

Additionally, editors should be able to choose a videoDuration as described in the YouTube Data API v3 (any, short (<4 min), medium (4-20 min), or long (>20 min), to allow editors to filter out unwanted videos from a channel.

Requires private API endpoint to be written that exposes methods for looping through all of the pages returned by a YouTube Data API v3 search on a channel's ID and aggregating and formatting the results before returning them the user. The Airtable script block can do a fetch call then to the ESOVDB API.

Set up tags as subcollections within collections (representing tag categories on the ESOVDB) in Zotero, and add to sync

This task involves the following steps:

  • Gather a list of tag categories.
  • (optional) to make the categories also sync, either gain access to the Airtable Metadata API (not going to happen, they've stopped onboarding new teams), or write code that scrapes the ESOVDB Airtable API docs and syncs the tag categories with Zotero, creating collections with the Tags parent collection, and then returning their Zotero Keys and Versions back to the ESOVDB API, either stored in a new table on the Airtable (not preferable), or in a JSON data file. This should all happen on command, set it up as a command line tool
  • Write a script to add all the initial, extant tags to Zotero as subcollections in their respective tag category parent collections, and sync their Zotero versions and keys back to the ESOVDB
  • Add tag collection syncing in to the processItems() Zotero sync function on the API server
  • Set up sync endpoints for create/update/delete tags themselves on the ESOVDB. Tag categories can run simultaneously with this, in case tags need to switch Zotero tag collections.

Batch process multiple single HTTP requests coming from both Airtable's `onCreateRecord` and `onUpdateRecord` automations

This is a larger issue, complicated by:

  1. The fact that the onUpdateRecord events are often non-deterministic (at least within a reasonable amount of time, because Airtable updates can trigger upstream updates in other records).

I decided to use the Observer/Observable pattern and handle the update data as a stream, within a moving time window that is reset with each new batch update that comes in before the time window closes.

  1. Because I am running the API proxy cache server in cluster mode using PM2, to prevent blocking due to the sometimes long response time when querying the entire DB contents (a direct result of Airtable's upstream rate limiting and pagination), it was impossible to use the naive approach I had first tried, using a simple in-memory JS Map() object to hold the batch data, and then clearing it when each batch was processed.

I considered (and built) a variation of the existing cache module, based on the Node filesystem module, to store the data in JSON files, but such a system is not really scalable, and I didn't want to be messing with the filesystem so often.

So I opted to use Redis, and now I am running a dedicated Redis server on one of the 4 cores of the ESOVDB VM, and clustering the API server on the other 3 using PM2. This has proved to work very well.

Sync tags from ESOVDB as actual tags in the Zotero library

This will require a few steps:

  1. First, get a list of existing tags and add them in bulk to Zotero, using the API.
  2. Export the tags' Zotero Keys and Versions and copy them over to the ESOVDB to begin tracking those.
  3. Create a new rollup column on the Videos table that aggregates the Tag Zotero Keys as a comma-separated list, and use this when creating/updating videos to add the tags to Zotero items
  4. Will also need onCreateRecord and onUpdateRecord automation in Airtable and corresponding zoteroRoutes for those, can just use the existing /zotero/:kind route, with any of PUT, POST, or DELETE.
  5. Lastly, when a video is updated or created, similar to collections, if there is a tag without a Zotero Key, it means it is not yet created, so loop through all such tags, create them in Zotero, sync them back to ESOVDB Tags table, and then inject the new Tags' Zotero Keys into the Video's list of Zotero tag keys to be added to the item/video in question.

Set up monitoring using Cronitor

Create monitors for:

  • ESOVDB, whole catalog, every 24 hours*
  • This 24 hour monitor will force caching of the DB every 24 hours. Setting the cache interval then for 24 hours will then keep the daily public version of the whole catalog response in the cache at all times, for quick responses and minimal server load.

Set up a service that never uses the cached DB, for a paid API endpoint.

In order to get a fresh download of the entire ESOVDB catalog, or large parts thereof, one of the three API server nodes is blocked for almost five minutes, because Airtable forces a rate limit on requests to its resources, and a maximum page size of 100 results per request. Given the ESOVDB has more than 6,000 records and that number is growing, that means a download of the whole catalog requires more than 60 separate requests, each of which must be spaced out to no more than 5 per second. At a minimum, with no processing time or latency, this would block the server for 12s, but in practice, each request takes quite a bit longer on Airtable's end, with some additional light processing on the API side, such that the whole transaction for downloading the entire catalog can take up to 5 minutes at times.

To make this less of an issue, I have been caching the results of the entire DB query, and other common requests, since almost the beginning of the project. I am going to be introducing a service that runs on the server and automatically caches the latest version of the ESOVDB once per day, so that there is always a cached version that the public can retrieve, for free.

The flipside of this is I am also going to introduce a premium endpoint that allows a paying user to get the absolute bleeding edge version of the ESOVDB, on-demand.

Watch Zotero for changes and sync them back to the ESOVDB. (Items & Collections)

This will have to be done by making &format=version API calls to the ESOVDB Zotero library at regular intervals, and then diffing the current and previous lists of items/collections. Anything deleted or updated can be identified by key and then the change synced back to the ESOVDB, on a regular basis.

Doing this on a push, rather than pull basis would be preferable, but AFAIK Zotero does not offer webhook triggers/events. I would have to build my own Zotero plug in to trigger these kind of push sync requests, perhaps when content in a certain folder/collection is changed.

Allows users to submit suggestions for updates/additions to specific fields on existing video records

Two possible methods of execution

  1. New "Updates" table, analogous to "Submissions", where records are duplicates of existing records, with different field content.
  2. Leverage existing "Submissions" table, distinguish as "update" behavior by simply using whether a "Submission" record already has a Zotero Key+Version or not, and/or corresponding "Video" record in the DB.
  • In the submissions table, indicate visually which are new submissions and which are updates, and which fields are being updated, with colors.

Break up series sync in Zotero to divide Series by their series category membership on the ESOVDB

This will involve:

  • Create Zotero collections for all high level series categories in Zotero
  • Map all high level series categories in the ESOVDB to their new collection in Zotero by Zotero Key
  • Write Zotero JS script that loops through all series collections and places them in the correct parent collection. May have to use an external JSON file for the missing parent-child relationship information
  • Amend all series code to also add the parent collection when syncing to Zotero, such that any video that is in a series subcollection is also in its parent.

*Note: the series categories will probably not be changing much, so I won't write any code there for CRUD of that type of information.

I may consider writing code to handle changing a series' category parent, because it might be the case often that when creating a new series I (or someone else in the future) forgets to assign the parent category, and later a bunch of series need to be moved out of "Uncategorized" series into their Zotero parent...we will see*

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.