avanavana / esovdb-api Goto Github PK

This project forked from daniloc/airtable-api-proxy

Public API cache proxy built on the Earth Science Online Video Database, an Airtable base, which also syncs to Zotero and broadcasts new submissions to Discord, Twitter, etc.

Home Page: http://www.esovdb.org

JavaScript 99.43% Shell 0.57%

airtable airtable-api cache cache-proxy discord-webhook-api earth-science earth-science-online-video-database earth-sciences esovdb express geology node proxy-server redis twitter-api-v2 webhook zotero zotero-api

esovdb-api's People

Contributors

Stargazers

Watchers

Forkers

bluelittle akbarato

esovdb-api's Issues

Create an endpoint on the API server that serves a simple HTML record page for a single video, so that it can be used to link on social media instead of the original source (original source embedded in record page)

Add support for filterByFormula that selects all records after a user-provided date in the form of a URL query param

Using some kind of filterByFormula like this (for createdDate) param:

IS_AFTER({Added}, DATETIME_PARSE("2020-12-17T16:40:04.000Z"))

Where the datetime string is passed as a URL query param.

Webhooks API, with CRUD for public webhooks

Get all webhooks

GET /webhooks
Lists all webhook subscriptions registered with the provided callback URL.

URL query params:

url - the callback URL to query for associated webhook subscriptions
Sample response: 200 OK req.query.url=http%3A%2F%2Fcallback%2Furl

[
    "example.event2",
    "example.event3"
]

Sample response: 200 OK

[
    "example.event",
    "example.event2",
    "example.event3",
    "example.event4",
    "example.event5"
]

Create webhook(s)

POST /webhooks
Adds one or more new webhook subscriptions, associated with the provided callback URL, to each of the provided webhook event.

Request body:

{
    "events": ["example.event"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "added":  ["example.event"],
    "unchanged": [],
    "failed": []
}

Update webhook(s)

PUT /webhooks
Adds specified webhook events to the provided callback URL's subscription, and also removes any existing webhook event subscriptions not included in the request body.

Request body:

{
    "events": ["example.event2", "example.event3", "example.event4"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "added": ["example.event"]
    "removed":  ["example.event3", "example.event4"],
    "unchanged": ["example.event2"],
    "failed": []
}

Delete webhook(s)

DELETE /webhooks
Removes one or more webhook subscription, associated with the provided callback URL, from each of the provided webhook events.

Request body:

{
    "events": ["example.event"],
    "callback": "https://www.example.com/callback"
}

Sample response: 200 OK

{
    "removed":  ["example.event"],
    "unchanged": [],
    "failed": []
}

Refactor code to use try...catch with thrown errors

Currently errors are not handled in catch blocks, with thrown errors. This results in annoying stack traces when there are errors.

Add "Flag for Deletion" field to ESOVDB Videos and Series tables, and set up script that powers the API-powered "Delete" button to batch request all flagged records, if records are in fact flagged, rather than the button being clicked

Deduplication automation block for admins to clear Submissions table of duplicates

Add function for deduplicating records in the Submissions table as a script block accessible by editors

Upgrade channel import script to get YouTube video durations

Unfortunately, YouTube makes it difficult (read: impossible) to get video durations in a search API call. Therefore, all of the videos, once retrieved, must be added to a batch request (max 50, loop through groups of 50 or less if total is greater) for another YouTube videos API call, from which the contentDetails.duration value may be retrieved and appended to each video result. The raw duration retrieved from YouTube is in ISO-8601 duration format, and must be converted into seconds for inserting into an Airtable duration field. This utility function for converting to seconds has already been written.

Refactor videos/query endpoint to allow for more params, filterByFormula, etc.

Params:

Future params:

combine multiple filters + searches?

Move all encapsulating conditional logic for Airtable automations from explicit code to the new Airtable automation's "Conditional Actions" feature

Submissions › Create Video from Submission
Submissions › New Submission from Discord
Series › Sync to Zotero on Create
Videos › Sync to Zotero on Update

Mailchimp newsletter for new submissions, announcements, automatically generated as a digest on a schedule

Tweet new submissions using the @esovdb Twitter account

For individual submissions, a single tweet with a link should be generated.

For batch submissions, the number of items in the batch should be announced in the main message, and then a random item should be selected from the batch and information about that item, including a link should be used to generate the tweet.

Tags and topics should turn into hashtags.

ESOVDB Homepage

Features:

What's new (pull using API)
DB link
DB explainer video
Future plans
Donations
Social links (Twitter, Discord)
Description/about/credits/team

Add "Reviewer" column to Submissions table, and modify all automations to automatically insert the reviewer of each submission according to the currently logged-in user, in advance of bringing on additional members of the curatorial team

Add query endpoints and parameters (other than "All") to Rapid API

Other endpoint ideas:

Featured videos in social media posts

Rather than just picking random videos from a batch, add the ability to choose a video in the batch to feature before the batch is sent for processing.

The featured video will then be the one shown on social media posts to Twitter and the ESOVDB Discord.

Write tool to scrape Airtable metadata for tag categories, etc and open-source separately

Related to #39 .

To make the categories also sync, either gain access to the Airtable Metadata API (not going to happen, they've stopped onboarding new teams), or write code that scrapes the ESOVDB Airtable API docs and syncs the tag categories with Zotero, creating collections with the Tags parent collection, and then returning their Zotero Keys and Versions back to the ESOVDB API, either stored in a new table on the Airtable (not preferable), or in a JSON data file. This should all happen on command, set it up as a command line tool

Since Airtable is no longer onboarding new users on the Metadata API, all that can be done is to scrape the API documentation page, which contains the same data, but annoyingly distributed throughout rendered HTML. The idea is to create a command-line tool that will take an Airtable Base ID, email, and password, and spit out a JSON file for all the metadata in that Airtable base.

Schema:

{
    "name": "baseName",
    "id": "baseId",
    "apiBaseURL": "https://api.airtable.com/v0/baseId",
    "tables": [
        {
            "name": "tableName",
            "fields": [
                {
                    "name": "fieldName",
                    "fieldType": "airtableFieldType",
                    "type": "dataType",
                    "description": "airtableDescription",
                    "examples": [
                        {
                            "type": "text",
                            "value": "exampleText"
                        },
                        {
                            "type": "array",
                            "value": [
                                "arrayItem",
                                ...
                            ]
                        }
                    ]
                },
                ...
            ]
        },
        ...
    ]
}

This can be open-sourced and distributed to others separate from the ESOVDB, who might find it usefu, given the moratorium on Airtable metadata API.

The data will be scraped using puppeteer.js and cheerio.js, with puppeteer in stealth mode.

The JSON file that this script writes out can be parsed and used for instance to dynamically list tag categories in #40, and this can either run on a schedule with crontab, or with the various ESOVDB sync functions, or manually through the command-line.

Support for multiple formats of response data

Currently all data from Airtable are sent to the user as JSON, formatted using names, format, and structure designed for Zotero, since originally the purpose of this proxy server was to download and sync the Airtable data with Zotero alone.

This format is far less useful to others, so I would like to create a format query parameter that uses a different default JSON structure, something closer to the data in Airtable itself, something more like what the future ESOVDB front-end website will use, and provide a few other formats.

Formats

Default (Airtable JSON)
Basic (simplified)
YouTube (id, recordId, esovdbId, zoteroKey, added)
Zotero (what I already have)
GeoJSON (should filter any records lacking geospatial data)
CSV

Set up Geospatial API, geospatial indexing of videos by location and/or tags, public geospatial API endpoints, and map-based query UI

This will entail:

Setting up a dedicated Geospatial API server to both store the data in a geospatially-indexed DB
Processing complex geospatial queries that can come in by way of a future map-based UI.

Add videos to new 'formats' collection in Zotero, using ESOVDB 'format' column

When an item is added to the ESOVDB Zotero after update or creation, back-sync its Zotero key and version to the ESOVDB.

Set up a service to cache the entire DB once every day, to send to a free public API endpoint.

See this issue: https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236967

Along with providing a premium, paid, bleeding-edge endpoint for the current-to-the-millisecond version of the ESOVDB catalog, I am going to create a service or cron job that retrieves and caches the whole catalog once per day, and offer this data to the public for free, since it requires far less blocking activity on the server.

Set up API on RapidAPI to handle public access.

See: https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236967 and https://github.com/avanavana/airtable-api-proxy/projects/1#card-74236962

I will be using RapidAPI to authenticate and monitor API requests going forward, as well as offer some premium, paid endpoints, to help offset the costs of maintaining and hosting the ESOVDB. The project is still a 'non-profit', in that I am not attempting to make any money from this, just enough to cover the costs of maintaining and running it.

Chrome Extension for Curators/Editors that displays whether or not a YouTube video currently being viewed is in the ESOVDB

This will use #26 (the new https://api.esovdb.org/v1/videos/youtube/:id endpoint). A chrome extension merely has to detect that the current page is a YouTube video page, grab the video ID, and send a request to that endpoint. If unsuccessful, display no, otherwise, return a handful of details about the video, zotero key, esovdbID, and added date, use the IDs/keys to create links to open in Zotero or ESOVDB, and then format a noticeable indicator icon next to the video, or over the video, like a banner.

The contentscript should show a 'submit' button that allows easy adding to the ESOVDB submissions table, if the video hasn't already been added.

Implement donation feature using the Stripe API

The ESOVDB is a totally not-for-profit project, yet it requires a certain amount of budget to operate and maintain (domain, hosting the server VMs, Airtable pro plan, etc). Several people have inquired about how to help fund the ESOVDB, and up until now there has been no way to donate except for directly sending @avanavana cash.

It would be ideal to have a dedicated Stripe API-based solution or form that legitimizes and simplifies the donation process, and perhaps even spells out concrete ways to help (e.g. "$20 – Cover 1 month of operations"...).

Write onboarding manual

ESOVDB mission, values, scope
Editor job description and responsibilities
How the ESOVDB is organized
The ESOVDB API layer
The ESOVDB network
Automation
Reviewing submissions
Creating new submissions
How to find new content
Technical description of what happens when a submission is processed

Post new submissions to the ESOVDB on the ESOVDB Discord in the #whats-new channel

For individual submissions, a rich message with embeds should be generated.

For batch submissions, the number of items in the batch should be announced in the main message, and then a random item should be selected from the batch and a rich message with embeds for that item be generated.

Tags should be encoded to match the ESOVDB Discord tag channels, to turn them into clickable links.

Add Airtable script block that lets an editor paste in a YouTube channel URL or ID and automatically import all its videos

Additionally, editors should be able to choose a videoDuration as described in the YouTube Data API v3 (any, short (<4 min), medium (4-20 min), or long (>20 min), to allow editors to filter out unwanted videos from a channel.

Requires private API endpoint to be written that exposes methods for looping through all of the pages returned by a YouTube Data API v3 search on a channel's ID and aggregating and formatting the results before returning them the user. The Airtable script block can do a fetch call then to the ESOVDB API.

Set up tags as subcollections within collections (representing tag categories on the ESOVDB) in Zotero, and add to sync

This task involves the following steps:

Gather a list of tag categories.
(optional) to make the categories also sync, either gain access to the Airtable Metadata API (not going to happen, they've stopped onboarding new teams), or write code that scrapes the ESOVDB Airtable API docs and syncs the tag categories with Zotero, creating collections with the Tags parent collection, and then returning their Zotero Keys and Versions back to the ESOVDB API, either stored in a new table on the Airtable (not preferable), or in a JSON data file. This should all happen on command, set it up as a command line tool
Write a script to add all the initial, extant tags to Zotero as subcollections in their respective tag category parent collections, and sync their Zotero versions and keys back to the ESOVDB
Add tag collection syncing in to the processItems() Zotero sync function on the API server
Set up sync endpoints for create/update/delete tags themselves on the ESOVDB. Tag categories can run simultaneously with this, in case tags need to switch Zotero tag collections.

Allow for "muting" of new submission announcements in social media channels

Batch process multiple single HTTP requests coming from both Airtable's `onCreateRecord` and `onUpdateRecord` automations

This is a larger issue, complicated by:

The fact that the onUpdateRecord events are often non-deterministic (at least within a reasonable amount of time, because Airtable updates can trigger upstream updates in other records).

I decided to use the Observer/Observable pattern and handle the update data as a stream, within a moving time window that is reset with each new batch update that comes in before the time window closes.

Because I am running the API proxy cache server in cluster mode using PM2, to prevent blocking due to the sometimes long response time when querying the entire DB contents (a direct result of Airtable's upstream rate limiting and pagination), it was impossible to use the naive approach I had first tried, using a simple in-memory JS Map() object to hold the batch data, and then clearing it when each batch was processed.

I considered (and built) a variation of the existing cache module, based on the Node filesystem module, to store the data in JSON files, but such a system is not really scalable, and I didn't want to be messing with the filesystem so often.

So I opted to use Redis, and now I am running a dedicated Redis server on one of the 4 cores of the ESOVDB VM, and clustering the API server on the other 3 using PM2. This has proved to work very well.

Create an endpoint on the API server that serves a simple HTML multiple record page for every processed batch, to be used to link on social media (instead of just showing the number of items added)

This will involve storing the batches in a processed batch Redis sorted set.

ESOVDB Airtable explainer video

Create a brief explainer video that shows the major features of the Airtable UI in action.

Sync tags from ESOVDB as actual tags in the Zotero library

This will require a few steps:

First, get a list of existing tags and add them in bulk to Zotero, using the API.
Export the tags' Zotero Keys and Versions and copy them over to the ESOVDB to begin tracking those.
Create a new rollup column on the Videos table that aggregates the Tag Zotero Keys as a comma-separated list, and use this when creating/updating videos to add the tags to Zotero items
Will also need onCreateRecord and onUpdateRecord automation in Airtable and corresponding zoteroRoutes for those, can just use the existing /zotero/:kind route, with any of PUT, POST, or DELETE.
Lastly, when a video is updated or created, similar to collections, if there is a tag without a Zotero Key, it means it is not yet created, so loop through all such tags, create them in Zotero, sync them back to ESOVDB Tags table, and then inject the new Tags' Zotero Keys into the Video's list of Zotero tag keys to be added to the item/video in question.

Assign an item to its topic's collection in the ESOVDB Zotero library

Set up monitoring using Cronitor

Create monitors for:

ESOVDB, whole catalog, every 24 hours*

This 24 hour monitor will force caching of the DB every 24 hours. Setting the cache interval then for 24 hours will then keep the daily public version of the whole catalog response in the cache at all times, for quick responses and minimal server load.

Add endpoint and logic for ESOVDB Series onRecordCreate and onRecordUpdate events, using the new Redis-powered batch module

Set up a service that never uses the cached DB, for a paid API endpoint.

In order to get a fresh download of the entire ESOVDB catalog, or large parts thereof, one of the three API server nodes is blocked for almost five minutes, because Airtable forces a rate limit on requests to its resources, and a maximum page size of 100 results per request. Given the ESOVDB has more than 6,000 records and that number is growing, that means a download of the whole catalog requires more than 60 separate requests, each of which must be spaced out to no more than 5 per second. At a minimum, with no processing time or latency, this would block the server for 12s, but in practice, each request takes quite a bit longer on Airtable's end, with some additional light processing on the API side, such that the whole transaction for downloading the entire catalog can take up to 5 minutes at times.

To make this less of an issue, I have been caching the results of the entire DB query, and other common requests, since almost the beginning of the project. I am going to be introducing a service that runs on the server and automatically caches the latest version of the ESOVDB once per day, so that there is always a cached version that the public can retrieve, for free.

The flipside of this is I am also going to introduce a premium endpoint that allows a paying user to get the absolute bleeding edge version of the ESOVDB, on-demand.

Watch Zotero for changes and sync them back to the ESOVDB. (Items & Collections)

This will have to be done by making &format=version API calls to the ESOVDB Zotero library at regular intervals, and then diffing the current and previous lists of items/collections. Anything deleted or updated can be identified by key and then the change synced back to the ESOVDB, on a regular basis.

Doing this on a push, rather than pull basis would be preferable, but AFAIK Zotero does not offer webhook triggers/events. I would have to build my own Zotero plug in to trigger these kind of push sync requests, perhaps when content in a certain folder/collection is changed.

Push create and update video events via public webhooks, as well as create and update tag and series events

Determine name of new front-end UI/UX

Allows users to submit suggestions for updates/additions to specific fields on existing video records

Two possible methods of execution

New "Updates" table, analogous to "Submissions", where records are duplicates of existing records, with different field content.
Leverage existing "Submissions" table, distinguish as "update" behavior by simply using whether a "Submission" record already has a Zotero Key+Version or not, and/or corresponding "Video" record in the DB.

In the submissions table, indicate visually which are new submissions and which are updates, and which fields are being updated, with colors.

Set up Pipedream webhook workflow to receive Cronitor alerts and post them to the #admin channel on the ESOVDB discord

Add delete button in Airtable that runs an Airtable script on the selected record to sync its deletion with its counterpart in Zotero via new DELETE API endpoint, for both Videos and Series on the ESOVDB

Break up series sync in Zotero to divide Series by their series category membership on the ESOVDB

This will involve:

Create Zotero collections for all high level series categories in Zotero
Map all high level series categories in the ESOVDB to their new collection in Zotero by Zotero Key
Write Zotero JS script that loops through all series collections and places them in the correct parent collection. May have to use an external JSON file for the missing parent-child relationship information
Amend all series code to also add the parent collection when syncing to Zotero, such that any video that is in a series subcollection is also in its parent.

*Note: the series categories will probably not be changing much, so I won't write any code there for CRUD of that type of information.

I may consider writing code to handle changing a series' category parent, because it might be the case often that when creating a new series I (or someone else in the future) forgets to assign the parent category, and later a bunch of series need to be moved out of "Uncategorized" series into their Zotero parent...we will see*

When an item is part of a series on the ESOVDB, assign that item to or create a new collection for that series under the 'Series' collection in the ESOVDB Zotero library.

avanavana / esovdb-api Goto Github PK

esovdb-api's People

Contributors

Stargazers

Watchers

Forkers

esovdb-api's Issues

Get all webhooks

Create webhook(s)

Update webhook(s)

Delete webhook(s)

Formats

Two possible methods of execution

Recommend Projects

Recommend Topics

Recommend Org