LSST DocHub Prototype

Prototype of LSST DocHub (www.lsst.io) as a static website generator.

The DocHub project is an effort to index LSST's meta information — such as documents, software, tickets, and conversations — to make them accessible from a single website and API. Read more about the DocHub concept in SQR-013: LSST DocHub Design.

Usage

#/usr/bin/env python
from dochubproto import DocHubProto

p = DocHubProto()
idx = p.render_index()

Configuration

DocHubProto uses the following environment variables:

KEEPER_URL (default https://keeper.lsst.codes).
LOGLEVEL (default WARNING).
TEMPLATE_DIR: directory containing Jinja2 templates (default templates).
UL_TEMPLATE_NAME: relative path to template for individual document items (default doclist.jinja2).
IDX_TEMPLATE_NAME: relative path to the index.html template (default index.jinja2).
MAX_DOCUMENT_DATA_AGE: maximum cache age in seconds of a document (default 3600).

DocHubProto API overview

check_state() returns one of:
- STATE_EMPTY ('empty')
- STATE_READY ('ready')
- STATE_REFRESHING ('refreshing')
- STATE_STALE ('stale')
A document is 'stale' if it is older than MAX_DOCUMENT_DATA_AGE.
get_document_data() and get_fresh_document_data() return a dict whose keys are document sections (e.g. DMTN) and within each section, a list ordered by document handle (e.g. dmtn-038).
render() returns an HTML unordered list entity created from the document data, encoded as UTF-8.
render_index() returns an HTML document created from the document data, encoded as UTF-8.
debug(), info(), warning(), error(), and critical() each log a message at the specified level; it uses a structlog logger to log JSON output via apikit.

Initial implementation design

Ticket: DM-9818

Background

The purpose of DocHub is make LSST's information artifacts, which are currently spread across many platforms, available and searchable from a single website. I did some research on DocHub in https://sqr-013.lsst.io last November, and that technote will provide useful background on what DocHub will (hopefully) become. But what you'll be building here is an initial prototype for DocHub. Rather than a sophisticated API+React app with JSON-LD metadata modeling, what we're looking for here is:

A static website published with LSST the Docs to the www product so that its URL will be www.lsst.io (we can alias lsst.io to www.lsst.io too).
There's no need for persistence yet in building the initial static site; all data can be obtained during build time from the keeper.lsst.codes API and from metadata.yaml files in the GitHub repositories of projects.
The ltd-dasher project is similar to what you'll build here (Jinja2 templates, with data populated from APIs), except that there's no need to make dochub-prototype a server application (at least at this stage). LTD Keeper doesn't need to trigger a DocHub rebuild everytime a new LSST the Docs build is pushed. I think that hourly builds will be sufficient. The reason I'm cautious about making this a server app is because the build will take a significant amount of time, so any client would timeout unless we build a background task queue. But if we design the entire thing to run as an asynchronous job that can be triggered by a cron or launched as a Kubernetes Job, then we get that task queue feature for free.

Python package

I think the core implementation can just be a standard Python package dochubproto (it can even be deployed to PyPI). Inside the package will be a templates directory with the Jinja2 templates and Python modules that handle website rendering (getting data from APIs and actually rendering the templates).

There can be a dochub-render.py executable for triggering a render.

Like ltd-dasher, you can use ltd-conveyor to upload the built HTML/CSS/whatever to LSST the Docs with all the appropriate caching headers.

Dockerizing

If you want, you can Dockerize and deploy dochub-prototype with Kubernetes. I was thinking of doing this as a Kubernetes Job resource. Once CronJob is available we can switch to that. The nice thing about this is then we could build a lightweight api.lsst.codes microservice that triggers a DocHub rebuild by just deploying the DocHub manifest. Again, this help prevent us from building our own task queue with celery.

If you can set up a Jenkins job or Travis Job to run this every hour that's great. But I think we can still close the epic without nailing the operational infrastructure 100%.

The index.html information content and API sources

The MVP for the sqre-s17-doceng epic is to list all technotes on www.lsst.io. We could also list LDMs and user guides (pipelines.lsst.io, firefly.lsst.io, developer.lsst.io, ltd-keeper.lsst.io, ltd-mason.lsst.io, ltd-conveyor.lsst.io) but I think that shipping just a list of DMTN, SQRs, and SMTNs would be sufficient and also useful.

Without getting into front-end design, you can treat the DMTN, SQR and SMTN sections (either all on the homepage, or as separate HTML pages) as ul lists of technote template partials.

The template partials should provide the following information for each technote:

Title (without the handle) (either from keeper.lsst.codes or metadata.yaml)
The document handle (either from keeper.lsst.codes or metadata.yaml)
The URL (from keeper.lsst.codes)
The GitHub repo URL (from keeper.lsst.codes)
Link to the edition dashboard (compute as https://product.lsst.io/v). For bonus points, use the GitHub API to state whether there are open PRs.
Date last updated (from keeper.lsst.codes)
The author list (from metadata.yaml)
The description (from metadata.yaml, if available)

Getting data from keeper.lsst.codes is straightforward as you know. You can use the GitHub API to obtain the metadata.yaml file from technote repositories.

One trick is that not all technotes are on LSST the Docs. Some of the originals are on Read the Docs, but still have metadata.yaml files. You can either work around that, or (probably better) just list technotes in LSST the Docs and I'll actually get around to porting the old technotes over.

lsst-sqre / dochub-prototype Goto Github PK

dochub-prototype's Introduction

LSST DocHub Prototype

Usage

Configuration

DocHubProto API overview

dochub-prototype's People

Contributors

Watchers

dochub-prototype's Issues

Initial implementation design

Background

Python package

Dockerizing

The index.html information content and API sources

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent