Giter Club home page Giter Club logo

learn-astropy-librarian's Introduction

learn-astropy-librarian

The content crawler that supplies Learn Astropy's web search.

Command line interface

Usage: astropylibrarian [OPTIONS] COMMAND [ARGS]...

  Manage the content index for the Learn Astropy project.

  Astropy Librarian helps you work with the Algolia index that powers the
  content listing and search for Learn Astropy, https://learn.astropy.org.

  Astropy Librarian is developed at https://github.com/astropy/learn-
  astropy-librarian

Options:
  -v, --verbose                   Verbose output. Use -v for info-type logging
                                  and -vv for debug-level logging.  [default:
                                  0]

  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Commands:
  delete  Delete Algolia records.
  index   Content indexing commands.

astropylibrarian index tutorial

Usage: astropylibrarian index tutorial [OPTIONS] URL

  Index a single tutorial.

Arguments:
  URL  URL for a tutorial.  [required]

Options:
  --algolia-id TEXT   Algolia app ID.  [env var: ALGOLIA_ID; required]
  --algolia-key TEXT  Algolia API key.  [env var: ALGOLIA_KEY; required]
  --index TEXT        Name of the Algolia index.  [env var: ALGOLIA_INDEX;
                      required]

  --priority INTEGER  Priority for default sorting (higher numbers appear
                      first)  [default: 0]

  --path PATH         Local path of tutorial HTML, if available.
  --help              Show this message and exit.

astropylibrarian index tutorial-site

Usage: astropylibrarian index tutorial-site [OPTIONS] SITE_DIR URL

  Index a directory of tutorial HTML files.

  This command is useful for automated CI workflows. The site_dir argument
  is the directory of tutorials built by nbcollection and url is the root
  URL where these tutorials are published on the web. This command indexes
  each HTML file as a tutorial, except for those with paths specified in the
  --ignore argument. The root index.html file is always ignored.

Arguments:
  SITE_DIR  Local path tutorial build directory  [required]
  URL       Base URL for tutorials.  [required]

Options:
  --algolia-id TEXT   Algolia app ID.  [env var: ALGOLIA_ID; required]
  --algolia-key TEXT  Algolia API key.  [env var: ALGOLIA_KEY; required]
  --index TEXT        Name of the Algolia index.  [env var: ALGOLIA_INDEX;
                      required]

  --ignore TEXT       List of HTML files to ignore from indexing. The root
                      index.html file is always excluded.  [default:
                      (dynamic)]

  --help              Show this message and exit.

astropylibrarian index guide

Usage: astropylibrarian index guide [OPTIONS] URL

  Index a guide.

Arguments:
  URL  Root URL for a guide.  [required]

Options:
  --algolia-id TEXT   Algolia app ID.  [env var: ALGOLIA_ID; required]
  --algolia-key TEXT  Algolia API key.  [env var: ALGOLIA_KEY; required]
  --index TEXT        Name of the Algolia index.  [env var: ALGOLIA_INDEX;
                      required]

  --priority INTEGER  Priority for default sorting (higher numbers appear
                      first)  [default: 0]

  --help              Show this message and exit.

astropylibrarian delete

Usage: astropylibrarian delete [OPTIONS] URL

  Delete Algolia records.

Arguments:
  URL  Root URL to delete  [required]

Options:
  --algolia-id TEXT   Algolia app ID.  [env var: ALGOLIA_ID; required]
  --algolia-key TEXT  Algolia API key.  [env var: ALGOLIA_KEY; required]
  --index TEXT        Name of the Algolia index.  [env var: ALGOLIA_INDEX;
                      required]

  --help              Show this message and exit.

Development primer

Before developing learn-astropy-librarian, set up a new Python virtual environment. Then, install the application with development dependencies:

make init

This command installs pre-commit hooks for code linting, installs tox, resets the tox environment, and installs the package itself.

You can run all tests through tox:

tox

You can also run tox environments individually:

  • tox -e py runs unit tests with Pytest.
  • tox -e lint runs code linters (such as flake8 and pre-commit).
  • tox -e typing runs mypy to check type annotations.

learn-astropy-librarian's People

Contributors

adrn avatar dependabot[bot] avatar jonathansick avatar

Watchers

 avatar  avatar  avatar  avatar

learn-astropy-librarian's Issues

Add ability to run on local copies of a site for CI

In order to run learn-astropy-librarian from CI, it would be useful to have a capability to run against local copies of the tutorial HTML. This will prevent any sync issues scraping content from the website during a build-triggered re-indexing of the Learn Astropy content.

Link to WIP tutorials

There are often partially-completed tutorials, such as those that have complete functionality but are missing descriptive text, that are still valuable to share with the community. I suggest that these should be linked and marked with a "WIP" tag to increase their visibility and perhaps prevent duplication of effort.

Encode Python package keywords hierarchically

Rather than separate astropy and Python package keywords, we can combine both into a single Python package keyword listing. Further, those keywords can be hierarchical to accomodate a drill-down in Python subpackages. See https://www.algolia.com/doc/api-reference/widgets/hierarchical-menu/react/

This also related to #14, which is about introspecting Python package keywords from code imports, but I think these can be implemented mostly independently.

Break records across prose+code block boundaries and clean Jupyter notebook in/out furniture

To get cleaner snippet highlighting from content that includes code, we might want to break records down to just prose and just code blocks and then add a flag to the record indicating whether it's code or not. This could let us style matches to code blocks using a monospace font, and hopefully even retain line formatting.

As well, we probably want to strip the In/Out furniture that Jupyter adds around code blocks and their outputs.

Metadata standard for guides

In tutorials, we currently extract metadata based on a standardized format. For guides, with only one currently available, there isn't any such standard. And further, there's some metadata types that aren't currently extracted for guides: namely keywords. This is a good opportunity to design a metadata system for guides that is straightforward for authors to use, and provides a rich set of metadata that can be reliably extracted by Librarian.

One possibility is to use standard HTML meta headers in the JupyterBook output. Another possibility is to include a JSON sidecar file with each JupyterBook that's linked via a <link> header.

Note a related issue, #14, to automatically capture package keywords. I think that an explicity metadata file could still be useful for types of metadata such as authors and scientific domain/task keywords that may be tricky to extract from the JupyterBook HTML

Alternate metadata format for tutorials to support third-party sites

The current metadata standard for tutorials involves a specific format in the notebook content. Keywords, authorships, and descriptions are extracted heuristically from the content. For third-party tutorial sites, we can't impose this specific format. Instead, we should have a way for authors to define Learn Astropy metadata without affecting their notebook's content. Besides the schema for this metadata, the main question is the format (notebook metadata? YAML sidecar file that's hosted on the website?).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.