Giter Club home page Giter Club logo

debiman's Introduction

debiman

Actions workflow Go Report Card

debiman logo

Goals

debiman makes (Debian) manpages accessible in a web browser. Its goals are, in order:

  1. completeness: all manpages in Debian should be available.
  2. visually appealing and convenient: reading manpages should be fun, convenience features (e.g. permalinks, URL redirects, easy navigation) should be available
  3. speed: manpages should be quick to load, new manpages should be quickly ingested, the program should run quickly for pleasant development

Currently, there is one known bug with regards to completeness (#12).

With regards to speed, debiman can process all manpages of Debian unstable in less than 10 minutes on a modern machine. Incremental updates complete in less than 15 seconds. For more details, see PERFORMANCE.md.

Prerequisites

  • mandoc
  • a local or remote Debian mirror or an apt-cacher-ng running on localhost:3142
  • a number of Go packages (which go get will automatically get for you, see below)
    • pault.ag/go/debian
    • pault.ag/go/archive
    • github.com/golang/protobuf/proto
    • golang.org/x/crypto/openpgp
    • golang.org/x/net/html
    • golang.org/x/sync/errgroup
    • golang.org/x/text

Architecture overview

debiman works in 4 stages:

  1. All Debian packages of all architectures of the specified suites are discovered. The following optimizations are used to reduce the number of packages, and hence the input size/required bandwidth:
    1. packages which do not own any files in /usr/share/man (as per the Contents- archive files) are skipped.
    2. each package is downloaded only for 1 of its architectures, as manpages are architecture-independent.
  2. Man pages and auxiliary files (e.g. content fragment files which are included by a number of manpages) are extracted from the identified Debian packages.
  3. All man pages are rendered into an HTML representation using mandoc(1).
  4. An index file for debiman-auxserver (which serves redirects) is written.

Each stage runs concurrently (e.g. Contents and Packages files are inspected concurrently), but only one stage runs at a time, e.g. extraction needs to complete before rendering can start.

Development quick start

Set up Go

Install the latest supported version of Go from https://go.dev/dl. If you prefer to install Go from Debian, ensure you get the same version — if you use Debian stable, you likely need to install from backports.

Install debiman

To download, compile and install debiman to ~/go/bin, run:

go install github.com/Debian/debiman/cmd/...@main

Run debiman

To synchronize Debian testing to ~/man and render a handful of packages, run:

~/go/bin/debiman -serving_dir=~/man -only_render_pkgs=qelectrotech,i3-wm,cron

Test the output

To serve manpages from ~/man on localhost:8089, run:

~/go/bin/debiman-minisrv -serving_dir=~/man

Note that for a production setup, you should not use debiman-minisrv. Instead, refer to the web server example configuration files in example/.

Recompile debiman

To update your debiman installation after making changes to the HTML templates or code in your debiman git working directory, run:

go generate github.com/Debian/debiman/...
go install github.com/Debian/debiman/cmd/...

Synchronizing

For https://manpages.debian.org, we run:

flock /srv/manpages.debian.org/debiman/exclusive.lock \
nice -n 5 \
ionice -n 7 \
debiman \
  -sync_codenames=oldstable,oldstable-backports,stable,stable-backports \
  -sync_suites=testing,unstable,experimental \
  -serving_dir=/srv/manpages.debian.org/www \
  -local_mirror=/srv/mirrors/debian

…resulting in the directories wheezy/, wheezy-backports/, jessie/, jessie-backports/, testing/, unstable/ and experimental/ (respectively).

Note that you will NOT need to change this command line when a new version of Debian is released.

When interrupted, you can just run debiman again with the same options. It will resume where it left off.

If for some reason you notice corruption or other mistakes in some manpages, just delete the directory in which they are placed, then re-run debiman to download and re-process these pages from scratch.

It is safe to run debiman while you are serving from -serving_dir. debiman will swap files atomically using rename(2).

Customization

You can copy the assets/ directory, modify its contents and start debiman with -inject_assets pointed to your directory. Any files whose name does not end in .tmpl are treated as static files and will be placed in -serving_dir (compressed and uncompressed).

There are a few requirements for the templates, so that debiman can re-use rendered manpages (for symlinked manpages):

  1. In assets/manpage.tmpl and assets/manpageerror.tmpl, the string <a class="toclink" is used to find table of content links.
  2. </div>\n</div>\n<div id="footer"> is used to delimit the mandoc output from the rest of the page.

interesting test cases

crontab(5) is present in multiple Debian versions, multiple languages, multiple sections and multiple conflicting packages. Hence, it showcases all debiman features.

w3m(1) has a Japanese translation which is only present in UTF-8 starting with Debian jessie. It also has a German translation starting with Debian stretch.

qelectrotech(1) has a French translation in 3 different encodings (none specified, ISO8859-1, UTF-8).

mysqld(8) is present in two conflicting packages: mariadb-server-core-10.0 and mysql-server-core-5.6.

recommended reading

https://wiki.debian.org/RepositoryFormat

URLs

The URL schema which debiman uses is (<suite>/)(<binarypkg/>)<name>(.<section>(.<lang>)). Any part aside from name can be omitted; here are a few examples:

Without suite and binary package:

  1. https://manpages.debian.org/i3
  2. https://manpages.debian.org/i3.fr
  3. https://manpages.debian.org/i3.1
  4. https://manpages.debian.org/i3.1.fr

With binary package:

  1. https://manpages.debian.org/i3-wm/i3
  2. https://manpages.debian.org/i3-wm/i3.fr
  3. https://manpages.debian.org/i3-wm/i3.1
  4. https://manpages.debian.org/i3-wm/i3.1.fr

With suite:

  1. https://manpages.debian.org/testing/i3
  2. https://manpages.debian.org/testing/i3.fr
  3. https://manpages.debian.org/testing/i3.1
  4. https://manpages.debian.org/testing/i3.1.fr

With suite and binary package:

  1. https://manpages.debian.org/testing/i3-wm/i3
  2. https://manpages.debian.org/testing/i3-wm/i3.fr
  3. https://manpages.debian.org/testing/i3-wm/i3.1
  4. https://manpages.debian.org/testing/i3-wm/i3.1.fr

debiman's People

Contributors

532910 avatar anarcat avatar dependabot[bot] avatar edwardbetts avatar gsquire avatar jwilk avatar jzacsh avatar kebertxela avatar pabs3 avatar sevan avatar stapelberg avatar vincentbernat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

debiman's Issues

Add a print stylesheet

The man page should fill the entire page. Whether there should be any borders or headers should be the user’s preference (via the print dialog).

optimization: tokenize HTML or process textually entirely

Tokenizing shaves off about 1 minute on a 6 minute rendering of Debian unstable.

The code is not entirely straight-forward to port due to the HTML-tag-agnostic cross reference detection (e.g. for <i>crontab</i>(5)) which requires us to keep state after all.

If we could improve mandoc’s cross reference detection and id generation, we could probably get away with textually processing the HTML, which has the potential to shave off another 30 seconds.

Provide html anchors for Options

Hello,

Sometimes my usage of manpages is to send someone documentation about a command or specifically an option of a command. It might be nice to have anchors in the options as well to point exactly where somebody wants.

Cheers

table of contents links with spaces don’t work

From the debian-doc mailing list:

One little remark : the links in the table of content are inactives because spaces are not translated to underscores :
<a class="toclink" href="#EXIT%20STATUS" title="EXIT STATUS">EXIT STATUS</a>

And then :
<h1 id="EXIT_STATUS">EXIT STATUS<a class="anchor" href="#EXIT_STATUS">

Un-brand templates

Other people might want to make their debiman repositories publically available (intentionally or unintentionally). The default template should not be Debian-branded so that no confusion is created with regards to whether such repositories are to be considered official or not.

Paragraph (code?) not indented as in man

Looking at crontab(5) in my manpage viewer shows code blocks (is what I think they are) being indented, like so:

will not work as you might expect. And neither will this work

    A=1
    B=2
    C=$A $B

There will not be any subsitution for the defined variables in the last value.

However not on the page, where it is rendered like all other text:

...
will not work as you might expect. And neither will this work

A=1
B=2
C=$A $B

There will not be any subsitution for the defined variables in the last value.

Browsers:

$ firefox -v
Mozilla Firefox 45.6.0
$ chromium --version
Chromium 55.0.2883.75 built on Debian stretch/sid, running on Debian stretch/sid

Render HTML versions of symlinked manpages instead of symlinking them

…otherwise, the navigation is confusing.

Take for example the crontab(5) manpage, which is present in the cron and bcron-run binary packages. Looking at jessie/cron/crontab.5.en.html, you can see the conflicting packages, but looking at jessie/bcron-run/crontab.5.en.html, you can’t. This is because the latter is a symlink to jessie/bcron/bcrontab.5.en.html, which doesn’t conflict.

We should restructure renderAll so that it first processes regular files, then processes symlinks. That way, we can re-use the content fragment and avoid duplicate rendering cost.

add title= argument to opensearch reference for Firefox

Reported via email:

We should replace

<link rel="search" type="application/opensearchdescription+xml" href="https://manpages.debian.org/opensearch.xml">

with

<link rel="search" title="Debian manpages" type="application/opensearchdescription+xml" href="https://manpages.debian.org/opensearch.xml">

package renames break navigation between different Debian versions

Easy to reproduce, either use the search to get to the manpage for systemd-nspawn of Jessie's version and see that you don't find a link to manpages of systemd-nspawn of the Testing and Unstable versions.
Alternatively, browse the manpage repositories of Testing and Unstable to not find the manpages of systemd-nspawn. The same is true for machinectl.
Might this be caused by the move of these tools to a separate package?

Investigate raw manpage deduplication

Manpages can be identical across different Debian versions. Our corpus of extracted manpages weighs about 2.5 GB. We should investigate whether it’d be worthwhile to de-duplicate content (by using hard links, most likely). If the savings are not substantial, it might be more trouble than it’s worth.

launch

This checklist tracks items that need to be fulfilled before we can launch:

  • set up git mirror on alioth: https://anonscm.debian.org/cgit/users/stapelberg/debiman.git/
    • git clone --mirror https://github.com/Debian/debiman/
    • crontab: 7 * * * * (cd /git/users/stapelberg/debiman.git && git fetch --quiet -p origin && git gc --quiet)
  • set up git repository for debian assets on alioth (pending permissions issue): https://anonscm.debian.org/cgit/srv-manpages/debian-assets.git
    • sudo -u manpages git -c http.sslCAInfo=/etc/ssl/ca-debian/ca-certificates.crt clone https://anonscm.debian.org/git/srv-manpages/debian-assets.git /srv/manpages.debian.org/debiman/debian-assets
    • sudo -u manpages git config --local --add http.sslCAInfo /etc/ssl/ca-debian/ca-certificates.crt
  • get mandoc into jessie-backports
  • get go 1.7 into jessie-backports
    • have go 1.7 installed on manziarly (pending: https://rt.debian.org/Ticket/Display.html?id=6538)
      • set up a group-writeable checkout of debiman so that jfs et al. can do maintenance when necessary
        • mkdir -p /srv/manpages.debian.org/debiman/gopath
        • chmod -R g+ws /srv/manpages.debian.org/debiman
        • setfacl -m "default:group::rwx" /srv/manpages.debian.org/debiman
        • add GOPATH environment variable to shell setup
  • verify debiman compiles with go 1.7
  • successfully run debiman
  • touch .nobackup in the output directory, because re-generating is simpler than restoring from backup
  • implement index swapping in debiman-auxserver (possibly triggered by SIGHUP, or use restart for now)
  • change apache configuration to serve both, the current manpages.debian.org and the debiman output
  • verify all current URLs are covered by redirects
    • /man/<section>/<name>
    • /man<section>/<name>
    • /<lang>/man<section>/<name>
    • /man/<suite>/<lang>/<section>/<name>
    • /man/<suite>/<section>/<name>
    • /<section>/<name>
    • /man/<lang>/<name>
    • /man/<name>
  • set up debiman-auxserver
  • set up a cronjob to update the output (with injected assets), using flock
  • prepare launch announcement email/blog post
  • get sign-off from jfs and anarcat w.r.t switching the index.html page and apache config, thereby publically launching
    • Change <Directory> directive to apply for the entire vhost
    • add ErrorDocument 404 /auxserver/%{REQUEST_URI}?%{QUERY_STRING}
    • remove /srv/manpages.debian.org/www/.htaccess
    • change robots.txt to only Disallow /cgi-bin/
  • update https://wiki.debian.org/manpages.debian.org

After a few days:

  • de-activate the CGI script

Add (lintian?) tests to enforce assumptions / best practices

  • verify that manpages with .pl, .py, … suffixes are not accidentally installed into language sub-directories
  • verify language subdirectories are correctly spelled
  • verify manpages don’t differ between architectures (FHS violation)
  • verify correct encoding

Support man.freebsd.org's syntax

FreeBSD's corresponding service supports passing the section as:

http://man.freebsd.org/printf/3

Would you consider supporting the same URL structure? (If it's unambiguous...) The advantage over http://manpages.debian.org/printf%283%29 is that neither shell-quoting (for the ( )) nor percent-encoding are required, and over http://manpages.debian.org/printf.3 that (a) man pages with dots in their names would work unambiguously (think rsyncd.conf(5)), (b) same URL structure across platforms means fewer things to concern one's brain with.

Thanks for restoring the service!

package for Debian

To make deployment and usage easier, we should create a Debian package for debiman.

optimization: reduce auxserver memory usage

Currently, auxserver uses ≈ 200MB of memory, which likely can be significantly reduced.

Reducing the steady-state memory usage leaves us more memory for actual debiman runs and the filesystem cache.

Optimize webfont loading even more

Currently, we asynchronously load WOFF webfonts unconditionally. Additionally, we should:

  • skip any loads if the fonts are installed locally
  • detect whether the browser supports WOFF2 and load WOFF2 instead

Investigate also processing Debian experimental

At least one user wanted to access a manpage which was only in experimental. Let’s see how much additional processing time/disk space would be necessary to process experimental as well.

Symlinked manpages with ambiguous target

Symlinks which resolve to a file that is not shipped within the same package might be ambiguous if multiple packages ship the resolved file. We should see if the ambiguity can be resolved, or, if that isn’t possible, if we can at least make a deterministic choice and document this limitation.

The following Debian binary packages are currently affected:

  1. mencoder, mplayer-gui (referencing mplayer.1 from either mplayer or mplayer2)
  2. aterm-ml, aterm (referencing urxvt.1 from either rxvt-unicode, rxvt-unicode-256color or rxvt-unicode-lite)
  3. mingw-w64-tools (referencing pkg-config.1 from either pkg-config or pkgconf)
  4. udhcpc, udhcpd, busybox-syslogd (referencing busybox.1 from either busybox or busybox-static)
  5. mysql-client-5.6, mariadb-client-10.0, mariadb-client-10.1 (referencing either mysql-client-core-5.6, mysql-client-core-5.7, mariadb-client-core-10.0 or mariadb-client-core-10.1)

Add Debian contrib

As per hmh:

Maybe you could consider adding the manpages from packages in contrib as
well?  Unlike non-free, the licenses in contrib are all compatible with
the DFSG, so they must not have any license restrictions that would get
in the way...

Implement slave alternative manpages

A large number of packages makes manpages available via slave alternatives. See https://codesearch.debian.net/search?q=path%3Adebian%2F+update-alternatives+--install for an upper bound.

This is tricky to implement, because slave alternatives are configured in the postinst maintainer script, i.e. via update-alternatives calls in arbitrary shell script. It would be brittle to parse that shell script without running it, not to mention that its behavior might depend on the package actually being installed.

It seems like the only solution is to actually install the package and then introspect the alternatives database, e.g. /var/lib/dpkg/alternatives/vi.

A few test cases are:

  • vi
  • pg_dump

Table columns not aligned

crontab(5) does not align the table in the description.

What I see:

field allowed values
 
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sun, or use names)

What I expect to see:

field         allowed values
-----         --------------
minute        0-59
hour          0-23
day of month  1-31
month         1-12 (or names, see below)
day of week   0-7 (0 or 7 is Sun, or use names)

Browsers:

$ firefox -v
Mozilla Firefox 45.6.0
$ chromium --version
Chromium 55.0.2883.75 built on Debian stretch/sid, running on Debian stretch/sid

upstream manpage issues

This issue documents a known limitation: some Debian packages have issues with regards to the manpages they ship.

Manpages in invalid directory

report binary package issue
850600 mirmon manpages in invalid dir (man/pm/man3)
850618 controlaula manpages in invalid dir (man/py/man1)
850619 plainbox manpages in invalid dir (man/py/man1)
850621 dhcp-probe manpages in invalid dir (man/cf/man5)
850623 partclone manpages in invalid dir (man/dd/man8)
850624 sarg manpages in invalid dir (man/man1/man1)
850625 libifd-cyberjack6 manpages in invalid dir (man/)

Manpages in wrong directory

report binary package issue
TODO clang-format-3.8 manpages in wrong dir (man/man8 instead of man/man1)

Manpages with invalid names

report binary package issue
850641 libqwt-doc manpages with wrong suffix (.3 inside man1)
850642 libexosip2-dev manpages with wrong suffix (.3 inside man1)

Dangling symlinks

report binary package issue
696277 pyro manpages are dangling symlinks
696277 pyro-gui manpages are dangling symlinks
TODO obexftp manpages are dangling symlinks
850636 maildrop manpages are dangling symlinks
850637 ksh manpages are dangling symlinks
567093 ocsinventory-agent manpages are dangling symlinks
850638 ng-cjk manpages are dangling symlinks
850638 ng-cjk-canna manpages are dangling symlinks
850638 ng-latin manpages are dangling symlinks
850639 rust-lldb manpages are dangling symlinks
850640 socks4-clients manpages are dangling symlinks
883057 libcuse4bsd-dev manpages are dangling symlinks
883059 pmud manpages are dangling symlinks
883063 freebsd-utils manpages are dangling symlinks

Quality issues

report binary package issue
852166 liblapack-doc-man contains 10000 stub manpages
TODO deja-dup ships untranslated help output for dozens of languages
883055 postgresql duplicate update-alternative calls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.