executablebooks / myst-nb Goto Github PK

View Code? Open in Web Editor NEW

199.0 199.0 79.0 3.03 MB

Parse and execute ipynb files in Sphinx

Home Page: https://myst-nb.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 34.03% CSS 4.71% Jupyter Notebook 61.25%

jupyter-notebooks markdown sphinx sphinx-extension

myst-nb's People

Contributors

Stargazers

Watchers

myst-nb's Issues

Non-notebook execution artifacts

Not sure if this is the right place, but I would like to bring up the question of storing the execution artifacts that aren't outputs, but rather external files/data. Jupyter provides a clear separation between data and outputs, namely reading the outputs isn't possible.

In developing interactive materials, it may be handy to preserve a result of a long running computation, and provide it to the user when they spin up a binder kernel. One cool option is pickling the complete kernel along the line of this recipe.

Is this within the project scope? If so: how should these artifacts be stored?

Add an ipynb Sphinx parser

Since we have a markdown parser, it should be pretty straightforward to add an ipynb parser, with the caveat that we need to figure out how to handle the cell output -> AST conversion.

It could do something like:

Read in ipynb files with nbformat
Loop through cells. For each cell:
- Create a document section with a "cell" attribute (maybe just using a container class)
- If a runnable code cell:
  - Create a document section for an "input"
  - Pass the contents as a code lbock
  - Create a document section for an "output"
  - Grab the mimebundle of the output, and decide how to convert this to a docutils node
- If a markdown cell:
  - Parse it with myst_parser and append to doc

So the output AST would be something like:

<document>
<container cell input>
    <container input>
        <code block of inputs>
    </container>
    <container output>
        <AST version of outputs>
    </container>
</container>
    <container cell markdown>
        <AST produced by myst_parser>
    </container>
</container>

Explore using jupyter-sphinx

I was looking into sphinx-jupyter and I think it might be useful as a part of our build chain. It specifies directives for parsing and running cells as a doctree transformation, and I think expects people will directly use jupyter directives as opposed to using jupyter notebooks as inputs.

Three ways I could imagine using it (off top of my head)

if we end up doing execution within sphinx we could try utilizing this directly. In this case we might want to add some caching somehow
if we keep execution pre-sphinx we could try seeing if jupyter-sphinx would be friendly to allowing a read-only ipynb parser that uses the myst markdown parser under the hood.
if they aren't OK with that, we can take heavy inspiration or reuse components from jupyter sphinx in how we render cells, outputs, etc.

I wonder if @akhmerov has thoughts on that!

Add sphinx integration testing

We could add a full sphinx integration test (like in myst-parser), but I'll save that for a later date

Originally posted by @chrisjsewell in #66 (comment)

How would this use `jupyter-cache`?

Once jupyter-cache is ready for prototyping etc, we should also figure out how to use it as a part of building Sphinx sites with notebooks. Here is one way to do it:

Each notebook will have a unique URI in the cache that is tied to its location on disk. In Sphinx, when we parse a source file we also have the file location of that file. So, when a source file is parsed that also has a key in the Jupyter Cache registry, then instead of pulling cell['outputs'] and inserting it into the cell mimebundle, we could instead grab those outputs from the cache. From then on, everything proceeds as normal. (somewhere around https://github.com/ExecutableBookProject/myst-nb/blob/master/sphinx_notebook/parser.py#L80)

So it would be something like the following (and assuming that the cache had already been run and cached before Sphinx entered the equation)

There's a configuration option like "myst_nb_use_cache"
Upon parsing a file and if the option is True, check in .jupyter_cache to see if there's a URI for the file.
- Maybe do some kind of sanity check to make sure the cache is up-to-date with the source file?
If there is, then when you get to https://github.com/ExecutableBookProject/myst-nb/blob/master/sphinx_notebook/parser.py#L80, instead grab the output corresponding to the current cell from the cache. It'll be returned as a mimebundle. Insert it as if you were doing cell['output'].
From then on, everything is the same.

Some confusion around the role / directive name for "glu"

I was just showing off some of the "glue" stuff to collaborators at UC Berkeley, and a piece of feedback that I got was that it was a bit confusing to use the word glue in the Python API (and also being the name of the package), but to use the short-hand glu: for the role and directive. They said it might be easier to use glu long-term when you're comfortable with it, but that glue would be more intuitive for newcomers.

So a few thoughts:

We could register both glu and glue?
We could replace glu with glue?

What do folks think?

How will notebook content be parsed into Sphinx

Aside from the question of what markdown flavor to support within text cells (see ExecutableBookProject/meta#14 and ExecutableBookProject/meta#19 and ExecutableBookProject/meta#18 etc for that), there is the question of what content will actually be read into Sphinx. Here's a place to discuss this.

A starting point for planning:

We assume there is always both a text-based and an ipynb-based version of any content file.
The ipynb file contains all of the outputs, and it is the thing that gets read in to Sphinx
A Sphinx parser loops through the notebook cells, and adds a docutils container around the cell that contains some cell metadata (e.g. tags, and the kind of cell)
For text cells, the Sphinx parser then parses the contents of the cell w/ a markdown parser (maybe an rST parser optionally?)
For code cells, it adds another container for "input" and parses the content as a code block. It then parses the output selectively based on what kind of output is in the mimetype.

How Jupyter Book caches / uses input content

A quick example for illustrative purposes:

The only assumption that Jupyter Book makes about the structure of incoming content is that all content is either

An ipynb file
A file that can be converted to ipynb with Jupytext

Building a Jupyter Book is a two-step process, one step is to run jupyter-book build. This does:

For any text content files, convert them to ipynb and run them top to bottom, then convert to HTML in the cache folder
For any ipynb files, assume they've already been run and convert them to HTML in the cache folder

Then, when the full book is constructed, it only uses the cached HTML versions that are in the cache folder to stitch together the book.

Another approach - riff off of what Jupyter Book does?

As I wrote the above, it got me wondering whether we want to force two versions of the content, and instead to use a cacheing system like the one described above. The main difference would be that instead of the cache step creating an HTML file (like in Jupyter Book), it would create an ipynb file that Sphinx would know how to parse into its AST when the whole book was built.

Create PyPi release

Need to get the new jupyter-sphinx version released first: jupyter/jupyter-sphinx#106 (and also ideally pydata/pydata-sphinx-theme#99)

Originally posted by @chrisjsewell in executablebooks/cli#22 (comment)

Develop text representation of IPYNB format

Develop a fully defined specification between the machine readable IPYNB and a text based representation. The emphasis will be on using one of the existing representations as much as possible (i.e. Rmarkdown).

How to handle titles?

Notebooks don't have a standard way of specifying a "title", and MyST also currently has no special syntax for titles. This is an issue to figure out how we should handle titles in our rendering.

For MyST

We could infer the title from the first header (I believe this is currently what happens)
We could decide upon a specific syntax that will be treated as a title, e.g.:
```
My title
=====
```
If no headers are found, as a fallback use the filename as a title.

For notebooks

We could adopt the following hierarchy for titles:

Check for ntbk.metadata['title']
If not found, see if the first markdown cell has either
- the "title" token we'd agree to above
- a setex heading.
- If either (in that order), use that as the title
If those aren't found, use the file name

How would titles be used in the doctree?

The other challenge is how to insert the titles into the doctree. If the document doesn't have a title, then Sphinx won't add it to the site-wide tree (see #30). So we could manually add the title to the document itself as a one-time operation. I'm not sure how to do it though, and it was a bit unclear from the MyST code. Maybe @chrisjsewell has suggestions?

Figure out how to capture parser warnings and provide more useful feedback

Sphinx provides warnings about errors that it finds on lines etc when it parses pages. However, since we are parsing cell-by-cell, line numbers don't mean anything unless you know the cell they belong to.

We should find a way to get Sphinx to report not only line numbers, but cell numbers in the warning reporting.

It seems that the warning message itself gets raised after the parsing process...so it seems like there's something inside the Sphinx machinery making this happen as opposed to one of our parsers.

Use jupyter-sphinx directives for markdown/rST pages

Right now the ipynb parser is the only thing that knows how to deal with outputs of running a cell. However, we could replicate similar behavior if we used machinery from jupyter-sphinx to run code and display outputs. I'm not 100% sure how this should happen, but this issue is to track the topic.

In particular,

if we could have code that did the same execution code that jupyter-sphinx uses (https://github.com/jupyter/jupyter-sphinx/blob/master/jupyter_sphinx/execute.py#L76)
but instead of directly inserting outputs we used the same OutputBundleNode that MyST-NB uses: https://github.com/ExecutableBookProject/MyST-NB/blob/master/myst_nb/parser.py#L136
then we could use the same CellOutputToNode transform that we use here: https://github.com/ExecutableBookProject/MyST-NB/blob/master/myst_nb/transform.py#L31

I think it'd be even better if jupyter-sphinx itself defined the CellOutputToNode object, and then Myst-NB could reuse that for both its markdown and ipynb uses. Maybe @akhmerov has thoughts on that

ping @rossbar who I think was interested in this issue

note: this is separate from the "two-way conversion between ipynb and myst nb markdown...in this case, the markdown will be directly read into Sphinx without being converted to an ipynb file first.

another note: we should consider a future where a jupyter-cache exists as well, as at some point we'l want to leverage that cache to avoid having to run all the code each time if it is in a markdown cell

Rename this repository to MystNB

I was thinking that sphinx-notebook is a really generic name, particularly because there is already both jupyter-sphinx and nbsphinx (and probably lots of other "notebook" extensions out there).

What if we focused this repository around "Jupyter notebooks with MyST" and re-branded this SphinxNB? I even mocked up a (very simple, we'd need to tweak it) logo to see how it feels:

A few considerations:

Are we reasonably sure we are sticking with "MyST"? Perhaps in general it could be called "MyST Markdown" as a colloquialism.
Is it reasonable to say this repository is very MyST-specific? The ipynb parser uses MyST to parse, and the text-based representation will also be MySTy...

thoughts? ping @chrisjsewell @jstac @mmcky

Avoid using Document in the parser

Right now we're parsing cells with a "Document" type...but this is probably too high-level because it will check for cell-level metadata etc. We should use a lower-level than this.

Looking at the Document class. It's calling "tokenize" on the list of cell

however, if I call tokenize on the split lines of the cell, I'm getting this error:

Exception occurred:
  File "/c/Users/chold/Dropbox/github/forks/python/ebp/mistletoe/mistletoe/base_renderer.py", line 97, in render
    return self.render_map[token.__class__.__name__](token)
KeyError: 'list'

should figure this out...

Improve integration with sphinx-copybutton

The copy button is inset for code cells:

Also see: https://3-241082514-gh.circle-artifacts.com/0/html/using/api.html

Error accessing `notebook.metadata.language_info.file_extension`

Trying to add a notebook to documentation in executablebooks/jupyter-cache#35, I'm getting this error

(expand for traceback)


  File "//anaconda/envs/mistune/lib/python3.7/site-packages/jupyter_sphinx/execute.py", line 259, in write_notebook_output
    ext = notebook.metadata.language_info.file_extension
  File "//anaconda/envs/mistune/lib/python3.7/site-packages/ipython_genutils/ipstruct.py", line 134, in __getattr__
    raise AttributeError(key)
AttributeError: file_extension

# Sphinx version: 2.4.0 # Python version: 3.7.6 (CPython) # Docutils version: 0.16 release # Jinja2 version: 2.11.1 # Last messages: # done # loading intersphinx inventory from https://docs.python.org/3.7/objects.inv... # building [mo]: targets for 0 po files that are out of date # building [html]: targets for 4 source files that are out of date # updating environment: # [new config] # 4 added, 0 changed, 0 removed # reading sources... [ 25%] develop/contributing # reading sources... [ 50%] index # reading sources... [ 75%] using/api # Loaded extensions: # sphinx.ext.mathjax (2.4.0) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/ext/mathjax.py # sphinxcontrib.applehelp (1.0.1) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinxcontrib/applehelp/__init__.py # sphinxcontrib.devhelp (1.0.1) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinxcontrib/devhelp/__init__.py # sphinxcontrib.htmlhelp (1.0.2) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinxcontrib/htmlhelp/__init__.py # sphinxcontrib.serializinghtml (1.1.3) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinxcontrib/serializinghtml/__init__.py # sphinxcontrib.qthelp (1.0.2) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinxcontrib/qthelp/__init__.py # alabaster (0.7.12) from //anaconda/envs/mistune/lib/python3.7/site-packages/alabaster/__init__.py # myst_parser (0.6.0) from //anaconda/envs/mistune/lib/python3.7/site-packages/myst_parser/__init__.py # sphinx_togglebutton (0.0.2) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinx_togglebutton/__init__.py # jupyter_sphinx (0.2.4a1) from //anaconda/envs/mistune/lib/python3.7/site-packages/jupyter_sphinx/__init__.py # myst_nb (0.2.1) from //anaconda/envs/mistune/lib/python3.7/site-packages/myst_nb/__init__.py # sphinx.ext.intersphinx (2.4.0) from //anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/ext/intersphinx.py # pandas_sphinx_theme (unknown version) from //anaconda/envs/mistune/lib/python3.7/site-packages/pandas_sphinx_theme/__init__.py Traceback (most recent call last): File "//anaconda/envs/mistune/lib/python3.7/site-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__ result = self[key] KeyError: 'file_extension'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/cmd/build.py", line 276, in build_main
app.build(args.force_all, filenames)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/application.py", line 349, in build
self.builder.build_update()
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/builders/init.py", line 300, in build_update
len(to_build))
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/builders/init.py", line 312, in build
updated_docnames = set(self.read())
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/builders/init.py", line 419, in read
self._read_serial(docnames)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/builders/init.py", line 440, in _read_serial
self.read_doc(docname)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/builders/init.py", line 480, in read_doc
doctree = read_doc(self.app, self.env, self.env.doc2path(docname))
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/io.py", line 316, in read_doc
pub.publish()
File "//anaconda/envs/mistune/lib/python3.7/site-packages/docutils/core.py", line 218, in publish
self.settings)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/sphinx/io.py", line 130, in read
self.parse()
File "//anaconda/envs/mistune/lib/python3.7/site-packages/docutils/readers/init.py", line 77, in parse
self.parser.parse(self.input, document)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/myst_nb/parser.py", line 133, in parse
write_notebook_output(ntbk, str(output_dir), doc_filename)
File "//anaconda/envs/mistune/lib/python3.7/site-packages/jupyter_sphinx/execute.py", line 259, in write_notebook_output
ext = notebook.metadata.language_info.file_extension
File "//anaconda/envs/mistune/lib/python3.7/site-packages/ipython_genutils/ipstruct.py", line 134, in getattr
raise AttributeError(key)
AttributeError: file_extension

@AakashGfude notes something similar in
#55 (comment)

Aakash can you add here the full traceback from the log file

Not sure where the actual fault here lies yet?
@akhmerov it also maybe is related to jupyter-sphinx, any ideas?

How to handle notebook-level metadata

In a markdown file, this is pretty straightforward - it's just the --- blocks at the top.

But for an ipynb file, we have a couple options:

Use all keys in the notebook metadata
Use a special key meant for myst-nb
Use a front-matter cell of some kind (raw cell, or check the first markdown cell for YAML?)

Currently we do the first, just wanna make sure it's an intentional decision!

Add new parser for text-based notebooks

Once mwouts/jupytext#458 is merged, and a new jupytext version is released, we can progress with the folowing:

Also these files won't work directly with the myst-parser extension (because it won't know what to do with code-cell and raw-cell). You need to add a separate parser to myst-nb that calls jupytext.myst.matches_mystnb, to work out if the markdown file should be read as pure markdown or converted to a notebook. (it should just need to be a small subclass of NotebookParser)

@AakashGfude will also need to use it in some manner in #55 to work out which markdown files to convert/execute/cache

Originally posted by @chrisjsewell in #82 (comment)

Defer cell metadata based manipulations to a transform

Currently, there are a number of places in the parser/renderer where we manipulate the docutils AST based on cell metadata

It would probably be better, for future development/extensibility, to handle these in a seperate transform, rather than 'hard-baking' them into the parser.
The full cell metadata should also be added to the CellNode, rather than just tags:

https://github.com/ExecutableBookProject/MyST-NB/blob/ab4ba1d0964a7fe0a6cd516143ccc0a472b63570/myst_nb/parser.py#L203-L205

Figure out a way for Sphinx to deal with mime bundles

Something that we have run into in implementing an IPYNB parser, as well as in thinking through an execution/cacheing step, is how to deal with the outputs of running Jupyter cells. AKA, how to handle mime bundles.

The way that jupyter-sphinx does this is to hard-code rules that parse the bundle and convert those directly to docutils objects at parse time. This isn't ideal, because this throws away information about the bundle before the rendering step, which is when you'd want that extra "mime-bundle-like" information.

So we should figure out a way to do something like:

Given the output of a Jupyter cell, find a way to either represent this in a docutils node or to represent a "pointer" in docutils that can be used to fetch the outputs later
Instructions for how, at the builder (transform?) step, this mimebundle should be grabbed and then rendered in an opinionated fashion (e.g. HTML renderers would pull JS/HTML if it's in an output, while PDF renderers would just pull static representations of the outputs)

A myst-based 'ipynb' document structure

Per a recent conversation with @mmcky and @chrisjsewell , we came up with the proposal for a MyST-based IPYNB structure.

Here is an example notebook with the latest syntactic ideas

---
kernel_info:
    name: python3
language_info:
    name: Python
title: "My notebook title"
comment: "If any of the above aren't specified then use jupyter defaults"
---

# Markdown syntax

## Cell breaks

We can manually break markdown cells quickly with this syntax

+++

### A subsection in another markdown cell

another proposal is to use

+++

### And here would be the other markdown cell...

## Markdown metadata

We can also explicitly separate a markdown cell and configure it like so:

```{markdown} tag1, tag2
---
key: val
---
## Here is some *configured* markdown!
```

And now this would be a third markdown cell

## Executable code

Code is always executed with 'execute' blocks, like so:

```{execute}
print('this would be run by the front-matter-specified, or default, kernel')
```

You can also add metadata to these

```{execute} kernelname
:key: val
:key2: val2
# Or perhaps we want a `metadata`: field for cell metadata, and other keys for options like jupyter-sphinx does
print('some python with cell metadata')
```
and that's it!

influences

Use MyST syntax to define code cells, markdown cells, and their breaks
Don't force extra syntax when it's not needed. E.g., if there's pure markdown between two code cells, treat that as a markdown cell.
Try to use markdown design influence in decisions, the syntax should suggest what it is doing.

constraints

Round-trip conversion

Content within cells

All content withing cells, as well as the breaks between cells, should be 100% round-trippable in a lossless fashion. The markup language used in markdown cells and anything inside code cells should not be modified.

Cell-level metadata

All metadata specified in markdown will be converted into the ipynb file. Conversely, a subset of cell-level ipynb metadata will be converted into markdown. TODO: figure out what subset we want...only tags? Other publishing-specific stuff?

Notebook-level metadata

The same rule applies to notebook-level metdata (and we need to figure out the subset of metadata to keep)

Proposed syntax

Notebook-level metadata

A YAML header block at the top of the document will denote notebook-level metadata

example:

---
key: value
---

# My first header

Code cells

Code cells are defined with the "execute" directive, followed by the language that should be used to execute the code. If no language is specified, then a notebook-level metadata should define the default kernel to use. YAML or : configuration at the front of the code cell will convert into cell-level metadata.

example:

```{execute} python
---
key: val
key2: val2
---
print('hi')

```{execute} python
:key: val
:key2: val2
print('hi')

Markdown cells

Markdown that's in-between code cells will be treated as a single markdown cell that separates those code cells.

If a user wants to attach cell-level metadata to some markdown, then they must use the "markdown" directive. This accepts a list of tags as a short-hand input, and also accepts YAML configuration like code cells.

example:

```{markdown} tag1, tag2
:key1: val1

# This is my markdown
```

Simple markdown cell splits

To define a split between two markdown cells, but without attaching extra metadata to those cells, there is a short-hand one-liner:

+++

This simply defines where one block of markdown content should become two markdown cells. If the author wishes to add extra metadata to one of the markdown cells, they should instead use the

```{markdown}
```

pattern

Support for DOCX

Migrated issue from: executablebooks/jupyter-book#229 (comment)

I've been working on using the new Jupyter Book to export DOCX and LaTeX files along with the standard download option available to get the source files. I've already made some progress, see the link above. Currently, I'm running into the following issue (copy/paste from link above):

I extended the jupyter-book build command to automatically also generate LaTeX (this works already, since it's built into Sphinx) and found a docx extension for sphinx (here: https://docxbuilder.readthedocs.io/en/latest/index.html). I also added docx to the MyST-NB transform.py render priority. So far, it's like this:

WIDGET_VIEW_MIMETYPE = "application/vnd.jupyter.widget-view+json"
RENDER_PRIORITY = {
    "html": [
        WIDGET_VIEW_MIMETYPE,
        "application/javascript",
        "text/html",
        "image/svg+xml",
        "image/png",
        "image/jpeg",
        "text/latex",
        "text/plain",
    ],
    "latex": ["text/latex", "text/plain"],
    # PG: Not sure about this...
    "docx": ["text/plain"]
}
RENDER_PRIORITY["readthedocs"] = RENDER_PRIORITY["html"]

However, now I'm running into errors:

unning Sphinx v2.4.4
Adding copy buttons to code blocks...
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [docx]: pass
updating environment: [config changed ('author')] 30 added, 0 changed, 0 removed
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/references.bib in bibtex cache... up to date                                                          
checking for /Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/mdrefs.bib in bibtex cache... up to date                                                     
reading sources... [100%] test_pages/test                                                                                                                                      
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
processing JupyterBook.docx... 
resolving references.../Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/03_build.md:11: WARNING: None:any reference target not found: 02_create
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/guide/04_publish.md:12: WARNING: None:any reference target not found: 03_build

writing... WARNING: Missing refuri :guide/old_docs/features/titles
WARNING: Missing refuri :guide/features/hiding
WARNING: Missing refuri :guide/old_docs/features/interactive_cells
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/markdown.md:: WARNING: Not support remote image files yet
WARNING: Missing refuri :features/features/myst#project-jupyter-proc-scipy-2018
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/old_docs/features/layout
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/notebooks.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/features/hiding.ipynb:: WARNING: Ignore unknown node CellNode
WARNING: Missing refuri :features/features/citations#holdgraf-rapid-2016
WARNING: Missing refuri :features/features/citations#holdgraf-evidence-2014
WARNING: Missing refuri :features/features/citations#holdgraf-portable-2017
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/layout_elements.ipynb:1: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/code.ipynb:: WARNING: Ignore unknown node CellNode
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:209: WARNING: Not support remote image files yet
/Users/pgierz/Documents/Code/ExecutableBookProject/cli/docs/test_pages/limits.md:212: WARNING: Not support remote image files yet

Exception occurred:
  File "/Users/pgierz/opt/miniconda3/envs/jbook_dev/lib/python3.7/site-packages/docxbuilder/writer.py", line 1196, in visit_Text
    self._doc_stack[-1].add_text(node.astext())
AttributeError: 'Table' object has no attribute 'add_text'
The full traceback has been saved in /var/folders/nn/sdjny2nn5v338x7w6q999yt00000gp/T/sphinx-err-71o5gf09.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

I guess this comes out of the docx-converter but maybe my changes for MyST-NB aren't complete yet...?

Kernel path settings

Notebooks being executed often require external assets: importing scripts/data/etc. These are prepared by the users.

At the same time, the notebook files will likely be generated during the build process, and not executed by the user directly. It's working directory will be defined by the kernel creation parameters.

Therefore, I think, it is a worthy point to discuss: how will the users access external assets from their executable code?

always use jupytext to read in markdown files?

I just wanna check my intuition here. Now that jupytext can read myst-markdown, why don't we simply always use jupytext to load in any markdown or ipynb file? If the file is not a "myst-notebook" style, then it'll simply be read in as a single markdown cell in a notebook, and then the parser should just treat it exactly as it would treat a regular .md file. If the file does have myst notebook syntax, then it will properly be read as a notebook. Any reason not to take this approach?

How to include glue artifacts in notebooks that aren’t built in the book

Another use case that might be worth documenting, is that now you could have a notebook(s) that doesn't actually show up in the documentation (you signify this to sphinx by adding "orphan: True" in the notebook metadata, see file-wide-metadata), then you can have a separate markdown file that uses these roles/directives to show selected figures/data from one or more notebooks.

I've talked about this use case before; when you want to use the notebook(s) as a "log" of your data analysis, then have a separate document which is the final "publishable" output.

originally from @chrisjsewell in #77

Figure out why `ipynb_checkpoints` aren't being skipped even if in `exclude_patterns`

Currently Sphinx tries to build anything it finds with ipynb even if it's in a folder that is excluded by exclude_patterns. I'm not sure why...@chrisjsewell any thoughts on why this might be?

Add Sphinx integration testing

Since this is a Sphinx parser, we should add some tests that use Sphinx infrastructure to make sure it behaves as expected. As a start, something like:

A notebook that has all of the syntax we'd expect to see from users
A test suite that builds a site from this notebook, and checks the outputs to make sure they behave as expected (or maybe tests the AST?)

Ideally, we could set this up so that, as functionality grows, we can naturally extend the test suite. So I'd like to not do anything too bespoke and hacky.

@chrisjsewell or @mmcky or @akhmerov do you have any suggestions or template code you can contribute to start us off with a good way to test Sphinx projects?

Code cell prefixes

As per https://nbsphinx.readthedocs.io/en/0.5.1/code-cells.html#Code-Cells,
it would be nice to have the option of including the cell prefixes [1]: (or in a format specified in the conf.py)

@akhmerov I don't think this is possible with jupyter-sphinx at present?

More generally, we should probably also being looking at nbsphinx (and talking to mgeiger) and thinking about what the synergies are: does it have other features that we would also want to implement, can we work together or are we essentially competitors??

Brainstorm if / how much of this to contribute up to `jupyter-sphinx`

It seems like there are a few major things that this repository does or will do, that jupyter-sphinx could do:

Refactor the output mimebundle handling to use a post-transform
Adds a myst-based ipynb reader
Will probably add its own directives to provide markdown-based structure of an ipynb file

Is any of that in-scope for jupyter-sphinx? I think the biggest bottleneck is that we currently depend on myst, which is still early-stages. But at least number 1 could be upstreamed.

Rough list of stuff to add

Improvements to package structure (to make it easier to upstream more changes) (jupyter/jupyter-sphinx#103)
Improvements to AST objects and post-transform for docutils nodes (jupyter/jupyter-sphinx#107)
Inline outputs (#24 (comment))
Thebelab improvements so we can re-use here

Curious what @akhmerov thinks about that

Update `myst-parser` version

This: https://github.com/ExecutableBookProject/MyST-NB/blob/51e97a23ed5232d558b16042ab65b2b667a33235/myst_nb/parser.py#L78

will now be:

        document = Document.read(cell["source"], front_matter=False)

When you update the dependency to myst-parser~=0.4.1. See: https://github.com/ExecutableBookProject/MyST-Parser/blob/786d99bba3c81fe86ca81ddeddb721b4a1fc3473/myst_parser/block_tokens.py#L44

Nothing else should have changed.

Use `glue` as a placeholder for `glu:any`?

As I was typing the docs in #77 , I found it a bit tedious to keep typing glu:any. I wonder if we could use the full word glue for this particular use-case (and keep glu:any for more explicit usage).

So, for any output-specific command, use glu:text, glu:figure, etc. For generic outputs, use glue or glu:any.

What do folks think?

Deal with markdown outputs of cells

We don't currently deal with the case that cells output markdown content, we should do so!

Would we want the markdown to become part of the top-level doctree, or would we want it to remain inside the CellOutputNode container it is a part of?

originally mentioned in #30

Fix line number / cell index reporting

So that sphinx will report errors/warnings referring to correct place in the notebook

circleci installs aren't working

Currently the circleci builds are failing with an installation error (missing dependencies it seems). I think this is probably because I'm unfamiliar with the "extras" install pattern that this repository is using (which I pulled from myst_parser). Maybe @chrisjsewell can advise there.

Once we get the docs building it'll be easier to start sharing how different visualization changes etc work.

Add the ability to execute notebooks

In addition to full jupyter-cache support, we should also probably allow for lightweight execution of notebooks at build time, so that users don't have to embed outputs with their notebooks if they wish for them to be in the docs.

We could do this with nbclient, I think the question is how to handle things like "known exceptions" etc.

A simple rubric to follow would be what nbsphinx does, and decide whether to execute a notebook top-to-bottom if it has no outputs in any cells.

Writing the Sphinx AST to HTML

We discussed what'd be the best way to write Sphinx AST to both HTML and PDF. It sounds like going to PDF will require going through Latex, while HTML will use Sphinx's HTML templating engine. This issue is to keep track of the HTML side of this.

HTML

Would require two things:

A slight extension of the Sphinx HTML writer (https://github.com/sphinx-doc/sphinx/blob/master/sphinx/writers/html.py#L68) to include relevant notebook metadata (e.g., divs for cells, cell tags, etc)...maybe this could be done with the container directive?
A custom theme in Sphinx that knows how to make sense of these extra tags, html structure, etc.

myst and html comments

Pull request #94 shows an issue I'm having with an html comment that is ok in nbsphinx but requires a keyboard interrupt to quit out of myst-nb

Plotly example seems broken

I see

On this page: https://myst-nb.readthedocs.io/en/latest/use/interactive.html

Consider using a sphinx domain for paste roles/directives

With the paste role/directive introduced in b46c2f1 we also have the opportunity to exert a lot of control over the output. I could envisage creating a sphinx domain, e.g.

```{paste:literal} key1
```

```{paste:figure} key2
:name: label

A caption...
```

```{paste:subfig}
- key3
- key4
```

Originally posted by @chrisjsewell in #66 (comment)

Section headings in markdown render nodes outside of the InputCell nodes

I just noticed that we have a bug in how the rendered docutils nodes are put (or not) inside of the CellInputNode that should contain them. I think I know what the problem is.

For each markdown cell, we create a CellInputNode and set it as the current node. Then we render the markdown content inside. What we want is for whatever nodes are spit out to be put inside the CellInputNode (as children). This usually works for most markdown. However, if there are section headers, or other syntax that should trigger a new section, then the render function creates a new section and starts rendering outside of the CellInputNode. I think this is where that happens for headers:

https://github.com/ExecutableBookProject/myst_parser/blob/develop/myst_parser/docutils_renderer.py#L259

@chrisjsewell can you think of a way that we can ensure the renderer is always placing objects inside the CellinputNode?

Checkout sphinxtr

I just come across sphinxtr, and the sphinx-notebook fork,
which has some nice features; like subfigures, etc (see https://github.com/jterrace/sphinxtr#changes)

Look at SoS notebooks

These are interesting: https://vatlab.github.io/sos-docs/doc/user_guide/multi_kernel_notebook.html
it means MyST-NB can support multiple languages in a single file, without having to split them up in to separate notebooks (like jupyter-sphinx does)

BUG: exception encountered when building a lecture from Quantecon

I created a sphinx project with myst_nb , myst_parser as dependencies.
When I used the example notebook used in documentation as a source file :- https://github.com/ExecutableBookProject/MyST-NB/blob/master/docs/notebooks.ipynb . It worked fine, for target html . But, on using the ipynb file of the numpy lecture from the Quantecon lectures site:- https://python.quantecon.org/_downloads/ipynb/numpy.ipynb, for conversion to html, I encountered the following exception:-

The full traceback is in this log file :-

sphinx-err-vmqywraz.log

Use better variables in the initial glue example

The initial glue example uses my_variable too much (both for the key and the val)...we should make this less confusing for people

Figure out how to keep a consistent Mistletoe AST before the docutils render

In executablebooks/MyST-Parser#113 we realized that there would be some unexpected behavior as a result of parsing each cell as a separate "document". e.g., footnote definitions would not be collected across multiple cells.

We should figure out how to get around this

@mmcky mentioned this would be particularly problematic for him!

Make CellNodes use directives?

I was just thinking through the CellNode code and how this would work if somebody was writing an .imd file instead of an ipynb file. If it is an imd file, then we'd have people writing directives to denote cells, like:

```{execute}
print('hi')
```
```{markdown} hide_input
# Some markdown
```

In that case we'd be calling an execute directive and a markdown directive, respectively, that I guess would themselves add a CellNode and the correct sub-nodes to the doctree.

Perhaps this pattern (using a directive to add the nodes to the tree) could also be used in parsing ipynb files, and that way there is only one canonical way to parse-and-add executable or markdown content.

IPYNB parsing: auto-register scrapbook scraps somehow

An idea I just had when thinking about how we can reference content from one notebook to another - what if as a part of the Sphinx parser we automatically generated docutils targets from any scraps that are in a notebook during the parsing process. That way references to those targets could already exist and users could reference them with a role, similar to what we do with :ref:. I'm imagining something like:

In a notebook

Somebody writes an analysis that generates a plot they'd like to include elsewhere in their docs. They run scrapbook.glue('myplot')

In a MyST document

Somebody wants to include that "scrap" from the notebook. They only need to do something like

{scrap}`myplot`

and Sphinx just looks up myplot against a list of scraps that it has found across any parsed notebook.

Add some instruction on modifying CSS styling in docs

Somewhere in the documentation, we should probably also add some instruction on modifying CSS styling

Originally posted by @chrisjsewell in #73

Rendering each markdown cell causes incorrect section headers

I just discovered a bug (from a user's perspective) that is technically a feature in rST :-P

I just noticed that the top-level markdown header of each cell in the notebook is being treated as an H1 header for the whole document, even if the header text is technically ##.

e.g. see how Sphinx thinks there are multiple page titles for the single notebook document that is here: https://sphinx-jupyter-notebook.readthedocs.io/en/latest/

I think this is because of Sphinx deciding on page titles etc based on the order in which headers happen, so if we parse a cell with only a single ## header, Sphinx will treat that as a document-level header.

e.g., note how the Sphinx document output is the same whether the two headers have different numbers of # symbols.

So from a docutils perspective, we need to make sure that cells that are sub-sections are nested properly. Even though notebooks have a single top-level hierarchy, docutils doesn't.

@chrisjsewell does that sound correct to you? Any ideas on the right way to do this?

Code Cell Output Improvements (labels, captions, ...)

We would like to, at a minimum, allow for code outputs such as plots and tables to
have a label and caption. Then also e.g. the other options allowed by docutils figures.

Using pasting and gluing (see b46c2f1 and #70) actually gives us a lot more flexibility in this respect, to add these options with markdown cell text.
It is basically an implementation of the Model-View-Controller pattern that I have banged on about before, so I would rather use this approach.

But for 'simpler' use cases we may also want to allow for these to be set in the metadata of the code cell, under a suitable key name:

{
  "output_format": {
    "label": "fig:myfig",
    "caption": "blah blah blah",
    "width": 400
}

executablebooks / myst-nb Goto Github PK

myst-nb's People

Contributors

Stargazers

Watchers

Forkers

myst-nb's Issues

A starting point for planning:

How Jupyter Book caches / uses input content

Another approach - riff off of what Jupyter Book does?

For MyST

For notebooks

How would titles be used in the doctree?

influences

constraints

Round-trip conversion

Content within cells

Cell-level metadata

Notebook-level metadata

Proposed syntax

Notebook-level metadata

Code cells

Markdown cells

Simple markdown cell splits

Rough list of stuff to add

HTML

In a notebook

In a MyST document

Recommend Projects

Recommend Topics

Recommend Org