kynan / nbstripout Goto Github PK
View Code? Open in Web Editor NEWstrip output from Jupyter and IPython notebooks
License: Other
strip output from Jupyter and IPython notebooks
License: Other
@tbekolay has recently released pytest-cram on PyPI. Should we be using this instead of bundling our own @mforbes?
Both
sys.executable
and
path.abspath(__file__)
give strings with '' in the path. Git (version 1.9.4) however only interprets paths with '/' correctly, so when adding an ipynb file one encounters a " command not found" error. I have locally patched this with a
.replace('\\', '/')
but I'm not convinced that this is a safe transformation for path and file names on Linux.
In many of my notebooks I have an initialization cell who's output provides important formatting information (such as MathJaX macro definitions). It would be nice to be able to somehow signal nbstripout
to keep certain cells.
This should probably use some sort of cell metadata, but perhaps could use some sort of pragma:
# nbstripout: keep
for example.
What is the advantage of doing this vs just having Jupyter let do it as described in the docs?
Maybe the fact that one can switch it on per repo and not generally for all Jupyter notebook activities?
If that is the reason, how can one reliably implement the concept of also version-controlling the python script version of the notebook using nbstripout
? Because the trouble I'm seeing is that if I only let Jupyter do the python script generation and have nbstripout
working as a pre-commit hook, then the script conversion will suffer from creating git diff noise because the pre-commit hook is run later, not when I save the notebook. So one would have to include the script generation somehow into the nbstripout
tool I guess to have this possibility working well?
v: 0.2.8 from pip
Python 3.5 using most recent miniconda on Mac 10.11.6
(stable) └─❱❱❱ nbstripout --uninstall +5357 16:31 ❰─┘
Traceback (most recent call last):
File "/Users/klay6683/miniconda3/envs/stable/bin/nbstripout", line 11, in <module>
sys.exit(main())
File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 248, in main
sys.exit(uninstall(args.attributes))
File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
f.write(''.join(lines))
ValueError: I/O operation on closed file.
Tests are currently failing on Windows due to cram not unifying line endings, see aiiie/cram#9.
Need a MANIFEST.in
.
Line 170 of nbstripout.py reads as f.write('\n*.ipynb filter=nbstripout')
. With the context manager on line 169, this appends the filter to .gitattributes
without a trailing new line character.
This might be a matter of opinion, but it would be nice to have a trailing new character, especially because this line is being appended to the end of the file. Also, since the script includes from __future__ import print_function
, this line could be rewritten as print('\n*.ipynb filter=nbstripout', file=f)
to accomplish this.
Is there a way of using nbstripout
that would allow me to create a branch of cleaned notebooks from a branch that contains notebooks with populated output cells (eg ones with output cells populated that can be used for testing with nbval
).
I'm thinking of a private github repo workflow where there is a testing-master branch containing executed notebooks with populated test output cells that begets a release branch containing notebooks that can be zipped and distributed to students.
Presumably, a variant of nbstripout
could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?
Some simple unit tests to make sure we don't accidentally break stuff.
In the fall through import case for legacy IPython NO_CONVERT
is not being defined. Its value doesn't matter since it's ignored in that case.
I follow the demo from youtube as follow.
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test git init
Initialized empty Git repository in /Users/zane/Desktop/test/.git/
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master git add Untitled.ipynb
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master ✚ git commit -a
[master (root-commit) 8bac162] qq
1 file changed, 51 insertions(+)
create mode 100644 Untitled.ipynb
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master nbstripout --install
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master nbstripout --status
nbstripout is installed in repository b'/Users/zane/Desktop/test'
Filter:
clean = b'"/Users/zane/Desktop/one_shot_face_recognition_env/bin/python3" "/Users/zane/Desktop/one_shot_face_recognition_env/lib/python3.5/site-packages/nbstripout.py"'
smudge = b'cat'
required = b'true'
diff= b'nbstripout -t'
Attributes:
b'*.ipynb: filter: nbstripout'
Diff Attributes:
b'*.ipynb: diff: ipynb'
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master ● git add Untitled.ipynb
(one_shot_face_recognition_env) zane@LZANELI-MB1 ~/Desktop/test master ✚ git diff --cached
But after running git diff --cached
I can not see anything, it just empty. After running nbstripout --uninstall
, git diff --cached
work fine as usual.
It does not strip output until I run nbstripout Untitled.ipynb
explicitly.
As proved by #18, we should be testing on Windows. I've already set up CI on AppVeyor, however this is currently blocked by tbekolay/pytest-cram#3 because cram defaults to running tests in /bin/sh
which is obviously not available on Windows.
Hi, this is a great tool, thanks! I suppose currently nbstripout install
installs to .git/info/attributes
. Would it be possible to support installing to .gitattributes
with some flag? For instance, nbstripout install --attributes=.gitattributes
or something better. Of course, one can just run mv .git/info/attributes .gitattributes
but still it'd be just a nice little polishing to support both installing options. What do you think?
We currently only store the version in setup.py
, which is not very useful. There should be a switch nbstripout --version
and/or nbstripout version
.
Options for implementing this include:
version.py
module and load in setup.py
via execfile
(would require turning nbstripout
into a package)Any thoughts @mforbes @michaelaye?
The metadata["language_info"]["version"] = ...
causes merge conflicts when using different python versions. While this has no affect on the workings on the notebook, can we also stripout this attribute?
I found that the pip version doesn't have -t switch so had to use the git version.
$ nbstripout -t my_notebook.ipynb
usage: nbstripout [-h] [--install] [--uninstall] [--is-installed] [--status]
[--attributes FILEPATH] [--version] [--force]
[files [files ...]]
nbstripout: error: unrecognized arguments: -t
$ pip install --upgrade nbstripout
Requirement already up-to-date: nbstripout in /Users/jinyoung.kim/Package/Conda36/anaconda/envs/py27/lib/python2.7/site-packages
$ nbstripout --version
0.3.0
The clean filter fails if there is whitespace in the paths to python and/or the nbstripout.py script.
C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py: C:/Program: No such file or directory
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed -1
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed
fatal: example.ipynb: clean filter 'nbstripout' failed
When uninstalling nbstripout, only the notebook filter is supposed to be removed, however the entire attributes file is cleared.
Could you please add a feature for stripping associated py files?
In py files there are lines
# In[XXX]:
which should be stripped back to
# In[ ]:
It would be immensely helpful to have this feature!
Thanks,
Andrew
I'm running Enthought Canopy Python on Windows 10, and when running nbstripout --install
on a newly created git repo, I'm getting the following error:
Traceback (most recent call last):
File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "c:\users\bdforbes\appdata\local\enthought\canopy\user\scripts\nbstripout.exe\__main__.py", line 9, in <module>
File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 254, in main
sys.exit(install(args.attributes))
File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 160, in install
git_dir = check_output(['git', 'rev-parse', '--git-dir']).strip()
File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 566, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Everything works fine in the MINGW64 git bash terminal that is installed with git on Windows. This may be an issue for people who use the Windows Command Prompt for their git work and would like to use this helpful package.
Reference:
Would be useful to have!
Would be great to have! cc @nehalecky
$ nbstripout --uninstall
Traceback (most recent call last):
File "/lscr_paper/allan/miniconda3/envs/msth/bin/nbstripout", line 11, in <module>
sys.exit(main())
File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 248, in main
sys.exit(uninstall(args.attributes))
File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
f.write(''.join(lines))
ValueError: I/O operation on closed file.
I've added a few comments to the file, and they disappeared along with the line added by nbstripout --install
, so I guess the file was overwritten with a blank file.
The problem happens on Linux running Bash and on OS X running Zsh.
I'm using the Anaconda Python Distribution, Python 3.5.2, inside a conda environment, and I installed using pip install nbstripout
.
Currently, nbstripout
ignores keep_output
and init_cell
cells entirely. However, this means that lots of stuff beyond the output stays, such as execution_count
. I think keeping this does not meet the design goals, and it can cause Git conflicts.
Feature request: Treat these two types of cells normally, except keep the output.
Running on Anaconda for windows results in the following:
C:\Anaconda>pip install nbstripoutput
Collecting nbstripoutput Could not find a version that satisfies the requirement nbstripoutput (from ve rsions: ) No matching distribution found for nbstripoutput
Any idea how I could get this to work? Thanks!
When using IPython widgets, their state is stored in the notebook metadata. This metadata changes when interacting with the widget and should be stripped.
I'm using the (hugely useful) Execute Time
nbextension, which produces cell metadata like this:
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2016-08-15T11:49:40.736068",
"start_time": "2016-08-15T11:49:40.145965"
},
"code_folding": [],
"collapsed": false
},
"outputs": [],
"source": []
}
But naturally I end up with a load of changes seen by git after running nbstripout
because of re-running the cells and updating the ExecuteTime
field. Would you accept a pull request explicitly stripping this out, or alternatively one which strips all but whitelisted metadata entries?
Doesn't seem to be working accross OS X and Linux? Path to Python seems to be hardcoded – any suggest workarounds will be greatly appreciated :)
I often work on a remove server, sometimes issuing command in the ssh terminal (Linux, Bash), and at other times in my local terminal (OS X, Zsh), since I also mount the relevant folders using sshfs.
nbstripout seems to be working fine on the platform on which run nbstripout --install
, but crashes on the other (and crashes commands such as git status
and git diff
). The behavior is the same whether I run the install-command on Linux and test on OS X, or vice versa.
Example output (from Linux, install-command on OS X):
$ git status
/Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: 1: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python: not found
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed
Example output (from OS X, install-command on Linux):
(py35) ❯ git status
/lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py: /lscr_paper/allan/miniconda3/envs/msth/bin/python: No such file or directory
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed
Add a CONTRIBUTING.rst
file.
I've had a go at recording a screencast. Not quite happy with it yet, I think it could be much shorter. Any suggestions welcome!
Steps to reproduce
$ cat /dev/null | nbstripout
Expected results
Zero bytes of output; exit code 0.
Actual results
Stack trace; exit code 1. E.g.:
Traceback (most recent call last):
[...]
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
[...]
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: ''...
Comments
Version 0.3.1 on OS X, installed with Conda.
This make it harder to recover from bad Git commits, e.g. caused by #55, because Git won't do anything unless its filters exit successfully. One must disable the filter, untangle things, and then re-enable it.
Thank you!
Hi, sorry if this is already possible, but it would be great to apply this retroactively to a repository where I have already committed notebook files with output. I would think this would involve some kind of filter-branch wizardry. Is it already possible? Is this a feature you would want added?
Hi,
When stripping a notebook that contains unicode I get an error if I use nbstripout as a filter, piping to a file, e.g. cat test.ipynb | nbstripout > out.ipynb
, git also fails
File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbstripout-0.2.2-py2.7.egg/nbstripout.py", line 176, in main
write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbformat/__init__.py", line 169, in write
fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 20464: ordinal not in range(128)
then sys.stdout.encoding
is None
(which defaults to 'ascii'
codec)
As a workaround I found to modify the last few lines of nbstripout.py:
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
(according to this stack overflow entry, but I am not sure if this is really a correct solution )
I am on os x, anaconda python 2.7 with nbformat 4.0.1
Gregor
PS thanks for your work, I really like nbstripout install
I'd like to combine nbdime and nbstripout when working with Git repositories: (in some cases) I don't want the output to be committed, but I would still like to use the nicer diff of the input cells provided by nbdime. Simply installing both does not work: nbdime always seems to take over meaning that the output isn't stripped out anymore when nbdime is active.
Maybe there is a simple configuration solution to that, but I am not familiar enough with the Git plugin architecture to come up with any.
I am using nbstripout 0.2.9, nbdime 0.1.0, and Git 2.11.0 on a Mac.
Everytime i go into a repo with lots of notebooks, it takes several seconds before i get my prompt back, which can be annoying...
I'm wondering if the official PreProcessor of the notebook tools is faster than your manual filtering?
I am trying to run nbstripout programmatically within a Python script and create a new file from the input file (and not in-place). The documentation doesn't have this use case covered. Is there a way I can do it?
Am I using it wrong?
(py35)
klay6683 at macd2860 in ~PYTHONPATH/planet4 (master●)
$ nbstripout install
Traceback (most recent call last):
File "/Users/klay6683/miniconda3/envs/py35/bin/nbstripout", line 9, in <module>
load_entry_point('nbstripout==0.2.0', 'console_scripts', 'nbstripout')()
File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 127, in main
sys.exit(install())
File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 111, in install
attrfile = path.join(git_dir, 'info', 'attributes')
File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/posixpath.py", line 89, in join
genericpath._check_arg_types('join', a, *p)
File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/genericpath.py", line 145, in _check_arg_types
raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
(py35)
Frequently the output is useful as informal "testing" of results,
and there is very little overhead in keeping non-graphical results.
Images, however, add considerable unwanted bulk to a commit
for any version control system (unless those images are very
expensive to reproduce, or are historical for some reason).
Proposal: provide an option to strip out only output cells with images.
I have something weird going on with the status of the "collapsed": true
(and false
) lines in a git diff of a .ipynb file.
I did a commit, nbstripout did its work, and I can see that what was commited contains "metadata": { "collapsed": false },
(as seen with git diff HEAD^
as well as on the remote Gitlab repo).
I pulled this commit on another computer. Now when I do a git diff
, I get this:
"cell_type": "code",
"execution_count": null,
"metadata": {
- "collapsed": true
+ "collapsed": false
},
"outputs": [],
"source": [
So the current state of the file is correct, but it thinks the repository contains this line as "collapsed": true
, even though when I do git diff HEAD^
i can see that commited line contains "false". Doing git checkout .
does not change anything.
Is it possible that nbstripout's handling is not symmetric in committing/checkout vs. status and diff?
P.S. On both computers I'm using Anaconda (Python 3.5.2), with nbstripout 0.2.9 from conda-forge. Config:
$ nbstripout --status
nbstripout is installed in repository b'/path/to/my/repo'
Filter:
clean = b'/path/to/anaconda3/bin/python /path/to/anaconda3/lib/python3.5/site-packages/nbstripout.py'
smudge = b'cat'
required = b'true'
Attributes:
b'*.ipynb: filter: nbstripout'
$ nbstripout < notebooks/example.ipynb
Traceback (most recent call last):
File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/bin/.nbstripout-wrapped", line 12, in <module>
sys.exit(main())
File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/lib/python3.4/site-packages/nbstripout.py", line 184, in main
write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/site-packages/nbformat/__init__.py", line 169, in write
fp.write(s)
File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/codecs.py", line 374, in write
self.stream.write(data)
TypeError: must be str, not bytes
EDIT: Seems like this might be fixed by f0056f5?
I'm loosing track where I have nbstripout in use and where not.
Could we have a little check run like:
nbstripout --is_installed
that simple returns: "nbstripout is installed in this repo" or "nbstripout is not installed in this repo" respectively?
Would be great, thanks!
Currently nbstripout installs the full path for filter.nbstripout.clean but not for diff.ipynb.textconv shouldn't it use the full path to nbstripout.py for both? This is especially relevant on windows with conda where nbstripout is not necessarily in path.
Uninstall would be helpful for people just trying things out.
I'm trying to clean the test notebooks by using a pipe but it doesn't work on Python 3.5. I just get empty result:
$ cat tests/test_metadata.ipynb | nbstripout
Running nbstripout tests/test_metadata.ipynb
correctly cleans the notebook inplace.
I tried with Python 2.7 and then it outputs the normal cleaned notebook when piping. Any ideas what could be wrong?
$ pip freeze
decorator==4.0.11
docopt==0.6.2
docutils==0.13.1
ipython==6.0.0
ipython-genutils==0.2.0
jedi==0.10.2
jsonschema==2.5.1
jupyter-core==4.3.0
nbformat==4.3.0
nbstripout==0.3.0
path.py==10.1
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.14
ptyprocess==0.5
Pygments==2.2.0
simplegeneric==0.8.1
six==1.10.0
testpath==0.3
traitlets==4.3.2
wcwidth==0.1.6
I seem to see changes for notebooks I haven't touched since I installed nbstripout. Interestingly, when I uninstall nbstripout and install it again, the changes are gone. Besides general performance issue, this prevents us from adopting nbstripout as a team.
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
... list of notebooks I haven't touched ...
----(after uninstall nbstripout and install it again)---
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.