Giter Club home page Giter Club logo

nbstripout's Issues

Installed filter fails on Windows

Both
sys.executable
and
path.abspath(__file__)
give strings with '' in the path. Git (version 1.9.4) however only interprets paths with '/' correctly, so when adding an ipynb file one encounters a " command not found" error. I have locally patched this with a
.replace('\\', '/')
but I'm not convinced that this is a safe transformation for path and file names on Linux.

ENH: Provide way of marking cells to keep output.

In many of my notebooks I have an initialization cell who's output provides important formatting information (such as MathJaX macro definitions). It would be nice to be able to somehow signal nbstripout to keep certain cells.

This should probably use some sort of cell metadata, but perhaps could use some sort of pragma:

# nbstripout: keep

for example.

Usage vs. Jupyter save hook

What is the advantage of doing this vs just having Jupyter let do it as described in the docs?

Maybe the fact that one can switch it on per repo and not generally for all Jupyter notebook activities?

If that is the reason, how can one reliably implement the concept of also version-controlling the python script version of the notebook using nbstripout? Because the trouble I'm seeing is that if I only let Jupyter do the python script generation and have nbstripout working as a pre-commit hook, then the script conversion will suffer from creating git diff noise because the pre-commit hook is run later, not when I save the notebook. So one would have to include the script generation somehow into the nbstripout tool I guess to have this possibility working well?

uninstall fails

v: 0.2.8 from pip
Python 3.5 using most recent miniconda on Mac 10.11.6

(stable) └─❱❱❱ nbstripout --uninstall                                  +5357 16:31 ❰─┘
Traceback (most recent call last):
  File "/Users/klay6683/miniconda3/envs/stable/bin/nbstripout", line 11, in <module>
    sys.exit(main())
  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 248, in main
    sys.exit(uninstall(args.attributes))
  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
    f.write(''.join(lines))
ValueError: I/O operation on closed file.

Missing trailing newline

Line 170 of nbstripout.py reads as f.write('\n*.ipynb filter=nbstripout'). With the context manager on line 169, this appends the filter to .gitattributes without a trailing new line character.

This might be a matter of opinion, but it would be nice to have a trailing new character, especially because this line is being appended to the end of the file. Also, since the script includes from __future__ import print_function, this line could be rewritten as print('\n*.ipynb filter=nbstripout', file=f) to accomplish this.

Creating a branch with notebook outputs stripped

Is there a way of using nbstripout that would allow me to create a branch of cleaned notebooks from a branch that contains notebooks with populated output cells (eg ones with output cells populated that can be used for testing with nbval).

I'm thinking of a private github repo workflow where there is a testing-master branch containing executed notebooks with populated test output cells that begets a release branch containing notebooks that can be zipped and distributed to students.

Presumably, a variant of nbstripout could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?

Add tests

Some simple unit tests to make sure we don't accidentally break stuff.

it doesn't work automatically and show nothing when running "git diff"

I follow the demo from youtube as follow.

(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test  git init
Initialized empty Git repository in /Users/zane/Desktop/test/.git/
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  git add Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ✚  git commit -a
[master (root-commit) 8bac162] qq
 1 file changed, 51 insertions(+)
 create mode 100644 Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  nbstripout --install
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master  nbstripout --status
nbstripout is installed in repository b'/Users/zane/Desktop/test'

Filter:
  clean = b'"/Users/zane/Desktop/one_shot_face_recognition_env/bin/python3" "/Users/zane/Desktop/one_shot_face_recognition_env/lib/python3.5/site-packages/nbstripout.py"'
  smudge = b'cat'
  required = b'true'
  diff= b'nbstripout -t'

Attributes:
  b'*.ipynb: filter: nbstripout'

Diff Attributes:
  b'*.ipynb: diff: ipynb'
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ●  git add Untitled.ipynb
(one_shot_face_recognition_env)  zane@LZANELI-MB1  ~/Desktop/test   master ✚  git diff --cached

But after running git diff --cached I can not see anything, it just empty. After running nbstripout --uninstall, git diff --cached work fine as usual.

It does not strip output until I run nbstripout Untitled.ipynb explicitly.

Support installing to .gitattributes?

Hi, this is a great tool, thanks! I suppose currently nbstripout install installs to .git/info/attributes. Would it be possible to support installing to .gitattributes with some flag? For instance, nbstripout install --attributes=.gitattributes or something better. Of course, one can just run mv .git/info/attributes .gitattributes but still it'd be just a nice little polishing to support both installing options. What do you think?

Metadata python version attribute

The metadata["language_info"]["version"] = ... causes merge conflicts when using different python versions. While this has no affect on the workings on the notebook, can we also stripout this attribute?

pip 0.3.0 version is outdated

I found that the pip version doesn't have -t switch so had to use the git version.

$ nbstripout -t my_notebook.ipynb
usage: nbstripout [-h] [--install] [--uninstall] [--is-installed] [--status]
[--attributes FILEPATH] [--version] [--force]
[files [files ...]]
nbstripout: error: unrecognized arguments: -t
$ pip install --upgrade nbstripout
Requirement already up-to-date: nbstripout in /Users/jinyoung.kim/Package/Conda36/anaconda/envs/py27/lib/python2.7/site-packages
$ nbstripout --version
0.3.0

The clean filter fails if there is whitespace in the paths

The clean filter fails if there is whitespace in the paths to python and/or the nbstripout.py script.

C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py: C:/Program: No such file or directory
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed -1
error: external filter 'C:/Program Files/Anaconda3/python.exe C:/Program Files/Anaconda3/lib/site-packages/nbstripout.py' failed
fatal: example.ipynb: clean filter 'nbstripout' failed

Add stripping for associated .py files

Could you please add a feature for stripping associated py files?
In py files there are lines
# In[XXX]:
which should be stripped back to
# In[ ]:

It would be immensely helpful to have this feature!

Thanks,
Andrew

`nbstripout --install` fails in Windows Command Prompt

I'm running Enthought Canopy Python on Windows 10, and when running nbstripout --install on a newly created git repo, I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\scripts\nbstripout.exe\__main__.py", line 9, in <module>
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 254, in main
    sys.exit(install(args.attributes))
  File "c:\users\bdforbes\appdata\local\enthought\canopy\user\lib\site-packages\nbstripout.py", line 160, in install
    git_dir = check_output(['git', 'rev-parse', '--git-dir']).strip()
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 710, in __init__
    errread, errwrite)
  File "C:\Users\bdforbes\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.5.3123.win-x86_64\lib\subprocess.py", line 958, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

Everything works fine in the MINGW64 git bash terminal that is installed with git on Windows. This may be an issue for people who use the Windows Command Prompt for their git work and would like to use this helpful package.

Reference:

https://stackoverflow.com/a/35670418/336001

nbstripout --uninstall fails

$ nbstripout --uninstall
Traceback (most recent call last):
  File "/lscr_paper/allan/miniconda3/envs/msth/bin/nbstripout", line 11, in <module>
    sys.exit(main())
  File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 248, in main
    sys.exit(uninstall(args.attributes))
  File "/lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py", line 191, in uninstall
    f.write(''.join(lines))
ValueError: I/O operation on closed file.

I've added a few comments to the file, and they disappeared along with the line added by nbstripout --install, so I guess the file was overwritten with a blank file.

The problem happens on Linux running Bash and on OS X running Zsh.

I'm using the Anaconda Python Distribution, Python 3.5.2, inside a conda environment, and I installed using pip install nbstripout.

Only output should be kept for keep_output and init_cell

Currently, nbstripout ignores keep_output and init_cell cells entirely. However, this means that lots of stuff beyond the output stays, such as execution_count. I think keeping this does not meet the design goals, and it can cause Git conflicts.

Feature request: Treat these two types of cells normally, except keep the output.

Pip install on Anaconda doesn't work

Running on Anaconda for windows results in the following:
C:\Anaconda>pip install nbstripoutput
Collecting nbstripoutput Could not find a version that satisfies the requirement nbstripoutput (from ve rsions: ) No matching distribution found for nbstripoutput
Any idea how I could get this to work? Thanks!

Stripping other metadata (execute time)

I'm using the (hugely useful) Execute Time nbextension, which produces cell metadata like this:

{
 "cell_type": "code",
 "execution_count": null,
 "metadata": {
  "ExecuteTime": {
   "end_time": "2016-08-15T11:49:40.736068",
   "start_time": "2016-08-15T11:49:40.145965"
  },
  "code_folding": [],
  "collapsed": false
 },
 "outputs": [],
 "source": []
}

But naturally I end up with a load of changes seen by git after running nbstripout because of re-running the cells and updating the ExecuteTime field. Would you accept a pull request explicitly stripping this out, or alternatively one which strips all but whitelisted metadata entries?

Not working across OS X and Linux?

Doesn't seem to be working accross OS X and Linux? Path to Python seems to be hardcoded – any suggest workarounds will be greatly appreciated :)

I often work on a remove server, sometimes issuing command in the ssh terminal (Linux, Bash), and at other times in my local terminal (OS X, Zsh), since I also mount the relevant folders using sshfs.

nbstripout seems to be working fine on the platform on which run nbstripout --install, but crashes on the other (and crashes commands such as git status and git diff). The behavior is the same whether I run the install-command on Linux and test on OS X, or vice versa.

Example output (from Linux, install-command on OS X):

$ git status
/Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: 1: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py: /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python: not found
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /Users/allan/homeInstalled/miniconda3/envs/py35/bin/python /Users/allan/homeInstalled/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed

Example output (from OS X, install-command on Linux):

(py35) ❯ git status
/lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py: /lscr_paper/allan/miniconda3/envs/msth/bin/python: No such file or directory
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed -1
error: external filter /lscr_paper/allan/miniconda3/envs/msth/bin/python /lscr_paper/allan/miniconda3/envs/msth/lib/python3.5/site-packages/nbstripout.py failed
fatal: tryLoadData/tryLoadData.ipynb: clean filter 'nbstripout' failed

Screencast

I've had a go at recording a screencast. Not quite happy with it yet, I think it could be much shorter. Any suggestions welcome!

nbstripout crashes when input is empty

Steps to reproduce

$ cat /dev/null | nbstripout

Expected results

Zero bytes of output; exit code 0.

Actual results

Stack trace; exit code 1. E.g.:

Traceback (most recent call last):
  [...]
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  [...]
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: ''...

Comments

Version 0.3.1 on OS X, installed with Conda.

This make it harder to recover from bad Git commits, e.g. caused by #55, because Git won't do anything unless its filters exit successfully. One must disable the filter, untangle things, and then re-enable it.

Thank you!

Apply retroactively?

Hi, sorry if this is already possible, but it would be great to apply this retroactively to a repository where I have already committed notebook files with output. I would think this would involve some kind of filter-branch wizardry. Is it already possible? Is this a feature you would want added?

encoding issues

Hi,

When stripping a notebook that contains unicode I get an error if I use nbstripout as a filter, piping to a file, e.g. cat test.ipynb | nbstripout > out.ipynb, git also fails

File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbstripout-0.2.2-py2.7.egg/nbstripout.py", line 176, in main
    write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
  File "/Users/gregor/anaconda/lib/python2.7/site-packages/nbformat/__init__.py", line 169, in write
    fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 20464: ordinal not in range(128)

then sys.stdout.encoding is None (which defaults to 'ascii' codec)

As a workaround I found to modify the last few lines of nbstripout.py:

        import codecs
        sys.stdout = codecs.getwriter('utf8')(sys.stdout)
        write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)

(according to this stack overflow entry, but I am not sure if this is really a correct solution )
I am on os x, anaconda python 2.7 with nbformat 4.0.1

Gregor

PS thanks for your work, I really like nbstripout install

git diff includes output when used in combination with nbdime

I'd like to combine nbdime and nbstripout when working with Git repositories: (in some cases) I don't want the output to be committed, but I would still like to use the nicer diff of the input cells provided by nbdime. Simply installing both does not work: nbdime always seems to take over meaning that the output isn't stripped out anymore when nbdime is active.

Maybe there is a simple configuration solution to that, but I am not familiar enough with the Git plugin architecture to come up with any.

I am using nbstripout 0.2.9, nbdime 0.1.0, and Git 2.11.0 on a Mac.

Performance issue

Everytime i go into a repo with lots of notebooks, it takes several seconds before i get my prompt back, which can be annoying...
I'm wondering if the official PreProcessor of the notebook tools is faster than your manual filtering?

Programmatically run nbstripout and output new file.

I am trying to run nbstripout programmatically within a Python script and create a new file from the input file (and not in-place). The documentation doesn't have this use case covered. Is there a way I can do it?

nbstripout install fails

Am I using it wrong?

(py35)
klay6683 at macd2860 in ~PYTHONPATH/planet4 (master●)
$ nbstripout install
Traceback (most recent call last):
  File "/Users/klay6683/miniconda3/envs/py35/bin/nbstripout", line 9, in <module>
    load_entry_point('nbstripout==0.2.0', 'console_scripts', 'nbstripout')()
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 127, in main
    sys.exit(install())
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/site-packages/nbstripout-0.2.0-py3.5.egg/nbstripout.py", line 111, in install
    attrfile = path.join(git_dir, 'info', 'attributes')
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/posixpath.py", line 89, in join
    genericpath._check_arg_types('join', a, *p)
  File "/Users/klay6683/miniconda3/envs/py35/lib/python3.5/genericpath.py", line 145, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components
(py35)

Consider stripping out only IMAGES as option

Frequently the output is useful as informal "testing" of results,
and there is very little overhead in keeping non-graphical results.
Images, however, add considerable unwanted bulk to a commit
for any version control system (unless those images are very
expensive to reproduce, or are historical for some reason).

Proposal: provide an option to strip out only output cells with images.

Differences in handling of "collapsed" metadata?

I have something weird going on with the status of the "collapsed": true (and false) lines in a git diff of a .ipynb file.

I did a commit, nbstripout did its work, and I can see that what was commited contains "metadata": { "collapsed": false }, (as seen with git diff HEAD^ as well as on the remote Gitlab repo).

I pulled this commit on another computer. Now when I do a git diff, I get this:

    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [

So the current state of the file is correct, but it thinks the repository contains this line as "collapsed": true, even though when I do git diff HEAD^ i can see that commited line contains "false". Doing git checkout . does not change anything.

Is it possible that nbstripout's handling is not symmetric in committing/checkout vs. status and diff?

P.S. On both computers I'm using Anaconda (Python 3.5.2), with nbstripout 0.2.9 from conda-forge. Config:

$ nbstripout --status

nbstripout is installed in repository b'/path/to/my/repo'

Filter:
  clean = b'/path/to/anaconda3/bin/python /path/to/anaconda3/lib/python3.5/site-packages/nbstripout.py'
  smudge = b'cat'
  required = b'true'

Attributes:
  b'*.ipynb: filter: nbstripout'

TypeError with python 3.4 and nbformat 4.0.1

$ nbstripout < notebooks/example.ipynb 
Traceback (most recent call last):
  File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/bin/.nbstripout-wrapped", line 12, in <module>
    sys.exit(main())
  File "/nix/store/n3mk6yx1gcsjxf8y8y3c4pinwdmysqj5-python3.4-nbstripout-0.2.4/lib/python3.4/site-packages/nbstripout.py", line 184, in main
    write(strip_output(read(sys.stdin, as_version=NO_CONVERT)), sys.stdout)
  File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/site-packages/nbformat/__init__.py", line 169, in write
    fp.write(s)
  File "/nix/store/bimqljy1xyw9j1rfn09dn53avjkzdrzj-python3-3.4.4-env/lib/python3.4/codecs.py", line 374, in write
    self.stream.write(data)
TypeError: must be str, not bytes

EDIT: Seems like this might be fixed by f0056f5?

Can we have a `--is_installed` check?

I'm loosing track where I have nbstripout in use and where not.
Could we have a little check run like:
nbstripout --is_installed
that simple returns: "nbstripout is installed in this repo" or "nbstripout is not installed in this repo" respectively?
Would be great, thanks!

Install full path for diff.ipynb.textconv

Currently nbstripout installs the full path for filter.nbstripout.clean but not for diff.ipynb.textconv shouldn't it use the full path to nbstripout.py for both? This is especially relevant on windows with conda where nbstripout is not necessarily in path.

Needs uninstall

Uninstall would be helpful for people just trying things out.

Piping not working on Python 3.5

I'm trying to clean the test notebooks by using a pipe but it doesn't work on Python 3.5. I just get empty result:

$ cat tests/test_metadata.ipynb | nbstripout 

Running nbstripout tests/test_metadata.ipynb correctly cleans the notebook inplace.

I tried with Python 2.7 and then it outputs the normal cleaned notebook when piping. Any ideas what could be wrong?

$ pip freeze
decorator==4.0.11
docopt==0.6.2
docutils==0.13.1
ipython==6.0.0
ipython-genutils==0.2.0
jedi==0.10.2
jsonschema==2.5.1
jupyter-core==4.3.0
nbformat==4.3.0
nbstripout==0.3.0
path.py==10.1
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.14
ptyprocess==0.5
Pygments==2.2.0
simplegeneric==0.8.1
six==1.10.0
testpath==0.3
traitlets==4.3.2
wcwidth==0.1.6

nbstripout installation causes incorrect 'git status' results

I seem to see changes for notebooks I haven't touched since I installed nbstripout. Interestingly, when I uninstall nbstripout and install it again, the changes are gone. Besides general performance issue, this prevents us from adopting nbstripout as a team.

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
... list of notebooks I haven't touched ...

----(after uninstall nbstripout and install it again)---

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.