Comments (29)
This issue arises in git generally because of how internal and external diff tools are handled. As explored here (for a similar case but with meld
rather than nbdime
as the external diff driver) external diff drivers compare the working tree file with a smudged repo file, while internal diff drivers compare the cleaned working tree file with the repo file. From that thread, it doesn't appear to be currently possible to override this in git.
A flag in nbdiff
would be great, but as a stopgap I'm wrapping nbstripout
around git-nbdiffdriver
and then using the script as the new diff driver:
#!/bin/bash
# pass arguments through to git-nbdiffdriver, but pass
# a stripped version of the working tree file
git-nbdiffdriver diff $1 $2 $3 $4 <(cat $1 | nbstripout) $6 $7
Making this file executable, adding it to the path and then adding it as a new diff driver will do the trick.
from nbstripout.
I repeated the steps from my comment above using nbdime 1.0.5 and nbstripout 0.3.4 and the difference is that nbdiff
now works straight out of the box. git diff
and nbdiff
still do show different things which is slightly confusing. For me atm, this is good enough.
from nbstripout.
Thanks for all the thoughts on this, did anyone find a solution that means git diff
works out the box as expected with both nbdime
and nbstripout
installed?
from nbstripout.
Can you be a bit more specific how you configured nbdime
and nbstripout
?
I haven't used nbdime in anger yet but did a simple test and found them to interact without any issue:
git init foo && cd foo
nbstripout --install
git-nbdiffdriver config --enable # without --global
git-nbmergedriver config --enable # without --global
# Create notebook foo.ipynb
git add foo.ipynb
git diff --cached # nbstripout kicking in, output stripped as expected
git commit -m init
# Make some changes to the notebook
git diff # nbstripout kicking in, output stripped as expected
git difftool --tool nbdime # diff shown in nbdime format, output *not* stripped
Note that in this setup nbstripout
's configuration is written to .git/info/attributes
whereas nbdime
's configuration is written to .gitattributes
.
This is the behavior I'd expect. Do you get / expect something different?
from nbstripout.
I've also tried written both configs to .gitattributes
and they don't seem to conflict. That's what I expect since nbdime
installs a diff and merge driver whereas nbstripout
installs a clean and smudge filter.
from nbstripout.
Sorry for being not precise enough here. The behaviour you are describing is what I'd expect, but not what I see. Your git diff # nbstripout kicking in, output stripped as expected
does use nbdime I assume?
For me nbstripout
isn't working anymore as soon as nbdime
is active. git diff
shows changes in a notebook using nbdime
, but outputs aren't stripped out. git add
even puts the output onto the staging area -- and git commit
commits them eventually. So that's not what I want.
As this is working for you, apparently it's a config issue on my side. I cannot spot any particularities in my setup though. It's purely repo local with gitattributes
looking like this
*.ipynb filter=nbstripout
*.ipynb diff=jupyternotebook
and .git/config
looking like this
[...]
[filter "nbstripout"]
clean = ~/bin/miniconda3/envs/dev/bin/python ~/bin/miniconda3/envs/dev/lib/python3.5/site-packages/nbstripout.py
smudge = cat
required = true
[diff "jupyternotebook"]
command = git-nbdiffdriver diff
Anyway, this being a config issue, feel free to close.
from nbstripout.
Your
git diff # nbstripout kicking in, output stripped as expected
does use nbdime I assume?
Aha, I had forgotten that I have an alias for git diff --no-ext-diff
which I normally use (and which shows a "normal" unified diff in this case). A regular git diff
does indeed use nbdime
.
Regardless, git add
should still respect the nbstripout
filter and in your commit you should not get the output. Can you confirm this?
To summarise:
- use
git diff
to get annbdime
diff with output - use
git diff --no-ext-diff
to get a diff with the output stripped bynbstripout
So what you really want is an nbdime
diff with the output stripped by nbstripout
?
from nbstripout.
So what you really want is an nbdime diff with the output stripped by nbstripout?
Yes, that's what I would like to have. It's nice to be able to use them side by side, but in the best case one could use them at the same time.
Regardless, git add should still respect the nbstripout filter and in your commit you should not get the output. Can you confirm this?
Yes I can confirm that, I just tried it. Slightly weird caveat though is that as soon as you have staged some changes, git diff --cached
will use nbdime
and show the changes including the output. This might be a Git particularity, but it's really confusing, because you would expect that whatever is staged (= whatever is shown by git diff --cached
) would be commited. But that's not the case in this setup.
from nbstripout.
OK, I'm not sure how to best get nbdime
and nbstripout
to cooperate nicely. Maybe the nbdime devs have suggestions? @vidartf @martinal @minrk
you would expect that whatever is staged (= whatever is shown by
git diff --cached
) would be commited
Does git diff --no-ext-diff --cached
not work in this case?
from nbstripout.
I'd have to look into the relationship of filters and drivers when both are active. nbdiff commands should get a flag for ignoring outputs, metadata, etc. nbshow has such flags, but nbdiff does not.
from nbstripout.
A flag in nbdime is now available (v. 0.3.0) (either use -s
/--sources
to only show source diff, or use -O
/--ignore-outputs
to exclude outputs from diff display).
from nbstripout.
Thanks, vidartf! However, I'm not sure either of these flags would solve OP's problem since neither appears to equivalent to nbstripout
, which removes execution counts, output and metadata. For example, even with the -s
flag, I'm still getting diffs for execution counts, although this may be a bug. Are these flags documented anywhere? I couldn't find much.
from nbstripout.
I don't think the flag will solve the problem on its own, but it should prevent the need for the stopgap diff driver wrapper.
The execution count is not really either source, metadata or output, so it got left out of the initial fix. It could either be included with metadata flag, or get its own flag (which could also include other uncovered fields like nbformat
/nbformat_minor
). The only documentation for the flags currently is the CLI --help
text.
from nbstripout.
Maybe I'm missing something, but since nbdime
flags aren't equivalent to nbstripout
, won't the diff given by nbdime
still be incorrect? (That is, not reflective of what will be committed because of nbstripout
filter.)
from nbstripout.
Any update on this @timtroendle et. al.?
from nbstripout.
@kynan apologies for being silent for quite a while. I am not using notebooks at the moment, gave it a quick try, but cannot make nbstripout
work right away in a fresh install: I actually don't see any diff at the moment. Not sure what's wrong but I don't have the time at the moment to dig deeper.
from nbstripout.
I've just opened a PR on nbdime (jupyter/nbdime#335) which seeks to ease integration with git filters. One problem that cropped up is that git diff drivers / tools cannot reliably determine whether it is diffing a blob or a file from the working tree, i.e. if it should apply the filters or not. Due to this, it needs to be hidden behind a flag (that defaults to off). For nbdiff
with git references it should work as expected. See the PR for details. Any feedback is welcome!
from nbstripout.
@vidartf I can see your PR was merged.
@timtroendle Does this resolve your issue?
from nbstripout.
I am afraid to say that this doesn't seem to solve the issue -- at least not with my setup and without manual tweaking. Here is what I do:
- I have
nbdime 1.0.2
andnbstripout 0.3.3
installed - I have a fresh Git repo with one commit including a notebook and a change in the notebook
git diff
shows me execution count and output and no nbdime magic -- as expected- I run
nbdime config-git --enable
and now the diff shows me the nbdime magic, but still including execution count and output -- as expected - I run
nbstripout --install
and now the diff shows me no output or execution count, but no nbdime magic either
Instead, I'd hope to see the nbdime
diff but without anything that is stripped out by nbstripout
.
from nbstripout.
Thanks for the update. I hope to find time to take a look.
from nbstripout.
Instead, I'd hope to see the nbdime diff but without anything that is stripped out by nbstripout.
That is what you should be seeing. This seems to me as a config issue, as nbstripout also configures diff drivers. Is there a difference if you call nbdiff
instead of git diff
? What if you perform manual config for nbstripout, but skip the diff parts?
from nbstripout.
Is there a difference if you call
nbdiff
instead ofgit diff
?
nbdiff throws an error:
FileNotFoundError: [Errno 2] No such file or directory: '"~/miniconda3/envs/nbdime/bin/python3.6"': '"~/miniconda3/envs/nbdime/bin/python3.6"'
(The file exists.)
What if you perform manual config for nbstripout, but skip the diff parts?
If I skip the last line of the manual config (*.ipynb diff=ipynb
), git diff shows me nbdime, but nothing stripped out. If I do add the line, git diff shows me everything stripped out, but not nbdime.
However, if I run nbdiff
without that last line, I do see exactly what I was looking for: output is stripped, and the diff is shown as per nbdime. Also, committing the changes does consider nbstripout.
So this is almost what I think would be great, but still has confusion from the difference in what git diff
shows and what gets committed eventually (which is what nbdiff
shows as well).
from nbstripout.
nbdiff throws an error:
Would you mind including the full stack trace? You can put it in a block like this to make it collapsible (nice github trick):
<details><summary>Full trace</summary>
<pre>
-- put your trace here --
</pre>
</details>
Full trace
-- put your trace here --
Could you try to add the argument --use-filter
to the git diff driver config for nbdime?
from nbstripout.
PS: Please note the caveat in this comment in nbdime for why --use-filter
is not enabled by default.
from nbstripout.
Here's the stack trace when running nbdiff
in a repo set up automatically as described in my post from August 6.
Full trace
Traceback (most recent call last):
File "/Users/trtim/miniconda3/envs/nbdime/bin/nbdiff", line 11, in
sys.exit(main())
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/site-packages/nbdime/nbdiffapp.py", line 128, in main
return main_diff(arguments)
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/site-packages/nbdime/nbdiffapp.py", line 39, in main_diff
for fbase, fremote in changed_notebooks(base, remote, paths):
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/site-packages/nbdime/gitfiles.py", line 160, in changed_notebooks
entry.b_path, entry.b_blob, ref_remote, repo_dir)
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/site-packages/nbdime/gitfiles.py", line 110, in _get_diff_entry_stream
ret = apply_possible_filter(path)
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/site-packages/nbdime/vcs/git/filter_integration.py", line 80, in apply_possible_filter
stderr=STDOUT, shell=USE_SHELL
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/Users/trtim/miniconda3/envs/nbdime/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '"/Users/trtim/miniconda3/envs/nbdime/bin/python3.6"': '"/Users/trtim/miniconda3/envs/nbdime/bin/python3.6"'
from nbstripout.
@timtroendle I opened a separate issue for the exception here: jupyter/nbdime#415
from nbstripout.
@vidartf @timtroendle I can see jupyter/nbdime#415 was fixed in 1.0.3.
Does that solve the issue? Can we close this?
from nbstripout.
Thanks for following this up. Feel free to open issues on the nbdime repo if there are any remaining actionable steps to take to further improve the cooperation between these tools! :)
from nbstripout.
I've not tried this in a while. @chris-rands could you provide reproduction steps i.e. what are you doing? What are you seeing? What would you expect / hope to see?
Ideally file a new issue for that.
from nbstripout.
Related Issues (20)
- `--dry-run` should exit non-0 if files would be updated HOT 1
- Should be agnosting on trailing blank lines HOT 9
- [Feature Request] Process Folders (Batch / Bulk) HOT 1
- Doesn't strip out pycharm metadata HOT 9
- New release HOT 2
- Replace cram with prysk HOT 2
- Read config from `setup.cfg` HOT 1
- Option to error on cell outputs exceeding `--max-size` HOT 4
- It is recommended to remove pytest-runnner from setup_requires in setup.py HOT 4
- Strip output_type=stderr only, with keep_output? HOT 4
- Possible nbstripout-fast integration HOT 6
- Prevent committing notebooks with errors in cell outputs HOT 6
- Specifying Python executable path in `nbstripout --install`
- 'nbstripout' is not recognized as an internal or external command, operable program or batch file. HOT 2
- Not compatible with `pre-commit-hooks/pretty-format-json` hook HOT 4
- `git config filter.nbstripout.extrakeys ` support for `attachments`? HOT 4
- Support setting defaults for command line arguments via git config
- No valid notebook detected HOT 4
- required = true by default or make doc more explicit about it HOT 5
- Support git-filter-repo HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nbstripout.