Giter Club home page Giter Club logo

Comments (33)

kynan avatar kynan commented on August 26, 2024 1

I have to admit I wasn't aware of the Jupyter save hooks until now.

The main difference I think is that the save hook would lead to the output never being saved to disk in the first place i.e. if you were to stop the server and reopen the notebook you'd have no output.

With the nbstripout Git filter you do have the "full" notebook saved on disk, but Git ignores the output when diffing and committing.

I'm not sure I fully understand your use case of maintaining a separate (?) python script. Could you give an example?

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024 1

FWIW: A problem with using the script option is that there is no guarantee that one will be able to convert the script back to a proper notebook. The Jupyter save-hook functionality is nice, but seems to then seems to need two files on disk - one clean and one with output - and then some functionality needs to be implemented to allow these to be merged etc. (which I would love, but seems a bit complicated to implement reliably).

My current strategy is to use the VCS to store the clean notebooks, and then tack on the notebook with output as an additional leave node that could be stripped out later. One can then track differences along the clean branch, or along the output branch. As long as the workflow never merges the output branch back into the clean branch, it can always be stripped or pruned later as needed.

I still need to work with this for a while to see how easy and reliable it is though.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

OK, having read the post_save_hook example in the docs you linked to makes it a bit clearer: is the problem that your script would contain the output (since you don't run a pre_save_hook to strip it) whereas your notebook (in Git's view) wouldn't?

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Yes (and the execution counts), which would mean that the python script would look different many times and hence create git diff noise.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

I haven't used the script output much. Have done some simple tests and it seems by default the execution counts are included as comments but the output is not?

Could you use nbstripout programmatically e.g. call strip_output it in your post save hook?

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

I solved it by adapting the end of the post_save_hook to:

    with io.open(script_fname, 'w', encoding='utf-8') as f:
        for line in script.splitlines():
            if line.startswith("# In["):
                f.write("# newcell\n")
                continue
            f.write(line+'\n')

because the only thing left that could change in there where the execution counts. Yeah!

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Why would you need to be able to go from the Python script back to the notebook if you always store the notebook as well? I see the Python script only as a human readable version of the notebook that is just always being created with every save of the notebook. It has no functional use for me.

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024

@michaelaye I think I do not understand your use case - what are you using the script for? My use case was to use the clean script as the definitive version of the notebook in my VCS: if there had been a safe way to get back to a notebook, then this might have been ideal - merges etc. could be performed on the script, and then the notebook reconstructed from this. Alas, this is not supported, so I store the clean notebook and (optionally) the notebook with output in the hopes that I don't need to do any sophisticated merging (or that the nbdiff project gets back up an running my the time I need to!)

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Notebooks are my development environment. But git diffs of the json code of notebooks are almost unreadable. I store the python version of it in GIT as well for easily readable changes of the code in the notebook.

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

So I store both the notebook AND the python script, but the latter only because it leaves an easier way to track of changes then the json code of the notebook.

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024

What do you do about the output in your notebooks? Do you store that too and just not worry about the size of your repos?

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

No, I strip it either with the pre_save_hook in my jupyter config or with the pre-commit hook as offered by nbstripout.
The latter has the advantage that the content exists on disk but does not ‘exist’ for git.

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024

I understand. That is sort of what I have been doing (except not with the additional script), but now I would like to also include the output as a separate strippable branch in my VCS so that I can easily share results along with the code with collaborators.

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

So, why do you want to share created output on ur machine with ur collaborators? Is it hard for them to recreate? Is it not tantamount that they will be able to recreate, otherwise you have code that runs only on one machine?
And if it’s for information only, wouldn’t it be enough to share the PDF version of the notebook that contains the output?
git is already complicated enough for me, your ‘strippable branches’ sound like a disaster to me. ;)

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024

Sometimes the results are the end of several hours of simulations, so they are hard to create. Other times collaborators are not familliar with the code, so they just want to see the graphs. Finally, I like to have a way of backing up my output temporarily or the work of collaborators who do not know how to use VCS (especially when working on a shared machine like Sage Mathcloud).

I agree that git is already complicated enough. I only do this with mercurial now which to me is much more understandable, and it so far seems pretty straightforward.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

@michaelaye, @mforbes: any conclusions on this?

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

My conclusion is that the nbstripout use with .gitattributes is currently the only way to have both a cleaned notebook in git while keeping outputs on local disk. Plus, it offers a simple way to activate and deactivate it for a respective git repo, while using the pre-save-hook would require some Jupyter config hackery to make it work here but not there, as Jupyter does not support profiles currently.

from nbstripout.

mforbes avatar mforbes commented on August 26, 2024

I still like the idea of using nbstripout controlled by the VCS through hooks etc. Ultimately I think this should be rolled into Jupyter hooks, but not until it is clear how to manage these (i.e. dealing with profiles as Michael mentions).

from nbstripout.

kynan avatar kynan commented on August 26, 2024

Glad to hear nbstripout is a valuable complement to the Jupyter hooks. Mabye it's worth involving some of the Jupyter folks in this discussion? @minrk @takluyver @Carreau

from nbstripout.

Carreau avatar Carreau commented on August 26, 2024

After a quick read, I guess nbstripout is mostly equivalent to a convenience wrapper around:

$ jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --inplace <notebook>

You can also use the --stdin, --stdout flag too (which might only be on master). But then it becomes hard to remember all the options.

It does though provide convenient methods to enable/disable the git clean/smudge filter, which is nice.
We usually try to not get into the habits of not adding specific VCS like git into our codebase.

So i'm not sure there is anything to move into the Jupyter/Nbconvert codebase, though if you like to link to nbstripout from the documentation or nbconvert itself, I don't see any problem with that.

Also rolling that into jupyter make the release cycle of such tools slower, which might not be a good thing.

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Not really, b/c IIUC the jupyter nbconvert above will change the content on disk which we wanted to avoid? Or do you mean to put that command line somehow as a git filter itself?

from nbstripout.

Carreau avatar Carreau commented on August 26, 2024

Not really, b/c IIUC the jupyter nbconvert above will change the content on disk which we wanted to avoid? Or do you mean to put that command line somehow as a git filter itself?

You do not need to change inplace, you can add --stdin --stdout:

$ jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --stdout --stdin < Untitled11.ipynb
[NbConvertApp] Converting notebook into notebook
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],

Unless I misunderstand what you are trying to achieve. Also I might be wrong but if I remember my bash correctly you can transform a stream into a filehandle with <(...) [note the omission of stdin]:

$ jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --stdout <(cat Untitled11.ipynb)
[NbConvertApp] Converting notebook into notebook
{
 "cells": [
  {
   "cell_type": "co

But I don't expect anyone to be a Bash wizard.

Of course what I do it from the CLI, can be done with API, and yes it could be used as a git filter.
You can try to do similar thing on pre/post-save hooks, and if you want something more complex, you might want to just subclass the content manager, like ipymd, or pgcontents

Unless I'm misunderstanding the question.

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Thanks! I think you got our aim correctly, basically we want only clear output for git, while leaving things untouched on the local stored version, so that the output is still there when continuing to work on the notebook at a later time. That's kinda the best world for me, having clear git diffs and much reduced git traffic, while not having to reproduce output immediately to remember where I left off with my work.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

Sure, we could potentially refactor nbstripout into a call to the nbconvert CLI. However I'm not convinced that buys us anything. We're already using the nbconvert API and I don't think that can be much simplified.

But if one wouldn't want to install nbstripout (and doesn't care about the convenient (un)install facility ;)) one could formulate a git filter purely in terms of a call to the nbconvert CLI as sketched by @Carreau.

from nbstripout.

nehalecky avatar nehalecky commented on August 26, 2024

Nice reading this thread, thank you all for the discussions.

My conclusion is that the nbstripout use with .gitattributes is currently the only way to have both a cleaned notebook in git while keeping outputs on local disk. Plus, it offers a simple way to activate and deactivate it for a respective git repo, while using the pre-save-hook would require some Jupyter config hackery to make it work here but not there, as Jupyter does not support profiles currently.

@michaelaye: exactly what I am looking for, any way we could get a summary workflow (with examples) for how you're using nbstripoutlike this? Would be great to establish a common pattern for this for the Jupyter notebook community.

PS. I see that there are notes on nbstripout usage, however, feel that a little more prose coupled to a workflow would help in conceptual mapping. :)

from nbstripout.

michaelaye avatar michaelaye commented on August 26, 2024

Sure, just let me finish this damn NASA proposal first (deadline Thursday). Kynan could have a go at simply adding what's missing to the README and I chime in with what I think is useful to understand or what I found in my searches of a workflow.

from nbstripout.

nehalecky avatar nehalecky commented on August 26, 2024

Of course, this sounds great. @kynan, thanks again for this library, so crucial for collaborative analysis and development with Jupyter. Just awesome, really.

Please let me know how I can help.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

Thanks for the endorsement @nehalecky! I have to point out that this is based on work from @minrk and @mforbes has contributed a lot!

Would be great to have some more tests, so if you'd like to contribute some that'd be greatly appreciated!

from nbstripout.

nehalecky avatar nehalecky commented on August 26, 2024

Hi @kynan, thanks for the reply. Sorry I haven't had time to contribute, but hope to do so soonish when we start using this library over the next couple of weeks. I'm still interested to see some sample workflows, but I understand everyone's time is limited. Hope to chime back in soon! Cheers.

from nbstripout.

kynan avatar kynan commented on August 26, 2024

@nehalecky no worries! Documentation is still a bit lacking you're right. I demoed nbstripout at the PyData London meetup I co-organise on Tuesday. If I find some time I might turn this demo into a screencast, but no promises at this point.

from nbstripout.

jraviotta avatar jraviotta commented on August 26, 2024

My conclusion is that the nbstripout use with .gitattributes is currently the only way to have both a cleaned notebook in git while keeping outputs on local disk. Plus, it offers a simple way to activate and deactivate it for a respective git repo, while using the pre-save-hook would require some Jupyter config hackery to make it work here but not there, as Jupyter does not support profiles currently.

I also believe this to be true. Also, because of this issue, I can't find a way to save a version of a notebook that includes the output using nbconvert and a pre-save hook.

However, I notice that the path to nbstripout is hard-coded into <project>/.git/config. I would prefer to save the path in my global .gitconfig and put a .gitattributes file in the repo that activates the filter. That way, I can install nbstripout in my base conda environment next to jupyter lab and know that it will work on all my environments, and on other machines.

I imagine I can make that work manually, but it would be nice to have a command such as:

nbstripout --install --global --attributes .gitattributes

Am I overlooking that functionality?

from nbstripout.

kynan avatar kynan commented on August 26, 2024

Installing in your global .gitconfig is not currently supported, but should be pretty straightforward to add. Could you add this as a feature request?

from nbstripout.

kynan avatar kynan commented on August 26, 2024

Using the global .gitconfig is now supported.

from nbstripout.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.