Giter Club home page Giter Club logo

Comments (6)

kynan avatar kynan commented on July 23, 2024

By its nature, a git filter can only act on the commit about to be made. It's not designed to "spawn" another commit in a different branch (I suppose it could be made to do that with some trickery, but I rather wouldn't like to implement such a "hack" in nbstripout).

You can however emulate the behaviour you want by installing nbstripout only in a .gitattributes file in your release branch. Assuming your "dirty" notebooks are on master, you can create a clean branch as follows:

Create an orphan release branch and install nbstripout in that branch only:

git checkout --orphan release
nbstripout --install --attributes=.gitattributes
git add .gitattributes
git commit -m 'Install nbstripout'

You can then cherry-pick notebook commits as follows (you will probably have to do this in order to avoid merge conflicts, unless each commit is entirely self contained):

git cherry-pick --no-commit
git commit -a --no-edit

From my quick test those 2 stages are necessary for the filter to kick in i.e. if you just do a plain cherry pick the filter is not applied.

Hope this helps!

from nbstripout.

kynan avatar kynan commented on July 23, 2024

Presumably, a variant of nbstripout could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?

Not easily: as mentioned, npstripout uses a git clean/smudge filter and operates purely on the file level. No cells are ever executed.

You would need to look at a pre-commit hook, however I expect that's not too easy to set up: you'd need to start a notebook server, run the notebook and deal with failures. This would also take very long.

If you only want to verify the output is populated, that's easier to do (and you could potentially reuse some of nbstripout's code for that).

from nbstripout.

psychemedia avatar psychemedia commented on July 23, 2024

@kynan Thanks for that - will give it a try. git is still a bit voodoo to me; I need to clear some time and try to get a proper understanding of how it works and also clarify in my own mind exactly what sort of process I want to implement.

For generating newly run notebooks, could that be done elsewhere in a Github managed repository, eg using CI hooks to run something to create the new notebooks? (Apols - this is going off-topic for nbstripout, I'm thinking aloud through my fingers...)

from nbstripout.

kynan avatar kynan commented on July 23, 2024

There's another option I didn't think of earlier: you can use the git filter-branch approach described in the README.

By creating "new" notebooks, do you mean creating stripped versions from "full" versions? Or the other way round?

You presumably could use CI hooks to automate either variant, but I don't have anything to suggest since I haven't tried anything of that sort.

If you haven't come across https://mybinder.org before, I wonder if that could be a starting point.

from nbstripout.

kynan avatar kynan commented on July 23, 2024

@psychemedia have you found a suitable workflow for your needs?

from nbstripout.

psychemedia avatar psychemedia commented on July 23, 2024

@kynan I've actually moved to a workflow around jupytext now that uses a text based representation for notebooks (no cell outputs).

Reflecting back, I think that a git filter-branch --tree-filter approach would probably work okay for a release: create new branch, run the git filter to clean all the notebooks in it, commit.

Here's another example of that approach: rewriting the contents of a branch as text files using jupytext; in a branch, run:

git filter-branch --tree-filter 'jupytext --to md */*.ipynb && rm -f */*.ipynb' HEAD

from nbstripout.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.