microsoft / gather Goto Github PK

View Code? Open in Web Editor NEW

526.0 13.0 38.0 10.7 MB

Spit shine for Jupyter notebooks 🧽✨

Home Page: https://microsoft.github.io/gather

License: MIT License

TypeScript 48.19% CSS 2.87% JavaScript 0.46% Jupyter Notebook 48.41% Shell 0.06%

jupyterlab-extension

gather's Issues

Discuss binder integration

Moving the Binder conversation here so that we're not just chatting in a closed PR :-)

re: repository size, it's OK to include extra cruft up to a certain point. If it's "can I include a bunch of notebooks that might be big-ish" that's totally fine. If it's "I want to include a 500MB dataset in my repository" that might be more of a performance hit. The biggest time-sink in Binder is the time it takes to "Docker pull" onto a node when somebody launches a repository for the first time on that node. After that, Binder uses the docker cache within the same node and it is much faster.

re: this repository or another repository, it can be whatever you'd like! Many people just use whichever repository they use for their documentation.

Is it still supported for Jupyter 4 and above

Need to know if it can still be used.

Dependencies of class properties are not gathered

Describe the bug
When a function references dependencies of class properties set either in the class constructor or class functions, those dependencies are not gathered.

To Reproduce
Steps to reproduce the behavior:

Build and run Gather extension.
Upload this notebook to Jupyter.
Execute all cells.
Try gathering the outputs for each case.

Expected behavior
Dependencies like variable declarations and module imports should be gathered.

Screenshots

Style: Use a formatting standard

Currently the TypeScript doesn't have a style guide, much less a consistent style. Using one could make it easier for contributors to know what style to use when contributing code. Consider using standard.

Parser does not correctly set this.indents = [0] when this.indents is undefined

Describe the bug
In python3.jison on line 69, this.indents is incorrectly set. This corresponds to line 1064 in the generated python3.js file.

if (this.indents == undefined) this.indents == [0]; // should be this.indents = [0]

This causes the parser to throw a TypeError: Cannot read property 'length' of undefined at Object.anonymous [as performAction] on the next line when the parser attempts to access this.indents.length:

if (this.indents.length > 1) {

Change Variable Highlighting Color

The light blue and pink highlighting that Gather uses look great in the default grey+white Jupyter Lab theme, but highlighting REALLY highlights when you are in Dark mode or the Material Darker theme. Is there a way to tone things down that I am missing? Thanks!

Lightweight API for adding new `gather` commands

In Issue #16, @micahjsmith proposed a new type of gather functionality. nbgather could be made more extensible to let others build new plugins like that one, which do new things with the gathered cells.

I imagine the interface for adding would involve a registerGatherCommand function, where the caller provides a command title, an icon, and a callback that will get triggered and provided with the gathered cells.

Let's use this issue to track interest in having an API for registering new gather commands, and design discussions about what that API would look like.

Cleanup: Remove Jupyter notebook install option and source code

It's likely that installing the Jupyter notebook extension will be a pain for those trying out the demo, given that it will only work in a narrow band of Jupyter notebook versions. We may want to remove the Jupyter notebook extension as an option from the README, removing the nb directory of code, and only let people install the Jupyter Lab version.

If this is the case, we should leave a note in the README referring readers to a previous version of the project that included the Jupyter notebook implementation.

Gather API

I'm not clear as to how gather actually works, whether it does a static analysis of a notebook, or whether it looks at cell execution / state history and then tries to identify the executed cells that influenced the execution of a given cell.

If it's a static analysis, given a notebook ipynb file, and an identified code cell, whether "code cell 5 of 10" or a code cell identified by a particular cell tag, it would be useful to be able to call something like gather("mynotebook.ipynb", code_cell=5) or gather("mynotebook.ipynb", code_id="final_chart") and return gather code cells with the identified code cell as the last cell.

Add optional, opt-in logging

Describe the solution you'd like
It would be great if we could collect usage data from people who are using code gathering tools in their work. This could help us decide on future improvements to this tool, and collect data that could be shared other researchers and tool builders build better notebook tools.

Data collection would be strictly opt-in, and would be off by default.

Additional context
To do this, we would need the following:

Fill out relevant institutional paperwork
A non-intrusive, easy-to-dismiss popup that appears when someone uses the tool, that asks if folks want to share their usage data with us, with a description of the data that will be collected, anonymization of the data, and research goals
Persistent storage of a user's choice. Use this choice to decide whether to report log events
A server (e.g., notebooks.berkeley.edu) that receives log events over a secure server

Some of the events we might want to collect are:

when code gathering events get invoked, how large are the slices? Where in the notebook do they come from? Are these slices opened in notebooks, scripts, or as cells in existing notebooks?

Docs: Add getting-started docs

Is your feature request related to a problem? Please describe.
Have enough documentation that someone can figure out:

why they should use this tool
how to install it
an easy case where they can try it out
how to use each of the major functions (including gather to notebook, gather to cells, and gather to script)
a real-world case where they can see it working for a complex problem
a sense of how it works (so people can get a better conceptual model for what will happen when they gather code in complex notebooks)

Describe the solution you'd like
We should have a README that describes the purpose of the tool, installation instructions, GIFs of using the major functions and of a more complex scenario, and a brief write-up about the implementation of the tool.

Describe alternatives you've considered
If pressed for time, we could link to the video figure and paper. We may also want to put this information on a github.io page or a project webpage instead of in the README.

Additional context
N/A

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

Slicing: Fails to include import for symbol used in method

Describe the bug

When a symbol is imported then used within a method, the import is not included even if the method is included in the slice. Originally reported by @barik.

To Reproduce

Here's an example of the part of a notebook where this bug occurs.

Expected behavior

import re should also be highlighted.

Desktop (please complete the following information):

OS: OSX Mojave
Browser: Chrome 73.0.3683.103

Dependencies of lambda functions with >0 args are not gathered

Describe the bug
UPDATE: After experimenting further it seems that program slicing fails specifically on a lambda function which accepts at least 1 argument.

To Reproduce
Steps to reproduce the behavior:

Build and run Gather extension.
Upload this notebook to Jupyter.
Execute all cells.
Try gathering the definitions and outputs for each case.

Expected Behavior
Dependencies of lambda functions should be gathered.

Screenshots

JupyterLab 2.2.0 not supported

My JupyterLab is version 2.2.0, the latest one.

I run the command

jupyter labextension install nbgather

The error looks like

An error occured.
ValueError: The extension "nbgather" does not yet support the current version of JupyterLab.

Conflicting Dependencies:
JupyterLab              Extension      Package
>=2.2.0 <2.3.0          >=1.2.0 <2.0.0 @jupyterlab/application
>=2.2.0 <2.3.0          >=1.2.0 <2.0.0 @jupyterlab/apputils
>=4.2.0 <4.3.0          >=3.2.0 <4.0.0 @jupyterlab/coreutils
>=2.2.0 <2.3.0          >=1.2.0 <2.0.0 @jupyterlab/docmanager
>=2.2.0 <2.3.0          >=1.2.0 <2.0.0 @jupyterlab/fileeditor
>=2.2.0 <2.3.0          >=1.2.1 <2.0.0 @jupyterlab/notebook
>=2.2.0 <2.3.0          >=1.2.0 <2.0.0 @jupyterlab/rendermime

Add a "Clear History" action

Is your feature request related to a problem? Please describe.
A user might want to clear the history of a notebook, e.g., if they executed a cell with some sensitive data that they don't want to have stored to the notebook file.

Describe the solution you'd like
Add an action to the interface that lets someone "Clear History". This would then reset the execution history log, and make sure that any history metadata saved with the notebook is emptied.

Describe alternatives you've considered
Currently, an analyst could open up the ipynb file on their own, and delete the metadata that includes the execution history.

Additional context
One additional benefit of this is reducing storage space for notebooks with very long histories.

Function and class definition textSliceLines are incorrectly computed

Describe the bug
Originally reported in #23 for classes, but predates #28
The line after a class or function definition is included twice in the gathered program.

Here's an example illustrating the problem for funcdefs after executing the following program as a single cell in Jupyter:

def foo():
    print("Hello") 
def bar(): 
    foo()
bar()

The gathered program looks like this:

def foo():
    print("Hello")
def bar():
def bar():
    foo()
bar()
bar()

Here's another example illustrating the problem for classes:

class Foo():
    class Bar():
        pass
v=1
Foo().Bar()

The gathered program looks like this:

class Foo():
    class Bar():
        pass
v=1
v=1
Foo().Bar()

Additional information
I'm somewhat out of my depth with debugging the parser, but it seems like the problem lies with the ILocation object that the parser returns, since for class and funcdefs, loc.last_column === 0 and loc.last_line is the true last line incremented by one. textSliceLines is then incorrectly computed based on the values of last_column and last_line in cellSlice.ts--in this case, lines 3 and 5 are included twice. I've added unit tests that illustrate this behavior here.

If I'm not mistaken, this means a fix will involve modifying the parser to return an ILocation object whose last_column and last_line properties have accounted for the dedent. Alternatively, the cellSlice.getTextSlice logic could be updated.

Bump to support Jupyter Lab API 1.0

When setting up a development environment for nbgather, linking the extension fails because the Jupyter Lab package versions in package.json are not compatible with the version of Jupyter Lab that is now downloaded by pip by default.

The project should be refactored to use APIs from the Jupyter Lab project version >=1.0.

Notebook instead of lab

This extension looks pretty good and will be very useful for many cases. But is it possible to use it directly with simple jupyter notebook instead of jupyter lab?

jupyterlab-extension topic for discoverability

Today I learned about this cool extension thanks to @trallard!

How about adding the jupyterlab-extension topic so it would show up in https://github.com/topics/jupyterlab-extension?

Revision browser doesn't work in Jupyter Lab version

Integration simply hasn't been implemented.

Icons for gathering not appearing in version browser

Describe the bug

The icons for "Open in notebook" and "Copy to clipboard" in the version browser are not appearing.

To Reproduce

Click on a variable or output, then click on "Gather to... Version Browser". You should be able to see that the icons are missing on the gather buttons.

Expected behavior

The icons should appear on the buttons, like so (screenshot taken from the version of gather currently hosted on BinderHub):

Additional context

This problem was probably introduced when gather started using Jupyter Lab libraries version >1.0 and the paths to the Jupyter icons changed. I already fixed the paths for the icons for the buttons in the Jupyter notebook toolbar in this commit, so we probably just need to point the CSS classes of the buttons in the version browser interface to the classes defined in that commit.

jupterlab-extionsion fails due to HTTP Error 500

Describe the bug
A clear and concise description of what the bug is.
Once I install this extension, the jupyterlab manager doesn't work:

The problem disappears when the ext is uninstalled.

To Reproduce
Steps to reproduce the behavior:

Install nbgather by jupyter labextension install nbgather

Screenshots
Seen above

Desktop (please complete the following information):

OS: Red Hat Enterprise Linux Server release 7.6
Browser: Chrome
Version [e.g. 22]

** Jupyter
jupyter core : 4.5.0
jupyter-notebook : 6.0.0
qtconsole : not installed
ipython : 7.6.1
ipykernel : 5.1.1
jupyter client : 5.3.1
jupyter lab : 1.0.2
nbconvert : 5.5.0
ipywidgets : 7.5.0
nbformat : 4.4.0
traitlets : 4.3.2

Additional context
I recalled to see this buggy behavior on another extension, "autoversion". Didn't test it.

Make gathering conservative by default

Is your feature request related to a problem? Please describe.

Currently, the slicer assumes that methods don't modify their arguments. While this assumption is often correct, sometimes it's not. And when methods do modify their arguments, the gathered notebook will be missing these methods, and hence code needed to reproduced a result.

Describe the solution you'd like

Basically, more accurate slicing, that's more likely to gather code that might not be needed than to leave it out.

For the exact implementation, I suggest modifying the slicer to assume that:

Methods change their arguments, unless otherwise noted
Methods change the objects they're called on, unless otherwise noted

And providing an easy way for users to specify when methods don't modify their arguments. For example, they could provide a lightweight configuration file that looks like:

[
  {
    "obj-name": "m",
    "function-name": "fit",
    "does-not-modify": ["OBJECT"]
  }, {
    "function-name": "clean_data",
    "does-not-modify": [0, "auxiliary_data"],
  }
]

That is, a user could specify function calls that modify their arguments by the function-name, optionally the obj-name or name of the object the function, and by a list of what the function modifies. This can be either the object the function was called on ("OBJECT"), positional arguments (e.g., 0 for the first argument), or keyword arguments (e.g., an argument named auxiliary_data).

The user could specify these rules of which methods don't modify their arguments in a Jupyter Lab setting editor. This could be populated with some defaults (e.g., some common Pandas data frame methods like df.head() and df.describe())

Describe alternatives you've considered

The slicer could be improved to infer when functions modify their arguments. This would take some engineering effort that's not currently available.

The current implementation of the tools assume that methods don't modify their arguments. I worry that this might make the tool unusable as by default a lot of relevant code might be missing from slices.

The extension "nbgather" does not yet support the current version of JupyterLab (for jupyterlab 3.0.14)

Describe the bug
The extension "nbgather" does not yet support the current version of JupyterLab for
jupyterlab 3.0.14 pyhd8ed1ab_0 conda-forge

To Reproduce
run
in terminal
jupyter labextension install nbgather

Expected behavior
how to install this package

Screenshots
$ jupyter labextension install nbgather
An error occured.
ValueError: The extension "nbgather" does not yet support the current version of JupyterLab.

Conflicting Dependencies:
JupyterLab Extension Package

=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/application
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/apputils
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/codeeditor
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/codemirror
=5.0.5 <5.1.0 >=3.2.0 <4.0.0 @jupyterlab/coreutils
=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/docmanager
=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/fileeditor
=3.0.9 <3.1.0 >=1.2.1 <2.0.0 @jupyterlab/notebook
=3.0.8 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/rendermime
See the log file for details: /tmp/jupyterlab-debug-fyx19196.log

Desktop (please complete the following information):

OS:
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"

Turn off gather-from-revisions by default

Is your feature request related to a problem? Please describe.
If we let people explore the versions of slices that produced results by default, the tool may be unusable for long sessions. This is because outputs can take up a lot of memory and, when the notebook is saved, storage. To enable people to explore versions of slices with their output, we need to keep every versions of every output produced.

Describe the solution you'd like
In the settings menu for the plugin, include a configuration option for turning on the "Gather from Revisions" action. Have a note next to the option that tells the user that if the feature is enabled, all outputs ever created will be saved to a notebook, which has the potential for increasing the size of the notebook many times. Add documentation for how to do this to the README.

Describe alternatives you've considered
Leave Gather to Revisions in by default. That said, our usability study suggested the tool is most useful for slicing and gathering code that was executed out of order, not gathering from revisions. I also feel that this feature, while exciting, hasn't yet reached the form where it will be obviously useful and easy-to-use in common cases where people want to use it. So for this tool to be useful, perhaps the feature of gathering from revisions should be opt-in only.

Additional context
N/A

Idea: support gathering functions at point of definition

Can functions be gathered from the point they are defined?

In the following example, f cannot be gathered by clicking on the function name or on the cell in which it is defined.

a = 1
# new cell
def f():
    return a+1

However, if I now assign f to a variable, that variable can gathered, which collects f and a as well as expected.

a = 1
# new cell
def f():
    return a+1
# new cell
b = f

Desired behavior is that after a function definition is evaluated, the function name (i.e., f) is highlighted indicating it is available for gathering.

Gather generates invalid Python code in simple scenario.

To repro with gather extension:

Open a notebook with these two cells:

#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
diff_sum = 0
for i in range(min(len(A), len(B))):
    sum += A[i] + B[i]
    diff_sum += A[i] - B[i]

#%%
print(sum)

Execute the first cell twice.
Execute the second cell.
Gather the second cell to a new notebook.

Expected: 2 cells

#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
for i in range(min(len(A), len(B))):
    sum += A[i] + B[i]

#%%
print(sum)

Actual: 3 cells

#%%
A = [0,1,2,3]
B = [4,5,6,7]
for i in range(min(len(A), len(B))):

#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
for i in range(min(len(A), len(B))):
    sum += A[i] + B[i]

#%%
print(sum)

The first cell is invalid Python code and shouldn't be included in the final gather.

microsoft / gather Goto Github PK

gather's Issues

Recommend Projects

Recommend Topics

Recommend Org