microsoft / gather Goto Github PK
View Code? Open in Web Editor NEWSpit shine for Jupyter notebooks 🧽✨
Home Page: https://microsoft.github.io/gather
License: MIT License
Spit shine for Jupyter notebooks 🧽✨
Home Page: https://microsoft.github.io/gather
License: MIT License
Moving the Binder conversation here so that we're not just chatting in a closed PR :-)
re: repository size, it's OK to include extra cruft up to a certain point. If it's "can I include a bunch of notebooks that might be big-ish" that's totally fine. If it's "I want to include a 500MB dataset in my repository" that might be more of a performance hit. The biggest time-sink in Binder is the time it takes to "Docker pull" onto a node when somebody launches a repository for the first time on that node. After that, Binder uses the docker cache within the same node and it is much faster.
re: this repository or another repository, it can be whatever you'd like! Many people just use whichever repository they use for their documentation.
Need to know if it can still be used.
Describe the bug
When a function references dependencies of class properties set either in the class constructor or class functions, those dependencies are not gathered.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Dependencies like variable declarations and module imports should be gathered.
Currently the TypeScript doesn't have a style guide, much less a consistent style. Using one could make it easier for contributors to know what style to use when contributing code. Consider using standard.
Describe the bug
In python3.jison on line 69, this.indents is incorrectly set. This corresponds to line 1064 in the generated python3.js file.
if (this.indents == undefined) this.indents == [0]; // should be this.indents = [0]
This causes the parser to throw a TypeError: Cannot read property 'length' of undefined at Object.anonymous [as performAction]
on the next line when the parser attempts to access this.indents.length:
if (this.indents.length > 1) {
The light blue and pink highlighting that Gather uses look great in the default grey+white Jupyter Lab theme, but highlighting REALLY highlights when you are in Dark mode or the Material Darker theme. Is there a way to tone things down that I am missing? Thanks!
In Issue #16, @micahjsmith proposed a new type of gather functionality. nbgather could be made more extensible to let others build new plugins like that one, which do new things with the gathered cells.
I imagine the interface for adding would involve a registerGatherCommand
function, where the caller provides a command title, an icon, and a callback that will get triggered and provided with the gathered cells.
Let's use this issue to track interest in having an API for registering new gather commands, and design discussions about what that API would look like.
It's likely that installing the Jupyter notebook extension will be a pain for those trying out the demo, given that it will only work in a narrow band of Jupyter notebook versions. We may want to remove the Jupyter notebook extension as an option from the README, removing the nb
directory of code, and only let people install the Jupyter Lab version.
If this is the case, we should leave a note in the README referring readers to a previous version of the project that included the Jupyter notebook implementation.
I'm not clear as to how gather actually works, whether it does a static analysis of a notebook, or whether it looks at cell execution / state history and then tries to identify the executed cells that influenced the execution of a given cell.
If it's a static analysis, given a notebook ipynb
file, and an identified code cell, whether "code cell 5 of 10" or a code cell identified by a particular cell tag, it would be useful to be able to call something like gather("mynotebook.ipynb", code_cell=5)
or gather("mynotebook.ipynb", code_id="final_chart")
and return gather code cells with the identified code cell as the last cell.
Describe the solution you'd like
It would be great if we could collect usage data from people who are using code gathering tools in their work. This could help us decide on future improvements to this tool, and collect data that could be shared other researchers and tool builders build better notebook tools.
Data collection would be strictly opt-in, and would be off by default.
Additional context
To do this, we would need the following:
notebooks.berkeley.edu
) that receives log events over a secure serverSome of the events we might want to collect are:
Is your feature request related to a problem? Please describe.
Have enough documentation that someone can figure out:
Describe the solution you'd like
We should have a README that describes the purpose of the tool, installation instructions, GIFs of using the major functions and of a more complex scenario, and a brief write-up about the implementation of the tool.
Describe alternatives you've considered
If pressed for time, we could link to the video figure and paper. We may also want to put this information on a github.io page or a project webpage instead of in the README.
Additional context
N/A
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
Describe the bug
When a symbol is imported then used within a method, the import is not included even if the method is included in the slice. Originally reported by @barik.
To Reproduce
Here's an example of the part of a notebook where this bug occurs.
Expected behavior
import re
should also be highlighted.
Desktop (please complete the following information):
Describe the bug
UPDATE: After experimenting further it seems that program slicing fails specifically on a lambda function which accepts at least 1 argument.
To Reproduce
Steps to reproduce the behavior:
Expected Behavior
Dependencies of lambda functions should be gathered.
My JupyterLab is version 2.2.0, the latest one.
I run the command
jupyter labextension install nbgather
The error looks like
An error occured.
ValueError: The extension "nbgather" does not yet support the current version of JupyterLab.
Conflicting Dependencies:
JupyterLab Extension Package
>=2.2.0 <2.3.0 >=1.2.0 <2.0.0 @jupyterlab/application
>=2.2.0 <2.3.0 >=1.2.0 <2.0.0 @jupyterlab/apputils
>=4.2.0 <4.3.0 >=3.2.0 <4.0.0 @jupyterlab/coreutils
>=2.2.0 <2.3.0 >=1.2.0 <2.0.0 @jupyterlab/docmanager
>=2.2.0 <2.3.0 >=1.2.0 <2.0.0 @jupyterlab/fileeditor
>=2.2.0 <2.3.0 >=1.2.1 <2.0.0 @jupyterlab/notebook
>=2.2.0 <2.3.0 >=1.2.0 <2.0.0 @jupyterlab/rendermime
Is your feature request related to a problem? Please describe.
A user might want to clear the history of a notebook, e.g., if they executed a cell with some sensitive data that they don't want to have stored to the notebook file.
Describe the solution you'd like
Add an action to the interface that lets someone "Clear History". This would then reset the execution history log, and make sure that any history metadata saved with the notebook is emptied.
Describe alternatives you've considered
Currently, an analyst could open up the ipynb
file on their own, and delete the metadata that includes the execution history.
Additional context
One additional benefit of this is reducing storage space for notebooks with very long histories.
Describe the bug
Originally reported in #23 for classes, but predates #28
The line after a class or function definition is included twice in the gathered program.
Here's an example illustrating the problem for funcdefs after executing the following program as a single cell in Jupyter:
def foo():
print("Hello")
def bar():
foo()
bar()
The gathered program looks like this:
def foo():
print("Hello")
def bar():
def bar():
foo()
bar()
bar()
Here's another example illustrating the problem for classes:
class Foo():
class Bar():
pass
v=1
Foo().Bar()
The gathered program looks like this:
class Foo():
class Bar():
pass
v=1
v=1
Foo().Bar()
Additional information
I'm somewhat out of my depth with debugging the parser, but it seems like the problem lies with the ILocation
object that the parser returns, since for class and funcdefs, loc.last_column === 0
and loc.last_line
is the true last line incremented by one. textSliceLines
is then incorrectly computed based on the values of last_column
and last_line
in cellSlice.ts
--in this case, lines 3 and 5 are included twice. I've added unit tests that illustrate this behavior here.
If I'm not mistaken, this means a fix will involve modifying the parser to return an ILocation
object whose last_column
and last_line
properties have accounted for the dedent. Alternatively, the cellSlice.getTextSlice
logic could be updated.
When setting up a development environment for nbgather
, linking the extension fails because the Jupyter Lab package versions in package.json
are not compatible with the version of Jupyter Lab that is now downloaded by pip by default.
The project should be refactored to use APIs from the Jupyter Lab project version >=1.0.
This extension looks pretty good and will be very useful for many cases. But is it possible to use it directly with simple jupyter notebook
instead of jupyter lab
?
Today I learned about this cool extension thanks to @trallard!
How about adding the jupyterlab-extension
topic so it would show up in https://github.com/topics/jupyterlab-extension?
Integration simply hasn't been implemented.
Describe the bug
The icons for "Open in notebook" and "Copy to clipboard" in the version browser are not appearing.
To Reproduce
Click on a variable or output, then click on "Gather to... Version Browser". You should be able to see that the icons are missing on the gather buttons.
Expected behavior
The icons should appear on the buttons, like so (screenshot taken from the version of gather currently hosted on BinderHub):
Additional context
This problem was probably introduced when gather started using Jupyter Lab libraries version >1.0 and the paths to the Jupyter icons changed. I already fixed the paths for the icons for the buttons in the Jupyter notebook toolbar in this commit, so we probably just need to point the CSS classes of the buttons in the version browser interface to the classes defined in that commit.
Describe the bug
A clear and concise description of what the bug is.
Once I install this extension, the jupyterlab manager doesn't work:
The problem disappears when the ext is uninstalled.
To Reproduce
Steps to reproduce the behavior:
jupyter labextension install nbgather
Screenshots
Seen above
Desktop (please complete the following information):
** Jupyter
jupyter core : 4.5.0
jupyter-notebook : 6.0.0
qtconsole : not installed
ipython : 7.6.1
ipykernel : 5.1.1
jupyter client : 5.3.1
jupyter lab : 1.0.2
nbconvert : 5.5.0
ipywidgets : 7.5.0
nbformat : 4.4.0
traitlets : 4.3.2
Additional context
I recalled to see this buggy behavior on another extension, "autoversion". Didn't test it.
Is your feature request related to a problem? Please describe.
Currently, the slicer assumes that methods don't modify their arguments. While this assumption is often correct, sometimes it's not. And when methods do modify their arguments, the gathered notebook will be missing these methods, and hence code needed to reproduced a result.
Describe the solution you'd like
Basically, more accurate slicing, that's more likely to gather code that might not be needed than to leave it out.
For the exact implementation, I suggest modifying the slicer to assume that:
And providing an easy way for users to specify when methods don't modify their arguments. For example, they could provide a lightweight configuration file that looks like:
[
{
"obj-name": "m",
"function-name": "fit",
"does-not-modify": ["OBJECT"]
}, {
"function-name": "clean_data",
"does-not-modify": [0, "auxiliary_data"],
}
]
That is, a user could specify function calls that modify their arguments by the function-name
, optionally the obj-name
or name of the object the function, and by a list of what the function modifies. This can be either the object the function was called on ("OBJECT"
), positional arguments (e.g., 0
for the first argument), or keyword arguments (e.g., an argument named auxiliary_data
).
The user could specify these rules of which methods don't modify their arguments in a Jupyter Lab setting editor. This could be populated with some defaults (e.g., some common Pandas data frame methods like df.head()
and df.describe()
)
Describe alternatives you've considered
The slicer could be improved to infer when functions modify their arguments. This would take some engineering effort that's not currently available.
The current implementation of the tools assume that methods don't modify their arguments. I worry that this might make the tool unusable as by default a lot of relevant code might be missing from slices.
Describe the bug
The extension "nbgather" does not yet support the current version of JupyterLab for
jupyterlab 3.0.14 pyhd8ed1ab_0 conda-forge
To Reproduce
run
in terminal
jupyter labextension install nbgather
Expected behavior
how to install this package
Screenshots
$ jupyter labextension install nbgather
An error occured.
ValueError: The extension "nbgather" does not yet support the current version of JupyterLab.
Conflicting Dependencies:
JupyterLab Extension Package
=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/application
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/apputils
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/codeeditor
=3.0.7 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/codemirror
=5.0.5 <5.1.0 >=3.2.0 <4.0.0 @jupyterlab/coreutils
=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/docmanager
=3.0.9 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/fileeditor
=3.0.9 <3.1.0 >=1.2.1 <2.0.0 @jupyterlab/notebook
=3.0.8 <3.1.0 >=1.2.0 <2.0.0 @jupyterlab/rendermime
See the log file for details: /tmp/jupyterlab-debug-fyx19196.log
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
If we let people explore the versions of slices that produced results by default, the tool may be unusable for long sessions. This is because outputs can take up a lot of memory and, when the notebook is saved, storage. To enable people to explore versions of slices with their output, we need to keep every versions of every output produced.
Describe the solution you'd like
In the settings menu for the plugin, include a configuration option for turning on the "Gather from Revisions" action. Have a note next to the option that tells the user that if the feature is enabled, all outputs ever created will be saved to a notebook, which has the potential for increasing the size of the notebook many times. Add documentation for how to do this to the README.
Describe alternatives you've considered
Leave Gather to Revisions in by default. That said, our usability study suggested the tool is most useful for slicing and gathering code that was executed out of order, not gathering from revisions. I also feel that this feature, while exciting, hasn't yet reached the form where it will be obviously useful and easy-to-use in common cases where people want to use it. So for this tool to be useful, perhaps the feature of gathering from revisions should be opt-in only.
Additional context
N/A
Can functions be gathered from the point they are defined?
In the following example, f
cannot be gathered by clicking on the function name or on the cell in which it is defined.
a = 1
# new cell
def f():
return a+1
However, if I now assign f
to a variable, that variable can gathered, which collects f
and a
as well as expected.
a = 1
# new cell
def f():
return a+1
# new cell
b = f
Desired behavior is that after a function definition is evaluated, the function name (i.e., f
) is highlighted indicating it is available for gathering.
To repro with gather extension:
#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
diff_sum = 0
for i in range(min(len(A), len(B))):
sum += A[i] + B[i]
diff_sum += A[i] - B[i]
#%%
print(sum)
Expected: 2 cells
#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
for i in range(min(len(A), len(B))):
sum += A[i] + B[i]
#%%
print(sum)
Actual: 3 cells
#%%
A = [0,1,2,3]
B = [4,5,6,7]
for i in range(min(len(A), len(B))):
#%%
A = [0,1,2,3]
B = [4,5,6,7]
sum = 0
for i in range(min(len(A), len(B))):
sum += A[i] + B[i]
#%%
print(sum)
The first cell is invalid Python code and shouldn't be included in the final gather.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.