rsokl / noggin Goto Github PK

View Code? Open in Web Editor NEW

36.0 6.0 1.0 4.93 MB

A simple tool for logging and plotting measurements during machine learning experiments

Home Page: https://noggin.readthedocs.io/en/latest

License: MIT License

Python 100.00%

matplotlib neural-network machine-learning data-visualization real-time livedata python

noggin's Introduction

noggin

Noggin is a simple Python tool for ‘live’ logging and plotting measurements during experiments. Although Noggin can be used in a general context, it is designed around the train/test and batch/epoch paradigm for training a machine learning model.

Noggin’s primary features are its abilities to:

Log batch-level and epoch-level measurements by name
Seamlessly update a ‘live’ plot of your measurements, embedded within a Jupyter notebook
Organize your measurements into a data set of arrays with labeled axes, via xarray
Save and load your measurements & live-plot session: resume your experiment later without a hitch

You can read more about Noggin here

noggin's People

Contributors

Stargazers

Watchers

Forkers

lgtm-migrator

noggin's Issues

Is noggin coming to conda?

I like to use conda rather than pip to keep all of my packages in one place. Will noggin be coming to conda via conda install at any point?

Make metrics saveable/loadable as x-arrays

Live metrics are already handled as ordered dictionaries of numpy arrays; this is nearly exactly the data format needed to form an xarray of the metrics.

This would permit users to seamlessly access their data as N-dimensional arrays with labeled axes.

LivePlot should be a true drop-in replacement for LiveLogger

Need to update docs afterwards

Warn users when plotting is a substantial portion of their loop time

liveplot needs proper docs page

Provide a compressed-save method

My logged data is taking up too much space!

Update binder notebook / env

x-axis values

Iteration number can be pretty unwieldy. It would be nice to have an option to label the x-axis by iteration number, epoch, etc.

Create gif of liveplot in action

The README needs a brief gif that shows liveplot in action. It should show at least two metrics (e.g. loss and accuracy) being plotted with both batch and epoch-level statistics.

Limit data rate for plotting

Currently liveplot will plot all available data regardless of how much data that is. This can lead to large computational costs, making plotting a bottleneck.

We should establish a heuristic for limiting the amount of data being plotted. Ideally this would involve estimating the computational cost of each "draw" during live plotting, and how this scales with the amount of data available.

We would also want to estimate the maximum visually-resolvable density of data. That is, if I am drawing 10,000 points on a typically-sized plot, does drawing every 10th point look just the same as drawing every point?

With these to pieces of analysis, we should be able to arrive at a sensible default for limiting the number of points that we draw in a given call. We could potentially plot sliding-window averages to coarsen the plot.

Add support for alternate plotting backends

Abstract away the specific plotting backend (i.e. matplotlib) from LivePlot. Thus the current version of LivePlot would become MatplotlibLivePlot, and would retain the matplotlib-specific functionality. Otherwise LivePlot will serve as an abstract base class that handles all of the metric logging, saving, refresh logic, etc.

Ultimately, it would be nice to support bokeh and toyplot as backends.

Plotting in server mode

Add ability to serve logged data to a plotter. This would permit people to manage a live plot in a separate and multiple notebooks.

This is an ambitious enhancement that has the potential for a large payoff. I would like to carefully consider the best means for serving/listening to data in a simple but robust way. I'd like to get input from other about how to move forward with this (@davidmascharka , @ptran516 , @arjunmajum)

Add tests for last-N batches

fix indentation

    # record training epoch
    if i%10 == 0 and i > 0:
        plotter.plot_train_epoch()

       # cue test-evaluation of model
       for x in np.linspace(0, 10, 5):
           x += (np.random.rand(1) - 0.5)*5
           test_metrics = {"accuracy": x**2}
           plotter.set_test_batch(test_metrics, batch_size=1)
       plotter.plot_test_epoch()
plotter.plot()  # ensures final data gets plotted

Add ability to plot only last N batches (or maybe epochs?)

Logger should permit per-metric batch domains / missing data

Users should be able to set nan in their batch

Experiment with various tips for speeding up plotting

Investigate and document compatibility with Jupyter lab

At first glance, it looks like the %matplotlib notebook magic doesn't work

recreate_plot should take a figsize argument

It would be lovely to be able to take a figsize in recreate_plot so as not to end up with a miniscule plot. When I work out of interactive mode (e.g. when I work in emacs), I'd like to be able to simply construct the plot at the size I want via an interface like:

plotter, fig, ax  = recreate_plot(train_metrics=train, test_metrics=test, figsize=(8, 12))

rather than:

plotter, fig, ax = recreate_plot(train_metrics=train, test_metrics=test)
fig.set_size_inches(8, 12)