man-group / notebooker Goto Github PK

View Code? Open in Web Editor NEW

844.0 844.0 79.0 2.71 MB

Productionise & schedule your Jupyter Notebooks as easily as you wrote them.

License: GNU Affero General Public License v3.0

Dockerfile 0.44% Python 81.48% Smarty 1.37% JavaScript 8.17% CSS 0.39% HTML 7.00% SCSS 0.79% Jupyter Notebook 0.36%

jupyter jupyter-notebook jupyter-notebooks notebooks productionise publishing

notebooker's People

Contributors

Stargazers

Watchers

notebooker's Issues

Delete button does not work for scheduler screen beyond page 1

The fix will be to call addCallbacks() either for all rows or whenever the table of schedules is modified in some way.

[Document] Missing Prerequisites & Setup Instruction

Add Prerequisites:

yarn

Setup:

Also need to run npm run-script build otherwise the schedule page will not work with 404 missing schedule_bundle.js error.

Fix image URLs on PyPI

Image URLs are broken here: https://pypi.org/project/notebooker/

Bug: generate_pdf_output and hide_code_output not working from scheduler

The scheduler does not seem to be passing these parameters correctly to the executor for some reason. This needs to be investigated.

Grey out/disable button when rerun is clicked

Clean up shims for old python version

For example there is code using six to handle 2/3 differences, but there is code that requires 3.5+ (e.g. type annotations syntax) and docs say 3.6+. As Python2 went EOL earlier this year, it's probably good to clean up the old code and dependencies.

Rerunning a report from the result screen doesn't hide code input

It seems like the "don't generate code" command to nbconvert doesn't get sent when you ask for a rerun of a report which previously had this selected.

If --py-template-base-dir is None, report run fails

This means that the default behaviour of falling back to the notebook_template_examples does not work at present.

"Delete all" button on report listing screen

Do not show hidden directories in PY_TEMPLATE_DIR

For example, .git is shown in the docker-compose setup from #14

Configurable link back to Notebook Templates git repo

The URL in the "execute a notebook" sidebar should be configurable, so that we can link back to the repo if users so choose.

If the configuration is called something like GIT_REPO_BASE_URL, then we can also extrapolate the URL for the individual templates to link directly back to their source code in GitHub/BitBucket (only if GIT_REPO_BASE_URL has been defined).

Incorrect cron-schedule hint on when it is to run next

Looks like the cron scheduler used in the notebooker is different from regular cron and the engine used to generate type hints.
Days 1-5 in notebooker mean Tue-Sat (and not Mon-Fri) while the type hint resolves these to Mon-Fri

Support ipynb files without requiring conversion

It would be really useful to natively support ipynb files in notebooker, without requiring them to be converted to .py files first.

This would help reduce the cycle time from scratch notebook to automated report, if you could quickly change, commit and run via notebooker.

pymongo.errors.DocumentTooLarge: command document too large error

Hi,

I am seeing this error occur when I run a notebook in Notebooker. The notebook runs fine when run locally.

Traceback (most recent call last):
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/execute_notebook.py", line 184, in run_report
    result_serializer.save_check_result(result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 124, in save_check_result
    self._save_to_db(notebook_result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 73, in _save_to_db
    self._save_raw_to_db(out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 62, in _save_raw_to_db
    self.library.replace_one({"_id": existing["_id"]}, out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 907, in replace_one
    collation=collation, session=session),
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 835, in _update_retryable
    _update, session)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1099, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1076, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 831, in _update
    retryable_write=retryable_write)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 796, in _update
    retryable_write=retryable_write).copy()
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 247, in _wrapped
    raise e
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 241, in _wrapped
    return orig_method(self, *args, **kwargs)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 501, in command
    self._raise_connection_failure(error)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 649, in _raise_connection_failure
    raise error
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 496, in command
    collation=collation)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/network.py", line 107, in command
    name, size, max_bson_size + message._COMMAND_OVERHEAD)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/message.py", line 709, in _raise_document_too_large
    raise DocumentTooLarge("command document too large")
pymongo.errors.DocumentTooLarge: command document too large

The notebook result itself is maybe too large to be stored in mongo. Please address this issue.

Thank you

Add option to pass scheduled cron time to the notebook

Being able to read scheduled cron time from the notebook would improve the use case of using notebooker as tool to generate periodic reports. Might also need to maintain that time if same report is re-run.

Native cron scheduler doesn't match convention

40 10 * * 1-5 is running Tuesday to Saturday rather than Monday to Friday.

Add option to customize email subject

Currently it is only possible to choose email subject (when report is sent via email) if ran from command line. UI lacks the option to customize it. Would be nice to have.

AttributeError: 'Cursor' object has no attribute 'count'

With pymongo==4.0.2
The following line result in AttributeError: 'Cursor' object has no attribute 'count'

notebooker/notebooker/serialization/mongo.py

Line 450 in 3277684

return self._get_raw_results({"report_name": report_name}, {}, 0).count()

Report hunter thread should occasionally delete gridfs entries for reports marked as deleted

Enable me to explore on Jupyterhub behind reverse proxy

I want to try out notebooker. I don't have access to python on my laptop for enterprise reasons. I can work inside jupyterhub.

I run

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb
conda install -y -c anaconda mongodb=6.0.2
pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel

mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb

notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

Both mongodb and notebooker start up successfully.

The jupyter-server-proxy is installed. So I would expect to be able to try out notebooker ui at

https://domain/namespace/user/user-name/proxy/11828

https://domain/namespace/user/user-name/proxy/11828/

But as you can see I cannot.

The issue is that all assets are expected to be found at the root /.

But they should not be. Either they should be found at some nested path that I should be able to specify via a --prefix flag. Or they should be referenced as ./static I believe.

Additional Context

I often run Panel succesfully from the terminal in my jupyter hub. Panel enables me to set a --prefix that is used to point to the static assets.

Its the same for other data app frameworks. For example streamlit.

Error installing Notebooker and report hunter not finding updates

I followed the installation steps provided in the Notebooker documentation, but I am having issues with the report hunter not finding any updates. The logs show the following messages:

INFO:notebooker.web.app:Notebooker is now running at http://0.0.0.0:11828
INFO:notebooker.web.report_hunter:Found 0 updates since None.
INFO:notebooker.web.report_hunter:Found 0 updates since 2023-04-19 04:01:03.564799.

I'm not sure what the issue could be, and I would appreciate any guidance on how to troubleshoot and resolve this issue. Is there anything else I can check or try?

Thank you for your help.

Create a view of all report results divided by report name

And perhaps subdivided by parameters

Docker image doesn't have the tex packages needed to render PDF

As can be seen by using the running the example template and choosing to render a PDF while using the docker-compose setup from #14

Be able to configure a max timeout for long-running reports

The current limit set in the report_hunter is 60 minutes. There should be an option to extend this.

Improve install time from egg

Unzipping the multitude of JS files takes ages. Can we speed this up using e.g. webpack?

"Email From" not preserved on re-run

On notebook re-run "Email to" is preserved but "Email from" isn't

nbconvert --to slide support

Support Reveal.js HTML slideshow option of nbconvert such that the output of a scheduled notebook can be a slide deck.

https://nbconvert.readthedocs.io/en/latest/usage.html

Include Dockerfile in CI

This could also simplify things, as the Dockerfile includes the running of tests at the moment.

/latest-successful URL should use parameters if given

Add a button which displays the stdout of the job which executed the notebook

Different email address for failed and succeeded reports

It might make sense to split the target email address into two:

one for reports that succeeded - these normally go to the target audience
another for reports that failed - which might go to tech/support team

Bug: A very long-running check (>1h) will be marked as timed out

However, when the check completes it will save properly. In the time range from T0+1h to report completion, it appears that the report has completely failed, but it is working fine.

We need to potentially send a heartbeat to ensure that it is not improperly marked as having failed/timed out when it is actually running in the background.

Improve Windows Installation Instructions

Installation instructions don't run successfully/consecutively on a clean Windows machine.
https://notebooker.readthedocs.io/en/latest/setup.html
E.g "pushd ./notebooker/web/static/" fails, dir does not exist in cwd.

Add ability to configure git branch

None parameter becomes "None"

Build in py38

Widen results display

The width is too narrow

Deleting a report should also delete the report on GridFS

Webapp configuration documentation is wrong

It should essentially just list out the click command as it is descriptive enough

Default from_email address is from a nonexistent domain

This email address belongs to a domain which doesn't exist. If someone responds either automatically or by mistake, a firewall may be triggered. This should be configurable by the user (perhaps as an attribute on the result object) and have a sensible default.

Add ability to add custom mongo connection logic

Usually you won't have a plaintext password in an environment variable (I hope) so we need to allow users to specify their own connection methods. This in future should be extendable to other storage mechanisms, e.g. postgres

Push directly to pypi from CI

'Last run X minutes ago' compares UTC to naive timestamp

This means they're incorrect unless your local timezone is equal to UTC.

In general, any newly generated report will have a 'last run' time of

babel.dates.format_timedelta(datetime.datetime.utcnow() - datetime.datetime.now())

and will just show your difference from UTC.

Add ability to manually trigger a scheduled report

Enable me to run on kubernetes behind reverse proxy

I would like to tryout notebooker deployed on kubernetes.

For quick exploration purposes I just run the command

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb;conda install -y -c anaconda mongodb=6.0.2;mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb &pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel;notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

inside my one docker container.

The container runs and deployes fine.

But I cannot deploy to "/" as this path serves other purposes. I need to deploy to /some-subpath. So when I got to

https://domain/some-subpath

I see

I.e. I get a 404 not found

[2022-10-12 11:58:38] "GET /mt-uk-mongodb-notebooker HTTP/1.1" 404 331 0.001099

With other frameworks I deploy like Panel, Streamlit, Dash, Fast Api and Flask I can specify some --prefix to the server application.

But it seems not available with notebooker? Could you add it?

Grouped front page should be case-sensitive

e.g. if you run for Cowsay and cowsay, the capitalised version will take precendence.

MongoDB queries should work with sharded libraries

Results page is empty in Windows 10

Notebooker is a great and promising project, however I could not get it it working in Windows.

I had to figure out some specific dependency versions to be able to make it (almost fully) working in Docker
(in particular, pymongo==3.6, papermill==1.2.1)
I say "almost" because tests related to scheduling fail (and had some runtime errors related to git functionality).

In Windows 10 I tried do the same that worked in Docker: the same Anaconda version (2020.07 python=3.7), the same mongodb version (4.4.1), pymongo 3.6 and papermill 1.2.1.

Example notebooks get executed fine and results get stored into database. But when I want to see them, the web interface gives empty results table.
I've checked the underlying request:
http://127.0.0.1:11828/core/get_all_available_results?limit=100&report_name=sample/plot_random
it also gives empty output ([ ])

ERROR: Service 'git-repo-init' failed to build : Build failed

Sequence:

git clone
cd docker
docker-compose up

Freeze package dependencies

Builds aren't reproducible as the dependencies aren't pinned

man-group / notebooker Goto Github PK

notebooker's People

Contributors

Stargazers

Watchers

Forkers

notebooker's Issues

Additional Context

Recommend Projects

Recommend Topics

Recommend Org