man-group / notebooker Goto Github PK
View Code? Open in Web Editor NEWProductionise & schedule your Jupyter Notebooks as easily as you wrote them.
License: GNU Affero General Public License v3.0
Productionise & schedule your Jupyter Notebooks as easily as you wrote them.
License: GNU Affero General Public License v3.0
The fix will be to call addCallbacks() either for all rows or whenever the table of schedules is modified in some way.
Add Prerequisites:
Setup:
npm run-script build
otherwise the schedule page will not work with 404 missing schedule_bundle.js
error.Image URLs are broken here: https://pypi.org/project/notebooker/
The scheduler does not seem to be passing these parameters correctly to the executor for some reason. This needs to be investigated.
For example there is code using six to handle 2/3 differences, but there is code that requires 3.5+ (e.g. type annotations syntax) and docs say 3.6+. As Python2 went EOL earlier this year, it's probably good to clean up the old code and dependencies.
It seems like the "don't generate code" command to nbconvert doesn't get sent when you ask for a rerun of a report which previously had this selected.
This means that the default behaviour of falling back to the notebook_template_examples does not work at present.
For example, .git
is shown in the docker-compose setup from #14
The URL in the "execute a notebook" sidebar should be configurable, so that we can link back to the repo if users so choose.
If the configuration is called something like GIT_REPO_BASE_URL, then we can also extrapolate the URL for the individual templates to link directly back to their source code in GitHub/BitBucket (only if GIT_REPO_BASE_URL has been defined).
Looks like the cron scheduler used in the notebooker is different from regular cron and the engine used to generate type hints.
Days 1-5 in notebooker mean Tue-Sat (and not Mon-Fri) while the type hint resolves these to Mon-Fri
It would be really useful to natively support ipynb files in notebooker, without requiring them to be converted to .py files first.
This would help reduce the cycle time from scratch notebook to automated report, if you could quickly change, commit and run via notebooker.
Hi,
I am seeing this error occur when I run a notebook in Notebooker. The notebook runs fine when run locally.
Traceback (most recent call last):
File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/execute_notebook.py", line 184, in run_report
result_serializer.save_check_result(result)
File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 124, in save_check_result
self._save_to_db(notebook_result)
File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 73, in _save_to_db
self._save_raw_to_db(out_data)
File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 62, in _save_raw_to_db
self.library.replace_one({"_id": existing["_id"]}, out_data)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 907, in replace_one
collation=collation, session=session),
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 835, in _update_retryable
_update, session)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1099, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1076, in _retry_with_session
return func(session, sock_info, retryable)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 831, in _update
retryable_write=retryable_write)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 796, in _update
retryable_write=retryable_write).copy()
File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 247, in _wrapped
raise e
File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 241, in _wrapped
return orig_method(self, *args, **kwargs)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 501, in command
self._raise_connection_failure(error)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 649, in _raise_connection_failure
raise error
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 496, in command
collation=collation)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/network.py", line 107, in command
name, size, max_bson_size + message._COMMAND_OVERHEAD)
File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/message.py", line 709, in _raise_document_too_large
raise DocumentTooLarge("command document too large")
pymongo.errors.DocumentTooLarge: command document too large
The notebook result itself is maybe too large to be stored in mongo. Please address this issue.
Thank you
Being able to read scheduled cron time from the notebook would improve the use case of using notebooker as tool to generate periodic reports. Might also need to maintain that time if same report is re-run.
40 10 * * 1-5 is running Tuesday to Saturday rather than Monday to Friday.
Currently it is only possible to choose email subject (when report is sent via email) if ran from command line. UI lacks the option to customize it. Would be nice to have.
With pymongo==4.0.2
The following line result in AttributeError: 'Cursor' object has no attribute 'count'
notebooker/notebooker/serialization/mongo.py
Line 450 in 3277684
I want to try out notebooker. I don't have access to python on my laptop for enterprise reasons. I can work inside jupyterhub.
I run
mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb
conda install -y -c anaconda mongodb=6.0.2
pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel
mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb
notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828
Both mongodb and notebooker start up successfully.
The jupyter-server-proxy is installed. So I would expect to be able to try out notebooker ui at
or
But as you can see I cannot.
The issue is that all assets are expected to be found at the root /
.
But they should not be. Either they should be found at some nested path that I should be able to specify via a --prefix
flag. Or they should be referenced as ./static
I believe.
I often run Panel succesfully from the terminal in my jupyter hub. Panel enables me to set a --prefix
that is used to point to the static assets.
Its the same for other data app frameworks. For example streamlit.
I followed the installation steps provided in the Notebooker documentation, but I am having issues with the report hunter not finding any updates. The logs show the following messages:
INFO:notebooker.web.app:Notebooker is now running at http://0.0.0.0:11828
INFO:notebooker.web.report_hunter:Found 0 updates since None.
INFO:notebooker.web.report_hunter:Found 0 updates since 2023-04-19 04:01:03.564799.
I'm not sure what the issue could be, and I would appreciate any guidance on how to troubleshoot and resolve this issue. Is there anything else I can check or try?
Thank you for your help.
And perhaps subdivided by parameters
As can be seen by using the running the example template and choosing to render a PDF while using the docker-compose setup from #14
The current limit set in the report_hunter is 60 minutes. There should be an option to extend this.
Unzipping the multitude of JS files takes ages. Can we speed this up using e.g. webpack?
On notebook re-run "Email to" is preserved but "Email from" isn't
Support Reveal.js HTML slideshow option of nbconvert such that the output of a scheduled notebook can be a slide deck.
This could also simplify things, as the Dockerfile includes the running of tests at the moment.
It might make sense to split the target email address into two:
However, when the check completes it will save properly. In the time range from T0+1h to report completion, it appears that the report has completely failed, but it is working fine.
We need to potentially send a heartbeat to ensure that it is not improperly marked as having failed/timed out when it is actually running in the background.
Installation instructions don't run successfully/consecutively on a clean Windows machine.
https://notebooker.readthedocs.io/en/latest/setup.html
E.g "pushd ./notebooker/web/static/" fails, dir does not exist in cwd.
The width is too narrow
It should essentially just list out the click command as it is descriptive enough
This email address belongs to a domain which doesn't exist. If someone responds either automatically or by mistake, a firewall may be triggered. This should be configurable by the user (perhaps as an attribute on the result object) and have a sensible default.
Usually you won't have a plaintext password in an environment variable (I hope) so we need to allow users to specify their own connection methods. This in future should be extendable to other storage mechanisms, e.g. postgres
This means they're incorrect unless your local timezone is equal to UTC.
In general, any newly generated report will have a 'last run' time of
babel.dates.format_timedelta(datetime.datetime.utcnow() - datetime.datetime.now())
and will just show your difference from UTC.
I would like to tryout notebooker deployed on kubernetes.
For quick exploration purposes I just run the command
mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb;conda install -y -c anaconda mongodb=6.0.2;mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb &pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel;notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828
inside my one docker container.
The container runs and deployes fine.
But I cannot deploy to "/" as this path serves other purposes. I need to deploy to /some-subpath
. So when I got to
I see
I.e. I get a 404 not found
[2022-10-12 11:58:38] "GET /mt-uk-mongodb-notebooker HTTP/1.1" 404 331 0.001099
With other frameworks I deploy like Panel, Streamlit, Dash, Fast Api and Flask I can specify some --prefix
to the server application.
But it seems not available with notebooker? Could you add it?
e.g. if you run for Cowsay and cowsay, the capitalised version will take precendence.
Notebooker is a great and promising project, however I could not get it it working in Windows.
I had to figure out some specific dependency versions to be able to make it (almost fully) working in Docker
(in particular, pymongo==3.6, papermill==1.2.1)
I say "almost" because tests related to scheduling fail (and had some runtime errors related to git functionality).
In Windows 10 I tried do the same that worked in Docker: the same Anaconda version (2020.07 python=3.7), the same mongodb version (4.4.1), pymongo 3.6 and papermill 1.2.1.
Example notebooks get executed fine and results get stored into database. But when I want to see them, the web interface gives empty results table.
I've checked the underlying request:
http://127.0.0.1:11828/core/get_all_available_results?limit=100&report_name=sample/plot_random
it also gives empty output ([ ])
Builds aren't reproducible as the dependencies aren't pinned
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.