niall-turbitt / e2e-mlops Goto Github PK
View Code? Open in Web Editor NEW[DEPRECATED] Demo repository implementing an end-to-end MLOps workflow on Databricks. Project derived from dbx basic python template
[DEPRECATED] Demo repository implementing an end-to-end MLOps workflow on Databricks. Project derived from dbx basic python template
Hi,
When running telco-churn-initial-model-train-register
job, I managed to pass the deploy step but failed at the launch step. Specifically, the second task for feature creation failed. The error is below. It looks like the data table ibm_telco_churn.bronze_customers
could not be found.
Thank you.
�[0;31mAnalysisException�[0m: Table or view not found: ibm_telco_churn.bronze_customers;
'UnresolvedRelation [ibm_telco_churn, bronze_customers], [], false
22/08/03 08:42:45 INFO ProgressReporter$: Removed result fetcher for 2396866968764080233_7828214349434961232_job-443271849362666-run-1125-action-6888937696656292
22/08/03 08:42:45 INFO DriverCorral$: Cleaning the wrapper ReplId-21436-1cf17-0ce06-9 (currently in status Idle(ReplId-21436-1cf17-0ce06-9))
22/08/03 08:42:45 INFO DriverCorral$: sending shutdown signal for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO StraceUtils$: python shell exit code: 143
22/08/03 08:42:45 INFO PythonDriverLocalBase$RedirectThread: Python RedirectThread exit
22/08/03 08:42:45 INFO PythonDriverLocalBase$RedirectThread: Python RedirectThread exit
22/08/03 08:42:45 INFO DriverCorral$: sending the interrupt signal for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO DriverCorral$: waiting for localThread to stop for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO DriverCorral$: ReplId-21436-1cf17-0ce06-9 successfully discarded
22/08/03 08:42:46 INFO MlflowAutologEventPublisher$: Subscriber with repl ID ReplId-21436-1cf17-0ce06-9 not responding to health checks, removing it
Hello,
Thanks for setting up the e2e demo. This is very helpful. I'm trying to reproduce the process, but got the following error during the feature-table-creation
step.
AnalysisException: Table or view not found: ibm_telco_churn.bronze_customers;
'UnresolvedRelation [ibm_telco_churn, bronze_customers], [], false
May I know where I can get the ibm_telco_churn.bronze_customers
dataset? Thanks!
Hi,
I'm following the tutorial to run an e2e mlops pipeline. After setting up my databricks CLI and github secrets, I ran the first workflow step: dbx deploy --jobs=telco-churn-initial-model-train-register --environment=prod --files-only dbx launch --job=telco-churn-initial-model-train-register --environment=prod --as-run-submit --trace
.
However, the deployment failed with following error.
May I know if I have missed any step in between? Thank you.
[dbx][2022-08-02 11:41:28.299] Starting new deployment for environment prod
[dbx][2022-08-02 11:41:28.311] Using profile provided from the project file
[dbx][2022-08-02 11:41:28.318] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-02 11:41:28.318] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-02 11:41:28.318] Profile e2-demo-west will be used for deployment
[dbx][2022-08-02 11:41:30.899] Auto-discovery found deployment file conf/deployment.yml
[dbx][2022-08-02 11:41:30.939] Deployment will be performed only for the following jobs: ['telco-churn-initial-model-train-register']
Traceback (most recent call last):
File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
sys.exit(cli())
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 169, in deploy
_preprocess_deployment(deployment, requested_jobs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 254, in _preprocess_deployment
deployment["jobs"] = _preprocess_jobs(deployment["jobs"], requested_jobs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 263, in _preprocess_jobs
raise Exception(f"Job {requested_job_name} was requested, but not provided in deployment file")
Exception: Job telco-churn-initial-model-train-register was requested, but not provided in deployment file
[dbx][2022-08-02 11:41:32.726] Launching job telco-churn-initial-model-train-register on environment prod
[dbx][2022-08-02 11:41:32.728] Using profile provided from the project file
[dbx][2022-08-02 11:41:32.729] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-02 11:41:32.729] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-02 11:41:32.729] Profile e2-demo-west will be used for deployment
[dbx][2022-08-02 11:41:34.459] No additional tags provided
Traceback (most recent call last):
File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
sys.exit(cli())
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 151, in launch
run_info = _find_deployment_run(filter_string, additional_tags, as_run_submit, environment)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 250, in _find_deployment_run
raise Exception(
Exception: "
Run Submit API is available only when deployment was done with --files-only flag.
Currently there is no deployments with such flag under given filters.
Please re-deploy with --files-only flag and then re-run this launch command.
Hi @niall-turbitt , et. al. - This is a great demonstration of MLOps. Wondering if you could set a license for this repo, preferably MIT License if possible, so I can start using it as a starting point?
Hi,
After successfulling completing the deploy step, I got the following error when executing dbx launch --job=PROD-telco-churn-initial-model-train-register --environment=prod --as-run-submit --trace
.
I'm quite sure I have set the access token for my prod environment. Any idea why the connection might fail?
Thank you.
[dbx][2022-08-03 10:03:04.737] Deployment for environment prod finished successfully ✨
[dbx][2022-08-03 10:03:07.213] Launching job PROD-telco-churn-initial-model-train-register on environment prod
[dbx][2022-08-03 10:03:07.216] Using profile provided from the project file
[dbx][2022-08-03 10:03:07.216] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-03 10:03:07.216] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-03 10:03:07.216] Profile prd will be used for deployment
[dbx][2022-08-03 10:03:09.464] No additional tags provided
[dbx][2022-08-03 10:03:09.467] Successfully found deployment per given job name
[dbx][2022-08-03 10:03:10.763] Launching job via run submit API
Traceback (most recent call last):
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/databricks_cli/sdk/api_client.py", line 138, in perform_query
resp.raise_for_status()
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://adb-5150270794210211.11.azuredatabricks.net/api/2.0/jobs/runs/submit
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
sys.exit(cli())
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 173, in launch
run_data, job_id = run_launcher.launch()
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 331, in launch
run_data = _submit_run(self.api_client, job_spec)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 400, in _submit_run
return api_client.perform_query("POST", "/jobs/runs/submit", data=payload)
File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/databricks_cli/sdk/api_client.py", line 146, in perform_query
raise requests.exceptions.HTTPError(message, response=e.response)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://adb-xxxxxxxxxxxxxxxx.11.azuredatabricks.net/api/2.0/jobs/runs/submit
Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'Node type i3.xlarge is not supported. Supported node types: '
'Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, '
'Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, '
'Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, '
'Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, '
'Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, '
'Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, '
'Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, '
'Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, '
'Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, '
'Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, '
'Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, '
'Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, '
'Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, '
'Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, '
'Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, '
'Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, '
'Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, '
'Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, '
'Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, '
'Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, '
'Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, '
'Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, '
'Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, '
'Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, '
'Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, '
'Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, '
'Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, '
'Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, '
'Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, '
'Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, '
'Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, '
'Standard_E80is_v4, Standard_L4s, Standard_L8s, Standard_L16s, '
'Standard_L32s, Standard_F4, Standard_F8, Standard_F16, '
'Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, '
'Standard_H16, Standard_F4s_v2, Standard_F8s_v2, '
'Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, '
'Standard_F72s_v2, Standard_NC12, Standard_NC24, '
'Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, '
'Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, '
'Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_L8s_v2, '
'Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, '
'Standard_L80s_v2, Standard_D4s_v5, Standard_D8s_v5, '
'Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, '
'Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, '
'Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, '
'Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, '
'Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, '
'Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, '
'Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, '
'Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, '
'Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, '
'Standard_E96ds_v5'}
https://github.com/niall-turbitt/e2e-mlops#running-integration-tests
Please note that for testing we recommend using [jobless deployments](https://dbx.readthedocs.io/en/latest/run_submit.html), so you won't affect existing job definitions.
jobless deployment link leads to page not found.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.