Giter Club home page Giter Club logo

e2e-mlops's People

Contributors

harrysalmon avatar jkbradley avatar niall-turbitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e2e-mlops's Issues

Task feature-table-creation failed during launch

Hi,

When running telco-churn-initial-model-train-register job, I managed to pass the deploy step but failed at the launch step. Specifically, the second task for feature creation failed. The error is below. It looks like the data table ibm_telco_churn.bronze_customers could not be found.

Thank you.

�[0;31mAnalysisException�[0m: Table or view not found: ibm_telco_churn.bronze_customers;
'UnresolvedRelation [ibm_telco_churn, bronze_customers], [], false

22/08/03 08:42:45 INFO ProgressReporter$: Removed result fetcher for 2396866968764080233_7828214349434961232_job-443271849362666-run-1125-action-6888937696656292
22/08/03 08:42:45 INFO DriverCorral$: Cleaning the wrapper ReplId-21436-1cf17-0ce06-9 (currently in status Idle(ReplId-21436-1cf17-0ce06-9))
22/08/03 08:42:45 INFO DriverCorral$: sending shutdown signal for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO StraceUtils$: python shell exit code: 143
22/08/03 08:42:45 INFO PythonDriverLocalBase$RedirectThread: Python RedirectThread exit
22/08/03 08:42:45 INFO PythonDriverLocalBase$RedirectThread: Python RedirectThread exit
22/08/03 08:42:45 INFO DriverCorral$: sending the interrupt signal for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO DriverCorral$: waiting for localThread to stop for REPL ReplId-21436-1cf17-0ce06-9
22/08/03 08:42:45 INFO DriverCorral$: ReplId-21436-1cf17-0ce06-9 successfully discarded
22/08/03 08:42:46 INFO MlflowAutologEventPublisher$: Subscriber with repl ID ReplId-21436-1cf17-0ce06-9 not responding to health checks, removing it

Data table error

Hello,
Thanks for setting up the e2e demo. This is very helpful. I'm trying to reproduce the process, but got the following error during the feature-table-creation step.

AnalysisException: Table or view not found: ibm_telco_churn.bronze_customers;
'UnresolvedRelation [ibm_telco_churn, bronze_customers], [], false

May I know where I can get the ibm_telco_churn.bronze_customers dataset? Thanks!

Job telco-churn-initial-model-train-register was requested, but not provided in deployment file

Hi,

I'm following the tutorial to run an e2e mlops pipeline. After setting up my databricks CLI and github secrets, I ran the first workflow step: dbx deploy --jobs=telco-churn-initial-model-train-register --environment=prod --files-only dbx launch --job=telco-churn-initial-model-train-register --environment=prod --as-run-submit --trace.

However, the deployment failed with following error.

May I know if I have missed any step in between? Thank you.

[dbx][2022-08-02 11:41:28.299] Starting new deployment for environment prod
[dbx][2022-08-02 11:41:28.311] Using profile provided from the project file
[dbx][2022-08-02 11:41:28.318] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-02 11:41:28.318] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-02 11:41:28.318] Profile e2-demo-west will be used for deployment
[dbx][2022-08-02 11:41:30.899] Auto-discovery found deployment file conf/deployment.yml
[dbx][2022-08-02 11:41:30.939] Deployment will be performed only for the following jobs: ['telco-churn-initial-model-train-register']
Traceback (most recent call last):
  File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
    sys.exit(cli())
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 169, in deploy
    _preprocess_deployment(deployment, requested_jobs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 254, in _preprocess_deployment
    deployment["jobs"] = _preprocess_jobs(deployment["jobs"], requested_jobs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/deploy.py", line 263, in _preprocess_jobs
    raise Exception(f"Job {requested_job_name} was requested, but not provided in deployment file")
Exception: Job telco-churn-initial-model-train-register was requested, but not provided in deployment file
[dbx][2022-08-02 11:41:32.726] Launching job telco-churn-initial-model-train-register on environment prod
[dbx][2022-08-02 11:41:32.728] Using profile provided from the project file
[dbx][2022-08-02 11:41:32.729] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-02 11:41:32.729] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-02 11:41:32.729] Profile e2-demo-west will be used for deployment
[dbx][2022-08-02 11:41:34.459] No additional tags provided
Traceback (most recent call last):
  File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
    sys.exit(cli())
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 151, in launch
    run_info = _find_deployment_run(filter_string, additional_tags, as_run_submit, environment)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 250, in _find_deployment_run
    raise Exception(
Exception: "
                Run Submit API is available only when deployment was done with --files-only flag.
                Currently there is no deployments with such flag under given filters.
                Please re-deploy with --files-only flag and then re-run this launch command.

OSS License?

Hi @niall-turbitt , et. al. - This is a great demonstration of MLOps. Wondering if you could set a license for this repo, preferably MIT License if possible, so I can start using it as a starting point?

requests.exceptions.HTTPError: 400 Client Error: Bad Request for url

Hi,

After successfulling completing the deploy step, I got the following error when executing dbx launch --job=PROD-telco-churn-initial-model-train-register --environment=prod --as-run-submit --trace.

I'm quite sure I have set the access token for my prod environment. Any idea why the connection might fail?

Thank you.

[dbx][2022-08-03 10:03:04.737] Deployment for environment prod finished successfully ✨
[dbx][2022-08-03 10:03:07.213] Launching job PROD-telco-churn-initial-model-train-register on environment prod
[dbx][2022-08-03 10:03:07.216] Using profile provided from the project file
[dbx][2022-08-03 10:03:07.216] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-08-03 10:03:07.216] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-08-03 10:03:07.216] Profile prd will be used for deployment
[dbx][2022-08-03 10:03:09.464] No additional tags provided
[dbx][2022-08-03 10:03:09.467] Successfully found deployment per given job name
[dbx][2022-08-03 10:03:10.763] Launching job via run submit API
Traceback (most recent call last):
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/databricks_cli/sdk/api_client.py", line 138, in perform_query
    resp.raise_for_status()
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://adb-5150270794210211.11.azuredatabricks.net/api/2.0/jobs/runs/submit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/di_lin/opt/anaconda3/envs/erebus/bin/dbx", line 8, in <module>
    sys.exit(cli())
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 173, in launch
    run_data, job_id = run_launcher.launch()
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 331, in launch
    run_data = _submit_run(self.api_client, job_spec)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/dbx/commands/launch.py", line 400, in _submit_run
    return api_client.perform_query("POST", "/jobs/runs/submit", data=payload)
  File "/Users/di_lin/opt/anaconda3/envs/erebus/lib/python3.9/site-packages/databricks_cli/sdk/api_client.py", line 146, in perform_query
    raise requests.exceptions.HTTPError(message, response=e.response)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://adb-xxxxxxxxxxxxxxxx.11.azuredatabricks.net/api/2.0/jobs/runs/submit
 Response from server: 
 { 'error_code': 'INVALID_PARAMETER_VALUE',
  'message': 'Node type i3.xlarge is not supported. Supported node types: '
             'Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, '
             'Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, '
             'Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, '
             'Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, '
             'Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, '
             'Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, '
             'Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, '
             'Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, '
             'Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, '
             'Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, '
             'Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, '
             'Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, '
             'Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, '
             'Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, '
             'Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, '
             'Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, '
             'Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, '
             'Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, '
             'Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, '
             'Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, '
             'Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, '
             'Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, '
             'Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, '
             'Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, '
             'Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, '
             'Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, '
             'Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, '
             'Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, '
             'Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, '
             'Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, '
             'Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, '
             'Standard_E80is_v4, Standard_L4s, Standard_L8s, Standard_L16s, '
             'Standard_L32s, Standard_F4, Standard_F8, Standard_F16, '
             'Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, '
             'Standard_H16, Standard_F4s_v2, Standard_F8s_v2, '
             'Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, '
             'Standard_F72s_v2, Standard_NC12, Standard_NC24, '
             'Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, '
             'Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, '
             'Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_L8s_v2, '
             'Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, '
             'Standard_L80s_v2, Standard_D4s_v5, Standard_D8s_v5, '
             'Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, '
             'Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, '
             'Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, '
             'Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, '
             'Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, '
             'Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, '
             'Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, '
             'Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, '
             'Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, '
             'Standard_E96ds_v5'}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.