Giter Club home page Giter Club logo

ml-template-azure's Issues

Getting error while running workflow

I used the steps mentioned in youtube video.
But my workflow failed with following error.
After talking to Ashish , I got to know , I need to run following CLI commands.

az provider register --namespace Microsoft.MachineLearningServices
az provider register --namespace Microsoft.MachineLearning
az provider register --namespace Microsoft.Compute

This should be covered in youtube video and documentation a well.

Message: At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.
Exception Details:
Error Code: Conflict
Message: 'error': 'code': 'MissingSubscriptionRegistration', 'message': "The subscription is not registered to use namespace 'Microsoft.MachineLearningServices'. See https://aka.ms/rps-not-found for how to register subscriptions.", 'details': ['code': 'MissingSubscriptionRegistration', 'target': 'Microsoft.MachineLearningServices', 'message': "The subscription is not registered to use namespace 'Microsoft.MachineLearningServices'. See https://aka.ms/rps-not-found for how to register subscriptions."]***
ErrorResponse


"error": ***
    "message": "Unable to create the workspace. At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details."
***

Thanks,
Manish

Issue in "Connect/create Machine Learning Workspace step"

####Error Details:

get_workspace error using subscription_id=xxx, resource_group_name=mlops-ga-rg , workspace_name=mlops-ga-ws
Message: ***
"error_details": ***
"error": ***
"code": "AuthorizationFailed",
"message": "The client 'xxx' with object id 'xxx' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/read' over scope '/subscriptions/***/resourceGroups/mlops-ga-rg /providers/Microsoft.MachineLearningServices/workspaces/mlops-ga-ws' or the scope is invalid. If access was recently granted, please refresh your credentials."
***
***,
"status_code": 403,
"url": "https://management.azure.com/subscriptions/***/resourceGroups/mlops-ga-rg%20/providers/Microsoft.MachineLearningServices/workspaces/mlops-ga-ws?api-version=2020-01-01"


InnerException None
ErrorResponse 

####The object id that is getting shown here does not match the sp I created
####Double checked the permission of the sp and the value of the secret in github repo

Any ideas?

Docs issue: Ambiguous description in setting the workspace name

ReadMe.md, Getting Started, 4. Define your workspace parameters
Below sentence is ambiguous:
"Change the name parameter to the name of your workspace, if you have already created one or choose any name, if you want the Action to create a new workpace."

Change like below can be more clear:
"If you already have an Azure ML Workspace under that resource group, change the name parameter to the name of your workspace, if you want the Action to create a new workspace in that resource group, pick a name for your new workspace, and assign it to the name parameter."

Actions Tab Documentation

Add more details in the README:

  • Show users how to get to the actions tab
  • Show users how to create secrets, if a service principle has already been created
  • Add a comment that mentions that the first run takes a little longer and mention why it is taking so long

FileNotFoundError

I am finding this error while following the steps in the instruction guide.
The error is :
FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_/bin/python3.6'")

I believe there was another issue raised with the same error. Any leads will be appreciated.

Unexpected error in aml-deploy action step

Hi - we are experiencing an unexpected error which we haven't been able to solve for a while now. All the steps before aml-deploy run just fine, but aml-deploy is suddenly not being able to pull the registered model. The name and version of the model coming from aml-registermodel have been cross-checked and are being correctly passed to aml-deploy. Based on the logs looks like there are also no issues with workspace authorization.

Hoping you've seen anything similar happen with the template and can help us.

Attaching the action log, deploy.json parameters and aml_arm_config json. We have kept the workflow largely the same except for adding some debugging actions.

15_Deploy model.txt

deploy.json

{
"name": "test",
"deployment_compute_target": "akscompute"
}

aml_arm_config from aml-workspace (found in $GITHUB_WORKSPACE/.azureml/)

{"Id": null, "Scope": "/subscriptions/{subscriptionid}/resourceGroups/{resourcegroup}/providers/Microsoft.MachineLearningServices/workspaces/{workspacename}"}

Documentation issues

  • Warn the user that the workflow gets triggered for each commit to master
  • Add more granular triggers in the GH workflow (specific folder code) and comment the code. Describe in the documentation how the user can change the behavior of the workflow.
  • Provide more details on how the secret can be setup
  • Make the links to the AML Actions that are used more prominent. Users just scroll over them and do not find the actual documentation of our GH Actions.
  • Instruct the user to comment out parts of the workflow, if he only wants to work on training and not deployment

HTTP404 after the PythonScriptStep is finished

Hi,

We have a situation where submitting a Python pipeline run through gitactions, gitaction is detecting a non-zero exit code even when the underlying python script exited with a finished-zero code. Even a simple print statement step would finish but the action would report it as failed.

For example - python step:

def main():
print('AMAR Inside STEP 1 Choose Data Test')

abbreviated python code for pipeline build for submission - this is the script that aml-run action calls to build pipeline.

pipeline_steps = StepSequence(steps=[step_1_choose_data])
pipeline = Pipeline(workspace=workspace, steps=pipeline_steps)
pipeline.validate()
return pipeline

===

Individual step output

[2021-11-02T11:10:11.272611] The experiment completed successfully. Finalizing run...
Cleaning up all outstanding Run operations, waiting 900.0 seconds
3 items cleaning up...
Cleanup took 0.1468954086303711 seconds
[2021-11-02T11:10:11.547600] Finished context manager injector.
2021/11/02 11:10:13 Attempt 1 of http call to http://[REDACTED]/sendlogstoartifacts/status
2021/11/02 11:10:13 Send process info logs to master server succeeded
2021/11/02 11:10:13 Not exporting to RunHistory as the exporter is either stopped or there is no data.
Stopped: false
OriginalData: 3
FilteredData: 0.
2021/11/02 11:10:13 Process Exiting with Code: 0
2021/11/02 11:10:14 All App Insights Logs was sent successfully or the close timeout of 10 was reached

BUT, action output/report

Action output:
StepRun(STEP_1_Choose_Data) Execution Summary
232==============================================
233StepRun( STEP_1_Choose_Data ) Status: Finished
234Traceback (most recent call last):
235 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 271, in attempt_get_deps
236 blob_deps_to_file()
237 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 263, in blob_deps_to_file
238 blob = request.urlopen(deps_url, context=ssl_context)
239 File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
240 return opener.open(url, data, timeout)
241 File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
242 response = meth(req, response)
243 File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
244 response = self.parent.error(
245 File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
246 return self._call_chain(*args)
247 File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
248 result = func(*args)
249 File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
250 raise HTTPError(req.full_url, code, msg, hdrs, fp)
251urllib.error.HTTPError: HTTP Error 404: Not Found
252
253During handling of the above exception, another exception occurred:
254
255Traceback (most recent call last):
256 File "/code/main.py", line 240, in
257 main()
258 File "/code/main.py", line 187, in main
259 run.wait_for_completion(show_output=True)
260 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 294, in wait_for_completion
261 step_run.wait_for_completion(timeout_seconds=timeout_seconds - time_elapsed,
262 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 736, in wait_for_completion
263 return self._stream_run_output(timeout_seconds=timeout_seconds,
264 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 827, in _stream_run_output
265 print(final_details)
266 File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
267 return func(*args, **kwargs)
268 File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 766, in repr
269 steps = self._dataflow._get_steps()
270 File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
271 return func(*args, **kwargs)
272 File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 218, in _dataflow
273 dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
274 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 185, in _set_auth_type
275 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
276 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
277 _engine_api = EngineAPI()
278 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 110, in init
279 self._message_channel = launch_engine()
280 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
281 dependencies_path = runtime.ensure_dependencies()
282 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
283 if not attempt_get_deps():
284 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
285 raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
286NotImplementedError: Linux distribution debian 11. does not have automatic support.
287.NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed.
288Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
289Follow your distro specific instructions to install dotnet-runtime-* and replace * with 2.1.
290

Happy to discuss details.

Submit training run fails while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'.

From the logs:

ERROR conda.core.link:_execute(502): An error occurred while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'.
FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'")
Attempting to roll back.

done

FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.