azure / azure-orbital-analytics-samples Goto Github PK

View Code? Open in Web Editor NEW

31.0 11.0 24.0 45.06 MB

Sample solution that demonstrates how to deploy and analyze spaceborne data using Azure Synapse Analytics

Home Page: https://aka.ms/synapse-geospatial-analytics

License: MIT License

Shell 16.00% Bicep 37.08% Python 46.63% Dockerfile 0.15% ASL 0.14%

geospatial spaceborne synapse-analytics cognitive-services computer-vision azure-batch azure-storage

azure-orbital-analytics-samples's Introduction

Project

This repository contains sample solution that demonstrates how to deploy and execute Geospatial Analysis using Azure Synapse Analytics workload on your Azure tenant. We recommend that you read the document on "Geospatial Analysis using Azure Synapse Analytics" before deploying this solution.

Disclaimer: The solution and samples provided in this repository is for learning purpose only. They're intended to explore the possibilites of the Azure Services and are a starting point to developing your own solution. We recommend that you follow the security best practices as per the Microsoft documentation for individual services.

Getting Started

Start by following the README.md to setup the Azure resources required to execute the pipeline.

This solution uses the Custom Vision Model as a sample AI model to demonstrate end-to-end Azure Synapse workflow for geospatial analysis. In this sample solution, the AI model detects swimming pools for a given geospatial data.

You can use this solution to integrate other AI models. Each AI Model requires their input geospatial data to be in a specific format. When adapting this solution for a different AI Model, make sure the geospatial data transform steps are modified to adapt to the individual AI Model's needs.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

azure-orbital-analytics-samples's People

Contributors

Stargazers

Watchers

azure-orbital-analytics-samples's Issues

How to delete Azure Synapse from cosmos db without migration of data.

How to delete Azure Synapse from cosmos db without migration of data. Currently this functionality is not available as per the documentation.

https://learn.microsoft.com/en-us/azure/cosmos-db/synapse-link-frequently-asked-questions#can-i-disable-the-azure-synapse-link-feature-for-my-azure-cosmos-db-account-

Replace copy noop spark job with data flow

Replace copy noop spark job with data flow. This should unblock and serve to improve performance for the overall pipeline as it will avoid the use of file share mount in mssparkutils which is currently failing.

Improve instructions for creating Synapse pipeline

Please provide more detailed instructions on which pipeline to import

As of the instructions

"Import the pipeline under the workflow folder to your Azure Synapse Analytics instance's workspace. Alternatively, you can copy the files to your repository (git or Azure DevOps) and link the repository to your Azure Synapse Analytics workspace."

Does the pipeline have to be created by using "Browse gallery" and picking "Spaceborne Data Analysis Master Pipeline"?

Under the workflow folder of the source code there are 2 folders with JSON files.
There seem to be multiple pipeline definitions under custom-vision-model-v2\pipeline for example

container_registry_analysis is failing on ADO pipeline

public container registry references are causing a security analysis failure

Issue while installing the libraries from the yml file

Running system reserved library update job for spark pool: Geospatial
Create Or Update SparkComputeFailed LibraryManagement - Spark Job timed out for Geospatial in workspace geospatialanalytics in subscription

Implement a simplified transform for Custom Vision Model Pipeline

Implement a simplified version of the current transform pipeline for Custom Vision Model. The current model pipeline performs mosaic, crop, convert and tiling as a series of spark jobs. This translates to separate spark-submit to the spark cluster in Synapse. Every spark-submit exacts an overhead of 2 to 4 minutes. This adds significant lead time for each of the transform - mosaic, crop, convert and tiling.

As a simple update, implement a consolidate version of the transform where the four transform are submitted as a single spark job. This reduces the redundant lead time associated with having multiple spark jobs.

Implement entrypoint to use different AI model execution infrastructure

We are exploring on using different infrastructure for executing AI model like batch-account, aks.
This demands the entrypoint scripts like setup.sh to provide for mechanism to specify different infrastructure types.

A Spark pool cannot have 1 nodes; it must have between 3 and 200 nodes.

While running the installation script, there is another error about Spark autoscaling configuration

{"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "DeploymentFailed",\r\n "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",\r\n "details": [\r\n {\r\n "code": "Conflict",\r\n "message": "{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"DeploymentFailed\",\r\n \"message\": \"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.\",\r\n \"details\": [\r\n {\r\n \"code\": \"BadRequest\",\r\n \"message\": \"{\\r\\n \\\"error\\\": {\\r\\n \\\"code\\\": \\\"ValidationFailed\\\",\\r\\n \\\"message\\\": \\\"Spark pool request validation failed.\\\",\\r\\n \\\"details\\\": [\\r\\n {\\r\\n \\\"code\\\": \\\"NodeCountNotValid\\\",\\r\\n \\\"message\\\": \\\"The autoscale minimum node count is not valid. A Spark pool cannot have 1 nodes; it must have between 3 and 200 nodes.\\\"\\r\\n }\\r\\n ]\\r\\n }\\r\\n}\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"}]}}

AI model execution infrastructure and data pipeline with AKS

As an alternative to Azure Batch account, AKS infrastructure and corresponding data pipeline can be set up based on set up parameter.

Remove environmentTag from nameSuffix

Setup.sh - Storage account name already taken prevents project setup

When setting up the solution with setup.sh, there is an error for storage account name conflict. Error message included below.

{"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "DeploymentFailed",\r\n "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",\r\n "details": [\r\n {\r\n "code": "Conflict",\r\n "message": "{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"DeploymentFailed\",\r\n \"message\": \"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Conflict\",\r\n \"message\": \"{\\r\\n \\\"status\\\": \\\"Failed\\\",\\r\\n \\\"error\\\": {\\r\\n \\\"code\\\": \\\"ResourceDeploymentFailure\\\",\\r\\n \\\"message\\\": \\\"The resource operation completed with terminal provisioning state 'Failed'.\\\",\\r\\n \\\"details\\\": [\\r\\n {\\r\\n \\\"code\\\": \\\"DeploymentFailed\\\",\\r\\n \\\"message\\\": \\\"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.\\\",\\r\\n \\\"details\\\": [\\r\\n {\\r\\n \\\"code\\\": \\\"Conflict\\\",\\r\\n \\\"message\\\": \\\"{\\\\r\\\\n \\\\\\\"error\\\\\\\": {\\\\r\\\\n \\\\\\\"code\\\\\\\": \\\\\\\"StorageAccountAlreadyTaken\\\\\\\",\\\\r\\\\n \\\\\\\"message\\\\\\\": \\\\\\\"The storage account named synhnsrqjhqd is already taken.\\\\\\\"\\\\r\\\\n }\\\\r\\\\n}\\\"\\r\\n }\\r\\n ]\\r\\n }\\r\\n ]\\r\\n }\\r\\n}\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"}]}}

Github workflow is not running for PRs create on repository forks

Because of security around secrets, the workflow tests for CI/CD are not running against the fork repositories.

Update README.md file with additional optional parameters

Since the release of the initial version of the script to deploy and configure the reference architecture, the README.md file has been outdated specifically in the parameters part. Need to update the parameters (required and optional) in the README.md file.

Lets clean the reference of versions for pipeline

We have only one pipeline but name refers to older nomenclature v1 and v2.
Lets cleanup the nomenclature anomaly by removing the references of v2 and the pipeline would only be referred by the name of custom-vision-model

Need an option to create a Synapse workspace with a managed VNET

We need a way to create a Synapse workspace with a managed VNET. This enables the security architecture with managed private endpoints.

By design, DEP (Data Exfiltration Protection) does not support public channels for package deployment. So, in order to support the current deployment scenario and does not failure with regression, DEP would be disable by default.

notebook resource for the pipeline is not being created.

We recently added a new dependency to the pipelines, "notebook".
Now the azure deployment/pipeline execution workflow is failing because we need notebooks to be created before installing the pipeline on synapse.

Add test for `DEPLOY_PGSQL=false`

We are only testing DEPLOY_PGSQL=true, we should add another ADO tests for the false condition which will test the synapse pipeline and deployment for non-pgsql setup.

Create instructions for running new AI model with this solution

Updates to follow ...

Add more information in Troubleshooting .md

We need to add more information into troubleshooting.md in regards to review comments

#87 (comment)

Custom Vision model offline image issue with Protofbuf

Protobuf issue introduced during the last build (triggered by the PR merge) cause AI Model run to fail. This issue needs to be addressed in a separate PR.


Traceback (most recent call last):
  File "./custom_vision.py", line 11, in <module>
    from predict import initialize, predict_image
  File "/predict.py", line 7, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.7/site-packages/tensorflow/__init__.py", line 98, in <module>
    from tensorflow_core import *
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/__init__.py", line 40, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 959, in _find_and_load_unlocked
  File "/usr/local/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/usr/local/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/__init__.py", line 52, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 16, in <module>
    from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/node_def_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/core/framework/tensor_shape_pb2.py", line 42, in <module>
    serialized_options=None, file=DESCRIPTOR),
  File "/usr/local/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Orbital Analytics end-to-end solution using downlinked .bin data

The proposed sample uses a sample GeoTIFF .tif file, which is downloaded by
deploy/scripts/copy_geotiff.sh script and afterwards converted to tiles.
However there does not seem to be any documentation on how to produce such GeoTIFF from the Azure Orbital .bin file downloaded at the end of
https://docs.microsoft.com/en-us/azure/orbital/downlink-aqua.

The instructions https://docs.microsoft.com/en-us/azure/orbital/satellite-imagery-with-orbital-ground-station also do not address this topic, hence there is a huge gap between these two tutorials.

The only relevant instruction available is the utilization of demodulation to reduce .bin file size, but not about the product.

Use restrictive permissions for batch account rather than Contributor access

Problem
Currently we are grant the managed-identities the contributor access on batch account.
This access is very open ended and could be a security issue.

Proposed Solution
Lets make use of custom-roles to create a custom role with selective permissions on batch account and assign the custom-role to managed-identities as required.

Link back to azure-orbital-integration/clarify relationship to AOGS

To help users understand the Orbital product portfolio examples it'd be helpful to link to https://github.com/Azure/azure-orbital-integration and explain how the two do and don't work together.

User documentation seems confusing when coming from pipeline in Synapse gallery

When we land to https://github.com/Azure/Azure-Orbital-Analytics-Samples/blob/main/deploy/gallery/instructions.md from pipeline help it suggests user to go to https://github.com/Azure/Azure-Orbital-Analytics-Samples/blob/main/deploy/README.md which has not only infrastructure deployment documentation but more than that deploying pipelines on standalone.

All-in-all this seems a little confusing for a user of the pipeline.

Lets try to simplify the documentation and make it helpful and easy for a user to understand and use.

Implement post-deployment checks

Is your feature request related to a problem? Please describe.
Currently, there is no sanity check post-deployment to make sure the components required are provisioned successfully like Spark pool, Batch pool nodes etc.

Describe the solution you'd like
Add validation script to make sure the components are provisioned successful post-deployment

Describe alternatives you've considered
Currently, I don't know if there are any alternatives but will consider evaluating in due course

Additional context
No additional context

Lets simplify the deploy directory

Deploy directory has too many files
Lets simplify by creating relevant directories and moving files there.

Deploying via Shell Scripts does not set the POSTGRES_ADMIN_LOGIN_PASS

Running setup.sh or install.sh does not prompt the user for the admin password for PGSQL as described in the deployment documentation.

Without this $POSTGRES_ADMIN_LOGIN_PASS is empty and deployment will fail saying

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"PasswordNotComplex","message":"Password validation failed. The password does not meet policy requirements because it is not complex enough."}]}

Thanks

env_code is not unique for workflow run

Since environment_code currently is based on sha, so the reruns of workflow are running into errors because of azure keyvault recreation with the same name.
So, ideally the env_code should be unique, so that the cleanup of previous workflow run does not interferes with the new workflow run.

Copy Xml/JSON: The specified resource does not exist

The synapse pipeline was created by searching in the gallery for "Spaceborne Data Analysis Master Pipeline" and setting the 3 required linked service accordingly:

A linked service to AZ blob storage rawdata******
A linked service to AZ datalake storage v.2 rawdata******
A linked service to

Updated the configuration files from https://github.com/Azure/Azure-Orbital-Analytics-Samples/blob/main/deploy/README.md#running-the-pipeline-custom-vision-model
Pipeline parameters were configured accordingly (as of the instructions from https://github.com/Azure/Azure-Orbital-Analytics-Samples/blob/main/deploy/gallery/instructions.md
I am using option A. for loading custom vision model from from https://github.com/Azure/Azure-Orbital-Analytics-Samples/blob/main/deploy/README.md

While running the Custom Vision Pipeline there is the following error (similar for Copy XML and Copy JSON activities):
It is unclear which the resource is so that

{
"errorCode": "2200",
"message": "ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error Message: The specified resource does not exist. (ErrorCode: 404, Detail: The specified resource does not exist., RequestId: 039b4525-001a-0061-2f79-d3625a000000),Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.Azure.Storage.StorageException,Message=The specified resource does not exist.,Source=Microsoft.Azure.Storage.Common,'",
"failureType": "UserError",
"target": "Copy Xml",
"details": []
}

The activity is using
"@concat(pipeline().parameters.Prefix, '/', activity('Read Spec Document').output['runStatus'].output.sink.value[0]['resultsDirectory'], '/other'

The results directory from Read Spec Document activity is "out" and the prefix is the container name, hence the path shall be "-test-container/out/other"

A hint about the configuration is more than welcome as the directory was not there and nothing changed after creating it manually. @jfrazee Do you think you could give a hint?