Giter Club home page Giter Club logo

purviewcli's People

Contributors

mdrakiburrahman avatar nemolin2008 avatar sonnyhcl avatar tayganr avatar zeinab-mk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purviewcli's Issues

Allow PURVIEW_NAME input as flag

It would be syntactically simpler if we could use --purviewName as an alternative to the mandatory PURVIEW_NAME environment variable.

(I'm sure the a technicalities why it's an env variable, but from user perspective it's not obvious.)

Issue for updating businessmetadata for entity

Payload bug for calling POST atlas/v2/entity/guid/{guid}/businessmetadata
The cli tool will call the service API with the following wrong payload

{
  "entity": {
    "businessAttributes": {
      "businessMetadataGroupExample": {
        "businessMetadataAttributeExample1": [
          "1"
        ],
        "businessMetadataAttributeExample2": "5"
      }
    }
  }
}

while the correct payload should be

{
 "businessMetadataGroupExample": {
        "businessMetadataAttributeExample1": [
          "1"
        ],
        "businessMetadataAttributeExample2": "5"
      }
} 

More details for this API https://learn.microsoft.com/en-us/rest/api/purview/datamapdataplane/entity/add-or-update-business-metadata?view=rest-purview-datamapdataplane-2023-09-01&tabs=HTTP

Customers reported that they have used the correct payload however getting the error.

samples/sources.ipynb is outdated

The samples/sources.ipynb file is outdated:

  • Commands should be prefixed with scan endpoint
  • registerSource should be createSource
  • Azure Purview is performing validation on certain endpoints such as Azure Blob, Cosmos DB, Azure SQL etc.

Not able to scan an asset using purviewCLI

Hi,

I am not able to scan an asset using purviewCLI, while other commands (like glossary, search, insight etc) are working fine.

Run Scan

!pv scan runScan --dataSourceName "AzureBlob-8hn" --scanName "Scan-Y95"

Traceback (most recent call last):
File "/anaconda/envs/azureml_py36/bin/pv", line 8, in
sys.exit(main())
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/cli/cli.py", line 75, in main
data = funcObj(command_args)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/endpoint.py", line 36, in wrapper
data = get_data(http_dict)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/endpoint.py", line 17, in get_data
data = client.http_get(http_dict['app'], http_dict['method'], http_dict['endpoint'], http_dict['params'], http_dict['payload'])
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/client.py", line 87, in http_get
elif response.headers['Content-Type'] == 'text/csv; charset=UTF-8':
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/structures.py", line 54, in getitem
return self._store[key.lower()][1]
KeyError: 'content-type'

`pv scan createSource` resourceName become only the first character of the specified parameter

I've tried pv scan createSource command like the following, the command has succeeded and the source has been created. But the resourceName has became c instead of cosdb-example.

Command

!pv scan createSource \
    --datasource "cosdb-example" \
    --kind "AzureCosmosDb" \
    --accountUri "https://cosdb-example.documents.azure.com:443/" \
    --subscriptionId "xxx" \
    --resourceGroup "rg-hinakaza-private" \
    --location "westus2" \
    --resourceName "cosdb-example" \
    --parentCollection "ServiceEndpoint"

Response

{
    "id": "datasources/cosdb-example",
    "kind": "AzureCosmosDb",
    "name": "cosdb-example",
    "properties": {
        "accountUri": "https://cosdb-example.documents.azure.com:443/",
        "createdAt": "2021-05-06T15:11:47.903815Z",
        "lastModifiedAt": "2021-05-06T15:13:51.304859Z",
        "location": "westus2",
        "parentCollection": {
            "referenceName": "ServiceEndpoint",
            "type": "DataSourceReference"
        },
        "resourceGroup": "rg-hinakaza-private",
        "resourceName": "c",
        "subscriptionId": "xxx"
    }
}

Idea of solution

I'm wondering if [0] of "resourceName": args['--resourceName'][0] in the follwoing function is the reason. How about removing the [0]?

def scanCreateSource(args):
endpoint = '/datasources/%s' % args['--datasource']
payload = {
"kind": args['--kind'],
"name": args['--datasource'],
"properties": {}
}
# Source Properties
if args['--kind'] == 'AzureCosmosDb':
payload['properties'] = {
"accountUri": args['--accountUri'],
"subscriptionId": args['--subscriptionId'],
"resourceGroup": args['--resourceGroup'],
"location": args['--location'],
"resourceName": args['--resourceName'][0]
}
elif args['--kind'] == 'AzureDataExplorer':
payload['properties'] = {
"endpoint": args['--endpoint'],
"subscriptionId": args['--subscriptionId'],
"resourceGroup": args['--resourceGroup'],
"location": args['--location'],
"resourceName": args['--resourceName'][0]
}

includeTermHierarchy parameter error

Hi,

I try to use createTermsImport to import a csv with hierarchy
When I use --includeTermHierarchy parameter, I get the following error
With --includeTermHierarchy True:
image
Without:
image

When I didn't use --includeTermHierarchy parameter, it ask me to use --includeTermHierarchy parameter

image

Am I using the correct way to use --includeTermHierarchy parameter?
Thank you!!

additional fields in json response for "pv scan readScanHistory"

Hi @tayganr ,

In https://github.com/tayganr/purviewcli/blob/master/samples/notebooks%20(plus)/scan%20history.ipynb, in the "Get Scan History" - there are two additional fields in the json response that are not in your header list : ingestionJobId & webScanResults. Because of those two (new?) fields, the result is shifted and for example the last two columns don't have header. I've added the new values in my own work, but maybe you should add those here?
headers = ["assetsClassified", "assetsDiscovered", "dataSourceType", "endTime", "error", "errorMessage", "id", "ingestionJobId", "parentId", "pipelineStartTime", "queuedTime", "resourceId", "runType", "scanLevelType", "scanRulesetType", "scanRulesetVersion", "startTime", "status","webScanResults", "source", "scanName"]

'pv insight assetDistributionByDataSource' command is failing

I am trying to get insights via azure purviewcli but getting this error while executing the commends to get insights:

Screen of Purview portal is also attached for your reference.

pv insight assetDistributionByDataSource

Traceback (most recent call last):
File "/anaconda/envs/azureml_py36/bin/pv", line 8, in
sys.exit(main())
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/cli/cli.py", line 61, in main
module = importlib.import_module('purviewcli.client._' + command)
File "/anaconda/envs/azureml_py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/_insight.py", line 1, in
from .client import get_data
ImportError: cannot import name 'get_data'
tmp

SPN authentication error

When I use

%env AZURE_CLIENT_ID=YOUR_CLIENT_ID
%env AZURE_TENANT_ID=YOUR_TENANT_ID
%env AZURE_CLIENT_SECRET=YOUR_CLIENT_SECRET

and then run:

glossary = !pv glossary read

I get the following error, both in Databricks and in my local environment in ipynb:

['Traceback (most recent call last):', ' File "C:\Python38\lib\runpy.py", line 194, in _run_module_as_main', ' return _run_code(code, main_globals, None,', ' File "C:\Python38\lib\runpy.py", line 87, in _run_code', ' exec(code, run_globals)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\Scripts\pv.exe\main.py", line 7, in ', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\cli\cli.py", line 77, in main', ' data = funcObj(command_args)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\endpoint.py", line 42, in wrapper', ' data = get_data(http_dict)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\endpoint.py", line 18, in get_data', " client.set_token(http_dict['app'])", ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\client.py", line 39, in set_token', ' credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\default.py", line 121, in init', ' credentials.append(EnvironmentCredential(authority=authority, **kwargs))', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\environment.py", line 62, in init', ' self._credential = ClientSecretCredential(', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\client_secret.py", line 40, in init', ' super(ClientSecretCredential, self).init(', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_internal\msal_credentials.py", line 33, in init', ' validate_tenant_id(self._tenant_id)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_internal\init.py", line 61, in validate_tenant_id', ' raise ValueError(', 'ValueError: Invalid tenant id provided. You can locate your tenant id by following the instructions here: https://docs.microsoft.com/partner-center/find-ids-and-domain-names']

Purview CLI vs. Azure CLI

I was expecing to find this functionality in (some preview version of) Azure CLI, like

az purview datasource create ...

Is there a particular reason for having this separate CLI?
Could (the functionality of) purviewcli be integrated into Azure CLI?

createTermsExport issues

%env PURVIEW_NAME=your_PURVIEW_NAME
%env AZURE_CLIENT_ID=your_CLIENT_ID
%env AZURE_TENANT_ID=your_TENANT_ID
%env AZURE_CLIENT_SECRET=your_CLIENT_SECRET

I've provided the account information as shown above. Then, I tried to createTermsExport command as below

!pv glossary createTermsExport --glossaryGuid=your_glossary_guid --termGuid=your_term_guid

I got the response as below:
{
"reason": "OK",
"status_code": 200,
"url": "https://digital-center-purview-prd.purview.azure.com/catalog/api/atlas/v2/glossary/your_glossary_guid/terms/export?api-version=2021-05-01-preview&includeTermHierarchy=False"
}

However, the example you provided is something like the following:
{
"export": "/YOUR_FOLDER_PATH/export.csv",
"status_code": 200
}

Could you please help me to clarify the reason why the response is different from what we expect?
Thank you.

Seems to need to assign Collection admins role to identity

Issue detail

I've tested with a newly created Purview account on October 21, 2021. Firstly I've assigned Data curators role and Data source admins role to a Service Principal from the Azure portal, and ran pv scan readDatasources for newly created Purview account with the SP, and got the following error message.

$ pv scan readDatasources
[Error]
Access to the requested resource is forbidden (HTTP status code 403).

[Resource]
[GET] https://purview-hinakaza-openhack-mdw.scan.purview.azure.com/datasources

[Response]
{'error': {'code': 'Unauthorized', 'message': 'Not authorized to access account'}}

[Credentials]
{
    "applicationId": "c89381ee-b8ad-4f60-a230-2e083061dc83",
    "objectId": "ab58b655-1a8f-44c5-9ae3-4bc4dfd2c99d",
    "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47"
}

I've assigned Collection admins role to the SP, and re-ran the command, the command was succeeded.

Idea for modification

From the above, it seems to need to assign Collection admins role to identity executing Azure Purview CLI commands. If true, I think the "Authorization" section in README.md should be modified.
https://github.com/tayganr/purviewcli#authorization

Tested version

  • Python - 3.8.12
  • purviewcli - 0.1.34

Resource not found error adding AzureBlob data source

This is probably user error, but I'm attempting to add an Azure Storage Account. I can do it through the portal successfully, but cannot make it work with the following JSON.

{
"id": "datasources/AzureStorage",
"kind": "AzureStorage",
"name": "AzureStorage",
"properties": {
"collection": null,
"endpoint": "https://armitagencypurview.blob.core.windows.net/",
"location": "westeurope",
"parentCollection": null,
"resourceGroup": "purview-resources",
"resourceName": "armitagencypurview",
"subscriptionId": "57c28f9c-f58a-47a6-bb0b-bfc921735b62"
}
}

I'm getting knocked back with Resource not found.

Any help would be greatly appreciated thank you.

!pv relationship creation failed with errorCode ATLAS-400-00-07D

Relationship creation is not working. Followed the sample code in relationship.ipynb:

typeName = 'process_dataset_outputs'
end1Guid = '2ab5525b-115a-4d82-93ea-63c33778020e'
end1Type = 'azure_datalake_gen2_path'
end2Guid = '9eb55cd7-911b-43b6-8fc6-bdf57c3e7d2a'
end2Type = 'adf_copy_activity'
!pv relationship create --typeName {typeName} --end1Guid {end1Guid} --end1Type {end1Type} --end2Guid {end2Guid} --end2Type {end2Type}

getting the following error:
{
"errorCode": "ATLAS-400-00-07D",
"errorMessage": "Relationship end is invalid. Expected Process but is NULL",
"requestId": "b6210a2e-1a99-48b1-a80a-de18e6de413f"
}

RequestInvalid error for pv entity deleteBulk

When I run the following command: !pv entity deleteBulk --guid="" --guid=""

I am getting the below error:

{
    "errorCode": "RequestInvalid",
    "errorMessage": "Request is not recognized. Please verify the HTTP method, header or URL",
    "requestId": "xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Note that !pv entity delete --guid="" is working correctly

APIVersionQueryParameterMissing error when running pv search query

I am getting the following error when I run the pv search query --keywords "" command:
{
"errorCode": "APIVersionQueryParameterMissing",
"errorMessage": "Please specify the query parameter api-version as one of the values in set [2021-09-01, 2021-05-01-preview] then retry.",
"requestId": "
----********"
}

Other pv commands are working as expected. Any help would be greatly appreciated.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.