tayganr / purviewcli Goto Github PK
View Code? Open in Web Editor NEWMicrosoft Purview CLI
Home Page: https://aka.ms/purviewcli
License: MIT License
Microsoft Purview CLI
Home Page: https://aka.ms/purviewcli
License: MIT License
It would be syntactically simpler if we could use --purviewName
as an alternative to the mandatory PURVIEW_NAME environment variable.
(I'm sure the a technicalities why it's an env variable, but from user perspective it's not obvious.)
Payload bug for calling POST atlas/v2/entity/guid/{guid}/businessmetadata
The cli tool will call the service API with the following wrong payload
{
"entity": {
"businessAttributes": {
"businessMetadataGroupExample": {
"businessMetadataAttributeExample1": [
"1"
],
"businessMetadataAttributeExample2": "5"
}
}
}
}
while the correct payload should be
{
"businessMetadataGroupExample": {
"businessMetadataAttributeExample1": [
"1"
],
"businessMetadataAttributeExample2": "5"
}
}
More details for this API https://learn.microsoft.com/en-us/rest/api/purview/datamapdataplane/entity/add-or-update-business-metadata?view=rest-purview-datamapdataplane-2023-09-01&tabs=HTTP
Customers reported that they have used the correct payload however getting the error.
The samples/sources.ipynb file is outdated:
scan
endpointregisterSource
should be createSource
Hi,
I am not able to scan an asset using purviewCLI, while other commands (like glossary, search, insight etc) are working fine.
!pv scan runScan --dataSourceName "AzureBlob-8hn" --scanName "Scan-Y95"
Traceback (most recent call last):
File "/anaconda/envs/azureml_py36/bin/pv", line 8, in
sys.exit(main())
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/cli/cli.py", line 75, in main
data = funcObj(command_args)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/endpoint.py", line 36, in wrapper
data = get_data(http_dict)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/endpoint.py", line 17, in get_data
data = client.http_get(http_dict['app'], http_dict['method'], http_dict['endpoint'], http_dict['params'], http_dict['payload'])
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/client.py", line 87, in http_get
elif response.headers['Content-Type'] == 'text/csv; charset=UTF-8':
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/structures.py", line 54, in getitem
return self._store[key.lower()][1]
KeyError: 'content-type'
I've tried pv scan createSource
command like the following, the command has succeeded and the source has been created. But the resourceName has became c
instead of cosdb-example
.
!pv scan createSource \
--datasource "cosdb-example" \
--kind "AzureCosmosDb" \
--accountUri "https://cosdb-example.documents.azure.com:443/" \
--subscriptionId "xxx" \
--resourceGroup "rg-hinakaza-private" \
--location "westus2" \
--resourceName "cosdb-example" \
--parentCollection "ServiceEndpoint"
{
"id": "datasources/cosdb-example",
"kind": "AzureCosmosDb",
"name": "cosdb-example",
"properties": {
"accountUri": "https://cosdb-example.documents.azure.com:443/",
"createdAt": "2021-05-06T15:11:47.903815Z",
"lastModifiedAt": "2021-05-06T15:13:51.304859Z",
"location": "westus2",
"parentCollection": {
"referenceName": "ServiceEndpoint",
"type": "DataSourceReference"
},
"resourceGroup": "rg-hinakaza-private",
"resourceName": "c",
"subscriptionId": "xxx"
}
}
I'm wondering if [0]
of "resourceName": args['--resourceName'][0]
in the follwoing function is the reason. How about removing the [0]
?
purviewcli/purviewcli/client/_scan.py
Lines 108 to 132 in 711fa8c
Hi,
I try to use createTermsImport to import a csv with hierarchy
When I use --includeTermHierarchy parameter, I get the following error
With --includeTermHierarchy True:
Without:
When I didn't use --includeTermHierarchy parameter, it ask me to use --includeTermHierarchy parameter
Am I using the correct way to use --includeTermHierarchy parameter?
Thank you!!
Hello,
We try to use the example in terms.json to set template
"templateName": []
https://github.com/tayganr/purviewcli/blob/master/samples/json/glossary/terms.json
But it doesn't works out, when we check on Purview, the template still be default
Is there any way to chose template?
Thank you
Hi @tayganr ,
In https://github.com/tayganr/purviewcli/blob/master/samples/notebooks%20(plus)/scan%20history.ipynb, in the "Get Scan History" - there are two additional fields in the json response that are not in your header list : ingestionJobId & webScanResults. Because of those two (new?) fields, the result is shifted and for example the last two columns don't have header. I've added the new values in my own work, but maybe you should add those here?
headers = ["assetsClassified", "assetsDiscovered", "dataSourceType", "endTime", "error", "errorMessage", "id", "ingestionJobId", "parentId", "pipelineStartTime", "queuedTime", "resourceId", "runType", "scanLevelType", "scanRulesetType", "scanRulesetVersion", "startTime", "status","webScanResults", "source", "scanName"]
I am trying to get insights via azure purviewcli but getting this error while executing the commends to get insights:
Screen of Purview portal is also attached for your reference.
pv insight assetDistributionByDataSource
Traceback (most recent call last):
File "/anaconda/envs/azureml_py36/bin/pv", line 8, in
sys.exit(main())
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/cli/cli.py", line 61, in main
module = importlib.import_module('purviewcli.client._' + command)
File "/anaconda/envs/azureml_py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/purviewcli/client/_insight.py", line 1, in
from .client import get_data
ImportError: cannot import name 'get_data'
When I use
%env AZURE_CLIENT_ID=YOUR_CLIENT_ID
%env AZURE_TENANT_ID=YOUR_TENANT_ID
%env AZURE_CLIENT_SECRET=YOUR_CLIENT_SECRET
and then run:
glossary = !pv glossary read
I get the following error, both in Databricks and in my local environment in ipynb:
['Traceback (most recent call last):', ' File "C:\Python38\lib\runpy.py", line 194, in _run_module_as_main', ' return _run_code(code, main_globals, None,', ' File "C:\Python38\lib\runpy.py", line 87, in _run_code', ' exec(code, run_globals)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\Scripts\pv.exe\main.py", line 7, in ', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\cli\cli.py", line 77, in main', ' data = funcObj(command_args)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\endpoint.py", line 42, in wrapper', ' data = get_data(http_dict)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\endpoint.py", line 18, in get_data', " client.set_token(http_dict['app'])", ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\purviewcli\client\client.py", line 39, in set_token', ' credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\default.py", line 121, in init', ' credentials.append(EnvironmentCredential(authority=authority, **kwargs))', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\environment.py", line 62, in init', ' self._credential = ClientSecretCredential(', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_credentials\client_secret.py", line 40, in init', ' super(ClientSecretCredential, self).init(', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_internal\msal_credentials.py", line 33, in init', ' validate_tenant_id(self._tenant_id)', ' File "c:\Users\rebremer\gitpublic\azure-functions-datalake-recovery-pitr\.venv\lib\site-packages\azure\identity\_internal\init.py", line 61, in validate_tenant_id', ' raise ValueError(', 'ValueError: Invalid tenant id provided. You can locate your tenant id by following the instructions here: https://docs.microsoft.com/partner-center/find-ids-and-domain-names']
I was expecing to find this functionality in (some preview version of) Azure CLI, like
az purview datasource create ...
Is there a particular reason for having this separate CLI?
Could (the functionality of) purviewcli be integrated into Azure CLI?
I'm getting an internal server error when I try to runScan.
Not sure if this is a msft problem.
Do you have any ideas?
I've also tried creating custom code to hit the run scan API as per docs but doesn't work:
https://docs.microsoft.com/en-us/rest/api/purview/scanningdataplane/scan-result/run-scan
if I try readScanHistory, this does work.
this also falls under the scan-result
API in msft docs. the main difference being GET
vs PUT
. very puzzled.
%env PURVIEW_NAME=your_PURVIEW_NAME
%env AZURE_CLIENT_ID=your_CLIENT_ID
%env AZURE_TENANT_ID=your_TENANT_ID
%env AZURE_CLIENT_SECRET=your_CLIENT_SECRET
I've provided the account information as shown above. Then, I tried to createTermsExport command as below
!pv glossary createTermsExport --glossaryGuid=your_glossary_guid --termGuid=your_term_guid
I got the response as below:
{
"reason": "OK",
"status_code": 200,
"url": "https://digital-center-purview-prd.purview.azure.com/catalog/api/atlas/v2/glossary/your_glossary_guid/terms/export?api-version=2021-05-01-preview&includeTermHierarchy=False"
}
However, the example you provided is something like the following:
{
"export": "/YOUR_FOLDER_PATH/export.csv",
"status_code": 200
}
Could you please help me to clarify the reason why the response is different from what we expect?
Thank you.
I've tested with a newly created Purview account on October 21, 2021. Firstly I've assigned Data curators role and Data source admins role to a Service Principal from the Azure portal, and ran pv scan readDatasources
for newly created Purview account with the SP, and got the following error message.
$ pv scan readDatasources
[Error]
Access to the requested resource is forbidden (HTTP status code 403).
[Resource]
[GET] https://purview-hinakaza-openhack-mdw.scan.purview.azure.com/datasources
[Response]
{'error': {'code': 'Unauthorized', 'message': 'Not authorized to access account'}}
[Credentials]
{
"applicationId": "c89381ee-b8ad-4f60-a230-2e083061dc83",
"objectId": "ab58b655-1a8f-44c5-9ae3-4bc4dfd2c99d",
"tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47"
}
I've assigned Collection admins role to the SP, and re-ran the command, the command was succeeded.
From the above, it seems to need to assign Collection admins role to identity executing Azure Purview CLI commands. If true, I think the "Authorization" section in README.md should be modified.
https://github.com/tayganr/purviewcli#authorization
It is more convenient to have a command pv scan createScan
to copy scans between Purview accounts.
This is probably user error, but I'm attempting to add an Azure Storage Account. I can do it through the portal successfully, but cannot make it work with the following JSON.
{
"id": "datasources/AzureStorage",
"kind": "AzureStorage",
"name": "AzureStorage",
"properties": {
"collection": null,
"endpoint": "https://armitagencypurview.blob.core.windows.net/",
"location": "westeurope",
"parentCollection": null,
"resourceGroup": "purview-resources",
"resourceName": "armitagencypurview",
"subscriptionId": "57c28f9c-f58a-47a6-bb0b-bfc921735b62"
}
}
I'm getting knocked back with Resource not found
.
Any help would be greatly appreciated thank you.
Relationship creation is not working. Followed the sample code in relationship.ipynb:
typeName = 'process_dataset_outputs'
end1Guid = '2ab5525b-115a-4d82-93ea-63c33778020e'
end1Type = 'azure_datalake_gen2_path'
end2Guid = '9eb55cd7-911b-43b6-8fc6-bdf57c3e7d2a'
end2Type = 'adf_copy_activity'
!pv relationship create --typeName {typeName} --end1Guid {end1Guid} --end1Type {end1Type} --end2Guid {end2Guid} --end2Type {end2Type}
getting the following error:
{
"errorCode": "ATLAS-400-00-07D",
"errorMessage": "Relationship end is invalid. Expected Process but is NULL",
"requestId": "b6210a2e-1a99-48b1-a80a-de18e6de413f"
}
When I run the following command: !pv entity deleteBulk --guid="" --guid=""
I am getting the below error:
{
"errorCode": "RequestInvalid",
"errorMessage": "Request is not recognized. Please verify the HTTP method, header or URL",
"requestId": "xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Note that !pv entity delete --guid=""
is working correctly
I am getting the following error when I run the pv search query --keywords "" command:
{
"errorCode": "APIVersionQueryParameterMissing",
"errorMessage": "Please specify the query parameter api-version as one of the values in set [2021-09-01, 2021-05-01-preview] then retry.",
"requestId": "----********"
}
Other pv commands are working as expected. Any help would be greatly appreciated.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.