databricks / terraform-provider-databricks Goto Github PK
View Code? Open in Web Editor NEWDatabricks Terraform Provider
Home Page: https://registry.terraform.io/providers/databricks/databricks/latest
License: Other
Databricks Terraform Provider
Home Page: https://registry.terraform.io/providers/databricks/databricks/latest
License: Other
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for jobs.
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for Instance Profiles.
The doc suggest using the output of a resource in the provider definition:
provider "databricks" {
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.demo_test_workspace.managed_resource_group_name
azure_region = azurerm_databricks_workspace.demo_test_workspace.location
workspace_name = azurerm_databricks_workspace.demo_test_workspace.name
resource_group = azurerm_databricks_workspace.demo_test_workspace.resource_group_name
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
}
}
The Terraform Kubernetes provider documentation warns against this. Presumably this would affect the Databricks provider too, although I have not encountered this issue.
This is a list of things to manipulate in the docs:
terraform apply
twice and using terraform outputsThe Terraform documentation for the Kubernetes provider states that should not be done:
IMPORTANT WARNING When using interpolation to pass credentials to the Kubernetes provider from other resources, these resources SHOULD NOT be created in the same apply operation where Kubernetes provider resources are also used. This will lead to intermittent and unpredictable errors which are hard to debug and diagnose. The root issue lies with the order in which Terraform itself evaluates the provider blocks vs. actual resources.
Is your feature request related to a problem? Please describe.
Currently unable to create interactive single-user clusters as the cluster resource doesn't allow seeting the single_user_name
property
Describe the solution you'd like
Adding the single_user_name
property to the cluster resource would solve this
Describe alternatives you've considered
None (other than falling back to CLI etc)
Additional context
This would support #63
Is your feature request related to a problem? Please describe.
It is not supported to configure the git integration of notebooks in terraform (at least, on Azure Databricks)
Describe the solution you'd like
Ability to define git integration for a notebook in the databricks_notebook
resource.
Describe alternatives you've considered
In Azure Databricks (not sure about other flavors), need to manually configure git repo in the notebook UI.
0.12.26
provider "databricks" {
host = var.databricks_host
token = var.databricks_api_token
}
resource "databricks_scim_group" "privileged-user-group" {
display_name = "Privileged user group"
}
resource "databricks_secret_scope" "privileged-scope" {
name = "privileged-secret-scope"
}
resource "databricks_secret_acl" "privileged-acl" {
principal = "Privileged user group"
permission = "READ"
scope = databricks_secret_scope.privileged-scope.name
}
resource "databricks_cluster" "standard_cluster" {
cluster_name = "standard-cluster"
spark_version = "6.4.x-scala2.11"
node_type_id = "Standard_DS13_v2"
autoscale {
min_workers = 1
max_workers = 3
}
library_whl {
path = "dbfs:/custom-whls/my_custom_whl.whl"
}
}
# Create high concurrency cluster with AAD credential passthrough enabled
resource "databricks_cluster" "high_concurrency_cluster" {
cluster_name = "high-concurrency-cluster"
spark_version = "6.4.x-scala2.11"
node_type_id = "Standard_DS13_v2"
autoscale {
min_workers = 1
max_workers = 3
}
spark_conf = {
"spark.databricks.cluster.profile": "serverless"
"spark.databricks.repl.allowedLanguages": "python, sql"
"spark.databricks.passthrough.enabled": true
"spark.databricks.pyspark.enableProcessIsolation": true
}
}
resource "databricks_notebook" "notebook" {
content = base64encode("# Welcome to your Jupyter notebook")
path = "/mynotebook"
overwrite = false
mkdirs = true
language = "PYTHON"
format = "SOURCE"
}
https://gist.github.com/masoncusack/3806347b0ef5ed873ac77689c63a4ab6
It should be recognised that a user group has been destroyed.
TF seems to hold the user group as part of the present state even though it's been deleted, causing future plan/applies to fail with error "Error: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"User or Group Privileged user group does not exist."}"
If we look in tfstate, the associated acl resource seems to still exist. Perhaps this wasn't successfully deleted by tf destroy
?
"resources": [
{
"mode": "managed",
"type": "databricks_secret_acl",
"name": "privileged-acl",
"provider": "provider.databricks",
"instances": [
{
"schema_version": 0,
"attributes": {
"id": "privileged-secret-scope|||Privileged user group",
"permission": "READ",
"principal": "Privileged user group",
"scope": "privileged-secret-scope"
}
}
]
},
terraform apply
(with associated databricks_scim_group
, secret_scope
, and secret_acl
resources in main.tf)terraform destroy
There is a default function setup for username and password, but as the attributes are required, you need to specify a valid string invalidating the environment variable.
"basic_auth": &schema.Schema{
Type: schema.TypeList,
Optional: true,
MaxItems: 1,
Elem: &schema.Resource{
Schema: map[string]*schema.Schema{
"username": &schema.Schema{
Type: schema.TypeString,
Required: true,
DefaultFunc: schema.EnvDefaultFunc("DATABRICKS_USERNAME", nil),
},
"password": &schema.Schema{
Type: schema.TypeString,
Sensitive: true,
Required: true,
DefaultFunc: schema.EnvDefaultFunc("DATABRICKS_PASSWORD", nil),
},
},
},
ConflictsWith: []string{"token"},
},
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for Azure Mounts (Blob (both SAS key and Access Key), ADLS gen 1, ADLS gen 2).
Hi folks,
If you create an azure blob mount with tf, delete it manually (via databricks notebook), then re-run terraform plan, a file not found error is thrown.
terraform -v == 0.12.19
Please list the resources as a list, for example:
variable group_name {}
variable "client_id" {
type = string
}
variable "client_secret" {
type = string
}
variable "tenant_id" {
type = string
}
variable "subscription_id" {
type = string
}
variable "dbws_name" {
type = string
}
provider "azurerm" {
version = "~> 2.3"
features {}
subscription_id = var.subscription_id
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
}
provider "random" {
version = "~> 2.2"
}
resource "random_string" "name_prefix" {
special = false
upper = false
length = 6
}
resource "azurerm_resource_group" "example" {
name = var.group_name
location = "eastus" # note must be lower without spaces not verbose style
}
resource "azurerm_databricks_workspace" "example" {
name = var.dbws_name
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
sku = "standard"
}
resource "azurerm_storage_account" "account" {
name = "${random_string.name_prefix.result}blob"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
}
resource "azurerm_storage_container" "example" {
name = "dev"
storage_account_name = azurerm_storage_account.account.name
container_access_type = "private"
}
resource "databricks_secret_scope" "terraform" {
name = "terraform"
initial_manage_principal = "users"
}
resource "databricks_secret" "blob_account_key" {
key = "blob_account_key"
string_value = azurerm_storage_account.account.primary_access_key
scope = databricks_secret_scope.terraform.name
}
provider "databricks" {
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.example.managed_resource_group_name
azure_region = azurerm_databricks_workspace.example.location
workspace_name = azurerm_databricks_workspace.example.name
resource_group = azurerm_databricks_workspace.example.resource_group_name
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
}
}
resource "databricks_cluster" "cluster" {
cluster_name = "cluster1"
num_workers = 1
spark_version = "6.4.x-scala2.11"
node_type_id = "Standard_D3_v2"
}
resource "databricks_azure_blob_mount" "mount" {
cluster_id = databricks_cluster.cluster.id
container_name = "dev"
storage_account_name = azurerm_storage_account.account.name
mount_name = "dev"
auth_type = "ACCESS_KEY"
token_secret_scope = databricks_secret_scope.terraform.name
token_secret_key = databricks_secret.blob_account_key.key
}
Terraform plan should list the deleted mount as a resource to add.
Terraform plan throws a file not found error and terminates.
Please list the steps required to reproduce the issue, for example:
Terraform v0.12.24
+ provider.azurerm v1.44.0
+ provider.databricks v0.1.0
Please list the resources as a list, for example:
resource "databricks_notebook" "notebook" {
content = filebase64("${path.module}/nb/notebook1.scala")
path = "/Shared/Notebooks/notebook1.scala"
overwrite = false
mkdirs = true
format = "SOURCE"
language = "SCALA"
}
On first run, the resource is seen as an add, correctly, and deploys.
On subsequent tf apply
it sees the content has having changed:
# databricks_notebook.notebook must be replaced
-/+ resource "databricks_notebook" "notebook" {
~ content = "3750311991" -> "2327128740" # forces replacement
format = "SOURCE"
~ id = "/Shared/Notebooks/notebook1.scala" -> (known after apply)
language = "SCALA"
mkdirs = true
~ object_id = 4081355166030977 -> (known after apply)
~ object_type = "NOTEBOOK" -> (known after apply)
overwrite = false
path = "/Shared/Notebooks/notebook1.scala"
}
This also exacerbates the issue #41 when deploying multiple files as it's trying to re-create all of them every time.
The content should be seen as the same and no-op
The content is seen as different and a delete/create is needed
Use the above hcl to deploy a notebook, then run tf apply
again.
This can break the provider during a refresh operation if the library has messages and the provider is expecting a single string.
Terraform v0.12.26
Please list the resources as a list, for example:
resource "databricks_azure_adls_gen2_mount" "mount_wibble" {
cluster_id = databricks_cluster.cluster.id
container_name = "wibble"
storage_account_name = "storeageaccount"
directory = "wibble"
mount_name = "wibbledir"
tenant_id = "<tentant_id>"
client_secret_scope = databricks_secret_scope.terraform.name
client_secret_key = databricks_secret.client_secret.key
initialize_file_system = true
}
Error: Response from server (403) <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 403 Invalid access token.</title>
</head>
<body><h2>HTTP ERROR 403</h2>
<p>Problem accessing /api/1.2/commands/status. Reason:
<pre> Invalid access token.</pre></p>
</body>
</html>
N/A
The tf plan
should throw an error saying that the directory does not match the validation logic of starting with a /
No validation error, instead get an error about invalid access token, looks like the db.fs.mount
being called behind scenes does not have a timeout and because it is looking up an invalid uri everything just times out.
Please list the steps required to reproduce the issue, for example:
terraform plan
terraform apply
➜ terraform -v
Terraform v0.12.24
resource "databricks_job" "transform" {
existing_cluster_id = databricks_cluster.cluster.id
notebook_path = databricks_notebook.transform.path
name = "transform"
schedule {
quartz_cron_expression = "0 2 * * *"
timezone_id = "UTC"
}
}
2020-04-30T18:30:30.347+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/04/30 18:30:30 {"Method":"POST","URI":"https://eastus2.azuredatabricks.net/api/2.0/jobs/create","Payload":{"existing_cluster_id":"0430-125417-toed833","new_cluster":{},"notebook_task":{"notebook_path":"/workspace/sample/transform.scala"},"name":"transform","schedule":{"quartz_cron_expression":"0 2 * * *","timezone_id":"UTC"}}}
2020/04/30 18:30:30 [DEBUG] databricks_job.transform: apply errored, but we're indicating that via the Error pointer rather than returning it: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"Missing required field: settings.cluster_spec.new_cluster.size"}
Job should be created without errors.
An error message indicating that a parameter is missing.
I think you need to leave out the new_cluster
parameter from the HTTP request when existing_cluster_id
is not null to avoid server-side validation of that block.
terraform apply
Reuse an existing cluster when creating a job.
➜ tf -v
Terraform v0.12.24
+ provider.azuread v0.8.0
+ provider.azurerm v2.6.0
+ provider.databricks (unversioned)
+ provider.http v1.2.0
+ provider.local v1.4.0
+ provider.null v2.1.2
+ provider.random v2.2.1
Provider is a custom build from commit 6bce373
I later updated my version to da0a178 but it kept crashing.
➜ go version
go version go1.14.2 darwin/amd64
See https://gist.github.com/sdebruyn/f97beefc8670f643a2ec3e8894ebe81f
Terraform creates all listed resources
Terraform crashed
terraform apply -auto-approve
terraform plan
It happens every time with my current state. I did git clean -dfx
and ran terraform init
again, but terraform kept crashing.
Terraform stops crashing when I remove the last resource in my config (the notebook).
I tried with other notebooks and they worked fine, except for one. That one also uses base64encode(template file(.......
for the content.
During configuration, there is validation to check if the host or token is empty and if one of them is, the provider will try to read the databricks config file.
if config.Host == "" || config.Token == "" {
if err := tryDatabricksCliConfigFile(d, &config); err != nil {
return nil, fmt.Errorf("failed to get credentials from config file; error msg: %w", err)
}
}
The problem here is that for MWS you need to setup host and basic_auth, but if you don't provide token it will try to read your config file leading to two possible scenarios:
You also can't use a placeholder token to avoid the usage of the config file, because token conflicts with basich_auth. Meaning that you need to create a config file or add a new profile to it, just to put the correct host that you specified in the .tf file initially.
Workaround:
provider "databricks" {
host = "https://accounts.cloud.databricks.com"
profile = "ACCOUNT"
basic_auth {
username = "username"
password = "password"
}
}
[DEFAULT]
...
[ACCOUNT]
host = https://accounts.cloud.databricks.com
token = placeholder
Creating a workspace with a secret scope, cluster or possibly other references and then manually deleting the workspace after creating, results in an error on terraform plan/apply.
0.12.24
Please list the resources as a list, for example:
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.
Error: parse :///api/2.0/secrets/scopes/list?: missing protocol scheme
Error: parse :///api/2.0/clusters/get?cluster_id=0610-100720-loss540: missing protocol scheme
Deleting an existing workspace, previously created by terrafrom, waits for a new workspace be created before querying for secret scopes, clusters etc.
Deleting an existing workspace, previously created by terrafrom, results in an error on terraform plan/apply.
Please list the steps required to reproduce the issue, for example:
terraform plan
Error output of a workspace with cluster, secrets and ADAL Gen2 mount, that was manually deleted:
variable "user" {
type = string
}
variable "password" {
type = string
}
variable "client_id" {
type = string
}
variable "client_secret" {
type = string
}
variable "tenant_id" {
type = string
}
variable "subscription_id" {
type = string
}
provider "azurerm" {
version = "~> 2.10"
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
features {}
skip_provider_registration = true
}
resource "azurerm_resource_group" "db" {
name = "db-labs-resources"
location = "West Europe"
}
resource "azurerm_databricks_workspace" "module" {
name = "db-labs-worspace"
resource_group_name = azurerm_resource_group.db.name
location = azurerm_resource_group.db.location
sku = "premium"
}
data "azurerm_client_config" "current" {}
provider "databricks" {
version = "~> 0.1"
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.module.managed_resource_group_name
azure_region = azurerm_databricks_workspace.module.location
workspace_name = azurerm_databricks_workspace.module.name
resource_group = azurerm_databricks_workspace.module.resource_group_name
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
}
resource "databricks_secret_scope" "sandbox_storage" {
name = "sandbox-storage"
initial_manage_principal = "users"
}
resource "databricks_secret" "secret" {
key = "secret"
string_value = "I am a secret"
scope = databricks_secret_scope.sandbox_storage.name
}
Is your feature request related to a problem? Please describe.
Databricks UI provides some validation, so provider has to do it as well. E.g.
Error: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"At least one EBS volume must be attached for clusters created with node type m4.xlarge."}
Describe the solution you'd like
Cluster config validation before sending request
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for Dbfs Files.
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for DBFS File Sync
Is your feature request related to a problem? Please describe.
I would like to be able to create cluster policies and cluster policy permissions. Please read more about it here in terms of the features they enable: https://docs.databricks.com/dev-tools/api/latest/policies.html#cluster-policy-permissions-api, https://docs.databricks.com/administration-guide/clusters/policies.html
Describe the solution you'd like
This requires:
Describe alternatives you've considered
Other alternatives could be that the cluster policy and the cluster policy permissions can be other objects to make it easier to manage but, the permissions objects themselves are not really reusable or create able. So from a crud stand point it does not make much sense.
Additional context
Please read these docs for more information: https://docs.databricks.com/dev-tools/api/latest/policies.html#cluster-policy-permissions-api, https://docs.databricks.com/administration-guide/clusters/policies.html
$ terraform -v
Terraform v0.12.6
The issue is not present with databricks_secret_scope
alone, but is required for databricks_secret
provider "databricks" {
host = "https://[redacted].azuredatabricks.net"
token = "[redacted]"
}
resource "databricks_secret_scope" "my-scope" {
name = "terraform-demo-scope"
initial_manage_principal = "users"
}
resource "databricks_secret" "my_secret" {
key = "test-secret-1"
string_value = "hello world 123"
scope = "${databricks_secret_scope.my-scope.name}"
}
Terraform should be able to plan when the secret has been deleted manually. The plan should notice the deletion and re-create the secret in the secret-scope
Error message: Error: status 404: err Response from server {"error_code":"RESOURCE_DOES_NOT_EXIST","message":"Scope terraform-demo-scope does not exist!"}
terraform apply
databricks secrets list --scope my-scope
- use cli and see the secret existsdatabricks secrets delete-scope --scope terraform-demo-scope
- delete the secret-scopeterraform plan
- error shown➜ terraform -v
Terraform v0.12.24
+ provider.azuread v0.8.0
+ provider.azurerm v2.6.0
+ provider.databricks v0.1.0
+ provider.http v1.2.0
+ provider.local v1.4.0
+ provider.null v2.1.2
+ provider.random v2.2.1
Same one as in #21
2020-05-04T10:45:06.969+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/05/04 10:45:06 {"Method":"GET","URI":"https://eastus2.azuredatabricks.net/api/2.0/clusters/list-node-types?"}
2020/05/04 10:45:07 [ERROR] eval: *terraform.EvalConfigProvider, err: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"Delegate unexpected exception during listing node types: com.databricks.backend.manager.util.UnknownWorkerEnvironmentException: Unknown worker environment WorkerEnvId(workerenv-3375316063940170)"}
Error: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"Delegate unexpected exception during listing node types: com.databricks.backend.manager.util.UnknownWorkerEnvironmentException: Unknown worker environment WorkerEnvId(workerenv-1918878560143470)"}
on databricks.tf line 10, in provider "databricks":
10: provider "databricks" {
After creating the workspace, we should be able to create the cluster during the same apply run.
When you create a workspace and terraform goes on to immediately create a cluster, you get the mentioned exception. It works when you apply a second time after a few seconds.
terraform apply
This third party databricks provider has the same issue
In the databricks_cluster
resource, it'd be nice to be able to enable Azure AD credentials passthrough.
Hi,
A bug we've hit that I'd like to pickup and PR a fix, looks like a super easy fix. We'd use integration tests to be added in #37 to validate it behaves correctly.
When re-running the adls_gen2_mount
resource it will always detect a change due to an additional slash being detected.
Think this is likely a 1 or 2 char change to fix then adding the tests to validate.
# tf -v
Terraform v0.12.16
+ provider.azuread v0.8.0
+ provider.azurerm v2.8.0
+ provider.databricks v0.1.0
+ provider.random v2.2.1
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = databricks_cluster.cluster.id
container_name = "dev" #todo: replace with env...
storage_account_name = azurerm_storage_account.account.name
directory = "/dir"
mount_name = "localdir"
tenant_id = data.azuread_client_config.current.tenant_id
client_id = azuread_application.datalake.application_id
client_secret_scope = databricks_secret_scope.terraform.name
client_secret_key = databricks_secret.client_secret.key
}
> TF_LOG=debug tf plan -var-file vars.tfvars 2>&1 >/dev/null | grep "plugin.terraform-provider-databricks_v0.1.0"
2020-05-05T15:10:42.867Z [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/05/05 15:10:42 {"Method":"POST","URI":"https://eastus.azuredatabricks.net/api/1.2/commands/execute","Payload":{"language":"python","clusterId":"0504-155102-pram660","contextId":"2178625329361652192","command":"\ntry:\n configs = {\"fs.azure.account.auth.type\": \"OAuth\",\n \"fs.azure.account.oauth.provider.type\": \"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider\",\n \"fs.azure.account.oauth2.client.id\": \"REDACTED",\n \"fs.azure.account.oauth2.client.secret\": dbutils.secrets.get(scope = \"terraform\", key = \"datalake_sp_secret\"),\n \"fs.azure.account.oauth2.client.endpoint\": \"https://login.microsoftonline.com/REDACTED/oauth2/token\"}\n dbutils.fs.mount(\n source = \"abfss://[email protected]//dir\",\n mount_point = \"/mnt/localdir\",\n extra_configs = configs)\nexcept Exception as e:\n dbutils.fs.unmount(\"/mnt/localdir\")\n raise e\ndbutils.notebook.
No diff should be found and plan should be empty.
After first apply
all plan
/apply
operations detect a diff
and recreate the mount
.
This is due to the added /
character
Please list the steps required to reproduce the issue, for example:
terraform apply
terraform plan
directory
fieldIs your feature request related to a problem? Please describe.
I work across a bunch of repos and each has their own requirements for tooling (and tooling versions). As part of working on the issues that @lawrencegripper recently raised, we will create a VS Code Devcontainer definition. This allows us to capture and share the requirements in a container definition and use that for any work on the project.
More information here: https://code.visualstudio.com/docs/remote/containers
Describe the solution you'd like
Contribute our .devcontainer
folder with the definition of the container to use when working with VS Code Devcontainers for this repo so that others can use it if they choose that workflow.
➜ terraform -v
Terraform v0.12.25
+ provider.azuread v0.9.0
+ provider.azurerm v2.11.0
+ provider.databricks (unversioned)
+ provider.http v1.2.0
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.time v0.5.0
Current master branch
https://github.com/datarootsio/terraform-module-azure-datalake/runs/709963745?check_suite_focus=true
2020-05-26T16:40:31.5532525Z command.go:172: Error: Response from server (403) <html>
2020-05-26T16:40:31.5532687Z command.go:172: <head>
2020-05-26T16:40:31.5533058Z command.go:172: <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
2020-05-26T16:40:31.5533244Z command.go:172: <title>Error 403 Invalid access token.</title>
2020-05-26T16:40:31.5533404Z command.go:172: </head>
2020-05-26T16:40:31.5533565Z command.go:172: <body><h2>HTTP ERROR 403</h2>
2020-05-26T16:40:31.5533748Z command.go:172: <p>Problem accessing /api/2.0/secrets/put. Reason:
2020-05-26T16:40:31.5533922Z command.go:172: <pre> Invalid access token.</pre></p>
2020-05-26T16:40:31.5534079Z command.go:172: </body>
2020-05-26T16:40:31.5534228Z command.go:172: </html>
2020-05-26T16:40:31.5534573Z command.go:172: : invalid character '<' looking for beginning of value
2020-05-26T16:40:31.5534739Z command.go:172:
2020-05-26T16:40:31.5534912Z command.go:172: on databricks.tf line 73, in resource "databricks_secret" "cmdb_master":
2020-05-26T16:40:31.5535168Z command.go:172: 73: resource "databricks_secret" "cmdb_master" {
Create a databricks_secret
See error output above
Please list the steps required to reproduce the issue, for example:
At first sight I thought it was an issue with databricks_token
but it does not seem to be directly related to that resource. It seems to be an issue with the token that this provider is using underneath to create the resources as this seems to happen with a databricks_secret
that isn't using any access tokens explictly.
Hi there,
variable "client_id" {}
variable "client_secret" {}
variable "tenant_id" {}
variable "subscription_id" {}
variable "resource_group" {}
variable "managed_resource_group_name" {}
provider "azurerm" {
version = ">= 2.3.0"
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
features {}
}
resource "azurerm_databricks_workspace" "demo" {
location = "westeurope"
name = "databricks-demo-workspace"
resource_group_name = var.resource_group
managed_resource_group_name = var.managed_resource_group_name
sku = "standard"
}
resource "databricks_cluster" "demo" {
autoscale {
min_workers = 2
max_workers = 8
}
cluster_name = "databricks-demo-cluster"
spark_version = "6.4.x-scala2.11"
node_type_id = "Standard_DS3_v2"
autotermination_minutes = 30
}
provider "databricks" {
version = ">= 0.1"
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.demo.managed_resource_group_name
azure_region = azurerm_databricks_workspace.demo.location
workspace_name = azurerm_databricks_workspace.demo.name
resource_group = azurerm_databricks_workspace.demo.resource_group_name
client_id = var.client_id
client_secret = var.client_secret
tenant_id = var.tenant_id
subscription_id = var.subscription_id
}
}
https://gist.github.com/christophecremon/f839ac7b0f342d277f0ceaba2fbab165
Terraform generates an execution plan.
Terraform will perform the following actions, upon apply command has been executed:
An error is generated:
Error 404 The workspace with resource ID /subscriptions/<REDACTED_FOR_GITHUB>/resourceGroups/databricks-rg/providers/Microsoft.Databricks/workspaces/databricks-demo-workspace could not be found.
Terraform actually creates the Databricks workspace, with a Premium Pricing Tier, even if terraform apply command has not been executed.
Please list the steps required to reproduce the issue, for example:
terraform plan
I had some trouble finding a valid spark_version
string for a cluster resource.
It'd be nice if the generic format of the string, or a selection of valid options to use from which this could be inferred, were provided under the usage example in the cluster resource documentation.
I guess the choice of documenting a generic format or specific working examples should depend on whether all spark versions that a user can select in the Databricks UI will always be supported by the TF resource. If this is the case, users can just translate the details there into a valid string.
Hi there,
$ terraform version
Terraform v0.12.24
+ provider.azuread v0.8.0
+ provider.azurerm v2.5.0
+ provider.databricks v0.1.0
Please list the resources as a list, for example:
resource "databricks_notebook" "tamers-databricks-nb-handson-unsupervised-02" {
content = filebase64("notebooks/handson-unsupervised-02-end-to-end-machine-learning-project.py")
language = "PYTHON"
path = "/Shared/xxxx/handson-unsupervised-02-end-to-end-machine-learning-project.py"
overwrite = false
mkdirs = true
format = "SOURCE"
}
https://gist.github.com/mikemowgli/42c32bd11e21d926cadc7b05788d49d7
No error nor warning when applying
warning or errors
TF_LOG=DEBUG terraform apply -target databricks_notebook.tamers-databricks-nb-handson-unsupervised-02
In an Azure Databricks workspace, a terraform apply on a specific notebook (using -target
option) yields the log in the gist link: only a warning, and the content of the notebook not updated but left as-is.
However, when applying all my terraform plan, I get the same debug log, with only this difference at the very end:
databricks_notebook.tamers-databricks-nb-handson-unsupervised-02: Creation complete after 3s [id=/Shared/xxxx/handson-unsupervised-02-end-to-end-machine-learning-project.py]
Error: Response from server (429)
2020-05-13T12:58:48.240+0200 [DEBUG] plugin: plugin process exited: path=/home/mvdborne/.terraform.d/plugins/linux_amd64/terraform-provider-databricks_v0.1.0 pid=8267
2020-05-13T12:58:48.240+0200 [DEBUG] plugin: plugin exited
$ echo $?
1
I'm impacted by the notebook content issue, so this one might be a consequence of issue 42.
Is your feature request related to a problem? Please describe.
It is not supported creating a secret scope backed by Azure Key Vault at the moment
Describe the solution you'd like
An extra setting in a databricks_secret_scope
resource to link with an Azure Key Vault
Describe alternatives you've considered
Only alternative so far is using a databricks backed secret scope.
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for instance pools.
➜ terraform -v
Terraform v0.12.24
terraform {
required_version = "~> 0.12"
}
provider "azurerm" {
version = "~> 2.6.0"
features {}
}
provider "azuread" {
version = "~> 0.8.0"
}
data "azurerm_client_config" "current" {
}
resource "azuread_application" "aadapp" {
name = "app"
required_resource_access {
resource_app_id = "e406a681-f3d4-42a8-90b6-c2b029497af1"
resource_access {
id = "03e0da56-190b-40ad-a80c-ea378c433f7f"
type = "Scope"
}
}
required_resource_access {
resource_app_id = "00000003-0000-0000-c000-000000000000"
resource_access {
id = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
type = "Scope"
}
}
}
resource "random_password" "aadapp_secret" {
length = 32
# special = false - this fixes the issue...
}
resource "azuread_service_principal" "sp" {
application_id = azuread_application.aadapp.application_id
}
resource "azuread_service_principal_password" "sppw" {
service_principal_id = azuread_service_principal.sp.id
value = random_password.aadapp_secret.result
end_date = "2030-01-01T00:00:00Z"
}
resource "azurerm_resource_group" "rg" {
name = "rg"
location = var.region
}
resource "azurerm_role_assignment" "sprg" {
scope = azurerm_resource_group.rg.id
role_definition_name = "Owner"
principal_id = azuread_service_principal.sp.object_id
}
resource "azurerm_databricks_workspace" "dbks" {
name = "dbks"
resource_group_name = azurerm_resource_group.rg.name
managed_resource_group_name = "rgdbks"
location = var.region
sku = "standard"
}
provider "databricks" {
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.dbks.managed_resource_group_name
azure_region = azurerm_databricks_workspace.dbks.location
workspace_name = azurerm_databricks_workspace.dbks.name
resource_group = azurerm_databricks_workspace.dbks.resource_group_name
client_id = azuread_application.aadapp.application_id
client_secret = random_password.aadapp_secret.result
tenant_id = data.azurerm_client_config.current.tenant_id
subscription_id = data.azurerm_client_config.current.subscription_id
}
}
resource "databricks_cluster" "cluster" {
spark_version = var.databricks_cluster_version
cluster_name = "cluster"
node_type_id = var.databricks_cluster_node_type
autotermination_minutes = 30
autoscale {
min_workers = 2
max_workers = 4
}
}
2020-04-30T13:44:32.563+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/04/30 13:44:32 Creating db client via azure auth!
2020-04-30T13:44:32.563+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/04/30 13:44:32 Running Azure Auth
2020-04-30T13:44:32.563+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/04/30 13:44:32 [DEBUG] Creating Azure Databricks management OAuth token.
2020-04-30T13:44:32.563+0200 [DEBUG] plugin.terraform-provider-databricks_v0.1.0: 2020/04/30 13:44:32 {"Method":"POST","URI":"https://login.microsoftonline.com/TENANTID/oauth2/token","Payload":"grant_type=client_credentials\u0026client_id=0123456-1234-1234-1234-52ef7bbab4af\u0026client_secret=NcntUf_9vBruvi5v8l}$GWolISz+kyXy\u0026resource=https://management.core.windows.net/"}
2020/04/30 13:44:33 [ERROR] <root>: eval: *terraform.EvalConfigProvider, err: status 401: err Response from server {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: 3386fd50-68af-4678-80e5-596f419e0d00\r\nCorrelation ID: 3ef4860f-7c98-41fb-8f63-ae37e9091033\r\nTimestamp: 2020-04-30 11:44:33Z","error_codes":[7000215],"timestamp":"2020-04-30 11:44:33Z","trace_id":"3386fd50-68af-4678-80e5-596f419e0d00","correlation_id":"3ef4860f-7c98-41fb-8f63-ae37e9091033","error_uri":"https://login.microsoftonline.com/error?code=7000215"}
The request to create an access token should work without issues.
The request fails because the client secret with the special characters isn't submitted correctly.
terraform apply
0.12.26
Please list the resources as a list, for example:
resource "databricks_cluster" "cluster" {
num_workers = 1
spark_version = "6.4.x-scala2.11"
node_type_id = "Standard_D3_v2"
autotermination_minutes = 15
}
resource "databricks_secret_scope" "terraform" {
name = "terraform${databricks_cluster.cluster.cluster_id}"
initial_manage_principal = "users"
}
resource "databricks_secret" "client_secret" {
key = "datalake_sp_secret"
string_value = "%[2]s"
scope = databricks_secret_scope.terraform.name
}
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = databricks_cluster.cluster.id
container_name = "dev" # Created by prereqs.tf
storage_account_name = "%[9]s"
directory = ""
mount_name = "localdir${databricks_cluster.cluster.cluster_id}"
tenant_id = "%[3]s"
client_id = "%[1]s"
client_secret_scope = databricks_secret_scope.terraform.name
client_secret_key = databricks_secret.client_secret.key
initialize_file_system = true
}
N/A
When the cluster that originally created the mount has been deleted inside databricks, the tf plan
should identify this. It should then identify that the mount most likely needs to be re-created as there is a high likelihood that the cluster is also in the same terraform configuration.
The provider throws an error during tf plan
and renders the state file unusable unless you manually remove the mount from state.
Error: status 400: err Response from server {"error_code":"INVALID_PARAMETER_VALUE","message":"Cluster <some cluster id> does not exist"}
Please list the steps required to reproduce the issue, for example:
terraform apply
terraform plan
>>> Error occurs here during refresh of stateIs your feature request related to a problem? Please describe.
Support multiple workspaces api to be able to provision Databricks workspaces via terraform.
Describe the solution you'd like
Creation of new resources:
Additional context
This is a brand new public preview api for the AWS cloud service provider
Is your feature request related to a problem? Please describe.
Currently not a problem, but it's advised to start using the new unique URLs for each Databricks workspace as documented in https://docs.microsoft.com/en-us/azure/databricks/release-notes/product/2020/april#unique-urls-for-each-azure-databricks-workspace
Describe the solution you'd like
Replace the current code that uses https://.azuredatabricks.net/
Additional context
The current hostnames have not been deprecated (yet) so we have still time.
The method tries to check if the libraries have a non-empty string for its name but for the Pypi, Maven and Cran libraries are pointer to structs, so when it tries to check the len like the following, it can lead to an exception.
if len(library.Pypi.Package) > 0
Hi there,
When we are trying to create a Databricks job with the new_cluster field fulfilled the payload sent to the API is empty.
provider "databricks" {
host = "https://xxxxxx.cloud.databricks.com/"
token = "xxxxxx"
}
resource "databricks_job" "my_job3" {
new_cluster {
autoscale {
min_workers = 2
max_workers = 3
}
spark_version = "6.4.x-scala2.11"
aws_attributes {
availability = "SPOT"
zone_id = "us-east-1a"
spot_bid_price_percent = "100"
}
node_type_id = "r3.xlarge"
}
notebook_path = "/Users/[email protected]/my-demo-notebook"
name = "my-demo-notebook"
timeout_seconds = 3600
max_retries = 1
max_concurrent_runs = 1
}
https://gist.github.com/Gnarik/21805a0ceb7e8fd26b67318d83ff80f6
We would expect to have the field new_cluster fulfilled with the values specified in the terraform script then a job that spin up a new cluster when it starts is created on the Databricks environment.
The new_cluster field is empty in the API HTTP payload and no job is created
Please list the steps required to reproduce the issue, for example:
terraform init
terraform apply
None
None
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for Azure Mounts (Blob (both SAS key and Access Key), ADLS gen 1, ADLS gen 2).
Is your feature request related to a problem? Please describe.
Currently resource_databricks_azure_* mounts don't have "Sensitive" method on their secrets in schema making it possible to print out secrets to standard output.
Describe the solution you'd like
Addition of "Sensitive bool" method, as per the official documentation.
https://www.terraform.io/docs/extend/schemas/schema-methods.html
Describe alternatives you've considered
N\A
Additional context
Issue found in:
databricks/resource_databricks_auzre_adls_gen1_mount.go
databricks/resource_databricks_auzre_adls_gen2_mount.go
databricks/resource_databricks_auzre_blob_mount.go
The solution should look similar to the below code (added method in bold):
"token_secret_key": {
Type: Schema.TypeString,
Required: True,
ForceNew: True,
Sensitive: True
}
The following data sources are missing documentation.
We expect all the attributes and types to be clearly documented. These data sources will be used in conjunction with other resources so it is important that they are documented.
Is your feature request related to a problem? Please describe.
Currently ADLS mounts allow mounts to be created using service princpal details, but for some scenarios we want to be able to provision mounts using AAD Passthrough: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough#--mount-azure-data-lake-storage-to-dbfs-using-credential-passthrough
Current ADLS Gen 2 mount resource:
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = ""
container_name = ""
storage_account_name = ""
directory = ""
mount_name = ""
tenant_id = ""
client_id = ""
client_secret_scope = ""
client_secret_key = ""
initialize_file_system = true
}
Describe the solution you'd like
Would like to be able to specify to use AAD Passthrough rather than passing client_id
etc
The proposed change to the resource is shown below
Service principal:
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = ""
container_name = ""
storage_account_name = ""
directory = ""
mount_name = ""
initialize_file_system = true
mount_type = "ServicePrincipal"
service_principal {
tenant_id = ""
client_id = ""
client_secret_scope = ""
client_secret_key = ""
}
}
AAD Passthrough:
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = ""
container_name = ""
storage_account_name = ""
directory = ""
mount_name = ""
initialize_file_system = true
mount_type = "AADPassthrough"
}
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for AWS Mounts (both IAM User & IAM Role mounts).
Is your feature request related to a problem? Please describe.
We are using the databricks-terraform
provider in conjunction with the azurerm
provider to deploy an Azure Databricks Workspace and set up Databricks using tasks in Azure DevOps Pipelines.
When using the Terraform task in Azure DevOps Pipelines to target Azure it sets up the ARM_*
env vars that the azurerm
provider expects. Since these are not used by the databricks-terraform
provider we cannot use the Terraform task. As an alternative we are creating a separate script task that sets the additional env vars for databricks-terraform
and then kicking of the terraform apply
Describe the solution you'd like
The azurerm
provider allows ARM_CLIENT_ID
and ARM_CLIENT_SECRET
env vars to be set. If we could have a way to opt in to configuring the databricks-terraform
provider to use these values to get an authorization token for talking to Azure Databricks then it would simplify the deployment pipeline.
Describe alternatives you've considered
Current alternative is wrapping the terraform execution inside a separate task in Azure DevOps Pipelines
Is your feature request related to a problem? Please describe.
I would like the scim service principal resource to be implemented, with acceptance tests and documented in the website docs. https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/scim-sp
Describe the solution you'd like
This requires:
Describe alternatives you've considered
Design is straight forward follows the pattern of scim user.
Additional context
For more information read here: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/scim-sp. It enables you to use Terraform to add SCIM service principals to the workspace via SCIM.
Is your feature request related to a problem? Please describe.
Currently this project only supports Auth Via SP and can create a token but this should not be the case, it should just use the AAD token with the right headers.
Describe the solution you'd like
Refactor the azure_ws_init file into an azure auth component for the dbclient.
Additional context
Azure customers would much rather use AAD tokens, rather than PAT tokens when interacting with the APIs.
Current Blockers
AAD token based auth is blocked due to api calls to secrets api via AAD token is blocked.
Our goal is to make sure that there is acceptance tests for all resources.
There should be an acceptance test for Clusters.
Hi there,
When creating a new job with the new_cluster
field fulfill, the parameter ebs_volume_count
disappears which leads to a response error from Databricks API.
Databricks provider version is master branch patched with #79
provider "databricks" {
host = "https://xxxxxx.cloud.databricks.com/"
token = "xxxxxx"
}
resource "databricks_job" "my_job3" {
new_cluster {
autoscale {
min_workers = 2
max_workers = 3
}
spark_version = "6.4.x-scala2.11"
aws_attributes {
availability = "SPOT"
zone_id = "us-east-1a"
spot_bid_price_percent = "100"
first_on_demand = 1
ebs_volume_type = "GENERAL_PURPOSE_SSD"
ebs_volume_count = 1
ebs_volume_size = 32
}
node_type_id = "r4.2xlarge"
}
notebook_path = "/Users/[email protected]/my-demo-notebook"
name = "my-demo-notebook"
timeout_seconds = 3600
max_retries = 1
max_concurrent_runs = 1
https://gist.github.com/Gnarik/dc16b034a1809011c7092897bc6326b9
Job is created with new_cluster behavior
No job is created and an error message is returned instead
terraform init
terraform apply
➜ terraform -v
Terraform v0.12.24
terraform {
required_version = "~> 0.12"
}
provider "azurerm" {
version = "~> 2.6.0"
features {}
}
provider "azuread" {
version = "~> 0.8.0"
}
data "azurerm_client_config" "current" {
}
resource "azuread_application" "aadapp" {
name = "app"
required_resource_access {
resource_app_id = "e406a681-f3d4-42a8-90b6-c2b029497af1"
resource_access {
id = "03e0da56-190b-40ad-a80c-ea378c433f7f"
type = "Scope"
}
}
required_resource_access {
resource_app_id = "00000003-0000-0000-c000-000000000000"
resource_access {
id = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
type = "Scope"
}
}
}
resource "random_password" "aadapp_secret" {
length = 32
special = false
}
resource "azuread_service_principal" "sp" {
application_id = azuread_application.aadapp.application_id
}
resource "azuread_service_principal_password" "sppw" {
service_principal_id = azuread_service_principal.sp.id
value = random_password.aadapp_secret.result
end_date = "2030-01-01T00:00:00Z"
}
resource "azurerm_resource_group" "rg" {
name = "rg"
location = var.region
}
resource "azurerm_role_assignment" "sprg" {
scope = azurerm_resource_group.rg.id
role_definition_name = "Owner"
principal_id = azuread_service_principal.sp.object_id
}
resource "azurerm_databricks_workspace" "dbks" {
name = "dbks"
resource_group_name = azurerm_resource_group.rg.name
managed_resource_group_name = "rgdbks"
location = var.region
sku = "standard"
}
provider "databricks" {
azure_auth = {
managed_resource_group = azurerm_databricks_workspace.dbks.managed_resource_group_name
azure_region = azurerm_databricks_workspace.dbks.location
workspace_name = azurerm_databricks_workspace.dbks.name
resource_group = azurerm_databricks_workspace.dbks.resource_group_name
client_id = azuread_application.aadapp.application_id
client_secret = random_password.aadapp_secret.result
tenant_id = data.azurerm_client_config.current.tenant_id
subscription_id = data.azurerm_client_config.current.subscription_id
}
}
resource "databricks_cluster" "cluster" {
spark_version = var.databricks_cluster_version
cluster_name = "cluster"
node_type_id = var.databricks_cluster_node_type
autotermination_minutes = 30
autoscale {
min_workers = 2
max_workers = 4
}
}
resource "databricks_azure_adls_gen2_mount" "mnt" {
cluster_id = databricks_cluster.cluster.cluster_id
container_name = "anything"
storage_account_name = "anything"
mount_name = "anything"
tenant_id = data.azurerm_client_config.current.tenant_id
client_id = azuread_application.aadapp.application_id
client_secret_scope = "anything"
client_secret_key = "anything"
}
Error: "cluster_id": required field is not set
You can see it in the state as well (I already deployed the cluster):
➜ terraform show terraform.tfstate | grep -A35 '# databricks_cluster.cluster'
# databricks_cluster.cluster:
resource "databricks_cluster" "cluster" {
autoscale = [
{
max_workers = 4
min_workers = 2
},
]
autotermination_minutes = 30
cluster_name = "cluster"
default_tags = {
"ClusterId" = "0430-125417-toed833"
"ClusterName" = "cluster"
"Creator" = "123456789-1234-1234-1234-52ef7bbab4af"
"Vendor" = "Databricks"
}
driver_node_type_id = "Standard_DS3_v2"
enable_elastic_disk = true
id = "0430-125417-toed833"
library_cran = []
library_egg = []
library_jar = []
library_maven = []
library_pypi = []
library_whl = []
node_type_id = "Standard_DS3_v2"
spark_version = "6.5.x-scala2.11"
state = "RUNNING"
}
According to the docs, cluster_id
should be available. You can use both id
and cluster_id
.
While id
works as expected, cluster_id
doesn't. I suggest to either remove the attribute or fill it as expected.
terraform apply
Hi,
Great work on the provider. We've found a small bug we'd like to fix up and PR into the provider to make understanding a failure case easier.
@stuartleeks is looking at fixing this up by changing the behavior in the try-except
block and adding an integration test to validate. Probably tackles #20 too as an added bonus 🎉
# tf -v
Terraform v0.12.16
+ provider.azuread v0.8.0
+ provider.azurerm v2.8.0
+ provider.databricks v0.1.0
+ provider.random v2.2.1
resource "databricks_azure_adls_gen2_mount" "mount" {
cluster_id = databricks_cluster.cluster.id
container_name = "dev" #todo: replace with env...
storage_account_name = azurerm_storage_account.account.name
directory = "/dir"
mount_name = "localdir"
tenant_id = data.azuread_client_config.current.tenant_id
client_id = azuread_application.datalake.application_id
client_secret_scope = databricks_secret_scope.terraform.name
client_secret_key = databricks_secret.client_secret.key
}
The errror from the dbutils.fs.mount
should be returned by the provider. For example if a secret is misconfigured or clientID wrong the following should be returned:
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: AADToken: HTTP connection failed for getting token from AzureAD. Http response: 401 Unauthorized
If the mount operation fails the exception details are swallowed by the provider and another exception is instead returned as a result of dbutils.unmount
failing.
For example if you misconfigure the Service Principal details inputted into the resource the following output is received.
The actual error occurring during the mount is:
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: AADToken: HTTP connection failed for getting token from AzureAD. Http response: 401 Unauthorized
This detail is lost as the except
block triggers and attempts to call dbutils.fs.unmount
. As the mount operation failed this call throws an exception. This is not caught and the throw e
line is not reached to throw the original error.
See example repro'ing this manually in a notebook:
terraform apply
Directory not mounted
instead of Not authorized error
Terraform v0.12.24
+ provider.azurerm v1.44.0
+ provider.databricks v0.1.0
Please list the resources as a list, for example:
resource "databricks_notebook" "notebook" {
for_each = fileset("${path.module}/notebooks", "*")
content = filebase64("${path.module}/notebooks/${each.value}")
path = "/Shared/Notebooks/${each.value}"
overwrite = false
mkdirs = true
format = "SOURCE"
language = SCALA
}
The above loops a number of files in a local dir and deploys them to databricks using the databricks_notebook
provider. When more than about 3/4 files are present i'm seeing pretty regular 429
errors returned when running tf apply
. Not sure if there's a race condition whereby the notebook is still being deleted when tf
is trying to re-create it.
On first create, all is successful. On further runs of tf apply
it sees each notebook as needing to be recreated (will log separate issue for this), and when trying to recreate we usually hit a 429
error.
databricks_notebook.notebook["notebook1.scala"]: Destroying... [id=/Shared/Notebooks/notebook1.scala]
databricks_notebook.notebook["notebook3.scala"]: Destroying... [id=/Shared/Notebooks/notebook3.scala]
databricks_notebook.notebook["notebook2.scala"]: Destroying... [id=/Shared/Notebooks/notebook2.scala]
databricks_notebook.notebook["notebook3.scala"]: Destruction complete after 0s
databricks_notebook.notebook["notebook3.scala"]: Creating...
databricks_notebook.notebook["notebook2.scala"]: Destruction complete after 0s
databricks_notebook.notebook["notebook2.scala"]: Creating...
databricks_notebook.notebook["notebook3.scala"]: Creation complete after 1s [id=/Shared/Notebooks/notebook3.scala]
databricks_notebook.notebook["notebook2.scala"]: Creation complete after 1s [id=/Shared/Notebooks/notebook2.scala]
Error: Response from server (429)
That the notebooks get re-created.
429 errors
tf apply
to loop the directory and create all the notebookstf apply
again.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.