Giter Club home page Giter Club logo

Comments (25)

joakimhellum avatar joakimhellum commented on June 15, 2024 9

@mb290 @schoren this is also the behavior of Azure CLI when using the command az ad sp create-for-rbac, as it pauses execution for 5 seconds and retries role assignment creation up to 36 times, waiting for server replication.

References:
https://github.com/Azure/azure-cli/blob/master/src/command_modules/azure-cli-role/azure/cli/command_modules/role/custom.py#L959
https://github.com/Azure/azure-cli/blob/master/src/azure-cli-core/azure/cli/core/commands/arm.py#L995
image

To be clear the terraform configuration below works most of the time because it waits 30s for server replication using a hack (but sometimes it take longer than 30s, and then it fails with the same error you describe above):

provider "azurerm" {
  version = "~> 1.10.0"
}

data "azurerm_subscription" "current" {}

resource "random_string" "password" {
  length = 32
}

resource "random_id" "name" {
  byte_length = 16
}

variable "role" {
  default = "Contributor"
}

variable "end_date" {
  default = "2020-01-01T01:02:03Z"
}

resource "azurerm_azuread_application" "service_principal" {
  name = "${random_id.name.hex}"
}

resource "azurerm_azuread_service_principal" "service_principal" {
  application_id = "${azurerm_azuread_application.service_principal.application_id}"
}

resource "azurerm_azuread_service_principal_password" "service_principal" {
  service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
  value                = "${random_string.password.result}"
  end_date             = "${var.end_date}"

  # wait 30s for server replication before attempting role assignment creation
  provisioner "local-exec" {
    command = "sleep 30"
  }
}

resource "azurerm_role_assignment" "service_principal" {
  scope                = "${data.azurerm_subscription.current.id}"
  role_definition_name = "${var.role}"
  principal_id         = "${azurerm_azuread_service_principal.service_principal.id}"
  depends_on           = ["azurerm_azuread_service_principal_password.service_principal"]
}

output "display_name" {
  description = "The Display Name of the Azure Active Directory Application associated with this Service Principal."
  value       = "${azurerm_azuread_service_principal.service_principal.display_name}"
}

output "application_id" {
  description = "The Application ID."
  value       = "${azurerm_azuread_application.service_principal.application_id}"
}

output "object_id" {
  description = "The Object ID for the Service Principal."
  value       = "${azurerm_azuread_service_principal.service_principal.id}"
}

output "password" {
  description = "The Password for this Service Principal."
  value       = "${azurerm_azuread_service_principal_password.service_principal.value}"
}

While this terraform configuration don't wait for server replication using the above hack, and always fails:

provider "azurerm" {
  version = "~> 1.10.0"
}

data "azurerm_subscription" "current" {}

resource "random_string" "password" {
  length = 32
}

resource "random_id" "name" {
  byte_length = 16
}

variable "role" {
  default = "Contributor"
}

variable "end_date" {
  default = "2020-01-01T01:02:03Z"
}

resource "azurerm_azuread_application" "service_principal" {
  name = "${random_id.name.hex}"
}

resource "azurerm_azuread_service_principal" "service_principal" {
  application_id = "${azurerm_azuread_application.service_principal.application_id}"
}

resource "azurerm_azuread_service_principal_password" "service_principal" {
  service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
  value                = "${random_string.password.result}"
  end_date             = "${var.end_date}"
}

resource "azurerm_role_assignment" "service_principal" {
  scope                = "${data.azurerm_subscription.current.id}"
  role_definition_name = "${var.role}"
  principal_id         = "${azurerm_azuread_service_principal.service_principal.id}"
  depends_on           = ["azurerm_azuread_service_principal_password.service_principal"]
}

output "display_name" {
  description = "The Display Name of the Azure Active Directory Application associated with this Service Principal."
  value       = "${azurerm_azuread_service_principal.service_principal.display_name}"
}

output "application_id" {
  description = "The Application ID."
  value       = "${azurerm_azuread_application.service_principal.application_id}"
}

output "object_id" {
  description = "The Object ID for the Service Principal."
  value       = "${azurerm_azuread_service_principal.service_principal.id}"
}

output "password" {
  description = "The Password for this Service Principal."
  value       = "${azurerm_azuread_service_principal_password.service_principal.value}"
}

with the error:

Error: Error applying plan:

1 error(s) occurred:

* azurerm_role_assignment.service_principal: 1 error(s) occurred:

* azurerm_role_assignment.service_principal: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="PrincipalNotFound" Message="Principal 12eab7225e744ca7b617876179b68b95 does not exist in the directory ssssssss-ssss-ssss-ssss-ssssssssssss."

Do anyone have suggestion for workaround in terraform? I don't yet understand how fix for this would be implemented in any of these resources.

I really don't want to use this very ugly hack:

...

resource "azurerm_azuread_service_principal" "service_principal" {
  application_id = "${azurerm_azuread_application.service_principal.application_id}"
}

resource "azurerm_azuread_service_principal_password" "service_principal" {
  service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
  value                = "${random_string.password.result}"
  end_date             = "${var.end_date}"

  # wait 30s for server replication before attempting role assignment creation
  provisioner "local-exec" {
    command = "sleep 30"
  }
}

resource "azurerm_role_assignment" "service_principal" {
  scope                = "${data.azurerm_subscription.current.id}"
  role_definition_name = "${var.role}"
  principal_id         = "${azurerm_azuread_service_principal.service_principal.id}"
  depends_on           = ["azurerm_azuread_service_principal_password.service_principal"]
}

...

Many thanks,

from terraform-provider-azuread.

adamrbennett avatar adamrbennett commented on June 15, 2024 3

I've barely tested this, so it's probably flawed, but it worked the first time I tried it:

resource "azuread_service_principal_password" "main" {
  service_principal_id = "${azuread_service_principal.main.id}"
  value = "${var.password}"
  end_date = "${var.end_date}"

  provisioner "local-exec" {
    command = <<EOF
until az ad sp show --id ${azuread_service_principal.main.application_id}
do
  echo "Waiting for service principal..."
  sleep 3
done
EOF
  }
}

At least it's an idea, and someone can probably identify the flaws and improve on it.

from terraform-provider-azuread.

joakimhellum avatar joakimhellum commented on June 15, 2024 2

We really want to avoid using the local-exec provisioner and sleep command as workaround, since we'd have to have pause execution approx. 180 seconds to really be sure server replication is done (sometimes server replication take long time). Also we run terraform on multiple OS/build agents where sleep is not always accessible. So it would be a really ugly hack. Using az ad sp create-for-rbac would be a better alternative for us than using the terraform resources currently.

Any suggestions on how to implement a fix for this in terraform is highly appreciated.

Update 1: yes, have really no idea how to approach fixing this in terraform other than retrying multple times on fail like az cli does, as the error returned from the API is very generic. Maybe @tombuildsstuff could help with what direction to take here.

Update 2:
FYI There is another issue #841 that seem to have the same kind of problem where retrying was implemented in the resource ref. https://github.com/terraform-providers/terraform-provider-azurerm/blob/master/azurerm/resource_arm_storage_container.go#L111

Update 3:
#1644 this is bad example of a workaround, would like just to start this discussion. any advice appreciated.

Thanks again,

from terraform-provider-azuread.

stevenicholl avatar stevenicholl commented on June 15, 2024 1

I am also encountering

Original Error: autorest/azure: Service returned an error. Status=400 Code="PrincipalNotFound" Message="Principal 6b3xxxxxxxxxxxx58755xxxx does not exist in the directory xxxxx-xxxx-xxxx-xxxx-xxxxxxxx."

In my scenario the service principle is pre-existing so it cannot be a time thing. I am attempting to give an AKS SP permission to act as "Managed Identity Operator" over a User Managed Identity.

When using the respective AZ CLI command as the same user running Terraform, I have no issues.

az role assignment create --role "Managed Identity Operator" --assignee [SP ID] --scope "/subscriptions/[SUBSCRIPTIONID]/resourcegroups/sandbox/providers/Microsoft.ManagedIdentity/userAssignedIdentities/sandbox-mid"

In this example it looks like (as @liamfoneill above) the issue may lie with the azurerm_role_assignment resource.

Resolved for now by running the az cli command via a local-exec. It works for now, but would much prefer to use the native resource.

from terraform-provider-azuread.

schoren avatar schoren commented on June 15, 2024

I had the same issue today. In my case, I fixed it by using the azurerm_azuread_application id instead of the azurerm_azuread_service_principal id. Something like this:

resource "azurerm_azuread_application" "test" {
  name                       = "exampleTFapplication"
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
}

resource "azurerm_azuread_service_principal" "test" {
  application_id = "${azurerm_azuread_application.test.application_id}"
}

resource "azurerm_azuread_service_principal_password" "test" {
  service_principal_id = "${azurerm_azuread_service_principal.test.id}"
  value                = "BVcKK237/&&)hyz@%nsadasdsa(*&^CC#Nd3"
  end_date             = "2020-01-01T01:02:03Z"
}

resource "azurerm_resource_group" "test" {
  name     = "testResourceGroup1"
  location = "West US"
}

resource "azurerm_role_assignment" "test" {
  scope                = "${azurerm_resource_group.test.id}"
  role_definition_name = "Reader"
  principal_id         = "${azurerm_azuread_application.test.application_id}"
}

It's a weird behavior, but I got that from the az ad sp create-for-rbac command. When comparing to the Azure Portal, the actual ID used was the application ID.

Hope it helps!

from terraform-provider-azuread.

TechyMatt avatar TechyMatt commented on June 15, 2024

@schoren thanks for replying. I just tested this and when i tried the update I get the response:

Error: Error applying plan:

1 error(s) occurred:

  • azurerm_role_assignment.test: 1 error(s) occurred:

  • azurerm_role_assignment.test: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="PrincipalNotFound" Message="Principal 408b56eeXXXXXXXXXXX does not exist in the directory #######-#####-######-#########."

I then confirmed the outputs and they are both different values:

Outputs:

azurerm_azuread_application_id = 408b56eeXXXXXXXXXXX
azurerm_azuread_service_principal_id = b711cba7XXXXXXXXXXX

I looked at the azurerm_role_assignment documentation and it does specifically call out the principal ID is required.

Am I missing something obvious?

from terraform-provider-azuread.

schoren avatar schoren commented on June 15, 2024

Yes, yesterday I had a similar issue. I'm checking now to see if it is still happening. In another env, I had successfully deployed and assigned roles to services principal using that method.

from terraform-provider-azuread.

schoren avatar schoren commented on June 15, 2024

Ok, now it's working with the original solution, using azurerm_azuread_service_principal id. Not sure why it worked different before, but it's working as expected now. Is it working for you?

from terraform-provider-azuread.

schoren avatar schoren commented on June 15, 2024

@joakimhellum-in Thanks for that clarification. It is an ugly workaround, but maybe that's the best we can get. I don't have a very deep understanding of terraform and this provider's inner workings, so I cannot tell if there's a cleaner solution.

For the time being, I think I'll implement what you suggested

from terraform-provider-azuread.

kjhosein avatar kjhosein commented on June 15, 2024

@tombuildsstuff and/or anyone - would you clarify something for me?

It appears (to me at least) that the solution to the various StatusCode=404, ErrorCode=ResourceNotFound issues in the AzureRM provider is to code a fix/retry into the particular resource component. I've noticed multiple such issues here.

Does this mean that you couldn't do something similar to the max_retries option in the AWS provider?

Thanks for any insight!

from terraform-provider-azuread.

LaurentLesle avatar LaurentLesle commented on June 15, 2024

I can confirm I have the same behaviour. This is related to the time to replicate the SP through the Azure AD servers.

My scenario is:

  • Create the azurerm_azuread_application,
  • Create the azurerm_azuread_service_principal
  • Create the azurerm_azuread_service_principal_password
  • Create a Keyvault
  • Assign a policy to that SP in KeyVault
  • Connect to Azure RM provider using that SP to create a secret key

Get the error : AADSTS70001: Application with identifier 'app guid here' was not found in the directory

retry 1 min later another terraform apply and everything goes through.

from terraform-provider-azuread.

kvolkovich-sc avatar kvolkovich-sc commented on June 15, 2024

Have the same issue.

from terraform-provider-azuread.

andresguisado avatar andresguisado commented on June 15, 2024

I have tried with 30s, 60s,180s and 200s and I am still getting the same issue...

Using directly az-cli is what worked for me as @joakimhellum-in mentioned previously:

resource "azurerm_azuread_service_principal_password" "app_spn_password" {
  service_principal_id = "${azurerm_azuread_service_principal.app_spn_id.id}"
  value                = "${random_string.password.result}"
  end_date             = "${var.spn_end_date}" #2020-01-01T01:02:03Z  

  provisioner "local-exec" {
    command = "az role assignment create --role ${var.spn_role_definition_name} --assignee-object-id ${azurerm_azuread_service_principal.app_spn_id.id} --scope ${var.spn_scope}"
  }

}

from terraform-provider-azuread.

andresguisado avatar andresguisado commented on June 15, 2024

Did anybody think to query the AD servers by PowerShell to see if the SPN has been replicated through and then carry on?

http://community.idera.com/database-tools/powershell/ask_the_experts/f/active_directory__powershell_remoting-9/21621/check-if-user-exist-and-is-active-in-ad1-or-ad2

I am not sure if you can do this on Azure AD though...

from terraform-provider-azuread.

clstokes avatar clstokes commented on June 15, 2024

I'm getting ServicePrincipalNotFound errors for azurerm_kubernetes_cluster resources as well and a subsequent apply works. @tombuildsstuff, should I open a different issue than this one?

from terraform-provider-azuread.

tombuildsstuff avatar tombuildsstuff commented on June 15, 2024

@clstokes that sounds like the same underlying issue as this, so we can track that here. Thanks!

from terraform-provider-azuread.

logankp avatar logankp commented on June 15, 2024

I'm getting the same issue but I'm not using depends_on. I created the cluster first then added the configuration to create the role assignment. No matter how many times I try to apply it fails.

from terraform-provider-azuread.

katbyte avatar katbyte commented on June 15, 2024

Hi @mb290,

As in 2.0 we are deprecating all Azure AD resources and data sources in the Azure RM provider in favour of this new provider I have moved the issue here.

from terraform-provider-azuread.

R0quef0rt avatar R0quef0rt commented on June 15, 2024

I can confirm that this issue still exists with the new AzureAD provider.

from terraform-provider-azuread.

liamfoneill avatar liamfoneill commented on June 15, 2024

I also cannot do role assignments with Terraform for Service Principals. It works fine for AAD groups but I get the Status=400 Code="PrincipalNotFound" too. The service principal has been created days ago so I don't think it is a race condition that others seem to be experiencing. If this is being tracked in another issue @tombuildsstuff can you please post the link here as I cannot find it.

from terraform-provider-azuread.

antoinne85 avatar antoinne85 commented on June 15, 2024

If you happen to be running on Windows (where until is not available), here's another potential workaround:
Drop wait-for-service-principal.ps1 in your working directory and use a local-exec provisioner (similar to the previous option).

wait-for-service-principal.ps1

param(
    [string]$ApplicationId
)

$elapsed = 0;
$delay = 3;
$limit = 5 * 60;

$checkMsg = "Checking for service principal with Application ID $ApplicationId"
Write-Host $checkMsg
$cmd = "az ad sp show --id $ApplicationId";
Invoke-Expression $cmd
while($lastExitCode -ne 0 -and $elapsed -le $limit) {
    $elapsedSeconds = $elapsed + "s";
    Write-Host "Service principal is not yet available. Retrying in $delay seconds... ($elapsedSeconds elapsed)"
    Start-Sleep -Seconds $delay;
    $elapsed += $delay;

    Write-Host $checkMsg
    Invoke-Expression $cmd;
}

if($lastExitCode -eq 0) {
    Write-Host "Service principal is ready."
    exit 0
}

Write-Host "Service principal did not become ready within the allotted time."
exit 1
resource "azuread_service_principal_password" "ad_principal_pw" {
  service_principal_id = "${azuread_service_principal.ad_principal.id}"
  value = "${var.password}"
  end_date = "${var.end_date}"

  provisioner "local-exec" {
    command    = ".\\wait-for-service-principal.ps1 -ApplicationId \"${azuread_application.ad_app.application_id}\""
    interpreter = ["PowerShell"]
  }
}

from terraform-provider-azuread.

boeboe avatar boeboe commented on June 15, 2024

I am having the same issue. Is there a permanent solution on the roadmap? I see this issue was removed from the 0.3.0 milestone.

The work-around with the exec-local to wait for "az ad sp show --id ${azuread_service_principal.main.application_id}" does not work either. The exec returns ok, displaying the service principe, but it is yet not ready to get consumed by AKS. I guess timing/eventual consistency issue between several Azure API's.

Sleep 30 was the only way forward for me.

from terraform-provider-azuread.

jlpedrosa avatar jlpedrosa commented on June 15, 2024

Hi!

This also affects for AKS cluster, as the SP is not ready (or the password).

from terraform-provider-azuread.

lukasmrtvy avatar lukasmrtvy commented on June 15, 2024

@adamrbennett

Maybe something like this can replace resource timeout block.
Also there is no necessary to query API for destroying that resource. (I am not familiar what is done with local-exec at destroying time..), Its just an another guess..

resource "null_resource" "wait" {

  provisioner "local-exec" {
    command = <<EOF
        COUNTER=$RETRIES
        until [ $COUNTER -eq 0 ] || az ad sp show --id ${azuread_application.application.application_id} -o none
        do
            echo "Waiting for service principal..."
            let COUNTER-=1
            sleep $TIMEOUT
        done
    EOF

    environment = {
      TIMEOUT = "5"
      RETRIES = "20"
    }

  }

  provisioner "local-exec" {
    when = "destroy"
    command = "echo 'Wait hook'"
  }

}

from terraform-provider-azuread.

 avatar commented on June 15, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

from terraform-provider-azuread.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.