microsoft / sdn Goto Github PK
View Code? Open in Web Editor NEWThis repo includes PowerShell scripts and VMM service templates for setting up the Microsoft Software Defined Networking (SDN) Stack using Windows Server 2016
License: Other
This repo includes PowerShell scripts and VMM service templates for setting up the Microsoft Software Defined Networking (SDN) Stack using Windows Server 2016
License: Other
Repro Steps:
Expected Result(s):
The SDN Express environment will be created successfully
Actual Result(s):
The SDN Express environment setup will always fail, because Network Controller access will always be denied with 401 (Unauthorized) error.
Root Cause:
the credential to access the network controller is not set properly in the function Invoke-WebRequestWithRetries in NetworkControllerRESTWrappers.ps1.
if($Credential -eq [System.Management.Automation.PSCredential]::Empty -or $Credential -eq $null){
$params.Add('Credential', $Credential)
}
Should be changed to:
if($Credential -ne [System.Management.Automation.PSCredential]::Empty -and $Credential -ne $null) {
$params.Add('Credential', $Credential)
}
We are setting up SDN manually based on the information gathered from the SDNExpress and other scripts in this repository. We trying simple setup at first with just:
Without Network Gateways or Load Balancers. The aim is just to have at first simple virtualised networks using vfp and vxlan.
Using northbound api we have managed to configure in Network Controller:
There is connectivity between Hyper-V hosts and Network Controller on port 6640 and 443, with three sessions permanently established:
PS C:\Users\gregory> netstat -aonp tcp | sls :6640
TCP 0.0.0.0:6640 0.0.0.0:0 LISTENING 1172
TCP 10.80.3.103:6640 10.80.3.109:64256 ESTABLISHED 1172
TCP 10.80.3.103:6640 10.80.3.109:64257 ESTABLISHED 1172
TCP 10.80.3.103:54819 10.80.7.234:6640 ESTABLISHED 1172
Each Hyper-V Host has two physical network adapters connected to physical switch with FULL trunk (all vlans) allowed on the switch. Both physical network adapters are attached to VMSwitch using SET Team.
Network Controller is able to send request to Hyper-V host to create two "PAhostVNIC", however NONE of them have ip address assigned. This was checked using vfpctrl
command using /list-vmswitch-port
and /get-address-info
options.
Network Controller SDN Diagnostics logs show an error message when allocating an IP address for the host.
SDNFNM, 119, 51001, PRIMARY#105,fnm\common\FnmTracing.cs#While allocating IPAddress for Host:71824f03-a079-4b9f-b616-27eae1d4dc9b, host was not connected to network:1bd21a65-d9c4-430e-a6fd-4b76c1d50156
Full log details follow later in this message.
Both Hyper-V Hosts are connected to a switch and full trunk (all vlans) are allowed to be passed to the Hyper-V Hosts.
Any ideas what can be causing this error and how to move forward ?
Definition of the Logical Network is as follows:
{
"resourceRef": "/logicalnetworks/204fd6a4-6cab-4b1d-a12e-368ae702e570",
"resourceId": "204fd6a4-6cab-4b1d-a12e-368ae702e570",
"resourceMetadata": {
},
"etag": "W/\"eb4eae16-d491-444c-9edb-f83b8d86ae5f\"",
"instanceId": "1bd21a65-d9c4-430e-a6fd-4b76c1d50156",
"properties": {
"provisioningState": "Succeeded",
"subnets": [
{
"resourceRef": "/logicalnetworks/204fd6a4-6cab-4b1d-a12e-368ae702e570/subnets/3c0d4178-f1a4-422d-a908-adb1240619ab",
"resourceId": "3c0d4178-f1a4-422d-a908-adb1240619ab",
"etag": "W/\"eb4eae16-d491-444c-9edb-f83b8d86ae5f\"",
"instanceId": "c897ccff-3b69-41f7-b2f2-3ddd161f40af",
"properties": {
"provisioningState": "Succeeded",
"addressPrefix": "10.5.10.0/24",
"ipConfigurations": [
],
"networkInterfaces": [
],
"gatewayPools": [
],
"networkConnections": [
],
"vlanID": "704",
"ipPools": [
{
"resourceRef": "/logicalnetworks/204fd6a4-6cab-4b1d-a12e-368ae702e570/subnets/3c0d4178-f1a4-422d-a908-adb1240619ab/ipPools/d0342931-99d9-4ce4-bfe1-cbd96d5ab8c3",
"resourceId": "d0342931-99d9-4ce4-bfe1-cbd96d5ab8c3",
"etag": "W/\"eb4eae16-d491-444c-9edb-f83b8d86ae5f\"",
"instanceId": "2d361413-9b1e-4cb7-8988-7b62d2c2bead",
"properties": {
"provisioningState": "Succeeded",
"startIpAddress": "10.5.10.50",
"endIpAddress": "10.5.10.150"
}
}
],
"dnsServers": [
"10.5.10.7",
"10.5.10.8",
"10.5.10.9"
],
"defaultGateways": [
"10.5.10.1"
],
"isPublic": false,
"usage": {
"numberOfIPAddresses": 101,
"numberofIPAddressesAllocated": 0,
"numberOfIPAddressesInTransition": 0
}
}
}
],
"virtualNetworks": [
{
"resourceRef": "/virtualNetworks/Contoso_VNet1"
}
],
"networkVirtualizationEnabled": "True"
}
}
Network Controller SDN Diagnostics logs:
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Checking for queues to be retried
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Device queue 71824f03-a079-4b9f-b616-27eae1d4dc9b is under processing, attempt 30
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Scheduling device 71824f03-a079-4b9f-b616-27eae1d4dc9b update
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Found device 71824f03-a079-4b9f-b616-27eae1d4dc9b
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Computer: ml-sdn-test-1.RBX.WDC.PL
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#62,sdnvsm\common\Tracer.cs#ProcessDeviceWorkItem: resource Id: 71824f03-a079-4b9f-b616-27eae1d4dc9b, work item Id: a3755b09-541e-40cd-8aa9-cfec4e6ab884, type: AllocateProviderAddresses
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 8
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#1492,sdnvsm\service\GoalStateDriver.cs#Processing device work item
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 7
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#119,Utilities\HelperClasses\NCEndpointBehavior.cs#Send request to server for AllocateIpAddressForHost with Id a3755b09-541e-40cd-8aa9-cfec4e6ab884
Source: Microsoft-Windows-NetworkController-SDNFNM
Date: 5/5/2017 1:04:46 AM
Event ID: 8
Level: Information
Description:
05/04/2017 23:04:23, SDNFNM, 119, 0, PRIMARY#162,Utilities\HelperClasses\NCEndpointBehavior.cs#Received request at server for AllocateIpAddressForHost with Id a3755b09-541e-40cd-8aa9-cfec4e6ab884
Source: Microsoft-Windows-NetworkController-SDNFNM
Date: 5/5/2017 1:04:46 AM
Event ID: 5
Level: Information
Description:
05/04/2017 23:04:23, SDNFNM, 119, 0, PRIMARY#164,Utilities\HelperClasses\NCEndpointBehavior.cs#Start activity
Source: Microsoft-Windows-NetworkController-SDNFNM
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNFNM, 119, 51001, PRIMARY#100,fnm\common\FnmTracing.cs#GetAllLogicalSubnetsForNetwork succeded for logical network with name:1bd21a65-d9c4-430e-a6fd-4b76c1d50156
Source: Microsoft-Windows-NetworkController-SDNFNM
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
**05/04/2017 23:04:23, SDNFNM, 119, 51001, PRIMARY#105,fnm\common\FnmTracing.cs#While allocating IPAddress for Host:71824f03-a079-4b9f-b616-27eae1d4dc9b, host was not connected to network:1bd21a65-d9c4-430e-a6fd-4b76c1d50156**
Source: Microsoft-Windows-NetworkController-SDNFNM
Date: 5/5/2017 1:04:46 AM
Event ID: 7
Level: Information
Description:
05/04/2017 23:04:23, SDNFNM, 119, 0, PRIMARY#193,Utilities\HelperClasses\NCEndpointBehavior.cs#Stop activity
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 7
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#83,Utilities\HelperClasses\NCEndpointBehavior.cs#Received reply from server
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 2
Level: Error
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#86,sdnvsm\common\Tracer.cs#EXCEPTION: Unable to process device 71824f03-a079-4b9f-b616-27eae1d4dc9b queue
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 2
Level: Error
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#87,sdnvsm\common\Tracer.cs#System.ServiceModel.FaultException`1[Microsoft.Windows.Networking.NetworkController.Framework.Utilities.ControllerFault]: The creator of this fault did not specify a Reason. (Fault Detail is equal to Message: The specified host is not connected to specified network., Target: , InnerException: null).
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Device queue 71824f03-a079-4b9f-b616-27eae1d4dc9b failed
Source: Microsoft-Windows-NetworkController-VSwitchService
Date: 5/5/2017 1:04:46 AM
Event ID: 4
Level: Information
Description:
05/04/2017 23:04:23, SDNVSM, 231, 0, PRIMARY#50,sdnvsm\common\Tracer.cs#Scheduling retry in 60000 ms, now 873089000, requested due time 873149000
I'm getting this error (in the .err file) when deploying the Network Controller. Seems to be something wrong with the patch or recent change from cmd.exe to powershell..
Hopefully someone can help.
.\PrepareNodeForNetworkController.ps1 : The term '.\PrepareNodeForNetworkController.ps1' is not
recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:4
+ CategoryInfo : ObjectNotFound: (.\PrepareNodeForNetworkController.ps1:String) [],
CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Folders EdgeDeployment.cr and NCCertificate.cr needs to be moved from Templates\NC to Templates SLB.
NC ports were blocked incorrectly (or wrong port settings) due to profile data being set to 2 on NIC. To change, I had to do the following and then restart NC VMs and NC Host Agent
$vmNics = Get-VMNetworkAdapter -VMName "NC-01"
(Get-VMSwitchExtensionPortFeature -FeatureId 9940cd46-8b06-43bb-b9d5-93d50381fd56 -VMNetworkAdapter $vmNics[0]).SettingData
$currentProfile = Get-VMSwitchExtensionPortFeature -FeatureId 9940cd46-8b06-43bb-b9d5-93d50381fd56 -VMNetworkAdapter $vmNics[0]
$currentProfile.SettingData.ProfileData = 1
Set-VMSwitchExtensionPortFeature -VMSwitchExtensionFeature $currentProfile -VMNetworkAdapter $vmNics[0]
This is happening on the step 6 of the SDNExpress script. What may be the reason of that behavior potentially? Sometimes the same happens to NC host agent service. A single host configuration is used with a single NIC. Thanks.
Update SDNExpress scripts to configure Set-NetworkControllerDiagnostic
After downloading the SLB templates and attempting to import the following error is generated:
Cannot import package 'C:\Templates\SLB Production Generation 2 VM.XML' The package is either corrupted or has an incorrect file type. Operations failed with error: '>' is an unextected token. The expected token is '=', Line 35, position 91.
This also happens with the Gen 1 template, however this reports position 90
I have Network Controller deployment with 2 Hyper-V hosts without VMM and I'm facing issue with Network Controller erroring when communicating with NC Host agent to plumb PA addresses. I'm getting on NC information:
SDNVSM, 77, 0, PRIMARY#62,sdnvsm\common\Tracer.cs#Plumbing provider address /0, VLAN 0, MAC 401DD8B71D04 on PA VNIC 51768dff-61d2-4e37-9982-f573bbdcb956, VSwitch 69dce0c3-87c3-4be1-a552-a2cab5996171, device 8fe4794d-df18-4e1d-a351-db1ce87e2853
and later error(for both NC Agent I have):
SDNVSM, 314, 0, PRIMARY#33,ovsdb\ovsdbhelper\OvsdbDriver.cs#Connect to /hostAgent/VSwitch failed. Error: The HTTP request was forbidden with client authentication scheme 'Anonymous'.: [stack trace follows]
Debug-ServiceFabricNodeStatus and Debug-NetworkControllerConfigurationState shows no errors, NCHostAgent is running, but only with one established connection:
netstat -anop tcp |sls :6640
TCP 0.0.0.0:6640 0.0.0.0:0 LISTENING 6040
TCP (NC Host Agent IP):54725 (NC IP):6640 ESTABLISHED 6040
HostIds corresponds to the Instance Id of a server resource on the NC. Thumbprints of certificates used by the Hyper-V host and configured in server resources are the same.
Certificates are self signed(both on host and NC), but log states that NC Hosts certs are valid:
SDNVSM, 314, 0, PRIMARY#282,framework\servicemodule\ControllerRuntime.cs#ValidateCertificate thumbprint [9A134854275D4496F33CC9529776DFCFF9F974CA]. [VALID]
NC certificate is in cert:\LocalMachine\Root on hosts and Hosts certificates are in cert:\LocalMachine\Root on NC (along with NC certificate).
Any ideas?
Thanks in advance,
Malwina
Need to update script to point slbhpconfig.xml on each host to point to the SLBM VIP.
I have already setup my VMM\NC\SLB environment, and they worked well. I could assigned VIP to SLB Muxs and connected the VIP to access my Back Virtual Network. To make things easier, I deployed single node NC and 3-Nodes SLB. But after I restart the NC VM, VMM can no longer connect NC. When I run "Get-NetworkControllerCluster" or "Get-NetworkContoller" in NC VM, the Command is Hung for a long time.
I have re-deploy the environment three times. Seems I encountered the problem every time. So, could anyone tell me where I am wrong?
SDNexpress.ps1 fails with the following error:
Failed to start service 'NC Host Agent (NCHostAgent)'.
At C:\sdnexpress\scripts\SDNExpress.ps1:3283 char:9
+ Start-DscConfiguration -Path .\ConfigureHostNetworkingPreNCSe ...
The system event log shows the following error:
The NC Host Agent service terminated with the following error: A device attached to the system is not functioning.
The SCVMM 2016 Service Templates for the SLB and Gateway roles requires the option "Enable spoofing of MAC addresses" selected on NIC's that send and receive tenant data.
In our lab environment, we were unable to get the SLB and Gateway roles working right until we discovered that we need to have MAC spoofing enabled on the interfaces that send and receive tenant data.
We are following the step by step article given below trying to setup the SDN infrastructure in the VMM 2016 fabric on Windows Server 2016 and we are facing a rather weird issue with the same:
https://blogs.technet.microsoft.com/larryexchange/2016/09/05/configure-wap-to-support-new-sdn-stack-on-windows-server-2016/
and
https://technet.microsoft.com/en-us/system-center-docs/vmm/scenario/sdn-overview
We are able to setup the network controller vms fine but when the Pre-install scripts run, we get the below given errors:
Error (22042)
The service was not successfully deployed. Review the event log to determine the cause and corrective actions.
Recommended Action
The deployment can be restarted by retrying the job.
Error (22753)
The script command with properties: Type (PreInstall), Deployment Order (5) and Parent Type (ApplicationProfile), failed to complete successfully. Refer to the errors list for more information.
Recommended Action
If the script command's job restart action is set to restart, then the script will be re-executed. Otherwise, the script command will be skipped when the job is restarted, in which case corrective action should be taken to mitigate the effects of the script command failure.
Error (22753)
The script command with properties: Type (PreInstall), Deployment Order (5) and Parent Type (ApplicationProfile), failed to complete successfully. Refer to the errors list for more information.
Recommended Action
If the script command's job restart action is set to restart, then the script will be re-executed. Otherwise, the script command will be skipped when the job is restarted, in which case corrective action should be taken to mitigate the effects of the script command failure.
Error (22632)
The script command standard error matched the failure policy setting "Match any string" with its result The string is missing the terminator: '.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : TerminatorExpectedAtEndOfString
'joT'' is not recognized as an internal or external command,
operable program or batch file.. For more information, see the standard error log C:\NCInstall\InstallNetworkController-AllNodes.err.
From the looks of it, it seems that somewhere in the script a terminator is missing but we are not able to figure out the same. Any help would be appreciated.
I get this error on the very last step of the script when "add-SCFabricRoleResource" happens for the gateway VM's:
`Error (50125)
Network service threw an unhandled exception: '{
"error": {
"code": "InternalServerError",
"message": "An error occured.",
"innerError": "System.NullReferenceException: Object reference not set to an instance of an object.\r\n at Microsoft.Windows.Networking.NetworkController.RestApi.GetNetworkInterfacesOperation.SanitizeNetworkInterface(NetworkInterface networkInterface)\r\n at Microsoft.Windows.Networking.NetworkController.RestApi.GetNetworkInterfaceOperation.Execute()"
}
}'
Recommended Action
Work with the network service vendor to fix the problem.`
I have installed rollup 2.1 so I am on version 4.0.2051.0 now.
Another mentioned this error also in this post:
https://github.com/Microsoft/SDN/issues/78
I have choosen standalone, non HA gen2 settings.
Any help is much appreciated.
Is there any document which provides a list of all plugins available for ncHostAgent and their options with some description or examples ?
https://github.com/Microsoft/SDN/blob/master/Containers/ConfigureMCNP.ps1
In here @JMesser81 have used ProxiedServices, InfraServices and NdResponder. I can work out what they are doing in this use case, however it would be nice to be able to understand full extend of the functionality and options available.
ConfigureSLBManager doesn't select the correct logical network if multiple logical networks use the same subnet. The FabricConfig example states that this is allowed (https://github.com/Microsoft/SDN/blob/master/SDNExpress/scripts/FabricConfig.example.psd1#L59)
It looks like the script locates all subnets that have a VIP pool. It then iterates through each logical network until it finds one with a matching subnet. In the case of multiple logical networks using the same subnet, the script always selects the logical network with the lowest resource reference, even though that may not be the correct logical network. See https://github.com/Microsoft/SDN/blob/master/SDNExpress/scripts/SDNExpress.ps1#L1484-L1522)
To give an example: I have two logical networks as VIP pools: PublicVIP (192.168.3.0/24 with a pool of 3.120-3.159) and PrivateVIP (192.168.3.0/24 with a pool of 3.160-3.199).
I also have a Transit logical network (not a VIP pool) with a pool of 3.80-3.119. The resource reference ID is 00000000-2222-1111-9999-000000000001, which is the lowest of all logical networks. The script selects 192.168.3.80 as the SLMBVIP because the Transit logical network is the first one it falls on with a matching subnet.
Limit NC diagnostics log size to 5GB per node by default to avoid NC VMs running out to disk space (make the 5GB value configurable via FabricConfig).
I would like to request to support that 2VM NICs (attached to the PA logical network and transit network) when deploying SLB muxes.
The script has hardcoded values for timezone and user locale which are applied to the fabric and tenant VMs via unattend. These values may need to be changed for some deployments and we should therefore expose them via FabricConfig/TenantConfig instead of hardcoding them within the SDNExpress ps1 scripts.
With the recent changes in the PrepareNodeForNetworkController.ps1 there is a critical issue.
The powershell script is looking on the network controller under C:\Windows\NetworkController in the TemplateClusterManifest.xml for the XML version number. The version number is the 10.1.0.0. In the powershell script it throws an error because it is the version 10.1.0.0.
Here is the snippet of the PrepareNodeForNetworkController.ps1 file.
versionCheckSnippet.txt
Please make sure that the version number of the TemplateClusterManifest.xml is changed.
You workaround the “VLAN 0” issue by forcing the function to run every time, which will go and put the port profile in all cases.
In the SDNExpress.ps1 script, there is a configuration script Script "SetPortAndProfile_$(
Script "SetPortAndProfile_$(
TestScript = {
return $false #This is the change: TestScript return $False Instead of $True
Those checks found that VLAN 0 matches the VLAN tag on Management NIC and returned $true earlier which caused the SetScript to skip. Now with TestScript returning $false, the SetScript executes.
This is temporary fix (workaround), final solution will come from dev(s).
I need to publish updated scripts with support for SLB peering with 2 TOR switches.
I've been experimenting with containers on Windows Server 2016 and I ran into a problem: somewhere during the container networking setup the MTU of my interface was changed. I think I've narrowed it down to the New-VMSwitch
command (I believe this command is executed during the default docker network setup). Executing the steps below in powershell as Administrator should reproduce the problem; I just followed them in new Windows Server 2016 VMs on Azure / EC2 / GCE.
cmd /c 'netsh interface ipv4 show subinterfaces'
# Set the MTU to a lower value:
cmd /c 'netsh interface ipv4 set subinterface "Ethernet 2" mtu=1460 store=persistent'
# Install docker to get container and Hyper-V components:
Install-Module -Name DockerMsftProvider -Repository PSGallery -Force
Install-Package -Name docker -ProviderName DockerMsftProvider
Restart-Computer -Force
# Reconnect RDP session. The vEthernet interface for the default container
# always has MTU 1500, rather than taking MTU 1460 from "Ethernet 2":
cmd /c 'netsh interface ipv4 show subinterfaces'
# Remove the existing container network, then reconnect the RDP session:
Stop-Service docker
Get-ContainerNetwork | Remove-ContainerNetwork -Force
# Ethernet 2 is now the only interface again, with MTU 1460:
cmd /c 'netsh interface ipv4 show subinterfaces'
# Create a new VMSwitch, then reconnect the RDP session:
New-VMSwitch -name testMTU -netadaptername "Ethernet 2"
# The only interface is now "vEthernet (testMTU)", with MTU forced to 1500
# instead of 1460:
cmd /c 'netsh interface ipv4 show subinterfaces'
This unexpected MTU change will cause packet fragmentation and potentially other issues (in my case my RDP connection did not work until I lowered the MTU again). Is there a reason that New-VMSwitch
overrides the MTU on the Ethernet interface? Can the command be changed to inherit the MTU from the interface?
(Filing this bug here after looking at https://technet.microsoft.com/en-us/windows-server-docs/networking/sdn/contact-sdn-team - hopefully this is the right place.)
We successfully deployed a Network Controller infrastructure with VMM. During the setup we got no erros. The onboarding worked without any issue and the creation of new Tenant networks works perfectly.
But we are not able to get an IP address inside a Tenant VM. Our Hosts and NCs are running the latest patches. The Debug Output from NC are showing no errors. We used the production template with 3 Nodes and a PKI certificate (not Self-Signed)
The only thing we are missing are the three established session from the Hosts to the NC. We disabled the firewalls on all servers. We see connections with the TIME_WAIT state ...
What were we missing ? Nothing inside the Host Eventvwr or trace logs. Many warning in the NC Service-Fabric EventLog. But no errors in the Network Controller EventLog.
Best regards
Dominik
I'm getting the following error on line 1132 in the onboardGateway function.
Not sure what's going on. Somehow I'm getting this error when associating the service "gateway manager" to the gateway manager role.
Hopefully someone can help me out.
Error (50125) Network service threw an unhandled exception: '{ "error": { "code": "InUseVirtualServerCannotBeDeleted", "message": "VirtualServer is being used by Gateway resource and cannot be deleted. Delete the dependent resource first and then retry.", "innerError": "Microsoft.WindowsAzure.Networking.Nrp.Frontend.Common.ValidationException: VirtualServer is being used by Gateway resource and cannot be deleted. Delete the dependent resource first and then retry.\r\n at Microsoft.Windows.Networking.NetworkController.Framework.Operations.DeleteVirtualServerOperation.ExecuteInternal(VirtualServer existingResource, ITransaction transaction)\r\n at Microsoft.Windows.Networking.NetworkController.RestApi.Common.Operations.DeleteResourceDefaultOperation
1.DefaultExecuteTopLevelResource()\r\n at Microsoft.Windows.Networking.NetworkController.RestApi.Common.Operations.DeleteResourceDefaultOperation1.Execute()\r\n at Microsoft.WindowsAzure.Networking.Nrp.Frontend.Operations.OperationBase
1.Run()"
}
}'
Recommended Action
Work with the network service vendor to fix the problem.
`
When i was trying to deploy SLB using SCVMM RTM 2016.
i was hit with multiple issue. First of all, registry was not found. Im using WS2016 RTM hyperbase.
VERBOSE: [2016-10-09T21:15:41.1898573+08:00] Adding Network Controller Certificates to trusted Root Store
VERBOSE: [2016-10-09T21:15:41.2210983+08:00] Found certificate at path: C:\MuxInstall\NCCertificate\MultiNodeNC.cer
VERBOSE: [2016-10-09T21:15:41.2210983+08:00] Adding certificate to root store..
VERBOSE: [2016-10-09T21:15:41.2999737+08:00] Extracting subject Name from Certificate
VERBOSE: [2016-10-09T21:15:41.3156151+08:00] Parsing Subject Name CN=172.16.1.30 to get Subject Fqdn
VERBOSE: [2016-10-09T21:15:41.3156151+08:00] Updating registry values for Mux...
VERBOSE: [2016-10-09T21:15:50.6066217+08:00] Caught an exception:
VERBOSE: [2016-10-09T21:15:50.6339225+08:00] Exception Type: System.Management.Automation.ItemNotFoundException
VERBOSE: [2016-10-09T21:15:50.6495633+08:00] Exception Message: Cannot find path
'HKLM:\SYSTEM\CurrentControlSet\Services\SlbMux' because it does not exist.
VERBOSE: [2016-10-09T21:15:50.6651894+08:00] Excepti"
Secondly, SLBMUX services was not installed by the script.
VERBOSE: [2016-10-09T21:22:11.1207763+08:00] Setting slbmux service to autostart
VERBOSE: [2016-10-09T21:22:11.1676501+08:00] Caught an exception:
VERBOSE: [2016-10-09T21:22:11.1989009+08:00] Exception Type: System.InvalidOperationException
VERBOSE: [2016-10-09T21:22:11.2145245+08:00] Exception Message: Service slbmux was not found on computer '.'.
VERBOSE: [2016-10-09T21:22:11.2145245+08:00] Excepti"
anyone can help me where did i did wrong?
Thanks
I deployed NC with the VMM templates, as well as the SLB MUX, but I've had issues with the gw template. Most notably, while all my VM's connected to the Transit and management network, they didn'T connect to the backend network. I'll try connecting it manually, but that didn't happen with the slb template. Everything onboarded as planned, but that's not the case with the GW template. Have you seen this in other tests?
Hi all,
I have a another issue. After we onboarded the Gateway Service Template to the Network Controller we got the following error in Debugging Network Controller: "Gateway Cleanup failed" If we then try to remove the Service from Network Controller we are getting the error: "Network service threw an unhandled exception: 'DeleteGateway: 2837a962-a2a5-46bd-a2a4-12ec121bce4f Gateway cleanup is pending '
Another Thing. After onboarding the Gateway Service I get an "InfrastructurePortBlocked" if I run Debug-NetworkController on the VirtualSwitch Ressource type.
After the SDNExpress.ps1 script gets to the "End Set" portion of InstallHostCert
, I get the following error:
PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: The term '\Scripts\CertHelpers.ps1' is not recognized as the name of a cmdlet, function, script file, or operable program.
When trying to deploy either Standalone or Production using VMM SDN Express script. Script errors out after authenticating to domain. See below.
Script output:
Checking the Fabric Configuration Input Parameters
WARNING: The product Key is blank. Specify the Product key by logging into the infrastructure VM while is it being configured
Successfully authenticated with domain System.DirectoryServices.DirectoryEntry.name
Storage Classification : System.Collections.Hashtable.StorageClassification does not exist
Any assistance would be great.
README.md references the non-existent file Config.psd1
After I configured the GW service in my NC network service in VMM, the following exception appeared:
I think it was because my frontend subnet that I specified on my nodes was also my management subnet. I forgot to change it. No configuration was saved, even tought the job succeeded according to VMM. After trying again and only changing the front end subnet, the exception didnt appeared, and I was still able to peer with my tor switch. Not an issue, but a problem I tough might interest you.
Probably it is good idea all the hardware configurations for the VMs in the templates to be highly available. That way when deployment is done the VMs deployed will be highly available.
I followed this [procedure](https://technet.microsoft.com/en-us/library/mt729074%28v=sc.16%29.aspx?f=255&MSPPError=-2147217396) from technet trying to install the templates of the network controller deployment and there was an issue with my logical switch. It is written in this other [procedure](https://technet.microsoft.com/en-us/library/mt732315%28v=sc.16%29.aspx) that teaming is not supported for NC deployment. My switchs didn't onboarded. I am going to try to redo the procedure with a new switch. Also, you can insert the product key in the open designer after you imported the template. Useful trick to not have to go inside the console. We are really excited to working with this technology. Tell me if you want feedback or ideas for test!
I changed my domain account's password after it had expired and tried to re-run the SDNExpress scripts. The scripts failed to complete most notably because the Network Controller could not be accessed. I discovered later that I forgot to update my password in the configuration file (FabricConfig.psd1).
We should add a mechanism to either request new user credentials each time the script is run or at least validate that the current domain account is valid and usable. Right now, there is no intelligible error produced when the domain account referenced is incorrect.
When running SDNExpressTenant.ps1 with the config file populated with IPv6 subnets, I get the following error:
VERBOSE: [HYPERV1]: [[Script]CreateVNet] Invoke-WebRequestWithRetries: Put Exception: {
"error": {
"code": "InvalidIPAddress",
"message": "IPAddress 2001:90::/64 is not in the correct format.",
"innerError": "Microsoft.WindowsAzure.Networking.Nrp.Frontend.Common.ValidationException: IPAddress 2001:90::/64 is not in the correct format.
Function undoNCDeployment does not work as it is trying to remove Run As accounts before other resources are removed first.
First the Service and Service instances needs to be removed.
Then the Service Template.
Then the resources in the library
Then Run As Accounts.
NC Templates in https://github.com/Microsoft/SDN/blob/master/VMM/Templates/NC/ have the following setting vmmst:StorageClassificationRefLocal Storage</vmmst:StorageClassificationRef> for the virtual disk
For example in:
https://github.com/Microsoft/SDN/blob/master/VMM/Templates/NC/Network%20Controller%20Production%20Generation%201%20VM.xml
that is line 165.
Line 165 (vmmst:StorageClassificationRefLocal Storage</vmmst:StorageClassificationRef>) should be removed.
Such setting will cause deploying to C:\ locally on a machine rather then to C:\ClusterStorage\Volume1. In a cluster scenario this could result into deployment issue.
There are bunch of custom resources available on http://PowerShellGallery.com (with source code on GitHub @ https://github.com/PowerShell/DscResources/tree/master/xDscResources) that should be used instead of script resources such as xVhdFileDirectory, WindowsFeature, xVMHyperV, xFireWall etc.
PS C:\Windows\system32> Get-VMNetworkAdapterIsolation -VMName NC-01 | select *
IsolationMode : None
AllowUntaggedTraffic : False
DefaultIsolationID : 0
MultiTenantStack : Off
ParentAdapter : VMNetworkAdapter (Name = 'Management', VMName = 'NC-01') [VMId = 'de4ba4c9-61e1-4fb7-ab32-f3770827cae7']
IsTemplate : True
CimSession : CimSession: .
ComputerName : CONTOSOHV01
IsDeleted : False
The code from line 439 till 442 looks like this:
$MgmtDomainCredPassword = ConvertTo-SecureString -String $node.ManagementDomainUserPassword -Force -AsPlainText
$MgmtDomainCred = New-Object System.Management.Automation.PSCredential ($node.ManagementDomainUser, $localAdminCredPassword)
$MgmtAdminRAA = New-SCRunAsAccount -Name "NC_MgmtAdminRAA" -Credential $MgmtDomainCred
It should be changed to this:
$MgmtDomainCredPassword = ConvertTo-SecureString -String $node.ManagementDomainUserPassword -Force -AsPlainText
$MgmtDomainCred = New-Object System.Management.Automation.PSCredential ($node.ManagementDomainUser, $MgmtDomainCredPassword)
$MgmtAdminRAA = New-SCRunAsAccount -Name "NC_MgmtAdminRAA" -Credential $MgmtDomainCred
After deploying NC with the VMM templates, the only thing I've been able to get to work is connectiong a web server to my VM networks. I can't seem to make NAT or direct routing work. For the cmdlet Get-NetworkControllerServer -ConnectionUri $connectionURI |ConvertTo-Json -Depth 8 I get (among other output)
"Serial": null,
"ConfigurationState": {
"Status": "Warning",
"DetailedInfo": [
{
"Source": "VirtualNetwork",
"Message": "Failed to configure the policies on the host device.",
"Code": "PolicyConfigurationFailure"
},
{
"Source": "VirtualSwitch",
"Message": "Multiple switches with VFP enabled, exists on the host, which is unsupported.",
"Code": "MultipleVfpEnabledSwitches"
},
{
"Source": "SoftwareLoadBalancerManager",
"Message": "Host is not Connected.",
"Code": "HostNotConnectedToController"
}
],
"LastUpdatedTime": "/Date(1467659129642)/"
And when I try to set a direct routing VM network, I get the same settings each time.
Name : VNET2_Gateway
Description :
IPv4Address : 10.254.254.2
IPv4Subnet : 10.254.254.0/29
IPv6Address :
IPv6Subnet :
IPAddresses : {10.254.254.2}
IPSubnets : {10.254.254.0/29}
EnableBGP : False
AutonomousSystemNumber :
EffectiveRoutes : {}
VPNConnections : {}
NATConnections : {}
NetworkGateway : NC
BGPPeers : {}
VMNetwork : VNET2
ServerConnection : Microsoft.SystemCenter.VirtualMachineManager.Remoting.ServerConnection
ID : 55d87a71-56bd-4a6a-9fca-e2f27c21119e
IsViewOnly : False
ObjectType : VMNetworkGateway
MarkedForDeletion : True
IsFullyCached : True
I don't really know what this IP adress means. I certainly haven't configured it manually. It may have something to do with the fact that only 1 out of 3 SLB/MUX peered with my switch, and 2 out of 3 gateways peered with it as well. On one of my GW, BGP hasn't been configured, I don't know how to verify that on the SLB, as they don't use Routing and Remote access for it. Will they use that role in the final version on 2016?
Anyway, do you know if I might have missed a step in the installation? I don't think I did tough. I followed Larry Zhang procedure by the book, except I wasn't in a virtual environment. Feel free to ask for precisions.
Hello,
I run the VMMExpress Script but when I run the Debug Script "Debug-NetworkControllerConfigurationVMM.ps1". All my VMs are not connected to the Network Controller or to the Virtual Switch. I will attach the result as a textfile.
Results.txt
Here is a snippet from the file:
---------------------------------------------------------------------------------------------------------
ResourcePath: https://10.100.8.1/Networking/v1/servers/4c4c4544-004d-3310-8030-b6c04f344732
Status: cchyperv-03.mail.cluster-center.de
Status: Warning
Source: SoftwareLoadBalancerManager
Code: HostNotConnectedToController
Message: Host is not Connected.
----------------------------------------------------------------------------------------------------------
Thanks for any help.
VERBOSE: [NC-01]: LCM: [ End Set ] [[Script]CreateControllerCluster] in 71.9900 seconds.
VERBOSE: Exception: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functio
nality with error message: The certificate provided for client authentication cannot be found on the node NC-
01.CONTOSO.COM. Ensure that the certificate exists and try again
VERBOSE: Disabling tracing for NC.
VERBOSE: Perform operation 'Invoke CimMethod' with following parameters, ''methodName' = SendConfigurationApp
ly,'className' = MSFT_DSCLocalConfigurationManager,'namespaceName' = root/Microsoft/Windows/DesiredStateConfi
guration'.
VERBOSE: An LCM method call arrived from computer CONTOSODC with user sid S-1-5-21-82120362-3978326868-922153
898-500.
VERBOSE: [NC-01]: LCM: [ Start Set ]
VERBOSE: [NC-01]: LCM: [ Start Resource ] [[Script]StopNCTracing]
VERBOSE: [NC-01]: LCM: [ Start Test ] [[Script]StopNCTracing]
VERBOSE: [NC-01]: LCM: [ End Test ] [[Script]StopNCTracing] in 0.0470 seconds.
VERBOSE: [NC-01]: LCM: [ Start Set ] [[Script]StopNCTracing]
VERBOSE: [NC-01]: [[Script]StopNCTracing] Performing the operation "Set-TargetReso
urce" on target "Executing the SetScript with the user supplied credential".
VERBOSE: [NC-01]: LCM: [ End Set ] [[Script]StopNCTracing] in 5.1060 seconds.
VERBOSE: [NC-01]: LCM: [ End Resource ] [[Script]StopNCTracing]
VERBOSE: [NC-01]: LCM: [ End Set ]
VERBOSE: [NC-01]: LCM: [ End Set ] in 5.8570 seconds.
VERBOSE: Operation 'Invoke CimMethod' complete.
VERBOSE: Time taken for configuration job to complete is 5.933 seconds
PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error
message: The certificate provided for client authentication cannot be found on the node NC-01.CONTOSO.COM.
Ensure that the certificate exists and try again
At C:\SDN\SDNExpress\scripts\SDNExpress.ps1:2656 char:9
Start-DscConfiguration -Path .\ConfigureNetworkControllerClus ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Soo. I had almost everything setup. Just had my gateways left do deploy. And I faced the issue here that some other people have had. To this the solution was to install KB4010672 on the network controllers and reboot them twice. It all seems perfect in the beginning and the gateways can be deployed.
Now the real hell starts. If you create any new vm’s or vmnetworks nothing works. All configurations get the error PolicyConfigurationFailure. So, my thought was to install this patch everywhere. All my hosts and vm’s now have this patch but I still faced the same error.
My next step was to redeploy everything fresh with this patch from the beginning (including KB4013429). I had a fresh vmm (UR2.1) and fresh hosts with this patch. My image for the sdn deployment also had this patch. Now everything works perfectly and network controllers got deployed and no errors show up. Now I deploy my HNV network and all seams fine. I deploy a test vmnetwork with ip pool and creates a virtual machine on top of this. This is where I get PolicyConfigurationFailure again.
This patch breaks communication between hosts and network controllers. I have PACA mappings and they have established states on port 6640. But the policys do not get pushed to the hosts.
Does anyone know how to solve this issue?
Here is my output from Debug-NetworkControllerConfigurationState
Checking Network Controller for any Configuration State Errors...
Fetching ResourceType: accessControlLists
Fetching ResourceType: servers
ResourcePath: https://ncc01.mydomain.com/Networking/v1/servers/00000000-0000-0000-0000-0cc47a6eea42
Status: Warning
Source: VirtualNetwork
Code: PolicyConfigurationFailure
Message: Failed to configure the policies on the host device.
Fetching ResourceType: virtualNetworks
ResourcePath: https://ncc01.mydomain.com/Networking/v1/virtualNetworks/00377689-bdb5-4bf6-85a0-b16dad107469
Status: Failure
Fetching ResourceType: networkInterfaces
ResourcePath: https://ncc01.mydomain.com/Networking/v1/networkInterfaces/079ecf8c-7f86-466b-89f3-1731af29de3f
Status: Failure
Source: VirtualSwitch
Code: PolicyConfigurationFailure
Message: Failed to configure the policies on the host device.
Source: VirtualNetwork
Code: PolicyConfigurationFailure
Message: Failed to configure the policies on the host device.
Fetching ResourceType: virtualGateways
Fetching ResourceType: loadbalancerMuxes
Fetching ResourceType: Gateways
Here is an error from the networkconttroller-VSwitchService
Goal State push on device failed: 15444805-6dc7-4a01-8b02-b78b334679a7, Error: TimedOut
Hi,
i need to insert HTML view or picture to appointment body.
please advice me how to do it with web service or any other way.
i need to send it from my application.
Thank you,
Omer G
The documentation for the Cisco documentation makes use of overlapping networks regarding link-nets between spines and tors
the below diagnostic scripts does not support network controllers with Kerberos authentication:
Get-ConnectivityResults.ps1
Test-LogicalNetworkPing.ps1
Test-VNetPing.ps1
the parameter -UseDefaultCredentials should be added for Kerberos authentication
Hi,
I have this problem when I use SDNExpress script
add-windowsfeature : ArgumentNotValid: The role, role service, or feature name is not valid: 'SoftwareLoadBalancer'. The name was not found.
At line:1 char:1
any idea?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.