Friday, September 12, 2014

Misadventures with Network Location Awareness

Recently, I ran into a situation where an IIS-based application (Microsoft Dynamics CRM 2011) was working very slowly.  It was working—but pages were taking a very long time to load, on the order of minutes.  Lots of troubleshooting went into determining the cause of this issue: checking database configuration and integrity, application settings, and network latency and routing issues.  Ultimately, in an illogical act of total desperation, all of the systems supporting the application were configured to have their host-based firewalls completely disabled.  After that, the application page loaded instantly, and everything was very responsive!

This wouldn’t have been so surprising if the Windows advanced firewall configuration hadn’t already been configured to allow all of the required connections between each server.  A little more investigation into the issue revealed that the Domain-connected network adapters were either assigned to the Public profile, or were stuck in the “Identifying…” state.  This is a big problem, because some of the firewall rules necessary to support the service were configured to be applied only to network connections that were assigned to the Domain profile.

So, I had found the root cause of the problem.  But, why were the network adapters being assigned to the incorrect profile, or not being assigned a profile at all?  According to Microsoft, in order for the Network Location Awareness service, which is responsible for assigning profiles to each of your networks, to assign the Domain profile, two conditions must be met:
  1. The system must be able to communicate with a DNS server that has the same connection specific DNS name that it is configured for.
  2. The system must be able to contact a Domain Controller via LDAP.
If both of these conditions are not met, then the system cannot be assigned the Domain profile.  I suspect that on this system, during the setup phase, there were network configuration issues or interruptions that prevented NLA from connecting to the DNS server when it was trying to identify the network.

There isn’t a prescribed situation in which a system should get stuck in the “Identifying…” state, but I can say that I have certainly seen it happen.  I suspect it might be a bug in the NLA service, as even if the system can’t get to a DNS server or contact the Domain Controller, it should end up in some state, even if it is the public network profile.  If I have time I would like to do more investigation into why this happens.  Until then, you can usually coerce it into making a decision by manually restarting the Network Location Awareness service, NlaSvc.

You might also find that you have lots of unused or incorrect profile information stored on your systems.  You can clear this out and let the NLA service regenerate it by removing all of the keys stored in HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkList\Nla.

I also discovered a corner case that is very unlikely to occur, but is worth considering if you are using NLA and the Windows Advanced Firewall to help secure your network: That is, if you have a complete system outage, including your Primary Domain Controllers and DNS server, that server might get assigned the incorrect (public) profile when it is restarted.  A solution for this is to create a scheduled task that triggers on DNS server event ID 4 to run the PowerShell command “restart-service dhcp –force”.  This works for two reasons: The DNS server sends an event ID 4 when it has completed loading the AD integrated zones and the domain is fully available, and because NLA will re-profile the network when DHCP is restarted.  With this scheduled task in place, you can be more confident that in the event of a complete outage, the correct rules end up assigned to your AD/DNS servers.

No comments:

Post a Comment