WatchGuard DHCP Relay and Network Access Protection
So I started a thread on the TechNet forums after having a problem with DHCP Network Access Protection (NAP) after I’d migrated our DHCP/Network Policy Server (NPS) to Windows Server 2012.
Part of the migration was to move the DHCP server to our core server subnet and to use our WatchGuard XTM 510 Firewall as a DHCP relay agent. As a relay agent on pure DHCP it was fine – put DHCP NAP in to the security soup and suddenly it all goes sour…
After trying to figure out the problem myself and failing dismally I was faced with a choice – call our MS Partner for assistance or call MS direct. I chose the later. Good job too.
The process of logging a call with MS is actually quite straight forward and allows you to tell them about the issue you’re having, I pointed them to the forum so they could get a rough idea of what was happening.
So it has taken several weeks of back and forth with MS tech support who have been very patient and have collected a lot of network trace to discover that when a machine is in non-compliant state under NPS rules the WatchGuard would tell the machine 3 times to change its DHCP configuration to the one given by the NPS server. This sounds excessive – in addition it seemed to tinker with the packet slightly – this caused the client machine to not understand the configuration thus generating the message: An error occurred in processing the SOH Response. Error code is 0x800706C6
We tried putting a RRAS box into act as DHCP relay agent which worked for the initial client DHCP broadcast and reply but failed after – but why?
Because the DHCP server is in a different subnet!
I’ll explain… With the aid of a diagram!
So as you can see the client is in the 192.168.8.0/24 subnet, the DHCP/NPS server is in 192.168.5.0/24.
When the client booted it would do a DHCP broadcast the WatchGuard would relay the information to the DHCP/NPS server and pass back the DHCP configuration – all good until NAP comes in… So when the client machine runs its System Health Validation (as told to do by Group Policy) it would send the information to the DHCP/NPS server via unicast (direct). The DHCP server would then send back the non-compliant DHCP configuration which the WatchGuard then messed around with and the client couldn’t understand. So MS tech support suggested putting a Routing and Remote Access Server (RRAS) server in to see if we could remove the DHCP relaying from the WatchGuard – so this is what we ended up with:
So the WatchGuard was not doing the relaying – yay! But no… still not working. EH??
Whilst the DHCP broadcast was relaying correctly the Statement of Health (SOH) information was being sent via the default gateway (the WatchGuard) to DHCP/NPS. Why? Once the IP address has been assigned to the client it is made aware of its DHCP server’s IP address – so it sends information directly. Consequently when the NPS server evaluated the SOH information and determined the client to be non-compliant it sent the information back to the client via the only route it knows how to use – the WatchGuard (it’s the default gateway for the server subnet too). At which point the WatchGuard set about tinkering with the packets and sent them on the client. The client gets the amended packets and consequently rejects them as malformed – hence the message: An error occurred in processing the SOH Response. Error code is 0x800706C6
So I was left with only three of options:
- Change all my routing to use the RRAS server on both subnets – no way. Far too much work – would’ve affected much more than just DHCP NAP!
- Remove DHCP NAP protection from the network and rely on IPsec NAP only. No – the DHCP is the first layer of protection for us, IPsec gets layered on top – with 802.1x coming later (maybe).
- Move the DHCP server…
I thought about the 3rd option and decided on a compromise. The wireless infrastructure we have has been configured to use the 192.168.5.11 address for RADIUS authentication and to change that would’ve been a massive pain the *&^#! We don’t have centralised management for the APs – yet. The compromise was to multi home the DHCP server, dip a toe in to each subnet and sit on the proverbial fence between subnets.
An extra virtual network card added via System Center Virtual Machine Manager (SCVMM) connected to the Desktop Range, some manual IP addresses assigned and job done.
The DHCP server is now serving both sides as required (the server side is purely for PXE deployed servers – a post for another day). The IP address the clients are getting for their DHCP server are now on the same subnet, as such no need to go through the WatchGuard or RRAS server.
Testing! Testing! Testing! Whilst I’m confident I’ve solved the issue I don’t want to jump the gun. Time to pilot some users – I should probably tell them…