Network Protection in Windows Server 2012 R2 for Hyper-V Guests

In Windows Server 2012 R2 there is a new feature for VMs called Network Protection. This allows the Hyper-V host to monitor the VMs network connectivity and Live Migrate the VM if it loses network connectivity. So if you’ve got enough switches you’re living the dream!

Example 1

  • 4 Switches
  • 4 Hyper-V nodes

4 Hyper-V nodes, 4 Network Switches

If you were to lose just Switch 1 then Hyper-V 1 and Hyper-V 3 would lose half of their teamed NIC for VM guest traffic and Hyper-V 2 but Network Protection would not kick in:

4 Hyper-V nodes, 4 Network Switches, 1 down

You would have to lose at least 2 switches before Network Protection kicks in. If you were to lose Switch 1 and Switch 2 then Hyper-V 1 would realise this and live migrate its VMs to another node (via the blue connections). Hyper-V 2 would lose half of its teamed NIC for VM guest traffic and half of its teamed NIC for Management functions; as would Hyper-V 3. Hyper-V 4 presents a massive issue:

4 Hyper-V nodes, 4 Network Switches, 2 down

If you’ve followed Microsoft best practise and locked your cluster communications (i.e. heartbeat) to one network, or included your Live Migration network, then the rest of the cluster cannot see the host and it would then attempt to start up the VMs that were on that node as per standard Hyper-V recovery rules. Here lies the issue – the VMs will still be running on Hyper-V 4 and as such the VHD/VHDX files would be locked and therefore the other Hyper-V hosts would be unable to get a lock on the file and unable to start the VM.

Example 2

  • 2 switches
  • 4 Hyper-V nodes

4 Hyper-V Hosts, 2 switches

In the diagram above if you were to lose Switch 1 (or Switch 2) then Network Protection would not kick in; every Hyper-V node would lose half of both teams:

4 Hyper-V nodes, 2 Network Switches, 1 down

Obviously if you were to lose both switches then you’re up the creek.

Example 3

  • 3 switches
  • 4 Hyper-V Nodes

4 Hyper-V nodes, 3 switches

In the diagram above if you were to lose Switch 1 then Network Protection would not kick in. Hyper-V 1 would lose half of its teamed NIC for VM guest traffic but keep all of its teamed NIC for Management functions, as would Hyper-V 3. Hyper-V 2 would lose half of its teamed NIC for Management functions but keep all of its teamed NIC for VM guest traffic; as would Hyper-V 4.

4 Hyper-V nodes, 3 Network Switches, 1 down

If you were then to lose switch 2 as well things get interesting (again)…

4 Hyper-V nodes, 3 Network Switches, 2 down

Hyper-V 1 would realise that its guest VMs have lost network connectivity and live migrate them to another host (Hyper-V 2 or Hyper-V 3 as they still have one converged management NIC and one teamed NIC for VM guest traffic available). Hyper-V 2 and Hyper-V 3 would have 50% of both teamed NICs available. Hyper-V 4 presents a massive issue.

Again, if you’ve followed Microsoft best practise and locked your cluster communications (i.e. heartbeat) to one network, or included your Live Migration network, then the rest of the cluster cannot see the host and it would then attempt to start up the VMs that were on that node as per standard Hyper-V recovery rules. Here lies the issue – the VMs will still be running on Hyper-V 4 and as such the VHD/VHDX files would be locked and therefore the other Hyper-V hosts would be unable to get a lock on the file and unable to start the VM.

So what could you do to avoid this?

Easiest option – connect all Hyper-V hosts to all switches. Network Protection would then not be applicable to you and could turn off the feature on your VMs.

So where would this be useful

In big clusters! For example a 1U switch will be able to take 48 ports, be that 10Gb, 1Gb, whatever. Once the switch is full you more need to connect your hosts and as such you will not be able to connect every node to every switch. Especially if you’ve got NIC teaming going on (why wouldn’t you).

One other tip – even if you stack/cluster your switches don’t put all your eggs in one stack/cluster if avoidable; what would happen if the stack/cluster failed?

Advertisements

Posted on 4 July, 2013, in Uncategorized. Bookmark the permalink. Leave a comment.

Anything to add? Let me know

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: