Month: March 2015

vSphere DRS and HA

Turning on DRS (Distributed Resource Scheduler) in a new Cluster will add an additional option to choose from.

1

Automation Level has three options:

  • Manual

The DRS Cluster will suggest relocation of VMs but not execute them.

  • Partially Automated

DRS will locate a newly created VM based on work load, and will also do this at VM power ON, but not after. So the only automation here is when a VM is first created or powered ON. Afterwards it just acts like the manual setting.

  • Fully Automated

Fully automated makes the DRS move VMs around at all times, and an additional option comes with this setting which is;

  • Migration Threshold

Conservative is the result of a partially imbalanced cluster, and an aggressive is the result of trying to come as close to a balanced cluster as possible. Moving VMs around requires resources too, so a slightly conservative threshold might be recommend although best practice is to leave it at default.

Distributed Power Management is something that is not initially visible when first creating a cluster.

To enable this setting right-click your cluster -> Settings -> vSphere DRS and press Edit.

4

DPM works in conjunction with DRS to balance the load on as little resources as possible. DPM uses WOL, IPMI or iLO to wake up the host when its resources are needed. Be careful with this as in some cases hosts might shutdown and startup all the time, or a host might just not wake up when it is needed.

vSphere HA comes with the option Enable admission control, and you can set this setting if you want vSphere to prevent you from powering ON too many virtual machines that would eventually violate your cluster resources. So if you have two hosts and each of them could power 10 VMs, then you would not be able to power on more than 10 VMs because the cluster is going to reserve the other host for failover capability.

In the event that you do not enable admission control and over subscribe your resources then vSphere will still try to power up as many VMs as possible when you have a host failure, but there is no guarantee. You can use VM restart priority to boot up critical VMs first;

6

You have to decide whether high availability is more important than running as many VMs as possible when choosing admission control.

2

You can set the policy on admission control based on how many hosts your cluster can tolerate, or how big of a percentage you want to reserve for failover.

If you have a two node setup and set the Host failures cluster tolerates to 1 then that would be the same as reserving 50% of CPU and memory capacity for that particular setup.

The VM Monitoring Status comes with three options:

  • Disabled
  • VM monitoring only

VM monitoring only provides vSphere the ability to monitor VMware Tools and restart the VM on another host if VMware Tools becomes unavailable.

  • VM monitoring and application

If you set it to also monitor application you must set this in conjunction with a vendor. So this is vendor specific and will not work right out of the box.

Now the last setting is EVC.

3

EVC mode is for making hosts of the same CPU brand (Intel and AMD) compatible between models/generations so you can vMotion. Sometimes you may want to add additional hosts to an existing cluster but the model has gone out of production, and you can only get a newer model with a CPU with lots of new features. You would then turn on EVC and downgrade the cluster to the “worst” CPU to make them compatible and mask out all those new CPU features in the new model.

I usually think of Microsoft’s Domain and Forest Functional level where you downgrade the level to the lowest version of a domain controller you have. So if you have a 2003 and 2008 domain controller you would downgrade the functional level to 2003 in that forest.

If you edit vSphere HA you will see some additional settings not visible when set on a newly created cluster.

5

I touched base on the VM restart priority earlier, and this can be overwritten. The next one is the Host isolation response. So what would you like to do in the event that a host would become isolated and unable to actively migrate VMs while powered ON?

  • Leave powered on
  • Power off, then failover
  • Shut down, then failover

What is a Host isolation response?

Host isolation is when the management network of a host becomes unavailable. This can happen for many reasons, but this means you cannot move VMs around while they are powered ON. They will still run because they have access to the storage network as seen here;

8

To address this issue VMware introduced Datastore Heartbeating.

7

This will help prevent the status of a host isolation – a second layer defense.