vSphere 5.5 EOGS - vSphere 6.5 Features Part 2 - Availability
vSphere 6.5 introduces some big enhancements to availability (I have listed the main points but for all details check here). The first being Proactive HA. The name suggests that this is an integral part of HA, however it's actually invoking DRS for availability.
Predictive HA (My abbreviation PHA) brings intelligence to vSphere where communication between the hardware and vSphere occurs. Instead of suffering a host failure and relying on a HA VM boot time penalty to happen i.e. host fails, HA engages and boots VM on another available host, PHA communicates with the hardware so that it's aware of impending issues before they occur. In this case if there is an indication of a hardware component anomaly PHA can invoke DRS to move the workload(s) off the host before failure thus increasing uptime and availabiliy for the application(s).
Currently Dell OpenManage, HPE OneView & Cisco USC monitoring software have web client plugins/providers that will work with PHA. They will pass health status/alerts to DRS.
Areas that are monitored include memory, local storage, PSU's, cooling fan's and network interfaces.
There is a new host state introduced called "Quarantine mode". This means that in the event of a signal from the provider to DRS that a failure is likely the host is put into this mode and VMs are migrated to healthy hosts (as long as affinity and anti-affinity rules are upheld. When new VMs are created the Quarantine hosts are avoided if possible.
HA Orchestrated Restarts/Dependencies
This is a very useful feature and something that many customers have agreed is beneficial for their environment. Dependencies will allow's you to create VM to VM restart rules. So for example you have a web server VM that relies on a data access middleware VM that relies on a SQL DB VM. In the event of a failure where HA recovers the VM or VMs in the chain the application consistency may be out of step and therefore the application may not function correctly.
With dependencies a mapping can be created so in the event of failure restart rules are in place so that the correct order of services coming online can be established.
DRS works great with it being one of the most used and dependent services our customers use. DRS works by looking at the last 5 minutes and based on this makes decisions about placement and balancing. vROPS is analytical and build trends of whats happening in the environment over a much longer period. It runs an analysis nightly on the environment and then build's future forecasting. This data will now be sent to DRS to make more "ahead of time" decisions. e.g. every Monday morning at 9AM there is a spike due to applications being consumed. With DRS on it's own there would be a little catchup (because of the 5 minute slider). Now with VROPS passing this information (as it has learned a trend every week) DRS has plenty of "ahead of time" knowledge to balance workloads in preparation for the busy 9AM period.
I have not listed all the options here such as VM distribution, memory metric for load balancing and network aware DRS, but one I do want to mention is in my opinion a very cool option. We can now set CPU overcommitment and this will enforce a maximum vCPU:pCPU ratio across the board. Once the cluster hits this value no VM will be allowed to be powered on. A real use case is obviously VDI - this is where we want maximum performance and a peanut butter spread across the cluster. It gives a nice policy setting that defines absolute performance.