Autostandby as a capability has been available for powerpath for over a year and a half. Must be something in the zeitgeist but all of a sudden, I have seen a couple of threads from customers and the field. And these threads have covered the entire range – from customers who are positively gushing about this capability, to questions about how this works, to operational questions like what tweaks are possible or not possible.
The background behind autostandby
We started down the autostandby road with some crucial observations:
- Most host I/O operations in a sequence are correlated to each other. In other words, random I/O workloads, while they do exist, are rare during customer operations.
(And yes, I realize that any generalization is dangerous territory. So, remember, we are following the 80:20 rule here).
- VPLEX has a read cache. To take advantage of this, you want to maximize the likelihood that read-type I/Os encounter cache hits, thereby reducing the latency for these I/Os.
Translation: If you combine the two observations above, then, for better performance, you want I/Os from a given host to a given volume to be directed to a given set of directors as much as possible.
Finally, let’s now bring the distance component into this. Particularly, the focus here is on the cross connect (Additional vMSC Qualifications for VPLEX Metro). In the case of the cross connected configuration, there is a latency advantage to having I/Os be directed to the local cluster. Otherwise, I/Os get subjected to the cross site round trip latency penalty. By the way, this is one of the reasons that we have chosen to restrict the support latency envelope for cross connected configurations to 1 msec RTT.
Working with the PowerPath team, we set out trying to address the design goals outlined above. Now PowerPath has a mechanism to address paths that should not be used for multipathing purposes. This is where paths get set to manual standby. That designates these paths (if alive) as usable once all the primary (non-standby) paths have failed.
For the VPLEX Metro cross connected environments, the designation of which path is on standby will depend on where the host is located corresponding to the VPLEX cluster. The host paths connected to the local VPLEX Cluster will be the *active* paths whereas those connected to the remote VPLEX Cluster will be the *standby* paths. As a result, the path setting needs to be automatic and at scale across all hosts.
How does the solution work?
A lot of the recent questions have been focused on how the algorithm for path selection works. So at a high level, here goes:
- Each host sets the preferred cluster independently. So, each host affinitizes correctly to the appropriate VPLEX Cluster
- If the delta between the minimum latency between clusters is zero, the preferred path designation is applied to one cluster or the other
PowerPath versions where autostandby for VPLEX is supported
Here are the minimum versions where autostandby for VPLEX is supported:
Frequently Asked Questions
For a given distributed volume, if there are multiple paths on a given cluster which is chosen as the preferred cluster, do all paths get utilized?
What is the frequency of the path latency test? What is the trigger for the path latency test?
What is the minimum latency difference between two paths before which one will be set on autostandby? What is the default? and is this settable?
I have a VPLEX Metro cluster deployment in which the cross connect latency is extremely small. I do not need the autostandby algorithm. Can I turn it off?
On failure of all active paths, the standby paths get made active. When the original paths return, does the user have to take any steps to return the configuration back tot he original configuration or does the pathing revert back to the original state>
PowerPath also has an autostandby mode that has been introduced to enable handling of flaky paths (IOs-Per-Failure autostandby). This blog is focused on the VPLEX portion of auto standby (referred to as the proximity based autostandby).