Amidst all the fun of EMC World, there is some really important news for the VPLEX Metro and VMware community that I wanted to ensure was not lost.
What was supported until now
Prior to this change, the official vSphere Metro Storage Cluster (vMSC) support stance was that VMware HA and vMotion was supported until 5 msec RTT (Round Trip Time). There was an additional wrinkle. vMotion was supported up to 10 msec RTT with your vSphere Enterprise Plus licensing. And VPLEX Metro supported it. However, VMware HA was not supported up to 10 msec RTT.
What has changed
The big change that happened is that now as a part of vMSC both VMware HA and VMware vMotion have now been extended to be supported with VPLEX Metro up to 10 msec RTT. This qualification has been completed with both PowerPath/VE as well as with NMP (Native Multi-pathing) for the non-uniform access mode (i.e. non-cross connected configuration). This support is available starting with vSphere 5.5 and GeoSynchrony 5.2.
The VMware Knowledge Base article is updated here.
Many thanks to the VMware Ecosystem Engineering team as well as key technical leaders on both the VMware and EMC sides for helping drive this. This has been a long time coming.
My previous blog post spurred a lot of conversations internally and externally (all good!) about uniform and non-uniform configurations and how they apply to VPLEX. I am sure these are questions that a lot of folks have. So, here is my effort at explaining these options using the I/O flows for VPLEX. Are you ready to dive deeper into understanding uniform mode from a VPLEX frame of mind?
What got the discussion started?
It all started with the assumption of what you think ‘Uniform Access Mode’ means. If you start with the assumption that in uniform mode, hosts from either side of the stretched cluster access the same storage controller, then that begins to explain what the issues are.
If you start with that assumption, then here is what a uniform configuration looks like to you:
In this scenario, the dark blue lines are being used for access to the same volume. Typically, in this state, one of the controllers is providing active access. The other controller provides access to the volume in case of failover. To put it simply, Figure 1 does not apply to VPLEX (and hence this post!).
VPLEX, because of its AccessAnywhere™ distributed cache coherence, allows simultaneous read and write access to storage on all controllers. So, while VPLEX can certainly be configured to run in the mode above (or even in the mode where all paths above are dark blue), the recommendation for how to configure VPLEX is in Figure 2. It is important to point out that while this discussion is about the Uniform Mode, VPLEX, quite uniquely, can operate in the Non-Uniform mode as described in my previous post. I expect that an overwhelming majority of our customers will prefer the Non-Uniform mode since it is easier to administer, has less complexity and provides superior performance and resiliency. The cross-connected mode is for customers who are able to have the host servers in a different fault domain than VPLEX (and the storage arrays behind VPLEX). This gives such customers the ability to continue running without application failure even on failure of paths to one side of the storage.
In this mode above, the paths from the ESX Servers to the VPLEX Cluster closest to it (in terms of latency) should be the primary access paths (from the host perspective) while the cross-connected paths are set to standby. In the previous blog entry, this was the configuration that was referred to as the cross connected topology. The key benefit of this configuration is that the cross-connected paths (which are the longer latency paths) are used only when the local (and therefore, lower latency) paths are not available.
The rest of the blog will help you understand how I/O flow works in the VPLEX scenario as compared to that outlined in Figure 1 and how the VPLEX configuration outlined in Figure 2 above is the preferred configuration when running the cross connect topology (even though VPLEX can support all configurations).
Understanding READ I/Os
Let us now take a look at how READ I/Os will be processed in the configuration in Figure 1 and the VPLEX configuration in Figure 2.
Figure 3 represents how I/Os will flow in a Uniform Mode single controller access topology described in Figure 1. The READ I/Os issued by ESX Server A are routed to the Storage Controller B and the response will traverse back to ESX Server A on that same interconnect. In other words, the READ I/O from ESX Server A will incur one complete round trip latency of the cross-connect. To put some numbers to this, if the round trip time (RTT) between the sites hosting (ESX Server A + Storage Controller A) and (ESX Server B + Storage Controller B) is 5 msec, then the READ I/O in the case above will encounter a wire latency of at least 5 msec in addition to the time taken the execute the I/O. Also, note that the data itself is also being sent across the cross-connected wire (which is a consideration if you are using leased lines and want to control your bandwidth).
Figure 4 represents how I/Os will be executed with VPLEX and the Uniform Access Mode configuration described in Figure 2. Here the READ I/Os from ESX Server A will get issued on the primary paths which are connected to the site local cluster. Since VPLEX makes the data available on both sides before acknowledging to the host, the READ I/O gets serviced from VPLEX Cluster A. In other words, the READ I/Os in this topology will not incur the impact of the cross site latency nor will the data hit the cross connect links upon access.
[NOTE 1: One obvious question is comparing this particular configuration to one where all paths are actively used with round-robin as your pathing policy instead of the recommended fixed pathing policy (i.e. imagine the red-dashed lines in Fig 4 are also solid lines). Since in this case paths get served on a round robin basis, the I/Os going on the cross connected paths will incur the latency that was described from Fig 3. Secondly, if your I/Os have any expectation of READ cache hit, you will improve your chances for a cache hit significantly by using the configuration in Fig 4 v/s one where all paths are simultaneously marked as primary].
[NOTE 2: I/Os from ESX Server B to Storage Controller B or VPLEX Cluster B are identical in both scenarios].
Understanding WRITE I/Os
Let us now turn our attention to WRITE I/Os to see if there are any differences between the configurations.
Figure 5 represents a WRITE I/O with the Uniform Mode single controller access topology depicted in Figure 1 whereas Figure 6 represents a WRITE I/O with the Uniform Access Mode with VPLEX (Figure 2).
In the scenario shown in Figure 5, WRITE I/O issued from ESX Server A will travel to Storage Controller B. Each of the subsequent I/O phases (XFER_RDY, DATA OUT and STATUS) will be driven through storage controller B. Similar to what happened in the READ I/O processing, each of these I/O phases will incur the cross connect latency. One additional facet to be aware of: since storage controller A and B are protecting the data synchronously, the WRITE DATA will have to be sent from Storage Controller B to Storage Controller A. In other words, there will be one more RTT latency incurred in sending that data across. Additionally, for such I/Os, you will be sending the data across the wire TWICE. If you are doing the math, for a WRITE I/O, you will incur 3x the RTT latency. To put some numbers around it, if the two sites are separated by 5 msec, each WRITE I/O will incur 15 msec (!!) of wire latency in addition to the latency needed to execute the I/O and you are consuming twice the cross connected bandwidth.
So, how does the VPLEX configuration in Figure 2 perform better? Let’s take a look.
Again, since VPLEX allows access to storage on both sides, ESX Server A is able to issue the WRITE I/Os and subsequent data phases to its local VPLEX Cluster. Since VPLEX Metro is synchronous, VPLEX will copy the WRITE data from VPLEX Cluster A to Cluster B. This will encounter RTT latency. It is also noteworthy that the I/O is sent across the site only once. For a WRITE I/O with the VPLEX deployment, you encounter 1 RTT latency in addition to the time taken for the I/O to execute.
[NOTE 3: In case you are wondering, if for some reason the paths to the local VPLEX Cluster fail, the standby paths are activated by vSphere and you have no application impact. However, WRITE I/Os will encounter the cross connect latency. This is one of the reasons we limit latency support for the cross connect topology on VPLEX to 1 msec RTT. If you happen to use products other than VPLEX (HOW DARE YOU? 😉), please ensure that you understand the architecture and the corresponding worst case I/O implications.]
[NOTE 4: I/Os from ESX Server B to Storage Controller B or VPLEX Cluster B are identical in both scenarios].
If all the above has your head spinning, here is a table to summarize all that was written above:
As you can see, having a cross connect topology with the cross-site paths on standby has some pretty significant advantages to using the single controller access topology and should be your cross-connect deployment of choice for VPLEX in the Uniform Access Mode.
One part of the debate that I will not follow-up on is the question of whether the configuration in Figure 2 is Uniform Mode or not. You could legitimately view it as the Non-Uniform mode described in my previous blog post with Uniform Mode when there is no alternate available. My take: So long as you – our customers – understand the differences and the advantage of one over the other and are able to deploy it to your advantage, it doesn’t matter what me or anyone else calls it.
During the initial VPLEX Metro qualification for the VMware Metro Storage Cluster (vMSC), VPLEX was qualified with the non-uniform host access mode together with VMware Native Multi-Pathing (NMP). With a recent update to the testing, VPLEX is now supported with the uniform host access mode.
Uniform Access Mode – What is it?
To understand uniform access mode, let us start with what non-uniform access mode is and work our way back to what uniform access mode is.
In the more common VPLEX Metro deployment, a host that is connected to one of the clusters of the VPLEX Metro does not connect to the other cluster. This is referred to as the non-cross connected topology. Here is a representation of this topology:
In this configuration, the hosts can only do I/Os with the VPLEX cluster that the host is connected to. VPLEX through AccessAnywhere™ presents the same storage on either side (Distributed Virtual Volume). The advantage of such a configuration is that the two sides are isolated. Combined with the VPLEX Witness and VMware HA, this configuration provides for automatic restart of VMs when there is a failure on one of the sides.
vMSC refers to this configuration as ‘non-uniform’ host access configuration. This was the original configuration that was qualified with VPLEX Metro. (If you need more details on this, refer to the Chad Sakac’s Virtual Geek blog here, Scott Lowe’s blog here or Duncan Epping’s Yellow Bricks blog here)
There are certain deployment requirements in which customers would like to further enhance the availability offered by the non-uniform host access configuration. They would like to avoid server restart upon storage failure. In order to accomplish this, there are two implementation requirements:
Hosts need an alternate path to access to the storage (Translation: Hosts connected to one side of a VPLEX Metro need to access the same storage on the second side)
Hosts need to be in a different failure domain than the storage (Translation: Hosts need be able to survive even when the storage might not. Examples of how this is accomplished is via fire-cells within data centers or different floors)
An important side note – one of the rising trends we are seeing is where VPLEX Metro is deployed ‘within’ a data center. In this mode, customers want to protect:
Across two different arrays within a data center
Across equipment on two different floors or firecells
Across two different Vblocks (or other forms of converged infrastructure)
Here is a configuration that delivers on the requirements stated above:
In this configuration mode, using VPLEX Metro, the same storage volume is accessible from both sides. From a host perspective, it sees multiple paths to the same volume. This configuration is referred to as the cross connected configuration. The red dashed paths are referred to as the cross connect paths. When the storage on one side fails (the entire layer from VPLEX down), the VPLEX Witness enables I/Os on the second VPLEX Cluster. From a host perspective, it continues to see the cross connected paths as continuing to be available and as a result, the loss of storage connected to one side of the VPLEX cluster gets converted into a loss of redundant paths from a host perspective. As a result, for this configuration, there is no downtime when there is a storage failure in this configuration.
vMSC refers to this configuration as the ‘uniform’ host access configuration. Post completion of a recent qualification, this configuration is now supported for VPLEX. The VPLEX vMSC
So what’s the catch?
As with most good things in life, there are tradeoffs (And you thought there was no philosophy in storage!!).
For the configuration above, the cross connected paths represent paths of longer latencies than the non-cross connected paths. If the host has to use that path, that would cause the read latency to be longer than it would be in the non-uniform access mode configuration). Another consideration is that if all paths are simultaneously active, then along the cross connected paths, the I/Os may need to traverse twice the cross site latency twice. To mitigate any increase in application latency, VPLEX supports a cross site latency of up to 1 msec RTT in the uniform host access mode.
A second aspect of this is that since the latencies are short, customers have the option of using stretched fabrics. In such configurations, proper care needs to be taken so as to not extend the failures from one side of the fabric to another. Among other things, fault isolation becomes a major design consideration in this configuration mode.
Some other questions / considerations
Here are some questions that we have been seeing from customers and the field.
(1) When is support for PP/VE going to be added through vMSC?
Engineers at VMware and EMC are working towards completing the vMSC qualification with PP/VE. Stay tuned! [Please note that EMC supports the use of PP/VE on VMware (and it is also supported within the base storage qual for VPLEX).]
(2) What if I need an increased latency together with the cross connect topology? OR Can I use fixed path policy on my NMP and use the cross connect topology over greater latencies?
While this is technically feasible, this is not currently a supported configuration. Please work with your account team to file an RPQ for this configuration.
(3) Can I deploy the cross connected topology without deploying the VPLEX Witness?
The benefit of the cross connected topology is the ability of the host to continue running when you lose access to storage on one side. The VPLEX Witness enables I/Os on the surviving side . This is what allows the hosts to continue running on the available alternate paths through the surviving VPLEX cluster. In other words, deploying without the Witness will not yield the core benefit of this topology.
(4) What about additional collateral?
Thought you’d never ask ;-). If you want to dive into this in more depth, here are documents that dive deeper into these and other topics: