My previous blog post spurred a lot of conversations internally and externally (all good!) about uniform and non-uniform configurations and how they apply to VPLEX. I am sure these are questions that a lot of folks have. So, here is my effort at explaining these options using the I/O flows for VPLEX. Are you ready to dive deeper into understanding uniform mode from a VPLEX frame of mind?
What got the discussion started?
It all started with the assumption of what you think ‘Uniform Access Mode’ means. If you start with the assumption that in uniform mode, hosts from either side of the stretched cluster access the same storage controller, then that begins to explain what the issues are.
If you start with that assumption, then here is what a uniform configuration looks like to you:
In this scenario, the dark blue lines are being used for access to the same volume. Typically, in this state, one of the controllers is providing active access. The other controller provides access to the volume in case of failover. To put it simply, Figure 1 does not apply to VPLEX (and hence this post!).
VPLEX, because of its AccessAnywhere™ distributed cache coherence, allows simultaneous read and write access to storage on all controllers. So, while VPLEX can certainly be configured to run in the mode above (or even in the mode where all paths above are dark blue), the recommendation for how to configure VPLEX is in Figure 2. It is important to point out that while this discussion is about the Uniform Mode, VPLEX, quite uniquely, can operate in the Non-Uniform mode as described in my previous post. I expect that an overwhelming majority of our customers will prefer the Non-Uniform mode since it is easier to administer, has less complexity and provides superior performance and resiliency. The cross-connected mode is for customers who are able to have the host servers in a different fault domain than VPLEX (and the storage arrays behind VPLEX). This gives such customers the ability to continue running without application failure even on failure of paths to one side of the storage.
In this mode above, the paths from the ESX Servers to the VPLEX Cluster closest to it (in terms of latency) should be the primary access paths (from the host perspective) while the cross-connected paths are set to standby. In the previous blog entry, this was the configuration that was referred to as the cross connected topology. The key benefit of this configuration is that the cross-connected paths (which are the longer latency paths) are used only when the local (and therefore, lower latency) paths are not available.
The rest of the blog will help you understand how I/O flow works in the VPLEX scenario as compared to that outlined in Figure 1 and how the VPLEX configuration outlined in Figure 2 above is the preferred configuration when running the cross connect topology (even though VPLEX can support all configurations).
Understanding READ I/Os
Let us now take a look at how READ I/Os will be processed in the configuration in Figure 1 and the VPLEX configuration in Figure 2.
Figure 3 represents how I/Os will flow in a Uniform Mode single controller access topology described in Figure 1. The READ I/Os issued by ESX Server A are routed to the Storage Controller B and the response will traverse back to ESX Server A on that same interconnect. In other words, the READ I/O from ESX Server A will incur one complete round trip latency of the cross-connect. To put some numbers to this, if the round trip time (RTT) between the sites hosting (ESX Server A + Storage Controller A) and (ESX Server B + Storage Controller B) is 5 msec, then the READ I/O in the case above will encounter a wire latency of at least 5 msec in addition to the time taken the execute the I/O. Also, note that the data itself is also being sent across the cross-connected wire (which is a consideration if you are using leased lines and want to control your bandwidth).
Figure 4 represents how I/Os will be executed with VPLEX and the Uniform Access Mode configuration described in Figure 2. Here the READ I/Os from ESX Server A will get issued on the primary paths which are connected to the site local cluster. Since VPLEX makes the data available on both sides before acknowledging to the host, the READ I/O gets serviced from VPLEX Cluster A. In other words, the READ I/Os in this topology will not incur the impact of the cross site latency nor will the data hit the cross connect links upon access.
[NOTE 1: One obvious question is comparing this particular configuration to one where all paths are actively used with round-robin as your pathing policy instead of the recommended fixed pathing policy (i.e. imagine the red-dashed lines in Fig 4 are also solid lines). Since in this case paths get served on a round robin basis, the I/Os going on the cross connected paths will incur the latency that was described from Fig 3. Secondly, if your I/Os have any expectation of READ cache hit, you will improve your chances for a cache hit significantly by using the configuration in Fig 4 v/s one where all paths are simultaneously marked as primary].
[NOTE 2: I/Os from ESX Server B to Storage Controller B or VPLEX Cluster B are identical in both scenarios].
Understanding WRITE I/Os
Let us now turn our attention to WRITE I/Os to see if there are any differences between the configurations.
Figure 5 represents a WRITE I/O with the Uniform Mode single controller access topology depicted in Figure 1 whereas Figure 6 represents a WRITE I/O with the Uniform Access Mode with VPLEX (Figure 2).
In the scenario shown in Figure 5, WRITE I/O issued from ESX Server A will travel to Storage Controller B. Each of the subsequent I/O phases (XFER_RDY, DATA OUT and STATUS) will be driven through storage controller B. Similar to what happened in the READ I/O processing, each of these I/O phases will incur the cross connect latency. One additional facet to be aware of: since storage controller A and B are protecting the data synchronously, the WRITE DATA will have to be sent from Storage Controller B to Storage Controller A. In other words, there will be one more RTT latency incurred in sending that data across. Additionally, for such I/Os, you will be sending the data across the wire TWICE. If you are doing the math, for a WRITE I/O, you will incur 3x the RTT latency. To put some numbers around it, if the two sites are separated by 5 msec, each WRITE I/O will incur 15 msec (!!) of wire latency in addition to the latency needed to execute the I/O and you are consuming twice the cross connected bandwidth.
So, how does the VPLEX configuration in Figure 2 perform better? Let’s take a look.
Again, since VPLEX allows access to storage on both sides, ESX Server A is able to issue the WRITE I/Os and subsequent data phases to its local VPLEX Cluster. Since VPLEX Metro is synchronous, VPLEX will copy the WRITE data from VPLEX Cluster A to Cluster B. This will encounter RTT latency. It is also noteworthy that the I/O is sent across the site only once. For a WRITE I/O with the VPLEX deployment, you encounter 1 RTT latency in addition to the time taken for the I/O to execute.
[NOTE 3: In case you are wondering, if for some reason the paths to the local VPLEX Cluster fail, the standby paths are activated by vSphere and you have no application impact. However, WRITE I/Os will encounter the cross connect latency. This is one of the reasons we limit latency support for the cross connect topology on VPLEX to 1 msec RTT. If you happen to use products other than VPLEX (HOW DARE YOU? 😉), please ensure that you understand the architecture and the corresponding worst case I/O implications.]
[NOTE 4: I/Os from ESX Server B to Storage Controller B or VPLEX Cluster B are identical in both scenarios].
If all the above has your head spinning, here is a table to summarize all that was written above:
As you can see, having a cross connect topology with the cross-site paths on standby has some pretty significant advantages to using the single controller access topology and should be your cross-connect deployment of choice for VPLEX in the Uniform Access Mode.
One part of the debate that I will not follow-up on is the question of whether the configuration in Figure 2 is Uniform Mode or not. You could legitimately view it as the Non-Uniform mode described in my previous blog post with Uniform Mode when there is no alternate available. My take: So long as you – our customers – understand the differences and the advantage of one over the other and are able to deploy it to your advantage, it doesn’t matter what me or anyone else calls it.