How To: Expand VPLEX Virtual Volumes

One of the important yet lesser known additions to the VPLEX 5.2 release was the ability to expand virtual volumes non-disruptively.

Actually, let me step back a bit. VPLEX has always had the capability of expanding local-only (i.e. non-distributed) virtual volumes. We accomplished this by concatenating the virtual volume with another segment that needed to be tacked onto the virtual volume. In GeoSynchrony 5.2, we introduced a newer method as a complement to the existing method. In general, this newer method is our preferred method of virtual volume expansion.

Why the new method?

To understand why we embarked on developing a new virtual volume expansion method, you need to understand the limitations of the concatenation approach:

  1. Local only volumes can be expanded by using concatenated volume expansion. For distributed volumes, there was no convenient way to expand the virtual volume. (You would have to break the distributed volume, expand the underlying device and rejoin the distributed virtual volume with a complete resync)
  2. If you are using array based local replication functionality (aka snaps and clones) (Look at ViPR with VPLEX for an example of how you can do this), then when you concatenate different volumes to create a larger volume, those local replicas are no longer useful.
  3. Concatenating volumes (especially those coming from different arrays or from different performance tiers from different arrays) is generally not a good practice from a performance standpoint. There are two reasons why: First, imagine the two portions of the same volume having different performance characteristics. Secondly, for I/Os that cross the volume boundaries, you end up having to break I/Os into multiple smaller parts which usually (again, 80:20 rule in play here) leads to poorer relative performance.

Alright – tell me more about the new method

We call this the storage-volume based virtual volume expansion method. If you look at the constraints established above, preservation of the volume map geometry becomes crucial to address all the goals outlined above. This method works for local as well as distributed virtual volumes.

Supported geometries for storage-volume based virtual volume expansion

Here are the supported geometries for this method.

Supported Geometries for virtual volume expansion
Supported Geometries for virtual volume expansion

Supported virtual volumes can be:

  1. A local virtual volume that is fully mapped to a single storage volume (1:1 mapped, RAID-0)
  2. A local virtual volume that is mirrored across two storage volumes (RAID-1, R1)
  3. A distributed virtual volume that is mirrored completely across two storage volumes (Distributed RAID-1, DR1)

If you confirm that the virtual volume you need meets the criteria above, you are ready to expand!

Step 1: Expanding the storage-volume

The first step is array specific. You now need to expand the storage volume on the array to the capacity that you need (e.g. let’s say you have a 500GB storage volume. You would now expand it to 750GB on the backend if you need to add 250GB). Remember that if you have a mirrored VPLEX virtual volume, then you will need to do this for every leg of that mirror.

Virtual Volume Expansion - Step 1
Virtual Volume Expansion – Step 1

You now need to get VPLEX to detect the increased capacity for the storage volumes. If you have I/Os going on to the virtual volume (and therefore, to the storage volume), then upon volume expansion, the storage volume will generate a Unit Attention that VPLEX will detect and probe the storage volume to detect the additional capacity. If I/Os are not running to the storage volume, then you can run the rediscover command on VPLEX to reprobe the array to detect the added capacity.

Step 2: Expanding the virtual volume

The next step is to expand the virtual volume so that it uses the additional capacity.

Virtual Volume Expansion Step 2
Virtual Volume Expansion Step 2

You need to run the virtual-volume-expand command on VPLEX. Here is what the command looks like:
virtual-volume expand
[-v | –virtual-volume] context path
[-e | –extent] extent
[-f | –force]

NOTE: I have listed the optional extent parameter above to be complete. This is used by the concatenation expansion method not by the storage volume expansion method.

To expand the volume, you issue the above command with the specific virtual volume that you need to expand. The command makes some checks (more on that later) and lo and behold, you have expanded the virtual volume without ever stopping I/Os.

Things to remember

This section captures an assortment of varied details that are important to know or tips and tricks about the command that I find useful.

  1. While VPLEX supports non-disruptive expansion of virtual volumes, Whether a host mounted volume can be expanded depends on the OS, File-systems and in some cases, applications. Windows, for example, allows non-disruptive volume expansions with a host rescan. Older UNIXs do not. Check your host OS, filesystem or application details for clarification on this. From a SCSI standpoint, once the additional capacity is available, VPLEX will report a Unit Attention indicating that the LUN capacity has changed. Host rescans will also show the added capacity.
  2. We have added four new attributes to help you figure out whether a volume can be expanded and what its current status is. If you run an ll on a VPLEX virtual volume, you can now see:
    • expandable (boolean denoting whether a virtual volume can be expanded or not)
    • expandable-capacity (how much capacity is available to expand)
    • expansion-method (what method needs to be used for volume expansion)
    • expansion-status (if a volume is being expanded, what is the current status)
  3. What if my volume is not one of the supported geometries? – If your volumes are not mapped 1:1, then you have two choices:
    • Perform an extent migration to migrate the extent to a storage volume that is 1:1 mapped
    • Migrate the virtual volume to a larger storage volume to become 1:1 mapped
    • From there you should be able to perform the virtual volume expansion as above.

  4. The newly added capacity of the virtual volume will be zero initialized (i.e. VPLEX will write zeroes to the new capacity) prior to the additional capacity being exposable to the host. The reason to do this is especially true on mirrored volumes (R1s or DR1s) since from a host perspective, the added capacity should return the *same* data on read from either leg. In other words, as with everything else, VPLEX ensures single disk consistency even with distributed virtual volumes when the capacity is added
  5. Today RecoverPoint protected virtual volumes cannot be expanded while the protection is in effect. This is something we are looking at for future releases. For now, you can turn off the RP protection and then expand the virtual volume and re-engage the RP protection for that virtual volume
  6. If a virtual volume is undergoing migrations, or if the system is undergoing a non-disruptive upgrade or if the system or the virtual volume has a failing health check, then VPLEX will block expansion of the virtual volume
Advertisements

ViPR with VPLEX

ViPR was launched with tremendous fanfare at EMC World last year (how time flies!). The product went GA in Sept 2013.

The key premise behind ViPR is that data center management has become too complex. As obvious as this problem is, it is a herculean task to address. I doff my hat to the ViPR team. They have taken a very complex challenge and built a product that they can be justifiably proud of.

Over the last few months, a number of customers have deployed VPLEX together with ViPR and used ViPR to simplify their management infrastructure. Our team has put together some demos to help explain how ViPR and VPLEX integrate.

We will be adding voice-overs at a later point in time but it seemed useful to make these available to customers to help them understand the value of VPLEX with ViPR.

Configuring VPLEX within ViPR

This demo shows you how to configure VPLEX within the ViPR context. ViPR takes over after the basic configuration of VPLEX (i.e. set up from a network accessibility standpoint).

  1. A VPLEX cluster gets configured as a virtual array within ViPR. For a VPLEX Metro, this equates to creating two virtual arrays.
  2. From there, you need to expose the network elements from the SAN to the specific VPLEX cluster.
  3. You can now create virtual pools describing what type of storage to provision. Based on the SAN exposed, you get options for which storage can be exposed to which VPLEX cluster. Based on the configuration of the pools, you can assign different properties to VPLEX pools

Note that this is a one-time configuration for a given virtual pool. This sets you up for end-to-end provisioning!

Provisioning VPLEX within ViPR

This now operationalizes what was set up in the prior demo. The first step relates to selecting the virtual array and a virtual pool and then creating a distributed volume. THe next step involves taking this volume and exposing this volume to the host. No zoning, no moving between multiple GUIs, all available with ease.

Deprovisioning VPLEX within ViPR

This is the flip side to the prior demo. Here the volumes that are exposed to the host are deprovisioned. Again, same paradigm as before. The orchestration happens through the ViPR controller and it is all in one interface.

Migration of Pools through ViPR

This takes the migration use-case and converts it into the catalog view. The change pool catalog request results in migration of volumes from one array to another. THe orchestration is at the pool level so you can migrate from one array or one tier to another.

This is just the beginning – we are looking at more complex use-cases to deliver a seamless experience to our end customers. You will hear more about this in the near future. What do you think?

Can engines in a VPLEX Cluster be split?

UPDATE Feb 1st, 2014: I had captured some details incorrectly about the Director Witness which I have corrected below.

This question has been asked on the EMC Community Network and comes up multiple times in various contexts.

The goal is to allow multiple engines within a VPLEX cluster to be deployed across multiple racks instead of a single rack.

There are two primary reasons that this request comes up:

  • Customer intends to upgrade the number of VPLEX engines. However, in the time between the original deployment and when new engines are being purchased, they have repurposed the space in the rack where VPLEX is deployed for other equipment.
  • Customer considers a single rack as a single point of failure. More on that later.

Our usual (only?) answer to this is that we do not support this configuration. That usually leads to some perplexed looks followed by a long explanation.

Let’s start with how the VPLEX HW is built.

VPLEX hardware has been built with redundancy all the way through for a high availability infrastructure. Every component in the platform is redundant. The basic building block is a VPLEX Engine that has two directors. As multiple engines are added, each of these engines are connected through an intra-cluster communication channel (colloquially called the Local COM). Again, with redundancy in mind, the Local COM consists of two physically independent networks.

The plot thickens. Some more platform details: The VPLEX directors share the responsibility of monitoring the Local COM for any failures so that partitions (severing of Local COM links between VPLEX engines) can be handled if appropriate. In fact, each VPLEX cluster has another witness we internally refer to as the Director Witness (not to be confused with its more illustrious and well known sibling – the VPLEX Witness which is responsible for monitoring across VPLEX Clusters).

Now, given the variability of potential customer deployments, it was critical that we find a scalable way of maintaining four physically redundant networks to enable delivery of the high availability that our customers expect.

The way that we accomplish this is by requiring that the engines be collocated in the same rack and configured in the same exact way. Without this requirement, the level of redundancy becomes difficult to ensure. Deployments can be highly variable and the core platform requirements that I described above get compromised. Not to mention the challenges to our services organization of working through these variable configuration details. The bottom line is that without mandating the strict configuration and deployment requirements I outlined above, the probability of multiple failures happening simultaneously increases leading to compromised availability.

If this explanation works for you, you can skip the next two paragraphs.

======== [gory tech detail alert begin] ===========
[For those who want the next level of detail, we dream up quadruple failures and argue about the probability of failures before determining how failure handling should take place within the system. If that sounds like a whole lot of fun, it is! With a co-racked system, the perturbations to the system are dramatically reduced, changing the equation for what assumptions the director witness can make.

The VPLEX Witness has completely different failure handling characteristics since it has to account for two separate racks, WAN links, two data centers … You get the idea].
======== [gory tech detail alert end] ============

There was one additional question above – about a rack being considered as a single point of failure. There are multiple things to consider:

  • First and foremost, VPLEX has hardware availability built from the ground up. Everything in the basic platform building block is a multiple of two. So, the classic reason for rack separation (around fault domain separation such as power phases etc) are accounted for in the HW deployment architecture.
  • As we started engaging in this conversation further, what usually emerges is that the customer is concerned about fault domain redundancy (e.g. I want to protect across sprinkler heads as an example). And VPLEX Metro with the Witness is designed precisely to enable this particular use-case.

We are always open to feedback from you about new use cases we can build our products for. And as I have seen, customers always provide insights that constantly confound our assumptions (and that is GOOD!). This is one which has some interesting possibilities that we continue to explore. So in case you want to talk, you know how to reach me!

PowerPath: Auto standby for VPLEX

Autostandby as a capability has been available for powerpath for over a year and a half. Must be something in the zeitgeist but all of a sudden, I have seen a couple of threads from customers and the field. And these threads have covered the entire range – from customers who are positively gushing about this capability, to questions about how this works, to operational questions like what tweaks are possible or not possible.

The background behind autostandby

We started down the autostandby road with some crucial observations:

    Most host I/O operations in a sequence are correlated to each other. In other words, random I/O workloads, while they do exist, are rare during customer operations.

(And yes, I realize that any generalization is dangerous territory. So, remember, we are following the 80:20 rule here).

    VPLEX has a read cache. To take advantage of this, you want to maximize the likelihood that read-type I/Os encounter cache hits, thereby reducing the latency for these I/Os.

Translation: If you combine the two observations above, then, for better performance, you want I/Os from a given host to a given volume to be directed to a given set of directors as much as possible.

Finally, let’s now bring the distance component into this. Particularly, the focus here is on the cross connect (Additional vMSC Qualifications for VPLEX Metro). In the case of the cross connected configuration, there is a latency advantage to having I/Os be directed to the local cluster. Otherwise, I/Os get subjected to the cross site round trip latency penalty. By the way, this is one of the reasons that we have chosen to restrict the support latency envelope for cross connected configurations to 1 msec RTT.

The solution

Working with the PowerPath team, we set out trying to address the design goals outlined above. Now PowerPath has a mechanism to address paths that should not be used for multipathing purposes. This is where paths get set to manual standby. That designates these paths (if alive) as usable once all the primary (non-standby) paths have failed.

For the VPLEX Metro cross connected environments, the designation of which path is on standby will depend on where the host is located corresponding to the VPLEX cluster. The host paths connected to the local VPLEX Cluster will be the *active* paths whereas those connected to the remote VPLEX Cluster will be the *standby* paths. As a result, the path setting needs to be automatic and at scale across all hosts.

How does the solution work?

A lot of the recent questions have been focused on how the algorithm for path selection works. So at a high level, here goes:

  • PowerPath measures the latency of SCSI INQ commands issued to each path
  • Determine the minimum path latency associated with each VPLEX cluster / frame
  • The VPLEX cluster / frame with the lowest latency is the designated as the preferred cluster.
    1. Each host sets the preferred cluster independently. So, each host affinitizes correctly to the appropriate VPLEX Cluster
    2. If the delta between the minimum latency between clusters is zero, the preferred path designation is applied to one cluster or the other
  • The paths associated with the preferred VPLEX cluster to active mode.User set active/standby always takes precedence over auto selection. So, if those paths have been previously set manually to standby, those settings will not be overruled.
  • The paths associated with the non-preferred VPLEX cluster are set to autostandby – the same caveat as the previous bullet applies
  • PowerPath versions where autostandby for VPLEX is supported

    Here are the minimum versions where autostandby for VPLEX is supported:

  • VMware: PP/VE 5.8
  • Linux: PP 5.7
  • Windows: PP 5.7
  • AIX: PP 5.7
  • HPUX: PP 5.2
  • Solaris: PP 5.5
  • Frequently Asked Questions

    For a given distributed volume, if there are multiple paths on a given cluster which is chosen as the preferred cluster, do all paths get utilized?

  • Yes.
  • What is the frequency of the path latency test? What is the trigger for the path latency test?

  • Path latency is evaluated for autostandby at boot time (if autostandby is enabled) or during runtime when the feature is turned from off to on or when a user issues a reinitialize from the command line.
  • What is the minimum latency difference between two paths before which one will be set on autostandby? What is the default? and is this settable?

  • The granularity varies from platform-to-platform (depends on the tick granularity of the OS). However, the granularity is really, really small and is not settable.
  • I have a VPLEX Metro cluster deployment in which the cross connect latency is extremely small. I do not need the autostandby algorithm. Can I turn it off?

  • Yes, you can turn it off. Refer to the PP administrative guide on how to turn it off. Now, here is the counter argument. If you expect your I/Os to have any level of read cache-hits, then it is still a good idea to leave the autostandby algorithm turned on.
  • On failure of all active paths, the standby paths get made active. When the original paths return, does the user have to take any steps to return the configuration back tot he original configuration or does the pathing revert back to the original state>

  • The pathing will automatically revert back to the original state as soon as an active path comes back alive.
  • Note

    PowerPath also has an autostandby mode that has been introduced to enable handling of flaky paths (IOs-Per-Failure autostandby). This blog is focused on the VPLEX portion of auto standby (referred to as the proximity based autostandby).