Can engines in a VPLEX Cluster be split?

UPDATE Feb 1st, 2014: I had captured some details incorrectly about the Director Witness which I have corrected below.

This question has been asked on the EMC Community Network and comes up multiple times in various contexts.

The goal is to allow multiple engines within a VPLEX cluster to be deployed across multiple racks instead of a single rack.

There are two primary reasons that this request comes up:

  • Customer intends to upgrade the number of VPLEX engines. However, in the time between the original deployment and when new engines are being purchased, they have repurposed the space in the rack where VPLEX is deployed for other equipment.
  • Customer considers a single rack as a single point of failure. More on that later.

Our usual (only?) answer to this is that we do not support this configuration. That usually leads to some perplexed looks followed by a long explanation.

Let’s start with how the VPLEX HW is built.

VPLEX hardware has been built with redundancy all the way through for a high availability infrastructure. Every component in the platform is redundant. The basic building block is a VPLEX Engine that has two directors. As multiple engines are added, each of these engines are connected through an intra-cluster communication channel (colloquially called the Local COM). Again, with redundancy in mind, the Local COM consists of two physically independent networks.

The plot thickens. Some more platform details: The VPLEX directors share the responsibility of monitoring the Local COM for any failures so that partitions (severing of Local COM links between VPLEX engines) can be handled if appropriate. In fact, each VPLEX cluster has another witness we internally refer to as the Director Witness (not to be confused with its more illustrious and well known sibling – the VPLEX Witness which is responsible for monitoring across VPLEX Clusters).

Now, given the variability of potential customer deployments, it was critical that we find a scalable way of maintaining four physically redundant networks to enable delivery of the high availability that our customers expect.

The way that we accomplish this is by requiring that the engines be collocated in the same rack and configured in the same exact way. Without this requirement, the level of redundancy becomes difficult to ensure. Deployments can be highly variable and the core platform requirements that I described above get compromised. Not to mention the challenges to our services organization of working through these variable configuration details. The bottom line is that without mandating the strict configuration and deployment requirements I outlined above, the probability of multiple failures happening simultaneously increases leading to compromised availability.

If this explanation works for you, you can skip the next two paragraphs.

======== [gory tech detail alert begin] ===========
[For those who want the next level of detail, we dream up quadruple failures and argue about the probability of failures before determining how failure handling should take place within the system. If that sounds like a whole lot of fun, it is! With a co-racked system, the perturbations to the system are dramatically reduced, changing the equation for what assumptions the director witness can make.

The VPLEX Witness has completely different failure handling characteristics since it has to account for two separate racks, WAN links, two data centers … You get the idea].
======== [gory tech detail alert end] ============

There was one additional question above – about a rack being considered as a single point of failure. There are multiple things to consider:

  • First and foremost, VPLEX has hardware availability built from the ground up. Everything in the basic platform building block is a multiple of two. So, the classic reason for rack separation (around fault domain separation such as power phases etc) are accounted for in the HW deployment architecture.
  • As we started engaging in this conversation further, what usually emerges is that the customer is concerned about fault domain redundancy (e.g. I want to protect across sprinkler heads as an example). And VPLEX Metro with the Witness is designed precisely to enable this particular use-case.

We are always open to feedback from you about new use cases we can build our products for. And as I have seen, customers always provide insights that constantly confound our assumptions (and that is GOOD!). This is one which has some interesting possibilities that we continue to explore. So in case you want to talk, you know how to reach me!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s