Category Archives: continuous availability

Mission Critical Center: A community for continuous availability

This blog post is about an internal effort we have started within EMC. We have talked about this at EMC World. Based on the initial response, the interest level behind this effort seems to be quite high. Here is some more information about the effort.

The challenge and the concept behind the solution

Over the past few years, customers are increasingly adopting / expecting continuous availability in their data centers. While it may be obvious, it still deserves saying that continuous availability is an end-to-end paradigm starting with the application to multi-pathing to SAN configuration to IP configuration to capabilities like VPLEX Metro and last but not the least physical storage.

We have always recognized that this has an impact on how customers view and purchase solutions. In other words, when a customer thinks about continuous availability, they think about continuous availability for their SAP Environment running on VMware in a SAN with multiple data centers etc. This has major implications for how we think about testing and validating what customers are deploying.

If you think of the normal testing paradigm for any product team, their responsibility is testing the product capabilities, product handling for failure conditions as well as performance, scale and other system testing needs. There is a second envelope of testing that is a superset of all of this – interoperability testing. EMC has built a core capability around interoperability testing with the world class ELab within the EMC family. ELab is responsible for interoperability and protocol testing and certifying products to work with EMC products. This results in generating Support Matrices. Customers and the field treat these support matrices as their bibles for how to configure and deploy products for interoperability. One more envelope around this testing is solution testing. This is now taking the end-to-end pieces that are supported and deploying them and testing them for functionality and performance.

One critical piece is still missing – especially with the focus that customers are putting on continuous availability. With the paradigm rapidly moving to 6 9s and 7 9s availability, it is not sufficient to test the part pieces and trust that interop and solution testing will result in customers reaching those hallowed availability levels. Instead, what is needed is proactive stress and failure testing of these end-to-end deployments. It is also important that we understand the operational paradigm a customer is likely to take in such a deployment.

How are we solving this challenge?

As you can imagine, in a multi-business unit company such as EMC, this is a herculean effort. You need different business units to buy into the concept of solution level failure and stress testing and then align on what is needed to validate and test this capability. Ultimately, our vision as EMC was to deliver to customers a continuous availability experience at the data center level. Talk about setting ambitious goals. But then, our goal was to deliver value to our customers. And setting goals only because they are achievable is not the way to get there.

Similar to when we built ELab, the decision was to invest in a new competency center – Mission Critical Center (MCC).

The mission of the MCC is to build a platform to test and demonstrate greater than 6 9s availability in production for products in the EMC portfolio.

And when we say production, we mean it. For our internal purposes, we treat the MCC exactly as we would treat a customer. They file an SR, escalations to engineering go through exactly the support route that the customer would follow. Upgrades to systems are done similar to how customers would go through it. For all practical purposes, they get exactly the same handling and care that EMC would provide in a customer environment. This teaches us about not only how the product behaves but also about what the impact is of our support processes from a customer perspective. Finally, this helps us also start to look at the problem holistically – i.e. we do not approach debugging the problem from a product perspective but rather from the perspective of the complete solution that the customer deploys.

Mission Critical Center: What is in place and where are we going?

Now that we have talked through the concept, let’s look at what the MCC team has done so far. The MCC team was started as a ground up effort looking for like minded and interested stakeholders across different business units (translation: it has largely been built through a lot of conviction and convincing). The team is essentially built through a shared collaboration between a lot of business units (VMAX, VNX, RecoverPoint and VPLEX). Here is the configuration they have put together.

Mission Critical Center Architecture
Mission Critical Center Architecture

For readers of this blog, you should be very familiar with this topology – it represents the cascaded VPLEX and RecoverPoint topology discussed here and specific topology captured here. The team has built use-cases around stretched Oracle RAC across DC1 and DC2, stretched VMware HA and other applications all running production level workloads across DC1 and DC2 and protected in DC3. Once this mission critical platform was built, their focus was certainly to run I/Os and then start to do accelerated failure testing (i.e. simulate data center type failure scenarios to understand what failures happen across the entire solution set). The goal of this is _NOT_ to test interoperability of VPLEX with VMAX or VPLEX with VNX or to test the performance any one component. The goal is to take real world customer workloads and deploy them across infrastructure the way a customer would and to learn their operational challenges as well as how the infrastructure handles and recovers from failures. So, the MCC team will often fail WAN links, entire arrays, do tech refreshes, introduce a fabric wide zoning change, simulate disaster of a data center, … you get the idea. Needless to say, I am a big fan!

The team has some very concrete plans on how to take this forward. This configuration is now being morphed into the MetroPoint configuration. That way, they can implement this new and exciting capability in much the same way as a customer would and corresponding to that is a whole new set of failure modes to test and simulate. We will continue to add more applications (SQL, SAP HANA, Hadoop), more infrastructure variances (data center moves, network outages, rolling outages and the ilk) and then more of EMC’s product families (DataDomain, Networker, Avamar, ViPR).

Mission Critical Center: The call to action

As the team is building their capabilities, we have a very real need for active guides / participants to build a strong community around the mission critical center. So, here are the concrete asks:

  1. If you are a customer / field person with solutions / design experience and would like to participate in this effort, do reach out to me and I can put you in touch with this effort. You can contribute as often or as little as you like. Your role will be to provide guidance to the team in terms of what they should look for, help understand operational processes on your end and to help us along the journey towards how your data center is evolving to make our products provide the same world class capabilities as they do in your environments today
  2. If there are specific scenarios / applications that you think would be worthy additions to this environment, please reach out to me and we can work to get those on our TODO list for the Mission Critical Center

In the end, this is a community of some very talented engineers within EMC volunteering a big chunk of their time (in addition to doing their day jobs) to enable EMC products to deliver a 6 9s experience in customer data centers. Your help is going to help us get there sooner and make this process more effective. Do consider contributing to this effort!

Talkin’ about VPLEX and RecoverPoint Part 4

The past three editions of these have been very popular. Our marketing and CSE team has created some new videos in support of the Q2 launches for VPLEX and RecoverPoint. So here are twelve videos for you to dig into.

  1. Why VPLEX for VMware Environments: Don Kirouac does an excellent job explaining how VPLEX integrates with VMware environments.
  2. Why VPLEX for Oracle RAC: Don Kirouac from the Corporate Systems Engineering team talks about the integration between Oracle RAC and VPLEX Metro to deliver continuous availability
  3. VPLEX with XtremIO: Charlie Kraus from the Product Marketing team explains how VPLEX delivers value to XtremIO environments
  4. ViPR with VPLEX and RecoverPoint: Devon Helms from the Product Marketing team explains how provisioning for VPLEX and RecoverPoint can be made simple with the ViPR Controller.
  5. Why VPLEX for SAP: Jim Whalen from the Solutions Marketing Team explains how VPLEX can help deliver SAP Application Availability.
  6. Why VPLEX for Microsoft Hyper-V Environments: Charlie Kraus talks about how VPLEX integrates with Microsoft Hyper-V environments to deliver mobility and availability
  7. VPLEX with Vblock: Charlie Kraus delves into how VPLEX integrates with and provides value to a Vblock environment.
  8. VSPEX Solutions for VPLEX and RecoverPoint: Karl Connolly from the VSPEX Marketing Team

  9. MetroPoint topology: Paul Danahy and I walk through the benefits and value propositions of the MetroPoint topology
  10. VPLEX Virtual Edition: Paul Danahy and I introduce the VPLEX Virtual Edition solution and why we think this is such a game changer
  11. Simplified Provisioning with VPLEX: Paul Danahy and I talk through how VPLEX Integrated Array Services simplifies provisioning with VPLEX
  12. EMC AppSync for RecoverPoint: Parag Pathak from the AppSync Marketing team and Devon Helms talk about the integration between AppSync and RecoverPoint to deliver application consistent protection

2014 Launch Post 2: MetroPoint: Extending the Availability and Protection Continuum

On April 4th, 2014, as part of the Data Protection and Availability Division (DPAD) launch, there were three VPLEX and RecoverPoint items that were launched or GAd:

  • VPLEX Virtual Edition – Availability late Q2
  • MetroPoint Topology – Joint capability of VPLEX and RecoverPoint – Availability Late Q2
  • VPLEX Integrated Array Services – Available now

This is the second in a series of posts to walk through what was launched / delivered.

VPLEX and RecoverPoint

It has been two years since we introduced the RecoverPoint splitter within VPLEX. The awesomeness of VPLEX was joined with the coolness of RecoverPoint. With this combination, we delivered operational and disaster recovery to VPLEX customers to add to the continuous availability that they already had access to. These were extremely complementary use-cases. While there were a lot of skeptics outside of EMC about this combination, we were quietly confident in our belief that customer wanted an extended continuum between disaster recovery and continuous availability. Suffice it to say, that this combination has exceeded our revenue expectations. Since the launch in May 2012, the organizations have come even closer together within a single business unit further solidifying the bonds between the two teams.

A quick recap of the current integration points between VPLEX and RecoverPoint.

RecoverPoint delivers continuous data protection enabling local and/or remote protection. This is enabled by a RecoverPoint splitter which resides within the VPLEX platform. RecoverPoint has a similar splitter in the VMAX and VNX platform as well. The RP splitter enables WRITES to be sent to a RecoverPoint Appliance (RPA). From there, you can enable local protection (where the writes are journaled locally) or remote protection (where the writes are journaled remotely) or both. The beauty of RecoverPoint is that it can store every single write to give recovery a DVR like capability. The other benefit of RecoverPoint is that the protection is heterogeneous i.e. it can protect between every combination of VPLEX / VMAX and VNX.

The combination of VPLEX and RecoverPoint supports the following topologies:

  1. VPLEX Local with RecoverPoint Local Protection
  2. VPLEX Local with RecoverPoint Remote Protection
  3. VPLEX Metro with RecoverPoint Local Protection
  4. VPLEX Metro with RecoverPoint Remote Protection
  5. The slide below shows the currently supported topologies.

    Currently supported VPLEX and RecoverPoint topologies
    Currently supported VPLEX and RecoverPoint topologies

    Customer topologies are all over the map – we see a lot of traction with the VPLEX Local and RecoverPoint Remote Protection (as we expected). However, the second largest topology is the three sided cascaded topology. And that was a surprise. Upon digging further, a lot of customers have business requirements that need them to have out-of-region disaster recovery site. Yet other customers are deploying VPLEX Metro within one site. So, the usage of RecoverPoint in this case is to provide DR to a Metro deployed within the site. This is the cascaded topology.

    As you can imagine, the downside of the cascaded topology is that if the replicating VPLEX Cluster fails or loses connectivity, DR protection is lost. Since the launch of RecoverPoint on VPLEX quite a few customers have been asking us to add the capability to protect both sides of a VPLEX Metro to a common third site using RecoverPoint. Well, that is exactly what we have done.

    MetroPoint: Operational and Disaster Protection across both sides of a VPLEX Metro

    MetroPoint Topology
    MetroPoint Topology

    The MetroPoint solution launched April 4th will GA at the end of Q2. This is a joint capability between RecoverPoint and VPLEX. Starting with RecoverPoint 4.1 and GeoSynchrony 5.4, customers will now be able to add Disaster Recovery and Operational Recovery protection to both sides of a distributed volume. With MetroPoint, we took the time to do this right – although the protection is on both sides of a distributed volume, only one of the sides is replicating data. The data goes to a single copy of a DR leg. In other words, no additional bandwidth or storage is needed to enable MetroPoint as compared to enabling a standard DR scenario.

    To enable this, we have created a new kind of consistency group called MetroPoint consistency groups. This enables replication on both sides of a distributed volume. Another characteristic of the MetroPoint consistency group is that you can load balance which site is the primary replication site. If there is a failure on the primary replication site, the replication will AUTOMATICALLY switch to the surviving site. In other words, there is no loss of DR protection even if you lose the primary replication site.

    To me, one of the more exciting implications is the extension of the VMware HA and VMware SRM use cases to the MetroPoint topology. Here is what this looks like:

    image
    MetroPoint with VMware HA and SRM

    The VPLEX Metro sites are protected with VMware HA and the remote DR site is protected with VMware SRM. This now gives our customers simultaneous HA and DR.

    One comment here: We talk about MetroPoint as a three site deployment and that is true. However, it is worth remembering that there are a number of customers who deploy VPLEX Metro within a data center either to protect multiple floors or multiple SANs or across a campus type environment. In those scenarios, customers can use MetroPoint to protect to a second site. There is a lot of interest in this deployment model.

    More coolness – along the way, we were able to meet one more request that our customers had requested. With the MetroPoint consistency group, we were able to provide operational recovery on both sides of a VPLEX Metro. And this does not need a third site!!

    Operational Recovery on both sides of a VPLEX Metro
    Operational Recovery on both sides of a VPLEX Metro

    To top this all off, MetroPoint is completely heterogeneous. All these goodies work with both EMC as well as non-EMC arrays. So long as the storage array is supported by VPLEX, you are good to go.

    Here is a short video that Paul Danahy and I put together to give you brief overview of MetroPoint:

    With MetroPoint, we have raised the bar on continuous availability and disaster recovery. This has been the result of collaboration between the VPLEX and RecoverPoint engineering team with a lot of input from some of our lead customers. To all those who helped us get here, a very BIG thank you!

2014 Launch Post 1: Software Defined Coolness: VPLEX Virtual Edition!!

2014-04-08 One Correction below

On April 4th, 2014, as part of the Data Protection and Availability Division (DPAD) launch, there were three VPLEX and RecoverPoint items that were launched or GAd:

  • VPLEX Virtual Edition – Availability late Q2
  • MetroPoint Topology – Joint capability of VPLEX and RecoverPoint – Availability Late Q2
  • VPLEX Integrated Array Services – Available now

This is the first in a series of posts to walk through what was launched / delivered.

The drivers towards a VPLEX Virtual Edition

Data center infrastructure is undergoing a massive shift. Virtualization in the data center has had a profound impact on customer expectations of flexibility and agility. Especially as customers get to 70+% virtualized, they have the potential to realize tremendous operational savings by consolidating management in their virtualization framework. In this state, customers typically do not want to deploy physical appliances and want everything handled from their virtualization context. Similar changes in networking and storage have meant that the basic infrastructure is now completely in software running on generic hardware. This is the software defined data center. VPLEX has been no stranger to this conversation. Especially given the very strong affinity of VPLEX to VMware use-cases, customers have been asking us for a software only version of VPLEX. That is precisely what we have done. This past week, we launched VPLEX Virtual Edition – with a GA towards the end of Q2.

What is the VPLEX Virtual Edition and what does it do?

The VPLEX Virtual Edition (VPLEX/VE) is a vApp version of VPLEX designed to run on an ESX Server Environment to provide continuous availability and mobility within and across data centers. We expect this to be the first in a series of virtual offerings. In comparison to the appliance, all the VPLEX directors are converted into vDirectors. For the first release, the configuration we support is called the ‘4×4’ – this will support four vDirectors on each side of a VPLEX Metro. From a configuration standpoint, that is the equivalent of two VPLEX engines on each side of a VPLEX Metro cluster. Each side of VPLEX/VE can be deployed within or across data centers up to 5 msec apart.

4x4 VPLEX/VE Topology
4×4 VPLEX/VE Topology

VPLEX/VE supports iSCSI for front-end and back-end connectivity. For the initial release, we have decided to support only the VPLEX Metro equivalent use-cases. Most of the VPLEX Local related use-cases can be addressed by a combination of vMotion and storage vMotion. To list the use-cases:

  • The ability to stretch VMware HA / DRS clusters across data centers for automatic restart and protecting VMs across multiple data arrays
  • Load balancing of virtual machines across data centers
  • Instant movement of VMs across distance
VPLEX Virtual Edition Supported Use Cases
VPLEX Virtual Edition Supported Use Cases

From a performance perspective, VPLEX/VE is targeted up to a 100K IOPS workload. Obviously, the true performance will depend on your workload. The deployment is designed to be customer installable from the get go. There is an installation wizard that guides you all the way through the installation. When GAd, please refer to the release notes to determine what kind of ESX Servers are supported for VPLEX/VE. The vDirectors need to be loaded onto separate ESX Servers such that no two vDirectors are deployed on the same ESX server. This is done so as to give the system maximum availability. Running application VMs on the same ESX server as that running the vDirector is supported. This means that you should be able to use your existing ESX servers (subject to the minimum requirement that will be established for the vDirectors).

The way that an I/O will flow is from the application VM (via iSCSI) to the VPLEX/VE vDirector VM and from there to the iSCSI array connected to VPLEX/VE. Speaking of which, right out of the chute, we support VNXe arrays. We will add other iSCSI arrays over time.

One of the more interesting changes that we have made with VPLEX/VE is the way that it is managed. Since VPLEX/VE is tailored for ESX servers only, our management interface to VPLEX/VE is completely through the vSphere Web Client. Here are some screenshots of how VPLEX/VE management looks. The coolest part for me is that you can go from creating your VMs, setting up an HA cluster, all the way to creating a distributed volume all within the vSphere Web Client. _VERY_ nifty! In addition, we have now enabled VPLEX/VE events and alarms to show up in the vCenter Event Viewer. For all practical purposes, this is a seamless vApp designed for your vSphere environments.

Customer installable
Customer installable
VPLEX Virtual Edition in vSphere Web Client
VPLEX Virtual Edition in vSphere Web Client
VPLEX/VE vDirectors in vSphere Web Client
VPLEX/VE vDirectors in vSphere Web Client
VPLEX/VE Operations in vSphere Web Client
VPLEX/VE Operations in vSphere Web Client

When a distributed volume is provisioned for VPLEX/VE, it is configured as a vmfs 5 volume and made available as a resource to vCenter.

With VPLEX/VE, we have had the opportunity to do a lot of things differently. One of our guiding principles was to not think of it as a storage product but rather to think of it as a product designed for VMware environments and targeted to an ESX Administrator. Naturally, I cannot wait to see this get into our customers hands and to see whether we have hit our marks and what adjustments are needed.

Equally importantly, this is a strategic imperative within EMC. You can expect to see a lot more of our product portfolio embarking on the software defined journey. There are a lot of intersects within the portfolio that we have only begun to explore (HINT: Composing software is a lot easier than composing hardware!).

Frequently Asked Questions

Since launch, I have seen a ton of questions on twitter, on internal mailing lists and via people directly or indirectly reaching out to me. So, here are the collated answers:

  • Is VPLEX/VE available right now?
    • A: VPLEX/VE will GA towards the end of Q2.
  • Will VPLEX/VE support non-EMC arrays?
    • A: As with VPLEX, we expect to qualify additional EMC and non-EMC arrays over time based on customer demand. Expect new additions fairly quickly after GA
  • Will I be able to connect VMs from ESX clusters that are not within the same cluster as the one hosting VPLEX/VE?
    • A: Yes No
  • Will I be able to connect non-VMware ESX hosts to VPLEX/VE?
    • A: At this point, we only support VMware iSCSI hosts connecting to VPLEX/VE. This is one of the reasons the management has been designed within the vSphere Web Client
  • Can I connect VPLEX/VE with VPLEX?
    • A: VPLEX/VE is deployed as a Metro equivalent platform (i.e. both sides). Connecting to VPLEX is not supported. If there are interesting use-cases of this ilk, we would love to hear from you. Please use the comments section below and we can get in touch with you.
  • Is RecoverPoint supported with VPLEX/VE>
    • A: Not today. So, I am explicit – the MetroPoint topology which also launched last week is also not supported with VPLEX/VE
  • Is VPLEX/VE supported with ViPR?
    • A: At GA, ViPR will not support VPLEX/VE. Both the ViPR and VPLEX/VE teams are actively looking at this.
  • Does VPLEX/VE support deployment configurations other than a 4×4?
    • A: Currently, 4×4 is the only allowed deployment configuration. Over time, we expect to support additional configurations primarily driven by additional customer demand.
  • Will VPLEX/VE be qualified under vMSC (vSphere Metro Storage Cluster)?
    • A: Yes.

    If you are interested in a Cliff’s note version of this, here is a short video that Paul and I did to walk through the virtual edition:

Talkin’ about VPLEX and RecoverPoint Part 3

It is that time again. Our marketing team has been at it developing more videos to communicate the value of VPLEX and RecoverPoint. Some of them have incredible art in them, others have things blowing up but they are all fun to share. So, without further ado, here is the next in the ‘Talkin’ about’ series:

  1. DataCrunchers: Data Center Detonation: Here Steve Todd (T: @SteveTodd and his wonderful Information Playground Blog) and Nga Nguyen demonstrate Continuous Availability with VPLEX by using some C4 explosives.

  2. Big Ideas; Big Tech: How to protect mission critical environments: This next one talks about the business impact of failure of mission critical environments and how with a combination of VMAX, RecoverPoint and VPLEX from EMC, you can build a comprehensive local and remote protection strategy for such environments.

Do take a look at these videos – I like how they simplify continuous availability concepts to make them consumable by a larger audience. Needless to say, I am a HUGE fan of these videos!