Mission Critical Center: A community for continuous availability

This blog post is about an internal effort we have started within EMC. We have talked about this at EMC World. Based on the initial response, the interest level behind this effort seems to be quite high. Here is some more information about the effort.

The challenge and the concept behind the solution

Over the past few years, customers are increasingly adopting / expecting continuous availability in their data centers. While it may be obvious, it still deserves saying that continuous availability is an end-to-end paradigm starting with the application to multi-pathing to SAN configuration to IP configuration to capabilities like VPLEX Metro and last but not the least physical storage.

We have always recognized that this has an impact on how customers view and purchase solutions. In other words, when a customer thinks about continuous availability, they think about continuous availability for their SAP Environment running on VMware in a SAN with multiple data centers etc. This has major implications for how we think about testing and validating what customers are deploying.

If you think of the normal testing paradigm for any product team, their responsibility is testing the product capabilities, product handling for failure conditions as well as performance, scale and other system testing needs. There is a second envelope of testing that is a superset of all of this – interoperability testing. EMC has built a core capability around interoperability testing with the world class ELab within the EMC family. ELab is responsible for interoperability and protocol testing and certifying products to work with EMC products. This results in generating Support Matrices. Customers and the field treat these support matrices as their bibles for how to configure and deploy products for interoperability. One more envelope around this testing is solution testing. This is now taking the end-to-end pieces that are supported and deploying them and testing them for functionality and performance.

One critical piece is still missing – especially with the focus that customers are putting on continuous availability. With the paradigm rapidly moving to 6 9s and 7 9s availability, it is not sufficient to test the part pieces and trust that interop and solution testing will result in customers reaching those hallowed availability levels. Instead, what is needed is proactive stress and failure testing of these end-to-end deployments. It is also important that we understand the operational paradigm a customer is likely to take in such a deployment.

How are we solving this challenge?

As you can imagine, in a multi-business unit company such as EMC, this is a herculean effort. You need different business units to buy into the concept of solution level failure and stress testing and then align on what is needed to validate and test this capability. Ultimately, our vision as EMC was to deliver to customers a continuous availability experience at the data center level. Talk about setting ambitious goals. But then, our goal was to deliver value to our customers. And setting goals only because they are achievable is not the way to get there.

Similar to when we built ELab, the decision was to invest in a new competency center – Mission Critical Center (MCC).

The mission of the MCC is to build a platform to test and demonstrate greater than 6 9s availability in production for products in the EMC portfolio.

And when we say production, we mean it. For our internal purposes, we treat the MCC exactly as we would treat a customer. They file an SR, escalations to engineering go through exactly the support route that the customer would follow. Upgrades to systems are done similar to how customers would go through it. For all practical purposes, they get exactly the same handling and care that EMC would provide in a customer environment. This teaches us about not only how the product behaves but also about what the impact is of our support processes from a customer perspective. Finally, this helps us also start to look at the problem holistically – i.e. we do not approach debugging the problem from a product perspective but rather from the perspective of the complete solution that the customer deploys.

Mission Critical Center: What is in place and where are we going?

Now that we have talked through the concept, let’s look at what the MCC team has done so far. The MCC team was started as a ground up effort looking for like minded and interested stakeholders across different business units (translation: it has largely been built through a lot of conviction and convincing). The team is essentially built through a shared collaboration between a lot of business units (VMAX, VNX, RecoverPoint and VPLEX). Here is the configuration they have put together.

Mission Critical Center Architecture
Mission Critical Center Architecture

For readers of this blog, you should be very familiar with this topology – it represents the cascaded VPLEX and RecoverPoint topology discussed here and specific topology captured here. The team has built use-cases around stretched Oracle RAC across DC1 and DC2, stretched VMware HA and other applications all running production level workloads across DC1 and DC2 and protected in DC3. Once this mission critical platform was built, their focus was certainly to run I/Os and then start to do accelerated failure testing (i.e. simulate data center type failure scenarios to understand what failures happen across the entire solution set). The goal of this is _NOT_ to test interoperability of VPLEX with VMAX or VPLEX with VNX or to test the performance any one component. The goal is to take real world customer workloads and deploy them across infrastructure the way a customer would and to learn their operational challenges as well as how the infrastructure handles and recovers from failures. So, the MCC team will often fail WAN links, entire arrays, do tech refreshes, introduce a fabric wide zoning change, simulate disaster of a data center, … you get the idea. Needless to say, I am a big fan!

The team has some very concrete plans on how to take this forward. This configuration is now being morphed into the MetroPoint configuration. That way, they can implement this new and exciting capability in much the same way as a customer would and corresponding to that is a whole new set of failure modes to test and simulate. We will continue to add more applications (SQL, SAP HANA, Hadoop), more infrastructure variances (data center moves, network outages, rolling outages and the ilk) and then more of EMC’s product families (DataDomain, Networker, Avamar, ViPR).

Mission Critical Center: The call to action

As the team is building their capabilities, we have a very real need for active guides / participants to build a strong community around the mission critical center. So, here are the concrete asks:

  1. If you are a customer / field person with solutions / design experience and would like to participate in this effort, do reach out to me and I can put you in touch with this effort. You can contribute as often or as little as you like. Your role will be to provide guidance to the team in terms of what they should look for, help understand operational processes on your end and to help us along the journey towards how your data center is evolving to make our products provide the same world class capabilities as they do in your environments today
  2. If there are specific scenarios / applications that you think would be worthy additions to this environment, please reach out to me and we can work to get those on our TODO list for the Mission Critical Center

In the end, this is a community of some very talented engineers within EMC volunteering a big chunk of their time (in addition to doing their day jobs) to enable EMC products to deliver a 6 9s experience in customer data centers. Your help is going to help us get there sooner and make this process more effective. Do consider contributing to this effort!

ViPR 2.0: New use-cases to support VPLEX and RecoverPoint

The GA of ViPR 2.0 was announced in time for EMC World. While there are significant announcements in ViPR 2.0, I will focus on the pieces that benefit VPLEX and RecoverPoint in this new integration.

A quick recap of what was supported prior to the 2.0 release is available here.

Support for Snaps and Clones on arrays behind VPLEX

In the 2.0 release, ViPR now supports full life cycle management of Snaps and Clones on arrays behind VPLEX. This allows customers to get a single pane of glass management function for snaps and clones. This seamless experience makes it easy for customers to take advantage of the performance and scale of these capabilities on underlying arrays and not compromise on the ease of use needed to make this capability work. Here is a demo of this capability.

Setting up a Local Mirror (RAID-1)

Another addition made in the ViPR 2.0 release is the ability to add a local mirror leg to a given virtual volume for the purposes of creating a RAID-1. This allows the volume to be protected across arrays. Here is a demo of what this capability is:

VPLEX and RP Protection

One of the big additions with the ViPR 2.0 release was common management for RecoverPoint within the VPLEX context. This allows RecoverPoint protection for VPLEX volumes to be accomplished through the same user interface. Combined with the end-to-end VPLEX provisioning through ViPR, you can now accomplish complete VPLEX provisioning with RecoverPoint. Please note that ViPR 2.0 does not support the MetroPoint topology. This is targeted for future releases.

Updated Provisioning use-case

Since ViPR 1.0, the provisioning for VPLEX has been updated. Here is a demo of the updated provisioning workflow.

vMSC Support now extended to 10 msec RTT

Amidst all the fun of EMC World, there is some really important news for the VPLEX Metro and VMware community that I wanted to ensure was not lost.

What was supported until now

Prior to this change, the official vSphere Metro Storage Cluster (vMSC) support stance was that VMware HA and vMotion was supported until 5 msec RTT (Round Trip Time). There was an additional wrinkle. vMotion was supported up to 10 msec RTT with your vSphere Enterprise Plus licensing. And VPLEX Metro supported it. However, VMware HA was not supported up to 10 msec RTT.

What has changed

The big change that happened is that now as a part of vMSC both VMware HA and VMware vMotion have now been extended to be supported with VPLEX Metro up to 10 msec RTT. This qualification has been completed with both PowerPath/VE as well as with NMP (Native Multi-pathing) for the non-uniform access mode (i.e. non-cross connected configuration). This support is available starting with vSphere 5.5 and GeoSynchrony 5.2.

vMSC 10 msec RTT support

The VMware Knowledge Base article is updated here.

Many thanks to the VMware Ecosystem Engineering team as well as key technical leaders on both the VMware and EMC sides for helping drive this. This has been a long time coming.

Spread the word!!

RecoverPoint and VPLEX @ EMC World 2014

It is THAT time of the year. All roads lead to EMC World. In fact, as I write this, I am on my way to completing the second leg of my journey to EMC World. I have spent the last few days in Edmonton, Alberta with the VPLEX team. Some of the ideas we have been discussing have been simply mind-blowing. I cannot wait to get to build on those post EMC World. But let us get to the work at hand first. (Incidentally, Edmonton had snow today (May 4th)!!!!!!!! Yeah – thats their definition of spring I am told. Winter is when it is too cold to snow)

This will be our first year EMC World as a part of the new DPAD organization. Tons of excitement on that front. As we worked through the planning for EMC World, one of the big positives was the level of intersection that VPLEX and Recoverpoint have with other parts of EMC and the level of collaboration between all these teams to enable spreading that message. To all these teams, a BIG THANK YOU! This will result in VPLEX and RecoverPoint products having a really good presence on the show floor.

As always the sessions will be chock full of technical and strategy insights. A key part of the messages will be around the launches that we did for RecoverPoint and VPLEX in early April. However, beyond the new capabilities, there will also be sessions for the practitioners as well as customer insights. In fact, one of the key themes for us this year is customers who are presenting at the various sessions with us.

So without further ado, here is the list of VPLEX and RecoverPoint sessions at EMC World this year.

  • VPLEX: Introducing the VPLEX Virtual Edition: Presented by Cody Garvin who led the product management effort for VPLEX Virtual Edition. Cody will cover the basics of VPLEX/VE, why we built it the way we have, the use-cases we are targeting it for and deployment topologies that it will work with.
  • VPLEX: VPLEX Virtual Edition Architecture and use cases for 2014: Presented by super-CSE Don Kirouac (@dk_defined). Don will cover the gory depths of the VPLEX Virtual Edition to help you understand the architecture as well as what use-cases will work and how they will work out of the gate.
  • VPLEX: New VPLEX Provisioning Options with ViPR and Unisphere: Steve Breault and Peter Lund both product managers on the VPLEX team will present the fabulous integration work that has been done between the ViPR and the VPLEX team. They cover in-depth the pieces discussed in my prior blog post (here). In addition, there are new capabilities being added to ViPR to support VPLEX and RecoverPoint. Additionally, they will also discuss the work we have done with VPLEX Integrated Array Services (VIAS) discussed here.
  • VPLEX: Advanced Configuration and Design – Performance, Design, Failure Modes and More: The very cool Jen Aspesi (@routr_grl) presents this session. This was a big draw last year. Jen is amping it up with new learnings from the year, recommendations for new configurations. I have a feeling that this session will be oversubscribed once again.
  • VPLEX: Continuous Availability For All Business-Critical Applications: Robert Wagner presents this session. Here his focus is on the application layer and how they can be configured / structured to support continuous availability. If you think applications and VPLEX, this session is for you.
  • VPLEX: The Future of Availability powered by New VPLEX Use-cases: I have the privilege of presenting this session. This is an overview session which will talk about all the things that we have announced in 2014. It is a breadth first session which covers all the areas first and then each of the sessions above go into greater depth.
  • Introducing MetroPoint: Combining The Best Of VPLEX Metro And RecoverPoint Capabilities: This is an _awesome_ session presented by Saar Cohen, Chief Architect of RecoverPoint and Idan Kentor, one of our RecoverPoint focused CSE team members. I have seen Saar and Idan present this session in practice sessions. The content is top-notch and they cover the nitty-gritty of MetroPoint exceptionally well. You get the technical depth as well as the practical knowledge to realize your three site HA/DR dreams!
  • RecoverPoint Overview: Top Reasons Why Users Love It: Yossi Saad leads the Business Development team for VPLEX and RecoverPoint. He will cover the very well-known aspects as well as the not so well-known aspects of why RecoverPoint is the compelling solution that it is.
  • RecoverPoint: Accelerated Recovery for Virtual Environments: Yair Cohen presents this session that goes into depths of how RecoverPoint can protect and recover from virtualized environments. This shows all the work that we are doing to integrate with VMware and how those environments can be tuned / modified to make the experience seamless.
  • RecoverPoint: Planning and Deployment Best practices: Zahid Fadli is another of the RecoverPoint focused CSEs on our team. He will go into depths of how RecoverPoint can be configured, sized, architected for different environments. He will also cover best practices for the product in different environments. If you are a disaster recover practitioner, this is a session for you
  • RecoverPoint: Data protection for cost sensitive environments: Boaz Michaely presents this session to help customers and potential customers understand how they can derive even more value from RecoverPoint – this will talk about all the choices we have to drive down cost – data compression, WAN optimization, virtual RecoverPoint appliance.
  • VNX with VPLEX: Making Continous Operations seamless: I am presenting this VNX focused session which will cover all the goodness of VNX with VPLEX. In addition, it also covers the new items that we have introduced for integrations between VPLEX and VNX.

In addition, there are other partner sessions that also touch upon VPLEX and RecoverPoint. Here is the list:

  • EMC ViPR: Explore the New ViPR Control Services.
  • From Backup to Availability: Explore the Data Protection Continuum.
  • PowerPath Advanced Multipathing: What’s New in 2014.
  • VCE Vblock Data Protection and Mobility: Converge to Save Your Job.
  • Buckle Up! A 15-demo & Technical Tour of What’s New & What’s NEXT in Data Protection and Availability.
  • How Can ControlCenter & ProSphere Customers Make the Move to Storage Resource Management Suite.
  • Increasing Intelligence & Efficiency With Data Protection Advisor: Demonstrating Real Proof of Data Protection
  • AppSync 2.0: What’s New in 2014
  • VMAX Performance: Performance Aspects of Remote Replication

To all our internal partners: a BIG THANK YOU for making these possible and helping make RecoverPoint and VPLEX strategic to your ecosystem.

Other activities related to VPLEX and RecoverPoint:

  • Guy Churchward and Stephen Manley’s will cover the DPAD Portfolio in their super session on Tue at 3:00PM
  • Area 53: This is a super secret session about future capabilities being developed in the bunkers at EMC. Rumors are that there will be a healthy dose of some of the products you know and love. This is on Tue at 4:30 PM
  • Birds-of-a-feather – Data Protection and Availability Executive Panel – Redefining Data Protection for a Software-Defined World: Ask the DPAD Executive team
  • Hands On Labs – We have eight hands-on-labs that are focused on VPLEX and RecoverPoint. Do take your time to go through these. This should give you a very tactile operational feel for how the products work. Do not hesitate to give us feedback on what things you would like to see improved. Here is the list of HoLs:

  • HOL 07 – VPLEX: Introduction to VPLEX and VPLEX Integrated Array Services (VIAS)
  • HOL 08 – MetroPoint: Enhanced 3-site protection with VPLEX Metro and RecoverPoint
  • HOL 09 – VPLEX Virtual Edition: Continuous Availability Delivered To Your Geographically Dispersed ESXi Environments
  • HOL 10 – RecoverPoint: Discover Operational Recovery and Disaster Recovery In A Multi-Site Environment
  • HOL 17 – SRM Suite for VPLEX
  • HOL 20 – The Data Protection Continuum: Getting More with EMC Data Protection & Availability
  • HOL 22 – ViPR 2.0 – Introduction to all new 2.0 features, including VPLEX Snaps, Local Mirroring
  • HOL 24 – Infrastructure-As-A-Service Made Easy With VSPEX
  • And there are show floor displays, booth presentations and theatre presentations in the DPAD theatre as well as in partner theatres. Last but not the least, if you are one of the 100+ customers with NDA conversations set up, we are looking forward to meeting you as well.

    We are sending engineers, corporate systems engineers, architects and product managers to EMC World to help you get the best in terms of technical knowledge to help you understand the product, its directions and what capabilities we have added to it over the last year. I cannot wait for the show to kick off!

    Federation – A new business model for disruptive innovation

    [Even if it has been stated elsewhere, for this particular post, it is even more important to state that these are my views. My interest in this topic is to help organize my thoughts on the implications of the Federation structure created at EMC. Equally importantly, I am an EMC employee and an EMC shareholder, so I have my biases. Consider yourself warned!].

    What is the Federation?

    For the past six months or so, EMC has organized itself as a Federation. For the official announcement and details behind it, please look here. In case you see it on slides, here is the logo that is being used for the federation.

    EMC Federation Logo

    In a nutshell, the concept of the Federation is the coexistence of independent companies with a shared vision and at the end of the day, to a large extent a common P&L.

    Why organize in this way?

    Usually, when looking at business organizations, there are two common ways of organizing. Obviously, no one mechanism of organizing is superior to the other. The correct answer for any given organization depends on a bunch of factors – the desired end goal, market dynamics, level of overlap between the businesses and finally, which bets does the organization want to place. In other words, there is no right answer.
    The two common ways of organizing are

    1. Monolith
    2. Conglomerate

    Monolith

    While the word monolith creates all sorts of mental images, this is by far the most common way of organizing. The organization is built around a singular vision. Different businesses within that organization are organized as business units. Typically, such organizations are able to benefit from shared services – HR, IT, Sales etc. The goal of this mode of organization is clarity of purpose, driving more alignment and efficiency in terms of common capabilities that can be optimized. This doesn’t stop these companies from being diverse – just that the degrees of separation between the individual businesses is relatively small and usually, focused on the same or very adjacent markets. The selling motion is usually similar – the end customer tends to be similar. In other words, the channels are more or less consistent. Naturally, as with most things in life, there are shades of gray – the degree of freedom afforded each business unit is a variable, what shared services operate is a variable etc.
    So, whats the downside of such an organization? The greatest strength of such organizations is also their weakness. These organizations tend to be pretty set in terms of their mode of business, what channels and markets they can go after. So, in some ways, for such companies, it is difficult to change their core fabric. Not impossible but definitely herculean. Purely from a business perspective, optimization for a particular market implies that you are completely subject to the vagaries of that market. So, any macro or micro economic trends that impact that particular market leave such a company vulnerable.

    Conglomerate

    To counter some of the issues seen in the monolith, the conglomerate emerged as a way to get diverse selling motions, diverse companies under the same organizational umbrella. The businesses are not required to be related to each other. The conglomerate functions as independent businesses pooling their P&Ls together. Berkshire Hathaway, General Electric and the Tata Group are some of the prominent examples of this structure. This organizational structure gives up on the efficiency of a common set of services in order to be able to diversify what they sell to their customers. Each business may have its own discrete vision, strategy and market. The common binding item across the conglomerate is the set of shared values across the teams. To illustrate this point, transactions across businesses within the conglomerate are identical to transactions with companies that are not within the conglomerate. The businesses are free to optimize to meet their objectives.

    Why Federation?

    With this perspective, let’s now look at the Federation.

    In some ways, EMC has been on this journey since the VMware acquisition in 2003 (yeah – it was that long ago!). Right from the get go, the interactions between EMC and VMware have been independent. I will not say that this was easy at first. My first interaction with VMware was as a developer dealing with a customer escalation where the customer expected that we would behave as one company. There were a few moments of awkward phone silence as we explained to the customer that we were independent entities. As we all grew comfortable operating in this model, folks on both sides understood why this “together but independent” state was important. At EMC, we realized that we had to win VMware’s business on our merits and that VMware had to interface with EMC’s competition in the same way that they interact with EMC. As that message was understood across both companies, EMC mobilized to have the best possible integration with VMware not because of our inherent affinity but because we recognized the business value that VMware brought.

    With VMware as a separate entity, this allowed two additional benefits:

    • VMware was able to maintain and develop its go-to-market independent of EMC. This gave it access to different markets and different levers to enable.
    • VMware was free to innovate independent of the impact to EMC. They were not beholden to EMC’s inherent interests and were able to take a different stance than a business internal to EMC might have been able to.

    Clayton Christensen has talked about the innovator’s dilemma where disruptive new technology suffers from the hegemony of the current dominant technology. It is far more deeply entrenched than just technology. For a big successful company like EMC, the entire company has been tuned to make the current technology successful. GTMs, incentives, selling motions, profitability structures, purchase cycles, relationships – the list is endless – are all geared to the current technology. Now imagine changing all that after recognizing a market shift while maintaining your current revenues and while people are changing what they do. Despite best intentions, this transformation is a perilous journey that few dare undertake and even fewer complete successfully. To say nothing about going to individuals who care passionately about the products they work on and tell them that the next big thing is going to surpass their pride and joy.

    With this backdrop, fast forward to today. The world of IT has been changing dramatically. You can see that there is a shift in how data is generated, how it is consumed and utilized. Data is the new frontier. The implications of this have not been fully realized. However, it is clear that the magnitude of this shift is enormous. So, the same conditions that led to VMware being kept as an independent entity are now in effect for attacking this new market. That independent entity is Pivotal.

    So, you have three independent companies (shades of a conglomerate) all working in adjacent markets (shades of a monolith) working with a shared vision (shades of a monolith) with the independence to optimize what they need for their business (shades of a conglomerate). That’s the Federation – not a monolith, not a conglomerate but an amalgamation of both.

    It is typical of what I have seen within EMC – we have a big challenge in front of us, we tackle it head on with a very creative solution. I have this mental image of whoever came up with this as waking up one morning with this solution in their heads and feeling like they have solved world hunger – this is brilliant stuff!

    Does this mean that everything is perfect? It never is and there are tradeoffs that are needed to make this successful. For one, this strategy does mean that the individual businesses will have to work that much harder to maintain alignment. The businesses have to be judicious about where it is okay to overlap in technology (since they are in adjacent markets) and what to do when that inevitable overlap arises.

    If this strategy does work, this may be an interesting way to address the innovator’s dilemma and guide future companies trying to innovate disruptively while continuing to execute on existing businesses. To me, the beauty of this approach is that it takes that challenge out if the realm of organizational discipline (which is frail) and canonizes it within the corporate structure. I, for one, am rooting for its success (and beyond just the ‘I am an EMC employee’ reason). We will see how the story unfolds. As Churchill aptly said, ‘However beautiful the strategy, you should occasionally look at the results.’

    A Blog About Clouds and Data Center Technology … Mostly

    Follow

    Get every new post delivered to your Inbox.

    Join 60 other followers