Customer Focus: The EMC Way

Every company has some core founding principles / values – some rather overt, some implicit. These are not principles related to product or technology. Nor are they related to vision. More often than not, these values are not written down. You only learn these through oral traditions of stories / myths at bars through people who have been in the organization long enough. Rarely you get to experience these first hand.

So why bring this up now?

After I shipped myself to the west coast, I am in the odd position of being in a minority (a person who was with EMC but moved to Isilon, from Hopkinton to Seattle). I am a conduit of these very myths – some I have learned (The ‘Yes it does snow in New England’), others I have lived. This is one in the latter category.

This story is from many _many_ years ago. All names (except some key principals who I am sure won’t mind) have been kept confidential for obvious reasons.

I had recently taken on a management role within the engineering organization. I was responsible for SW development and customer escalation management.

As most stories go, this one started with a rather innocuous request from a customer (BTW, they have since become one of my favorite customers – visionary, drive technology, take calculated risks and in every way, partner with us to build better products. It also helps that they are a household name – one of the few ways I can help my non techie family members understand what it is I work on). The request was for them to migrate between data centers with a special request to ensure that engineering was involved.

As it came to us, this seemed like a normal request and we assumed that engineering involvement was needed largely for review. Then the oddness started – the product we had was specifically designed for migrations. But the customer was apparently not using this product for migration.

We dug in, contacted the account team who contacted the customer. Turns out it is a migration, except it isnt a copy of the data but rather a physical move of infrastructure. And the customer wanted to keep their operations online through the physical move and were convinced that they could do this with our product.

I have to share this in all honesty – as the person responsible for carrying the quality banner, I was petrified. While the customer, in theory, was RIGHT (aren’t they always?), we had never quite anticipated a customer contemplating using the product in this manner. Oh yeah, the move was going to happen on Thu and we learned about this on the Monday of that week.

Once you get past the seven stages of coming to terms with reality, we got down to brass tacks (and yes, involvement from engineering was going to be more than just review :-)). One of the engineers from my team was going to head down to the customer site (drivable distance from Hopkinton) to perform this ‘move’. He got testing and practicing the move procedure working out any kinks. So far so good and nothing too far out of the ordinary as far as customer escalations go.

One more thing …

So we have worked through the kinks, the engineer going to the customer site was feeling confident. Thursday morning we run through a last check. Lo and behold, we found out that the long distance SFPs that are needed to enable the migration were misplaced at the customer site. For some reason that I cannot remember, these were not SFPs that were just lying around (I seem to remember that the default was to use short distance SFPs). So, here we are at noon on Thu with everything set except the key ingredient to make the move successful.

I remember going to my manager (@MattWaxman) with a complete dead look that basically said, ‘I am out of options’. In an inspired moment, he suggested something off the wall – since then I have learnt that desperation makes you creative – “Let’s email all the people we know at EMC (our PMTs, BMTs, execs, support) with a system wide SOS that said something to the effect of ‘We need long distance SFPs for a customer in the next three hours – here is the model number. Please contact us if you have any of these lying around. We need 24.’.”

We sent that note – expecting this to be a complete Hail Mary with no chance of success.

Not having much else to do, we ran through one more dry run for the move and let the account team know that we may have to cancel since we didn’t have the SFPs. The final go no go was set at 3:00 PM. The engineer was leaving at 4:00 PM for the drive over.

Here is what happened instead.

I returned to my office post that dry run. And on my desk, I had ten to twenty different packages of SFPs – some from people I knew, most of these from people I didn’t know. I had one guy who drove from our factory in Franklin MA with a box of these SFPs. His exact statement to me was – ‘Someone told me that they had heard about you needing SFPs for a customer. I have tested all of these – they work. Make the customer successful’. I had sticky notes that said the same thing.

Needless to say, the engineer was able to take this on their ride to the customer site with them. The move was executed flawlessly. In fact, the customer’s end customers didn’t see a single app bounce. This customer and that account team became one of our biggest advocates.

What did I learn

Customer focus was something Dick Egan intentionally drove into the EMC culture. Many many years removed from his direct involvement with the company, EMC employee #1 continued to cast a large shadow. It tells you how important founders (and the culture they establish) are.

Customer success is everyone’s responsibility whether you work on a product or not. Customers make the world go round. Over time I have been in many situations where my direct responsibility would have caused me to not act. In all those situations, I try to act in the same proactive ownership culture that I was the beneficiary of.

Always focus on what’s right for the customer. The easy answer above would have been to declare the use case as unsupported. The harder answer was to look past the execution risk and focus on what the customer justifiably needed. Many thanks to the customer for pushing us.

Even as I type this blog many years removed from the actual incident, I continue to be touched by the camaraderie and the sheer stick-to-itiveness of the EMC culture that did not allow us and the customer to fail.

It is one of those rare moments where it feels like the entire company stands behind you as an individual helping you succeed. In my day-to-day interactions within EMC, I continue to use this incident as the yardstick by which I measure myself.

InsightIQ: Basic workflow demos

InsightIQ is Isilon’s software for capacity planning / reporting and performance troubleshooting / reporting. Over the past few releases, we have been working diligently to making some major shifts to how IIQ workflows function and what capabilities the product can provide.

Instead of trying to describe these workflow changes, our TME team (specifically the awesome Robert Chang) has come up with some demos to show these common workflows to our customers.

Use-case 1: Identify demanding NAS Clients

This use-case focuses on how you can identify a client that is consuming network resources. In this case, start with the external throughput and work your way to the actual workstation / IP address that is consuming the resources. That can now break out into the type of I/Os and which protocol within that client to help narrow down what is happening.

When do you use this – when you start seeing clients who are not able to get the bandwidth they need from Isilon, this can be a great first step to understand who is consuming the bandwidth resources.

Use-case 2: Protocols Operations Average Latency

This use-case focuses on identifying the latency for protocol operations within the Isilon cluster. In scenarios when clients are trying to debug latency issues, this procedure can be very helpful. It is important to understand that Isilon can only help identify latency once the I/O enters the system. There may be network contention outside of Isilon or even on the client if multiple hosts are contending for the CPU. In a lot of cases, this is a good sanity check. I know of a couple of customers who maintain tabs on this latency as a means to indicate overall Isilon system health.

Use-case 3: Capacity utilization

Isilon runs a job called File System Analytics (FSA) which collects the metadata for files. This is then combined with the raw performance and capacity data to derive some very helpful information. In this particular demo, Robert tracks through capacity utilization.

When is this useful? – the primary case is when you are trying to understand where your capacity is being consumed but more importantly, which client has had the biggest delta in capacity. Note that, this is one of the mechanisms to debug this – you should always manage capacity through proper use of soft and hard quotas.

And we are just getting started – there is a lot more that we need to tell you and are planning to tell you about the InsightIQ space. Stay tuned!