AB Testing Podcast – Episode 2 – Leading change

The Angry Weasel and I have put together another podcast. In this episode, we talk about problems we see in leading change towards a better direction. We cover some changes we face, change management, and reasons change fails. We talk about the importance of “why” and leverage the “Waterfall vs. Agile” religious war, as an example.

We managed to keep our “shinythingitis” to a minimum of slightly less than 70% of the time. 🙂 Enjoy!

Want to subscribe to the podcast?
RSS feed


A/B Testing Podcast goes live

So one day, a colleague and friend, Alan Page,  said “Hey, Brent, why don’t we start a podcast?”.   After much discourse (Alan pushing me) and challenges (me finding the time), I am happy to announce that I suck at it, but am doing it anyways.  I am very thankful to have an old pro who is letting me tag along for the ride. Anyways, if you’ve got a 30 min to kill and want to hear some more from a couple of guys who are trying to help lead change, please check it out. AB, in this context, stands for Alan/Brent and in “Episode 1” we explore Testing vs. Quality as well as recent changes happening at Microsoft.  Enjoy.  Feedback is always welcome.   We are likely to keep going until someone tells us that the pain is unbearable.  🙂

Download:AB Testing – “Episode” 1

Or play it now:

Want to subscribe to the podcast? Here’s the RSS feed.

In Pursuit of Quality: Shifting the Tester mindset

Last time, I wrote a book review on Lean Analytics. Towards the end of that post, I lamented that I see a lot of testers in my neck of the woods trying to map their old way of thinking into what’s coming next. Several folks (Individual contributors and managers of same) have come to me wondering why should test move into this world of “data crap” and why is how they have been previously operating so wrong now. It is my hope today to explain this out.

But first, before continuing, I’d like to try something new and offer you a poll to take.

Please consider the following:

So which did you pick? Over time, it will be interesting to me to track how people are viewing this simple comparison. I have been doing this example for almost a year now. When I first started it, about 1 in 2 testers polled would select the bug-free code. Whereas with testers I talk to lately, about 1 in 3 will select it. I definitely view this as a good sign and that folks are starting to reflect on these changes and adapting. My ideal world is that 1 year from now the ratio is closer to 1 in 10.

Why is this poll so hard for folks?

Primarily, it is due to our training. Test was the last line of defense – a safety net – needed to assure we didn’t do a recall when we released product out to manufacturing. When I first started in the software development world, 1.44 floppy disks were the prevailing way customers installed new software on to their system. Windows NT 3.1, as example, required 22 of them. It was horrible. Installation of a new machine would take the better part of the day, disks would be asked for out of order, and lastly, people would often get to the end of the install to discover that a setting they were asked for at the very beginning was wrong and that it was easier to just redo install than to hunt through the manual to figure out how to fix it after the install.

Customers who got their system up and running successfully and found a major bug afterwards would be quite sore with us. Thankfully, I have not heard of this is quite some time, but back then, Microsoft had the reputation of shipping quality in version 3.0. There was a strong and successful push within the company to get our testers trained with a singular mission: find the bugs before our customers do and push to get them fixed. I was proud to state back then that Microsoft was the best in the world at doing this.

The problem I am attempting to address is the perceived value loss in Test’s innate ability to prevent bugs from hitting the customer. A couple of months ago I presented to a group of testers and one of the questions asked “All of this reacting to customer stuff is great, but how can we prevent bugs in the first place?” Thankfully, someone else answered that question more helpfully as my initial response would’ve been “Stop trying to do so“.

The core of the issue, imo, is that we have continued to view our efforts as statically valuable. That our efforts to find bugs up front (assuring code correctness) will always be highly regarded. Unfortunately, we neglected to notice that the world was changing. That, in fact, it was more dynamic: Our need to get correctness right before shipping was actually tied to another variable: Our ability to react to bugs found by customers after shipping. The longer the time it takes us to react, the more we need to prevent correctness issues.

“Quality redefinition” – from correctness to customer value

A couple of years ago, I wrote a blog, Quality is a 4 letter word. Unfortunately, it seems that I wrote it well before it’s time. I have received feedback recently from folks stating that series of posts were quite helpful to them now. One such person had read it then and had a violent allergic reaction to the post:

“Brent, you can’t redefine quality”.

“I’m not!”, I replied, “We’ve *always* had it wrong! But up until now, it’s been okay. Now we need to journey in a different direction.”

While I now refer to the 4 pillars of Quality differently, their essence remains the same. I encourage you to read that post.

The wholeness of Quality should now be evaluated on 4 fronts:

  • Features that customers use to create value
  • The correctness of those features
  • The extent to which those features feel finished/polished
  • The context in which those features should be used for maximum value.

Certainly, correctness is an important aspect of quality, but usage is a significantly greater one. If you take anything away from today’s post, please take this:

Fixing correctness issues on a piece of code that no one is using is a waste of time & resources.

We need to change

In today’s world, with services lighting up left and right, we need to shift to a model that allows us to identify and improve Quality faster. This is a market differentiator.

It is my belief that in the short term, the best way to do this is to focus on the following strategy:

    • Pre-production
      • Train your testers to rewrite their automation such that Pass/Fail is not determined by the automation, but rather, leveraging the instrumentation and data exhaust outputted by the system. Automation becomes a user simulator, but testers grow muscle in using product logs to evaluate the truth. This set of measurements can be directly applied to production traffic when the code ships live.
      • Train your testers to be comfortable with tweaking and adding instrumentation to enable measurement of the above.
    • Next, move to Post-production
      • Leverage their Correctness skillset and their new measurement muscle to learn to understand the system behavior under actual usage load
      • This is an evaluation of QoS, Quality of Service. What you want Testers learning is what & why the system does what it does under prodtraffic.
      • You can start here in order to grow their muscle in statistical analysis.
    • Then, focus their attention on Customer Behavior
      • Teach them to look for patterns in the data that show:
        • Places in the code where customers are trying to achieve some goal but encountering pain (errors, crashes, etc) or friction (latency issues, convoluted paths to goal, etc). This is very easy to find generally.
        • Places in the code where customers are succeeding in achieving goal and are walking away delighted. These are patterns that create entertainment or freedom for the customer. Unlike the above, this is much harder to find, it will require hypothesis testing, flighting, and experimentation, but are significantly more valuable to the business at hand.
      • Being stronger in stats muscle will be key here. Since Quality is a subjective point of view, this will force Test away from a world of absolutes (pass/fail) and into one of probabilities (likelihood of adding value to customers vs. not). Definitely, it is wise to befriend your local Data Scientist and get them to share the magic. This will help you and your team to scale sustainably.
      • This is an evaluation of QoE, Quality of Experience. What you want Testers learning is what & why the Customers do what they do
    • You will then want to form up a dynamic set of metrics and KPI’s that capture the up-to-date learnings and help the organization quickly operationalize their goals of taking action towards adding customer value. This will generate Quality!

Lastly, while executing on these mindshifts, it will be paramount to remain balanced. The message of this blog is NOT that we should stop preventing bugs (despite my visceral response above). Bugs, in my world view, fall into 2 camps: Catastrophes and Other. In order to have Quality high, it is critical that we continue to work to prevent Catastrophe class bugs from hitting our customers. At the same time, we need to build infrastructure that will enable us to react very quickly.

I simply ask you to consider that:

    As the speed to which we can react to our customers INCREASES, the number of equivalence classes of bugs that fall into Catastrophe class DECREASES. Sacrificing speed to delivery, in the name of quality, makes delivering actual Quality so much harder. Usage, now, defines Quality better than correctness.

Ken Johnston: a friend, former manager, and co-author of the book “How we test at Microsoft” recently published a blog on something he calls “MVQ”.    Ken is still quite active on the Test Conference scene (one next month), but if you ever get the chance to ask, ask him “If he were to starting writing the Second Edition of his book, how much of the content would still be important?“. His response is quite interesting, but I’ll not steal that thunder here. J

Here’s a graphic from his post for your consideration. I think it presents a very nice balance:

Thank you for reading.

Book Review: Lean Analytics – great primer for moving to DDE world


So O’Reilly has this great program for bloggers, called the Reader Review Program. They will let me pick out a book of my choosing & read it for free, as long as I write up an honest review of the book here on my blog site. Because I know that I will eventually posting reviews here, I will be picking books that I think might have value to the audience that is following me. This is my first foray into this model. Right now, I think it’s an “everybody wins” thing, but I will pay heightened attention to how this affects the integrity & reputation of the site. Since I am generally reading 5-10 books at a time, I highly doubt that I will post blogs like this more than once or twice a year. Your feedback is welcome.


Lean Analytics by Alistair Croll & Benjamin Yoskovitz; O’ Reilly Media

    The title above will take you to OReilly’s site so you can delve further if you choose.



    As the title suggests, Lean Analytics is a solid combination of two powerful movements in the Software Engineering world, Lean Agile and Data/Business Analytics. While there are several books out there discussing the need for data science and growth in statistics, this book really covers the What, How, and Why for using data to drive decision making in your specific business. Without being too technicial or academic, it introduces readers to techniques, metrics, and visualizations needed for several common business start-up models in operation in today’s world.


    I am ***REALLY*** fond of the Head-First Series of books and that is just about the only thing that could make this book better. After The Lean Startup this is probably the most useful book for those trying to iterate fast in today’s software engineering world. I found the information to be very straightforward and easy to follow. While I think the authors really tried to cram everything they could into the book (at times, making it read awkwardly), they introduce you to practical examples of how to use the material and when.


Several sections of the book are quite good… looking at some lightweight case studies of startups and the analytics they used to navigate muddy waters. The book tries to make all types of software business accessible. Ranging from how to categorize the growth phase of your company, what things to use during your value phase, what analytics are appropriate for various types of companies (mobile apps versus SaaS e.g.), and even how to operate within enterprises. As a result, though, the depth at times can be lacking but if you are looking for a breadth book that covers all of the basics this one might be good for you. Reading it is one of the reason I have decided to start my Masters in Analytics. With more information in the case studies, and more examples of actual data to look at and suggestions on how to avoid false metrics and gives guidance on what to look for.


One of the struggles that I am seeing at my place of employ is that Test is shifting away from automation roles and into data pipeline roles. This means we are just changing the way in which we deliver information so that others can analyze it and make the “adult” decision. This, imho, is not good. But it falls within Test Wheelhouse, so it is safe. Please Please Please instead grab this book and take a leadership role. This book will help us start the disciple move into a direction setting role instead of just a measurement one.


This will likely be the topic of my next post. Thanks for reading…


What would a Leader do?

Just over 2 years ago I wrote my most viewed blog post to date: The Tester’s Job. As 2013 comes to a close, there is much hullaballoo happening in the Microsoft Test community. After a series of non-trivial reorgs, it is clear Microsoft is making a huge step away from the traditional role for Testers and even the word Test is being stricken from the vernacular with our fearless leaders’ titles being replaced with a new one Director of Quality. Followers of this blog know this is change I felt was coming for a while now as well as aggressively supported.

Last May, my own team underwent this change with some great successes and some even greater learnings which I presented recently to a set of Test leaders. This presentation has become the closest thing to viral that I’ve ever experienced. Mostly, I believe, because folks are anxious about the change and are eager to use whatever data they can find to help predict their own future. The deck mentions some pretty significant changes needed to make the paradigm shift happen. The most controversial being a very large change in the Dev to Test ratio (a number that was historically used to determine the “right size” of test teams during reorgs). In my experience, some folks are more comfortable with innovating and being a starter. Whereas, other folks are superb at executing and getting work closed. Between the two, I have always been much more interested in being on the frontlines of change. Accordingly, I’ve never much been afraid of change, and view it as an opportunity to explore something new. I thoroughly enjoy seeing how these new learnings can be used to grow myself, my team, and whatever product I happen to be working on. And as a result, my love of the New and of Learning has helped to make me quite adaptable over the years. However, I understand that not everyone’s built this way and even those who are, the changes coming might be more than they can tolerate.

One colleague sent me the following in email after seeing my presentation:

Sobering. This is a lot of where we are headed in [my group], but without (so far) shifting any resources.  This may be really hard on test from the sounds of it.  Are there any suggestions on how to lessen the pain? Do we just rip the band aid and give people retraining?”

Since this is something that I am getting a lot of questions about lately, I felt it would be a good topic for a post. As I mentioned last time, Spider organizations don’t scale to the degree we will need them, so we need to build up Starfish muscle. While this does mean a move towards Headless organizations, it by no means describes one that is leaderless. In fact, leaders become critical. One of the challenges with this shift is that people are so accustomed to living in Spider organizations that they forget, nay, afraid to lead. This is the first change I think people need to do.

Here’s a very simple strategy I have found that helps me when times are troubling:

  • Ask and answer: What would a Leader do? – If there were an actual leader right here and now, what would they be doing? Why? What goal would they be trying to achieve? How would they go about it?
  • Be that leader – Why not you? Everyone has the ability to lead. It’s just easier not to. Choose to lead. Bravery is doing the right thing even though you are afraid.

So what would a leader do in these times? Here’s what I think:

  1. Keep their head on straight. There are 4 key things people need from their leaders. Trust, Hope, Compassion, and Stability. You cannot provide *any* of these if you join the panic. Imagine the state of the fireman who is going UP the stairs of a burning building.
  2. Manage the Change
    1. Explain what, why, and how the change is occurring. All three! Many times I see leaders leave one of them out. Folks need all three in order to triangulate on the correct direction.
    2. Explain the goal and new direction. Telling folks where to head is easier and more beneficial than telling them what to avoid. “We need to ship weekly” is better than “We need to stop shipping every 3 years” as examples.
    3. Enlist others in making the change happen. People are more likely to follow something that they contributed to creating. I’ve always been a fan of enlisting the team to come up with the logistics and dates and to place themselves into the positions that they are the most passionate about.
    4. Pull the trigger. Be the tipping point to get the momentum going.
  3. Train Themselves – This is probably the single most important item. You cannot help others if you have not helped yourself. You need to learn more about the new world you are heading towards. Dive in. Then and only then will you be in a position to guide others. Seek out internal experts. Change jobs. Go back to school. Head to conferences. I read recently that if you just read 1 hour a day in the field of your choosing, you would be an international expert on that topic in just 7 years. Those investments add up very quickly. Do not underestimate it. (I, myself, will be starting my Masters in Analytics on January 6th! (very excited about it))
  4. Train Others – You need to distribute what you have learned. A couple of things I have been doing lately:
    1. When someone asks me to talk to them on a topic, I assume they will not be the last. So organize it as a presentation.
    2. Record it, so it can be shared.
    3. Create a local community and ask each interested person if they’d like to join it. I can no longer find the reference, but something like 80% people will join if they are just asked. Try to drive participation on the community to make it self-sustaining.
  5. Get out of the way – Remain as the bottleneck to the change for as little time as possible. Someone may need to stay at the helm in order to make sure the momentum continues in the right direction, but once it is clear that it has, get out of the decision making process and let the team be empowered.

Rip the Band-Aid?

    To be honest, I am a proponent of Band-Aid ripping in these situations. People are afraid to make changes due to the unknown consequences. As Brad Pitt’s character asked in MoneyBall, “Which would you prefer, a clean shot in the head or 5 shots to the body and bleed to death?” The longer you wait, the harder the pain will be for those involved. But DO NOT rip the Band-Aid without a plan.

One last note: One very popular question people have been asking me lately is do I think that Test is dying? I believe Test (as we know it) is like a chicken with its head cut off. It’s dead, but the rest of the body is still flapping about and doesn’t quite know it yet. I have now been at the company for 20 years and, in that time, have seen a number of these big transitions occur in the Test discipline. I find it wise to remember: Each time these transitions occurred, a fairly large number of people were affected, but as a whole, we improved and became more valuable to the company. I think this time around will be no different. My view is that, given our innate ability to code and test coupled with our passionate pursuit of quality, our staff is well suited for being the engineers of the future and perhaps better than any of the other disciplines. However, whichever way the wind blows, it’s clear we will need to change. My New Year’s Resolution is to help anyone and everyone I can to help make this migration. After all, it’s what a leader would do.


Irresponsible Accountability

How familiar is this narrative to you?

<Setting: Triage/shiproom discussion not so far away and not so long ago… Key contributors and decision makers are all in the room. They are trying to collaborate towards determining what bugs need to be fixed before they can ship. The planned ship date is looming ominously just around the corner….>

Dev Manager (of Managers): We should fix these bugs…

Dev: What’s the bar? I thought we were recall class only as of today.

Dev Manager: They are easy… We should do them… You guys should just be able to get them out of the way.


Sometime later


Dev: There’s one last thing I’d like to discuss. It’s clear that from the bugs we accepted today, we are ignoring the recall class only bar.

Dev Manager: I don’t care about the bar; we need to do the right thing.

Dev: But if we keep doing that, how are going to land this project in time?

Dev Manager: I’m sorry. I don’t understand your question or concern.

Program Manager: um, Mr. DM sir, what Mr. Dev means is we’d like to understand what Leadership’s plan is to get to the quality goals with the dates specified?

Dev Manager: What?!?! That’s *your* job. You want me to do that too?!?

When I heard this story from one of my mentees, I had already been thinking on the nature of Accountability Cultures. His team is in a bit of a mess right now as they quickly try to align reality with promises made to customers… They have bitten off far more than can chew. This is made worse by the strong lack of leadership in his team as evident by the Dev Manager in the story above, who is clearly a JeDI (and no, that is not good!).

So what do I mean by an accountability culture? I mean those workplaces where management is focused on assigning owners to tasks/work for the purpose of “knowing who is responsible”. These organizations are generally hierarchical in nature and this ownership model is intended to both “empower” and “control” how work is being done. However, far too often, the results it achieves is simply the knowledge of who, precisely, to blame when it fails. In other words, it does not optimize for helping teams succeed, but rather for helping them to fail. Teams who want to succeed figure out ways to work around their management’s policies by “going dark” and doing work in secret or working much harder than needed to satisfy management’s quest for owners and the desire to move the business forward.

My litmus test for the accountable person for anything: S/he is the one who apologizes when things go astray and is the one looking to make things right. This is significantly different than “the one who is to blame”.

Thankfully, I have only worked in 2 such teams where I felt the culture simply was too toxic to be fixed. Management usually does not want, nor understand, that their behavior is causing a negative downstream effect on their staff.

How do teams get to this point?

There’s how I think this happens.

  • Spider is beating the Starfish: Human society is constantly fighting it out over the best way to organize in order to produce the best results. Ori Brafman writes about this in his book The Starfish and the Spider.
    • Starfish models are headless. Teams of people work together to achieve goals. The individual is tantamount and bands together with others towards complimentary goals. If you played or observed World of Warcraft or similar games, you’ve seen this model in action. They are tribal in nature, fast, and have a lot of flexibility, but they don’t efficiently utilize resources in a single direction. Instead they are effective at being able to change direction at any moment in time and collaboration amongst the team’s members thrives. Decisions can be made quickly as the framework is principled, not tactical. (eg “do the features in your team’s backlog that add the most fiscal ROI”, vs. “Add the ability to Bold text in a help file”)
    • Spider models have a head. They scale to greater number of people, but have a critical flaw. Kill the head and the body dies with it. If the head has a bad plan, the body goes along for the ride. This is also known as “Command and control“. Competition thrives in this model. Folks quickly understand that to “win” you don’t need to convince others….. Only the head. Spider models are more effective when you know the objective to be achieved and how it needs to happen, but they are not effective when decisions need to be made quickly and repeatedly, such as is common in the Software services world.
  • Peter Principle is alive and well (it’s proven!)
    • People are getting promoted to where they are no longer competent.
      • They are not taking training to understand how to scale to their new role, nor are they learning the state of the art techniques
      • They have to make more and more decisions with less and less data, which is harder and harder. So the validity of their decision becomes more and more unstable.
  • Jensen’s Law of Politics – For years, I have been teaching folks this insight: “He who is defensive, loses… Always…”
    • One cannot win *any* game by playing defense only.”

So what happens is: Spider models get propagated (for a variety of reasons (control, $$, clarity of purpose, etc). People then get promoted to the point where they can no longer scale to efficiently provide good decisions to their subordinates. Since spider models are competitive by nature, people quickly (subconsciously) begin to realize the way *they* continue to win/keep the head position is by taking the offensive position.

    They blame others.

This then gets sustained in a vicious loop:

  1. First off, Leaders hate getting blamed.
  2. But, Leaders don’t have the time to learn how to do things differently
  3. But, this process, as a decision making framework, is too slow. Leaders don’t (can’t?) take time to understand the root causes to failure or the dependencies needed to satisfy success. So they keep plodding on. Essentially, “hoping” they succeed.
  4. But, since they don’t, they resort to finger pointing and brute force tactics (“30 hr” workdays) to set things straight again.

Is this fixable?

I think so, but teams *must* learn Starfish techniques and the environment must support it.

  1. In my mind, this means any form of Adaptive project styles (scrum, kanban, XP, etc). But they need to make sure they are *adapting* and not just iterating… Teams should be encouraged to act, but to validate everything with actual learning.
  2. Create an environment where people commit to goals, not told to commit. Don’t fool yourself. People cannot be told to commit to anything. In order for them to commit, it must fulfill some important purpose to them, it must be achievable, and they must understand the risks, the rewards, and the degrees of freedom they have if the project goes out into the weeds.
  3. Teams need to be taught to work together to achieve goals. Taught to trust and (more importantly) rely on their peers to move forward.
  4. There is an important distinction between solving problems and owning solutions. Owning the solution doesn’t guarantee that it actually solves any problem that is important. For the past 3 teams I have led, I have had a team motto that I adore: “my team’s job is to solve problems, not write tools or code. If I can leverage another team’s component, I will. NIH kicks ass!” (not invented here)
  5. Give folks guidance and principles to make decisions on their own. Enforce this. This one is probably the hardest to do. Folks get used to not owning their own decisions. It’s uncomfortable for them. “Are you going to blame me if I fail?” My style is to let individuals make the decision, but reinforce that the team will own cleanup if the decision fails. I try to create a spirit on the team that individuals are shepherding work, but the whole team is accountable for all work in progress (see earlier post). This helps people feel empowered and forces the team to have each other’s back.

Lastly, I recently got certified as a Program Consultant for the Scaled Agile Framework, which means for the next year I am allowed to teach it and certify others. One of the really great things I think those folks have figured out, is that in order to truly scale, you need to find an efficient way to decouple the business strategy from the implementation tactics. I’m over-simplifying, but in essence:

  • Management owns setting strategic direction, measurement of success, cadence, and funding
  • Team owns creating an implementation they commit to.
  • Management owns creating the decision framework that is principled based. Teams are pre-informed with the constraints.
  • Team owns making the decisions, staying within the constraints.  Team owns correctly setting expectations with Management.
  • When things go wrong, both sit in a room as peers to work out how to adapt. Both have an important and distinct role to serve. When they are working together to thrive, the business does too.

Special Bonuses:

  • Here’s one of my favorite videos of the famous All Blacks rugby team. Showing what committed teamwork looks like.
  • The US military defines Command and Control as: “The Exercise of authority and direction by a properly designated commander over assigned forces in the accomplishment of the mission. Command and control functions are performed through an arrangement of personnel, equipment, communications, facilities, and procedures which are employed by a commander in planning, directing, coordinating, and controlling forces and operations in the accomplishment of the mission”

    Their most recent investigations show that C2 doesn’t scale due:

    • The essence of command lies in the cognitive processes of the commander.
    • the influx of data now available
    • The speed at which the operation must execute.

Automation doesn’t kill productivity. People do.

Shortly after I wrote Forgive me, Father, for I have sinned, I received the following email from a colleague of mine:


I read your most recent blog. Your blog is actually dangerously close to sinning as well. In principle I agree with your sentiment, but be aware of violent pendulum swings. There is still a lot of value in the type of automation systems we have built, but it has to be tempered with a self-enforcing quality of quality, and quality of developer code measures. Good test teams actually do enable bad developer behavior. We become like a cheap compiler. Test will catch any issues, and quickly too. Developers are perfectly capable of writing solid (not bug free) code. They are just not always incentivized to do so. With a good test team, they don’t have to. At [my company], they don’t get rewarded to do so. The test team carries the burden, and the blame, for quality. There are many factors that play into the subject you have chosen. You are only tackling one facet.

Also, you are not really presenting a fix in your “how to fix” section, but rather pointing out a possible end result of the automation effort.


I really appreciate this sort of feedback as it really helps me to understand where I communicated well and where I did so poorly. That blog can be read as written by someone who was newly “enlightened” and automation was not invited to the renaissance. This was not my intent and not the case. (Aside: I am very nearly at that point when it comes to UI automation… I get a very visceral nauseous feeling lately when I hear of folks investing in this…) When used properly, automation becomes one of the most important tools in a software engineer’s arsenal. That is the crux of it, though. It must be used properly. The point of my story is that I had not done so and it led to some bad outcomes: thoughtlessness and poor code quality. I had done a really great job doing something that the business wanted me to do, but in retrospect, it was not the right way to solve the problem. In fact, perhaps it was solving the wrong problem…

Damned if you do

My eyes really began to be opened about 10 years ago. I had changed teams and become a middle manager on a product I used every day and loved. I quickly learned they had 2 big problems: First, they could not get their Build Verification Tests to pass 100%. I, later, learned that this had been the case for 6 years in a row. This by itself was interesting to me. In my experience, no team kept moving forward when BVT’s failed; they stopped and fixed the problem. When I asked about it, they mentioned they had tried several things, but none of them worked. Second, the test team did not have the payroll they needed to keep up with dev. At was the first wave of Agile Development at Microsoft and this team had decided to experiment with it. Dev believed documentation was overhead and velocity was all that mattered. As a consequence, Dev would move *really* fast and ask for “smoke tests” – testing done by the test team before checkin. When the product still failed BVT’s the next day, they would rally around the need for even deeper smoke testing. I saw a vicious loop and asked to own the solution. My manager readily agreed… The problem had gotten so bad, he was seriously considering banning all automation. He dreamed of the untrained Test Engineers world that dominated Microsoft only a few years earlier. He felt automation killed productivity.

To solve the problem, I first measured it. I learned my teams were spending 50% of their time doing smoke testing and another 20%, fixing automation. I also was able to show that these efforts were not *in any way* helping the BVT’s to pass. The more things failed, the more time they would spend on trying to fix it, but would not. It was depressing. Once I got to the bottom of the problem, it was fairly easy to fix. The hardest part was getting people to let go of sacred principles that they held to be true. Without proof. This team refused to recognize that their automation program, as implemented, was never going to work. In a nutshell, they were stuck in a vicious loop. They had super complex automation running in their simplest suite (no unit testing existed in those days) and they were using it to validate the build. Since they had not pre-validated the individual components, they *always* failed when integration occurred. This high level automation was hard to debug. As a result, the Test team kept on slowly losing more and more resources to maintenance. Bigger than that, the team so overloaded, they did not notice that they were not fixing the problem, but rather making it worse.

Once I realized how much it was costing the project, we did three things: 1) Ban E2E automation in that suite, 2) Limit Smoke requests to 8 hrs per week per feature team, and 3) built a tool for dev to run on their desktop to run the new BVT suite themselves. Once this was fixed, the automation began to work consistently and correctly. The dysfunctional bottleneck was removed from the system.

I would come to believe that I had learned the true point of automation:

To reduce the overall cost of development.

I concluded: Automation that didn’t do this, should be stopped. I would later learn this was wrong.

Damned if you don’t

Years later, I would join another team that had the opposite problem. Their system at that time was “not automatable” (or so I heard over and over). Really what this meant was that it was really hard to do and expensive and no one had created the hooks to make it possible. Because of this, they had a small army of vendor testers that would do manual testing every day. The team (including me) thought this was super expensive, so we looked into starting an automation program (after all, this made it cheaper, right?)

Our constraints:

1) They did a (yet another) different variant of “agile” where they planned out their 2 week sprints based on dev capacity only. As a result, time for automation was often very rare.

2) There were far too few unit tests. As a result, dev “needed” test to work night and day at sprint end to validate the new code in time for sprint end.

3) As I mentioned above, test hooks were missing and/or unstable.

4) The vendor team was only able to keep running the same tests… They did not have the ability to absorb more tests into their runs. As a result, monthly test passes had to be funded by the sprinting testers. This caused a starvation problem for 50% of each month in the sprint teams.

Lack of automation was killing productivity.

My manager and I worked on this over and over and finally came up with a solution. I would take a few of my team and create a new team responsible for curating the automation.

Their goal would be understand and optimize the execution of test cases for the division.

NOTE: this following part is not really needed for this story, but I am including it mostly because I think was a nifty process invention. You can skip ahead to “THE POINT” should you like.

Here’s how we started:

1) The Optimization team started by getting all teams to document and handoff their tests, automated or not. Teams were motivated: a team that handed off their tests would no longer be responsible for running their tests during the monthly test pass.

2) The Optimization team would own these passes instead.

3) The Sprint teams were required to write whatever automation they needed in order to get to done and exit the sprint. This large meant sparse unit tests at best. But enabled the sprint teams to have higher confidence that the code worked as expected each sprint. This by itself was a massive improvement.

4) The Sprint teams were also required to write the test hooks needed for that automation.

5) After the initial handoff, sprint teams were required to handoff again at the end of each sprint.

Once tests were handed off, the Optimization team owned the following work:

1) Establish SLA: Adjusting the priorities on the tests cases into 4 different SLA buckets: Daily, Sprintly, Monthly, Quarterly. (aside: this team shipped every 4-6 months)

2) Drive getting these tests executed using the Vendor team

3) Prune: Length of time ignored was used as to determine the test’s importance. Any test case that had been consistently failing for “too long” (initially set to 3 months) would be moved to an ‘archive’ folder (essentially deleting it) and mail would be sent to the team that owned the relevant area.

4) Categorize and Automate: Go through each test case and categorize by the type of automation problem that test represented. UI? Stress? Backend storage issue? API? Etc. There were eventually around 15-20 categories. They would then automate whole categories based on their ROI. This was considerably more efficient than automating all of the P1’s across all of the categories.

5) Maintenance: Frontline investigation on any test automation failure when the vendor team reported it and either fix the problem or move it to the sprint team’s backlog.

It took a good while to get the priorities right based on business need and the team’s desire/ability to react to a failure, but once we did, we had an efficient model for funding the execution of the manual suite.

Every day the vendor would get a backlog of tests to run: (see fig:1)

  • 2/3rd of the Vendor team’s time would be spent on running the daily tests… all of them.
  • 2/3rd of the remaining time
    would be spent on the sprint tests. A small chunk would be executed each day, so that all would be execute at least once each sprint)
  • 2/3rd of the then remaining time would be spent on monthly tests
  • The rest would be spent on the remaining tests

Fig 1: Capacity allocation plan for test execution

This allocation meant we could predict and control our payroll costs for manual test execution. If the number of tests in a category exceeded its funding level, some other test got demoted. Tests being demoted out of the quarterly runs meant a conversation: 1) test was no longer represented risk that we cared about or 2) more resources were needed on that team.


Once we had done all of this work and socialized it, we were about to reduce the vendor team by almost one half. In addition, the rest of the test team loved us. We had enabled them to focus on their sprint work as well as taken the tiresome test pass off of their shoulders. “WooHoo!” I thought, “Look how we reduced the cost, mitigated the risk, and boosted team morale…” That had saved a TON of payroll money. Greedily, I went to the manager I put in charge of the Optimization team and asked how can we reduce the cost more (we were still 80% or so manual, so I assumed we could use automation to make this super cheap!)

He then pointed out that, in general, for every 1000 test cases we automated or pruned from here on, we would be able to get rid of 1 of these vendors.

“That’s fantastic”, I said, “That doesn’t seem like very many tests to have to automate. Do you know the breakeven point? What’s the max we can pay for the automation in order for it to pay off?”

“$50 per test case per year”, he replied.

“What?!? $50 per test case?!? That’s impossible! That’s essentially 1 hour per test per year. I’m not certain we can even develop the automation at that pace.”

The really great thing was that we had built a system in which it made it easy to see and make the call. Though I am drastically simplifying things for this post, he could show me the math readily… It was all true. Over time, the automation system would improve and its pricetag would lessen, but not to the degree necessary. At the time, this news was shocking. It turned out manual testers were very effective and a lot cheaper than the automated equivalent for our product.

Automation on this team, clearly, was not reducing the cost of development.

Cost savings was not the reason to automate. Automation was a tax.

The morale of the story is that automation’s purpose is not about saving money. It’s about saving time. It’s about accelerating the product to shippable quality.

My colleague, H, is right, of course.  There is “a lot of value in the type of automation systems we have built”. We have built great tools, but any tool can be abused. I believe the fix lies in transparency and measurement. Understanding that the goal is in accelerating the product to the goal, not in accelerating the intellectual laziness of its Dev and Test teams. A dev team that is leveraging the automation system that test built as a safety net might be making choices that are contributing to slower releases and greater expense. Please send these folks to ATDD/TDD classes to start them on a better direction.

Ultimately, it comes down to choices. What do we choose to measure and what do we choose to believe? Automation is a tool; how we use it is a decision.