A/B Testing Podcast goes live

So one day, a colleague and friend, Alan Page,  said “Hey, Brent, why don’t we start a podcast?”.   After much discourse (Alan pushing me) and challenges (me finding the time), I am happy to announce that I suck at it, but am doing it anyways.  I am very thankful to have an old pro who is letting me tag along for the ride. Anyways, if you’ve got a 30 min to kill and want to hear some more from a couple of guys who are trying to help lead change, please check it out. AB, in this context, stands for Alan/Brent and in “Episode 1” we explore Testing vs. Quality as well as recent changes happening at Microsoft.  Enjoy.  Feedback is always welcome.   We are likely to keep going until someone tells us that the pain is unbearable.  🙂

Download:AB Testing – “Episode” 1

Or play it now:

Want to subscribe to the podcast? Here’s the RSS feed.

In Pursuit of Quality: Shifting the Tester mindset

Last time, I wrote a book review on Lean Analytics. Towards the end of that post, I lamented that I see a lot of testers in my neck of the woods trying to map their old way of thinking into what’s coming next. Several folks (Individual contributors and managers of same) have come to me wondering why should test move into this world of “data crap” and why is how they have been previously operating so wrong now. It is my hope today to explain this out.

But first, before continuing, I’d like to try something new and offer you a poll to take.

Please consider the following:

So which did you pick? Over time, it will be interesting to me to track how people are viewing this simple comparison. I have been doing this example for almost a year now. When I first started it, about 1 in 2 testers polled would select the bug-free code. Whereas with testers I talk to lately, about 1 in 3 will select it. I definitely view this as a good sign and that folks are starting to reflect on these changes and adapting. My ideal world is that 1 year from now the ratio is closer to 1 in 10.

Why is this poll so hard for folks?

Primarily, it is due to our training. Test was the last line of defense – a safety net – needed to assure we didn’t do a recall when we released product out to manufacturing. When I first started in the software development world, 1.44 floppy disks were the prevailing way customers installed new software on to their system. Windows NT 3.1, as example, required 22 of them. It was horrible. Installation of a new machine would take the better part of the day, disks would be asked for out of order, and lastly, people would often get to the end of the install to discover that a setting they were asked for at the very beginning was wrong and that it was easier to just redo install than to hunt through the manual to figure out how to fix it after the install.

Customers who got their system up and running successfully and found a major bug afterwards would be quite sore with us. Thankfully, I have not heard of this is quite some time, but back then, Microsoft had the reputation of shipping quality in version 3.0. There was a strong and successful push within the company to get our testers trained with a singular mission: find the bugs before our customers do and push to get them fixed. I was proud to state back then that Microsoft was the best in the world at doing this.

The problem I am attempting to address is the perceived value loss in Test’s innate ability to prevent bugs from hitting the customer. A couple of months ago I presented to a group of testers and one of the questions asked “All of this reacting to customer stuff is great, but how can we prevent bugs in the first place?” Thankfully, someone else answered that question more helpfully as my initial response would’ve been “Stop trying to do so“.

The core of the issue, imo, is that we have continued to view our efforts as statically valuable. That our efforts to find bugs up front (assuring code correctness) will always be highly regarded. Unfortunately, we neglected to notice that the world was changing. That, in fact, it was more dynamic: Our need to get correctness right before shipping was actually tied to another variable: Our ability to react to bugs found by customers after shipping. The longer the time it takes us to react, the more we need to prevent correctness issues.

“Quality redefinition” – from correctness to customer value

A couple of years ago, I wrote a blog, Quality is a 4 letter word. Unfortunately, it seems that I wrote it well before it’s time. I have received feedback recently from folks stating that series of posts were quite helpful to them now. One such person had read it then and had a violent allergic reaction to the post:

“Brent, you can’t redefine quality”.

“I’m not!”, I replied, “We’ve *always* had it wrong! But up until now, it’s been okay. Now we need to journey in a different direction.”

While I now refer to the 4 pillars of Quality differently, their essence remains the same. I encourage you to read that post.

The wholeness of Quality should now be evaluated on 4 fronts:

  • Features that customers use to create value
  • The correctness of those features
  • The extent to which those features feel finished/polished
  • The context in which those features should be used for maximum value.

Certainly, correctness is an important aspect of quality, but usage is a significantly greater one. If you take anything away from today’s post, please take this:

Fixing correctness issues on a piece of code that no one is using is a waste of time & resources.

We need to change

In today’s world, with services lighting up left and right, we need to shift to a model that allows us to identify and improve Quality faster. This is a market differentiator.

It is my belief that in the short term, the best way to do this is to focus on the following strategy:

    • Pre-production
      • Train your testers to rewrite their automation such that Pass/Fail is not determined by the automation, but rather, leveraging the instrumentation and data exhaust outputted by the system. Automation becomes a user simulator, but testers grow muscle in using product logs to evaluate the truth. This set of measurements can be directly applied to production traffic when the code ships live.
      • Train your testers to be comfortable with tweaking and adding instrumentation to enable measurement of the above.
    • Next, move to Post-production
      • Leverage their Correctness skillset and their new measurement muscle to learn to understand the system behavior under actual usage load
      • This is an evaluation of QoS, Quality of Service. What you want Testers learning is what & why the system does what it does under prodtraffic.
      • You can start here in order to grow their muscle in statistical analysis.
    • Then, focus their attention on Customer Behavior
      • Teach them to look for patterns in the data that show:
        • Places in the code where customers are trying to achieve some goal but encountering pain (errors, crashes, etc) or friction (latency issues, convoluted paths to goal, etc). This is very easy to find generally.
        • Places in the code where customers are succeeding in achieving goal and are walking away delighted. These are patterns that create entertainment or freedom for the customer. Unlike the above, this is much harder to find, it will require hypothesis testing, flighting, and experimentation, but are significantly more valuable to the business at hand.
      • Being stronger in stats muscle will be key here. Since Quality is a subjective point of view, this will force Test away from a world of absolutes (pass/fail) and into one of probabilities (likelihood of adding value to customers vs. not). Definitely, it is wise to befriend your local Data Scientist and get them to share the magic. This will help you and your team to scale sustainably.
      • This is an evaluation of QoE, Quality of Experience. What you want Testers learning is what & why the Customers do what they do
    • You will then want to form up a dynamic set of metrics and KPI’s that capture the up-to-date learnings and help the organization quickly operationalize their goals of taking action towards adding customer value. This will generate Quality!

Lastly, while executing on these mindshifts, it will be paramount to remain balanced. The message of this blog is NOT that we should stop preventing bugs (despite my visceral response above). Bugs, in my world view, fall into 2 camps: Catastrophes and Other. In order to have Quality high, it is critical that we continue to work to prevent Catastrophe class bugs from hitting our customers. At the same time, we need to build infrastructure that will enable us to react very quickly.

I simply ask you to consider that:

    As the speed to which we can react to our customers INCREASES, the number of equivalence classes of bugs that fall into Catastrophe class DECREASES. Sacrificing speed to delivery, in the name of quality, makes delivering actual Quality so much harder. Usage, now, defines Quality better than correctness.

Ken Johnston: a friend, former manager, and co-author of the book “How we test at Microsoft” recently published a blog on something he calls “MVQ”.    Ken is still quite active on the Test Conference scene (one next month), but if you ever get the chance to ask, ask him “If he were to starting writing the Second Edition of his book, how much of the content would still be important?“. His response is quite interesting, but I’ll not steal that thunder here. J

Here’s a graphic from his post for your consideration. I think it presents a very nice balance:

Thank you for reading.