Shortly after I wrote Forgive me, Father, for I have sinned, I received the following email from a colleague of mine:
I read your most recent blog. Your blog is actually dangerously close to sinning as well. In principle I agree with your sentiment, but be aware of violent pendulum swings. There is still a lot of value in the type of automation systems we have built, but it has to be tempered with a self-enforcing quality of quality, and quality of developer code measures. Good test teams actually do enable bad developer behavior. We become like a cheap compiler. Test will catch any issues, and quickly too. Developers are perfectly capable of writing solid (not bug free) code. They are just not always incentivized to do so. With a good test team, they don’t have to. At [my company], they don’t get rewarded to do so. The test team carries the burden, and the blame, for quality. There are many factors that play into the subject you have chosen. You are only tackling one facet.
Also, you are not really presenting a fix in your “how to fix” section, but rather pointing out a possible end result of the automation effort.
I really appreciate this sort of feedback as it really helps me to understand where I communicated well and where I did so poorly. That blog can be read as written by someone who was newly “enlightened” and automation was not invited to the renaissance. This was not my intent and not the case. (Aside: I am very nearly at that point when it comes to UI automation… I get a very visceral nauseous feeling lately when I hear of folks investing in this…) When used properly, automation becomes one of the most important tools in a software engineer’s arsenal. That is the crux of it, though. It must be used properly. The point of my story is that I had not done so and it led to some bad outcomes: thoughtlessness and poor code quality. I had done a really great job doing something that the business wanted me to do, but in retrospect, it was not the right way to solve the problem. In fact, perhaps it was solving the wrong problem…
Damned if you do
My eyes really began to be opened about 10 years ago. I had changed teams and become a middle manager on a product I used every day and loved. I quickly learned they had 2 big problems: First, they could not get their Build Verification Tests to pass 100%. I, later, learned that this had been the case for 6 years in a row. This by itself was interesting to me. In my experience, no team kept moving forward when BVT’s failed; they stopped and fixed the problem. When I asked about it, they mentioned they had tried several things, but none of them worked. Second, the test team did not have the payroll they needed to keep up with dev. At was the first wave of Agile Development at Microsoft and this team had decided to experiment with it. Dev believed documentation was overhead and velocity was all that mattered. As a consequence, Dev would move *really* fast and ask for “smoke tests” – testing done by the test team before checkin. When the product still failed BVT’s the next day, they would rally around the need for even deeper smoke testing. I saw a vicious loop and asked to own the solution. My manager readily agreed… The problem had gotten so bad, he was seriously considering banning all automation. He dreamed of the untrained Test Engineers world that dominated Microsoft only a few years earlier. He felt automation killed productivity.
To solve the problem, I first measured it. I learned my teams were spending 50% of their time doing smoke testing and another 20%, fixing automation. I also was able to show that these efforts were not *in any way* helping the BVT’s to pass. The more things failed, the more time they would spend on trying to fix it, but would not. It was depressing. Once I got to the bottom of the problem, it was fairly easy to fix. The hardest part was getting people to let go of sacred principles that they held to be true. Without proof. This team refused to recognize that their automation program, as implemented, was never going to work. In a nutshell, they were stuck in a vicious loop. They had super complex automation running in their simplest suite (no unit testing existed in those days) and they were using it to validate the build. Since they had not pre-validated the individual components, they *always* failed when integration occurred. This high level automation was hard to debug. As a result, the Test team kept on slowly losing more and more resources to maintenance. Bigger than that, the team so overloaded, they did not notice that they were not fixing the problem, but rather making it worse.
Once I realized how much it was costing the project, we did three things: 1) Ban E2E automation in that suite, 2) Limit Smoke requests to 8 hrs per week per feature team, and 3) built a tool for dev to run on their desktop to run the new BVT suite themselves. Once this was fixed, the automation began to work consistently and correctly. The dysfunctional bottleneck was removed from the system.
I would come to believe that I had learned the true point of automation:
To reduce the overall cost of development.
I concluded: Automation that didn’t do this, should be stopped. I would later learn this was wrong.
Damned if you don’t
Years later, I would join another team that had the opposite problem. Their system at that time was “not automatable” (or so I heard over and over). Really what this meant was that it was really hard to do and expensive and no one had created the hooks to make it possible. Because of this, they had a small army of vendor testers that would do manual testing every day. The team (including me) thought this was super expensive, so we looked into starting an automation program (after all, this made it cheaper, right?)
1) They did a (yet another) different variant of “agile” where they planned out their 2 week sprints based on dev capacity only. As a result, time for automation was often very rare.
2) There were far too few unit tests. As a result, dev “needed” test to work night and day at sprint end to validate the new code in time for sprint end.
3) As I mentioned above, test hooks were missing and/or unstable.
4) The vendor team was only able to keep running the same tests… They did not have the ability to absorb more tests into their runs. As a result, monthly test passes had to be funded by the sprinting testers. This caused a starvation problem for 50% of each month in the sprint teams.
Lack of automation was killing productivity.
My manager and I worked on this over and over and finally came up with a solution. I would take a few of my team and create a new team responsible for curating the automation.
Their goal would be understand and optimize the execution of test cases for the division.
NOTE: this following part is not really needed for this story, but I am including it mostly because I think was a nifty process invention. You can skip ahead to “THE POINT” should you like.
Here’s how we started:
1) The Optimization team started by getting all teams to document and handoff their tests, automated or not. Teams were motivated: a team that handed off their tests would no longer be responsible for running their tests during the monthly test pass.
2) The Optimization team would own these passes instead.
3) The Sprint teams were required to write whatever automation they needed in order to get to done and exit the sprint. This large meant sparse unit tests at best. But enabled the sprint teams to have higher confidence that the code worked as expected each sprint. This by itself was a massive improvement.
4) The Sprint teams were also required to write the test hooks needed for that automation.
5) After the initial handoff, sprint teams were required to handoff again at the end of each sprint.
Once tests were handed off, the Optimization team owned the following work:
1) Establish SLA: Adjusting the priorities on the tests cases into 4 different SLA buckets: Daily, Sprintly, Monthly, Quarterly. (aside: this team shipped every 4-6 months)
2) Drive getting these tests executed using the Vendor team
3) Prune: Length of time ignored was used as to determine the test’s importance. Any test case that had been consistently failing for “too long” (initially set to 3 months) would be moved to an ‘archive’ folder (essentially deleting it) and mail would be sent to the team that owned the relevant area.
4) Categorize and Automate: Go through each test case and categorize by the type of automation problem that test represented. UI? Stress? Backend storage issue? API? Etc. There were eventually around 15-20 categories. They would then automate whole categories based on their ROI. This was considerably more efficient than automating all of the P1′s across all of the categories.
5) Maintenance: Frontline investigation on any test automation failure when the vendor team reported it and either fix the problem or move it to the sprint team’s backlog.
It took a good while to get the priorities right based on business need and the team’s desire/ability to react to a failure, but once we did, we had an efficient model for funding the execution of the manual suite.
Every day the vendor would get a backlog of tests to run: (see fig:1)
- 2/3rd of the Vendor team’s time would be spent on running the daily tests… all of them.
- 2/3rd of the remaining time
would be spent on the sprint tests. A small chunk would be executed each day, so that all would be execute at least once each sprint)
- 2/3rd of the then remaining time would be spent on monthly tests
- The rest would be spent on the remaining tests
Fig 1: Capacity allocation plan for test execution
This allocation meant we could predict and control our payroll costs for manual test execution. If the number of tests in a category exceeded its funding level, some other test got demoted. Tests being demoted out of the quarterly runs meant a conversation: 1) test was no longer represented risk that we cared about or 2) more resources were needed on that team.
Once we had done all of this work and socialized it, we were about to reduce the vendor team by almost one half. In addition, the rest of the test team loved us. We had enabled them to focus on their sprint work as well as taken the tiresome test pass off of their shoulders. “WooHoo!” I thought, “Look how we reduced the cost, mitigated the risk, and boosted team morale…” That had saved a TON of payroll money. Greedily, I went to the manager I put in charge of the Optimization team and asked how can we reduce the cost more (we were still 80% or so manual, so I assumed we could use automation to make this super cheap!)
He then pointed out that, in general, for every 1000 test cases we automated or pruned from here on, we would be able to get rid of 1 of these vendors.
“That’s fantastic”, I said, “That doesn’t seem like very many tests to have to automate. Do you know the breakeven point? What’s the max we can pay for the automation in order for it to pay off?”
“$50 per test case per year”, he replied.
“What?!? $50 per test case?!? That’s impossible! That’s essentially 1 hour per test per year. I’m not certain we can even develop the automation at that pace.”
The really great thing was that we had built a system in which it made it easy to see and make the call. Though I am drastically simplifying things for this post, he could show me the math readily… It was all true. Over time, the automation system would improve and its pricetag would lessen, but not to the degree necessary. At the time, this news was shocking. It turned out manual testers were very effective and a lot cheaper than the automated equivalent for our product.
Automation on this team, clearly, was not reducing the cost of development.
Cost savings was not the reason to automate. Automation was a tax.
The morale of the story is that automation’s purpose is not about saving money. It’s about saving time. It’s about accelerating the product to shippable quality.
My colleague, H, is right, of course. There is “a lot of value in the type of automation systems we have built”. We have built great tools, but any tool can be abused. I believe the fix lies in transparency and measurement. Understanding that the goal is in accelerating the product to the goal, not in accelerating the intellectual laziness of its Dev and Test teams. A dev team that is leveraging the automation system that test built as a safety net might be making choices that are contributing to slower releases and greater expense. Please send these folks to ATDD/TDD classes to start them on a better direction.
Ultimately, it comes down to choices. What do we choose to measure and what do we choose to believe? Automation is a tool; how we use it is a decision.