13 Automated Testing, the Achilles Heel of Dev Ops

13 Automated Testing, The Achilles Heel of DevOps #

Hello, I am Shi Xuefeng.

In ancient Greek mythology, the god of war Achilles was incredibly brave and invincible, except for his heel, which was his fatal weakness. During the Trojan War, he was shot in the heel with an arrow and fell to the ground, ultimately leading to his death. Since then, “Achilles’ heel” has been used to describe a fatal flaw. Today, I want to talk to you about automated testing, which is the Achilles’ heel of DevOps.

During my visits to many companies, I found that in the field of engineering practices, such as configuration management and continuous integration, they are doing quite well. However, there are two major problems: development metrics and automated testing.

No one denies the value of automated testing, and many companies are practicing it to varying degrees. However, overall, the implementation of automated testing is generally not systematic, with most teams focusing on individual tools. Additionally, teams have doubts about the real effects of automated testing. Without addressing these issues, it is difficult to break through the ceiling of DevOps implementation.

So, what problems does automated testing actually solve, and which business situations and testing scenarios is it suitable for? How can we progressively build and measure the effects correctly to avoid pitfalls? These are the key topics I want to share with you in this lecture.

What problems does automated testing solve? #

The improvement in product delivery speed has posed great challenges to testing work. On one hand, testing time is constantly compressed, with what used to take three days now needing to be completed in a single day. On the other hand, the changes in requirements have also brought a lot of uncertainty to the execution of testing work. The core issue behind this is that the accumulation of business functions leads to continuously expanding test scope, which contradicts the compression of testing duration. In simple terms, the content that needs to be tested is increasing, but the time available for testing is decreasing.

Comprehensive testing can bring relatively better quality, but the investment of time and manpower is also enormous. However, rapid iterative delivery means taking on a certain degree of risk. So, should we prioritize speed or quality? This is a difficult question to answer.

Therefore, if we want to improve testing efficiency, it is natural to think of automation. In fact, automated testing is applicable to the following typical scenarios:

A large number of mechanical and repetitive operations that need to be repeatedly executed, such as batch regression testing.
Scenarios with clear design specifications and relative stability, such as interface testing.
Compatibility testing across a large scale and multiple platforms, covering multiple versions and device models. It’s acceptable to cover dozens of device models, but if it extends to hundreds or thousands, automation is the only solution.
Tests that need to be executed continuously for a long time, such as stress testing, usability testing, etc.

These typical scenarios often have several characteristics: clear design, stable functionality, repeatable execution, and long-term large-scale execution. The core purpose is to solve the problem of testing costs by using automation, which means solving the issue of human resources. However, this does not mean that manual testing has no value. On the contrary, when people are freed from repetitive work, they can be engaged in more valuable testing activities, such as exploratory testing, usability testing, user acceptance testing, and so on - all of which fall within the scope of manual testing.

This sounds reasonable, but why do many companies still prefer manual testing? In reality, not all testing activities are suitable for automation, and there are also some challenges in automated testing construction.

Cost-effectiveness: Many requirements are only deployed once (e.g., promotional activities). Implementing automated testing is much more costly than manual testing, and it will not be reused in the future, which makes it clearly not worth the investment.
Learning curve: Automated testing relies on code implementation, and developing a configurable testing framework and platform requires significant architectural design and coding capabilities. However, the coding skills of testing personnel are generally relatively weak.
High maintenance costs: Whether it’s the testing environment, test cases, or test data, adjustments need to be made continuously with changes in requirements. Otherwise, outdated automated tests can easily result in execution failures.
High investment in testing devices: For example, mobile app testing requires a large number of mobile resources. It is unrealistic to cover all device models and operating system versions. Moreover, the limited machines are often taken by testers for local debugging, exacerbating the situation where there are no available resources for online testing.

Design of Automated Testing #

It seems that automated testing is not a universal solution, and we cannot expect all tests to be automated. Only in the appropriate domain can automated testing provide the maximum value. So you may ask, in the face of so many types of tests, where should we start the development of automated testing?

First, let me introduce you to the classic testing pyramid. This model describes the progressive testing process from unit testing, integration testing to UI testing. The closer to the bottom, the faster the execution speed of the test cases and the lower the maintenance cost. In contrast, at the top UI layer, the execution speed is slower than unit testing and interface testing, faster than manual testing, but the corresponding maintenance cost is much higher than unit testing and interface testing.

Image source: “DevOps Handbook”

In this case, starting with unit testing, which is closer to the bottom, is a relatively high cost-effective option. However, in practice, the execution of unit testing varies from company to company. Some companies can achieve 80% coverage, while others struggle to make progress. After all, unit testing is mostly driven by the development team, and the attitude of the development team determines the effectiveness. But it is undeniable that unit testing is still very necessary, especially for core services, such as coverage of core transaction modules. Of course, good unit testing requires a lot of effort from the development team.

For the UI layer, the execution speed and maintenance cost go to the other extreme. However, this does not mean that there is no need to invest in UI automation development. The UI layer is the only way to simulate real user operation scenarios in end-to-end testing. A button on a webpage may trigger dozens of function calls internally, which is different from unit testing that only checks the logic of one function at a time. UI testing focuses more on the cascading logic after module integration and is the most effective means of integration testing.

In addition, many testers start with UI automation, coupled with mature testing tools and frameworks. Implementing UI automation does not depend on source code, which makes it an easy automation approach to get started with. In practical applications, UI automation can help us save manual testing costs and improve the efficiency of functional testing. However, its disadvantages are also apparent: with the increasing speed of agile iterations, frequent changes to UI controls will lead to unstable control positioning and increase the maintenance cost of test scripts.

Considering the cost-effectiveness and the difficulty of getting started, interface testing in the middle layer becomes a good choice. On the one hand, modern software architecture, regardless of whether it is a layered or service call pattern, has greatly increased the dependence on interfaces. For example, in typical front-end and back-end separation development patterns, both ends are developed and debugged around interfaces. On the other hand, compared with unit testing, interface testing involves more complete business logic and clear interface definitions, making it suitable for automated execution.

For this reason, for web-based applications, I recommend the oval-shaped model, which focuses on API interface testing in the middle layer, with unit testing and UI testing as supplementary. You can refer to the layered automation testing model chart.

Development of Automated Testing #

Effective automated testing relies on support from tools and platforms. Taking interface testing as an example, it used to be executed using tools like cURL, Postman, JMeter, etc. However, for a successful interface test, apart from being able to make service requests, it also requires pre-test data preparation and post-test result verification. For actual business needs, it is not only necessary to execute single interface tests, but also complex multi-interface tests with logic, which relies on the ability to orchestrate the calling of interfaces, and even built-in Mock services.

Furthermore, a mature automated testing framework should have functions such as managing test data, test cases, and scripts, collecting, measuring, analyzing and displaying data during testing, and sending test reports.

For UI automation testing, the most troublesome issue is the maintenance cost of test cases after UI control changes. The solution is to decouple the control and its location method by obtaining the control and its location method at the operation layer, which relies on the design and implementation of the framework. In actual control operations, you can call the control by using custom names, which are defined in the configuration files related to the controls. At the specific operation level, it can be handled through a proxy layer below the operation layer. The sample code is as follows:

public void searchItem(String id) {
  getTextBox("SearchBar").clearText();
  getTextBox("SearchBar").setText(id);
  getButton("Search").click();
}

In the code example, the search bar control is defined as “SearchBar”. By calling the getTextBox method of the proxy layer, a text box type object is obtained, and its clear method is called. Then, in the corresponding control’s configuration file, the custom name and control’s location method are added.

In this way, even if the control changes, for the actual operation layer code, because custom names are used, you don’t need to modify the logic, just replace the control’s location method in the corresponding control’s configuration file. Regarding the specific control’s configuration file, the sample code is as follows:

<TextBox comment="Homepage search bar" id="SearchBar">
	<iOS>
    <appium>
      <dependMethod methodName="findElementByXPath">
      	<xpath>
          //XCUIElementTypeNavigatorBar[@name="MainPageView"]/XCUIElementTypeOther/...
        </xpath>
      </dependMethod>
    </appium>
  </iOS>
</TextBox>

Of course, in order to simplify the writing cost of test cases for testers, you can use the Page-Object pattern at the operation layer, encapsulate operating methods for pages or modules, and achieve specific functional operations in a way that fits cognitive recognition. This way, when actually writing test cases, you can easily call the interface definitions of the operation layer. The sample code is as follows:

@TestDriver(driverClass = AppiumDriver.class)
public void TC001() {
	String id='10000'
  page.main.switchView(3);
  page.cart.clearShoppingCart();
  page.main.switchView(0);
  page.search.searchProduct(id);
  page.infolist.selectlist(0);
  page.infodetail.clickAddCart();
  Assert.assertTrue(page.cart.isProductCartExist(), "Product added successfully")
}

From these examples, we can see that a good automated testing framework can significantly reduce the threshold for testers to write test cases and the maintenance cost of test cases. For a mature platform, usability is a very important ability. By using the DSL approach to declare the testing process, testers can focus on the design and construction of the testing business logic, greatly improving the implementation efficiency of automated testing.

Regarding the capability model of automated testing frameworks, I will share a document with you. You can click on the cloud drive to access it, and the extraction code is gk9w. This capability model showcases the capabilities that the framework should have at various stages of framework development, including test script encapsulation, test data decoupling, test flow orchestration, and report generation.

Analysis of Automated Testing Results #

So, how do we measure the results of automated testing? The current commonly used approach is coverage, but does increasing test coverage really help to uncover more defects?

A large financial company has achieved 80% unit test coverage and 100% interface coverage, which is quite good for their automated testing. However, I was surprised when I asked about the proportion of issues discovered by automated testing compared to the overall number of issues. Despite the high coverage, they found that only about 5% of the issues were discovered by automated testing. Does this mean that the return on investment for automated testing is low, considering the amount of effort put into it?

In reality, thinking that automated testing is meant to uncover more defects is a typical misconception. In actual projects, manual testing identifies a much larger number of defects than automated testing. Automated testing is more about helping to maintain the baseline of software quality, especially when applied to regression testing. It ensures that existing functionality works properly and is not compromised by the introduction of new features. It can be said that if automated testing coverage is high enough, then the software quality will definitely not be compromised.

In the field of automated testing, in addition to pursuing coverage as a metric, it is also important to focus on the analysis of automated testing results. If the results of automated testing are not accurate and even lead to a large number of false positives, it becomes a distraction for the team. Test false positives refer to cases where automated test cases fail due to non-development code changes. The common industry definition for false positive rate is:

Automated test false positive rate = number of non-development change-introduced issues / number of failed test cases

For example, if 100 test cases were executed in a single automated test run, and 20 of them failed, with 5 of the failures being caused by the current functionality or code changes, meaning they are real defects, then the false positive rate would be: (20 - 5) / 20 = 75%.

The false positive rate is a key indicator of the stability of automated testing. The reasons for false positives can vary for different types of tests and product forms. For example, they can be caused by unstable network connections in the test environment, inherent defects in the test scripts or tools, incomplete test data, unavailable test resources, and so on.

Due to the existence of false positives, even with the execution of automated testing and the provision of test results, manual review and judgment are still necessary to report real issues to the defect tracking system. With this added manual processing at the end of automated execution, it becomes difficult to scale up automated testing, which is one of the reasons why automated testing is somewhat “a mixed bag”.

So, how can we solve this problem? It relies on the analysis of automated testing results.

Classify the issues identified by automated testing. You need to determine whether a failure is due to environmental issues, network issues, functional changes, or system defects. You need to categorize the failed test cases accordingly. When there are a large number of issues in a category, you can consider further subdividing it. For example, network issues can be subdivided into network unreachable, delay timeout, domain name resolution errors, etc.
Improve the automatic identification capability of the existing categories. For example, for captured common exceptions, you can automatically report them to the corresponding error category based on the exception information, thus reducing the workload of manual identification and categorization of errors.
Improve the robustness of automated testing tools and environments, and add some retry mechanisms for known issues.
Continuously accumulate and enrich error categories, and make targeted improvements to continuously improve the stability of automated testing.

Let me share with you a diagram illustrating the analysis of automated testing results from a certain company. By categorizing the errors, you can see the distribution of errors and optimize them for common false positive types, and establish metrics to track long-term results, ensuring the overall reliability of automated testing results. These efforts require long-term investment to see results, which is also the path to maximizing the value of automated testing and improving team capabilities.

Summary #

To sum up, in this lecture I introduced you to four aspects of automated testing, including the problems and applicable scenarios that automated testing aims to address, the implementation path, the typical approach to framework and tool development, and the key points of result analysis. I hope this helps you establish a comprehensive understanding of the challenging issue of automated testing and provides you with a clear direction when promoting the development of automated testing capabilities.

Thought-provoking Questions #

What challenges and problems did your company encounter during the automation construction process, and how did you solve them?

Please feel free to write your thoughts and answers in the comments section. Let’s discuss together and learn from each other. If you find this article helpful, please share it with your friends.