Tuesday, April 18, 2006

Ready to Ship? – Using Release Readiness Metrics

This is an article from John Fodeh which was published in the first edition of the new EuroSTAR newsletter. This article is an excerpt from John’s 2005 Tutorial in Copenhagen entitled "Establishing an Effective Test Metrics Programme". This tutorial sold out in advance of the conference.

To view this article and all other articles, view the EuroSTAR newsletter here

Why Measure?

A metric is a measure. "Measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way to describe them according to clearly defined rules" (Fenton, Pfleeger). As you know, metrics play a crucial role in the advancement of all sciences. This should also include computer science. Tom DeMarco stated, "You cannot control what you cannot measure". In software development and testing we use metrics to:

* Understand: Gain insight into the applied processes and identify relationships. We use measurements to establish a baseline and goals for future assessment.
* Assess: Determine status with respect to plans and goals, and then monitor progress.
* Predict: Detect bottlenecks, early warnings of problems and determine what tradeoffs can be made.
* Improve: Identify root causes and learn from historical data.
* Communicate: Visualize status and trends, and share the knowledge with the rest of the team.

What Metrics Are Needed

Some organisations select a single or a few isolated "golden" metrics to collect and monitor. However, most metrics are not useful when used separately. For example, the number of defects found during testing has limited value until it is combined with other data such as the severity, type and root cause of the defects, the number of defects fixed, the type and effort of testing, the number of defects found by the end-users, etc.

On the other hand, having too many metrics to collect, analyse and report can be infeasible and problematic. There are numerous aspects in software development and testing that can be measured and it can be tempting to do so. However, spamming your team with tons of metrics is certainly not the answer (and will probably not be appreciated as well).

Instead of spending your time on a multitude of metrics that you or someone else might find useful sometime in the future, your time is much better spent doing more testing. The solution is a suite of key metrics. This set of metrics should be simple and balanced and help you track progress toward your target, i.e. release readiness.In this respect, applying a metric paradigm such as Goal/Question/Metric (GQM) can be very effective. GQM provides a formalised framework for developing a metrics programme, prompting you to think; "what information do I need?" instead of "what is easy to measure?" In other words, to derive your set of metrics you need to know what you need to know.

The following describes some basic metrics that you can use and include in your set of release readiness metrics.

Test progress

Test progress metrics include information concerning the status with respect to the plans, e.g. how many of the planned tests have been run? What is the pass rate of the tests? Throughout development this information will show the current status and help detect bottlenecks and early warnings of trouble, e.g. if we have only completed a fraction of the planned tests and the pass rate is low, is it still possible to finish on time with the available resources? When approaching release this metric will speak for itself; can we release the system with the current test completion status?

Obtained coverage

Coverage metrics hold vital information regarding the thoroughness and completeness of the testing. Coverage can be expressed in terms of requirements, code or risks and provide the means for quantifying the portion of the requirements/code/risks that is exercised by the applied testing. A low value reveals an insufficient test effort and the risk of potential latent defects. This metric is typically used in conjunction with the test progress metrics and can reveal details not seen from the progress data. An example could be the situations where many tests have been completed while critical requirements or high risk areas remain untested.

System quality factors

Data about system quality factors contain information on different aspects of the product, such as functionality, performance, reliability, installability and conformance to standards (you might also consider using ISO 9126). I prefer to show the test completeness of these quality factors combined with information about the defect density. In this way I can monitor progress in the different areas and detect if some deserve special attention.

Found defects

Defect metrics are usually collected by applying the appropriate queries in the defect reports database. The metrics typically include the total number of defects reported and categorised in open and closed (fixed and verified) defects, sorted according to their severity.

This data delivers a snap shot of the system state at release time, making it possible to take into account the risk and consequence of releasing the system e.g. If the data reveals a large number of open high-severity or non-verified defects, then it clearly shows that releasing the system at this moment is a high-risk decision. By depicting the found defects over time (or test effort) it is possible to create a defect trend. A defect trend is a graph showing the accumulated number of reported defects as a function of test effort (expressed in terms of test days).

This graph is typically S-shaped. When testing is started, the defect-finding rate is low, as the functionality of the software is restricted to few areas and because showstoppers might prevent testing in some areas. The defect-finding rate increases with the addition of new functionality and the correction of already found defects. As the software matures, the defect-finding rate starts decreasing, as it becomes harder to find new defects. Ultimately, the graph flattens. Finding further defects at this stage requires a huge test effort and shows that the software is possibly ready for release (or that the limitation of the applied testing technique has been reached).

Monitoring the status on the S-curve helps to determine when to stop testing, i.e. is the curve starting to flatten?
It is even possible to extrapolate the graph, providing a predictive evolution of the defect-finding rate and a means of estimating the number of unknown defects.

Customer feedback

User feedback during development is of major importance. During testing we normally verify that the system performs as specified, i.e. "are we building the system right?" The user evaluation data helps you answer the question: "are we building the right system?"


Successful test management involves making complex decisions. These decisions need to be based on solid, quantitative data and well-calculated risk. Using a simple set of metrics you can get a snapshot of the system state and quality that is useful throughout the entire development process. However, in the closing stages, possessing the right metrics has a tremendous value, in particular when you need to find the proper timing for release.

To view the EuroSTAR Newletter, click here

No comments: