The Library of Software Testing

Pavankumar Pothuraju's weblog
posts - 40, comments - 45, trackbacks - 32

My Links

News

Article Categories

Archives

Post Categories

Groups

Other Blogs

Pioneer’s

Resources

Coverage and Effective Testing

Coverage and Effective Testing

There should be an emphasis between the act of evaluating tests and evaluating testing. To take an example, evaluating one test case is generally extremely easy. The product (or that area of the product) either passes or fails the test case. But how do we know that the test cases we ran satisfied the overall purpose of the requirements? In other words, how do we say that the test case (or test cases) demonstrated requirements, found faults/failures, and exercised code? To evaluate how well the test cases fulfilled this tripartite purpose, we need a measure of overall test quality. But what does that mean? Well, in general, if we want to measure our test quality we include three elements in our measure: requirements coverage, failure coverage, and code coverage. Notice I said we "want" to do this. I probably should not say that because it seems that this is not what is often being done. To wit, it is the latter two that often get left off by many testers, even though they consider what they are doing "testing". I would disagree, at least if one starts speaking to how effectively they are testing. Even requirements coverage is often done poorly which is odd because it is usually the easiest of the three. So let us consider the easiest first and, after that, the two that are less often done.

Requirements Coverage
Generally an effective tester will count how many requirements must be validated. Call this VR (Validated Requirements). Then an effective tester will attempt to trace each requirement to all test cases that exercise that particular requirement. After that, effective testers count all of the test cases that have passed. When all test cases that exercise one specific requirement pass, that requirement is said to have passed. The count of all passed requirements can be called RQ (Requirements Passed). As a final action, and to report on requirements coverage, effective testers calculate a requirements coverage factor by dividing RQ by VR (so that 1.0 means that tests for all requirements have passed). This is an objective measurement of how thoroughly requirements are demonstrated. (Caveat: One would and should expand this metric to include input domain and output domain coverage because while the above tells you the thoroughness of testing, it does not necessarily speak to its actual comprehensiveness.)

Fault/Defect Coverage
Now what about this question: how well did your test cases find failures? Many testers count failures. That is an easy task, however, and a truly effective tester should try to estimate (via prediction or forecasting methods) how many defects (i.e., the causes of failures) were in the product before testing. Why is that a good thing? Think about it and this is a crucial point: When testers do not know how many defects were present to begin with, they cannot make an accurate assessment of how well they have probed for defects or failures. This is such a crucial point that it often surprises me that there is ever debate about this. (Debate about how to do it is one thing and is healthy. Debate about whether it should even be done is another thing entirely, and usually is counterproductive.) By a simple extension of this logic, testers could also not know how many defects or failures were (or might have been) left in the product to be passed on to the end-users. If testers cannot predict how many failures customers might potentially find, testers simply cannot evaluate quality in terms of failure or defect coverage. The point is simple: if you did not have estimates of how many faults/failures you might find in a given product, you cannot fully determine the reliability of the test effort because you are lacking crucial information to compare your information with.

Code Coverage
Code coverage is also important but is another thing that most testers that I encounter think of incorrectly. They tend to think of "code coverage" as being synonymous with "code testing". That could not be more incorrect. Code testing is all about creating test cases that exercise the code. Code coverage is all about evaluating test cases. There is an entirely different focus shift there. So the idea is (hopefully) to write test cases from requirements specifications. When you run the requirement-based test cases against actual code (whether at a unit or system level), you are then supposed to make measurements to determine which statements and branches have been exercised (covered). That is where coverage comes in.

Evaluating your tests is a lot different than evaluating your overall testing. I have found that test professionals tend to concentrate more on the former than on the latter. I think fault and code coverage (by their correct definitions) are often the thing that is most overlooked in effective testing practices.

Test Coverage - Defect Coverage

The use of test coverage measures (like block coverage, as just one example) to control the software test process has become an increasingly common practice. This is often justified by the assumption that higher test coverage helps achieve higher defect coverage and therefore improves software quality. That is, however, an assumption. In practice, the extant data often seems to indicate that defect coverage and test coverage grow over time, as additional testing is performed. At this point, however, it seems to be unclear as to whether this concurrent growth of both coverages (defect and test) is due to some sort of inherent causal dependency or it is simply due to the cumulative nature of both measures.

To my way of thinking, being able to answer something like this is somewhat important because it could point the way towards helping us derive a procedure that determines whether a given test coverage measure should be monitored for quality control and used to drive a testing effort. The idea here would be to come up with a procedure to see if any given test coverage criterion has a genuine additional impact on defect coverage when compared to the impact of just running additional test cases. The idea would be to see if the results of this process support the idea of a causal dependency of any sort. The idea for this study as a whole is simply that a great deal of research is always directed towards development of new, improved test methods. It has long been known that one way to better control testing - and to thus improve test resource allocation - is to measure estimators (referred to as test coverage) of the percentage of defects detected during testing (referred to as defect coverage). This has led to the construction of defect coverage models based on test coverage measures. Again, however, the basic assumption is made that there is some (ostensibly significant) causal effect between test coverage and defect coverage. As some commentators have pointed out, since both test coverage and defect coverage increase with test intensity or time, it is not really all that surprising that the data usually show a relationship. It is faulty logic, however, to jump to the conclusion that additional test coverage actually drives the detection of new defects. So, in light of the procedure being sought that I mentioned above, one thing to ask is how we could potentially test whether a given test coverage measurement, or several of them combined, are actually having a significant impact on defect coverage. (This speaks very much to determining how effectively we are testing.)

The existing hypothesis that has been put forth is that test coverage leads to defect coverage. And, of course, relative to this type of discussion, test coverage is measured as the percentage of constructs - as defined by the coverage criterion being used - that have been executed at least once during testing. (So, for example, in statement coverage, the coverage criterion is statements or lines of code.) It is important to study this, however, because some researchers have suggested that, in reality, a more likely explanation of any alleged empirical relationship between a test coverage measure and defect coverage is that they are both driven by more testing (referred to as test intensity). But how do we go about studying it? Well, one way would be to try to determine whether test coverage has any additional impact on defect coverage as compared to test intensity alone. In other words, this would be equivalent to assessing whether test coverage is still a statistically significant indicator of defect coverage, when the effect of test intensity has already been accounted for.

Another way to look at it might be to try to determine whether the combined effect of test intensity and test coverage can better explain the variations in defect coverage than test intensity alone. If that were somehow shown to be the case, then one can certainly provisionally conclude that there is reason to believe that test coverage has a significant additional impact on defect coverage. It probably pays to mention that a likely axiom of this whole study would be that in those situations where a test effort is primarly driven by test coverage, test intensity and test coverage cannot be differentiated, or at least certainly not as easily. Basically, to determine if test coverage is the main driver of the testing process, one could fairly easily check that by looking at the relationship between the number of test cases executed and the increase in test coverage.

Functional and System Testing: Coverage

For what is traditionally called functional testing there are certain steps that one is supposed to take:

  • Decompose and analyze the functional design specification.
  • Partition the functionality into logical components and for each component, make a list of the detailed functions.
  • For each function, use the analytical black-box methods to determine inputs and outputs.
  • Develop the functional test cases.
  • Develop a function coverage matrix.
  • Execute the test cases and measure the logic coverage.
  • Develop additional functional tests, as indicated by the combined logic coverage of function and system testing.

With this talk of function coverage, the idea is to measure it as follows: The execution of a given test case against program P will exercise (cover) certain parts of P's external functionality. A measure of testedness for P is the degree of function coverage produced by the collective set of test cases for P. Function coverage can be measured with a function coverage matrix. What are often referred to as "black-box testing methods" are used to increase function coverage. A function coverage matrix is generally a matrix or table listing specific functions to be tested, the priority of the testing for each function, and the test cases that contain tests for each function.

Regarding what we call system testing, there are certain steps that one is supposed to take:

  • Decompose and analyze the requirements specification.
  • Partition the requirements into logical categories and, for each component, make a list of the detailed requirements.
  • For each type of system testing do this: for each relevant requirement, determine inputs and outputs; develop the requirements test cases.
  • Develop a requirements coverage matrix which is simply a table in which an entry describes a specific subtest that adds value to the requirements coverage, the priority of that subtest, the specific test cases in which that subtest appears.
  • Execute the test cases and measure logic coverage.
  • Develop additional tests, as indicated by the combined coverage information.

With system testing we are more concerned with requirements coverage and this can be measured as follows: The execution of a given test case against program P will address (cover) certain requirements of P. A measure of testedness for P is the degree of requirements coverage produced by the collective set of test cases for P. It should be noted that system testing is not just the process of function testing the completely integrated system or program. It is the process of determining that a program or system does or does not meet its original requirements and objectives. This is tricky because requirements must be specific enough to be testable but general enough to allow freedom in the functional design. System tests are designed by analyzing the requirements specification and then formulated by analyzing the functional design specification or user documentation.

A Coverage Example

This example has been adapted from one by James Whittaker in his article "What is Software Testing? And Why Is It So Hard?" in IEEE Software for Jan/Feb 2000. Consider the source code example below:

====================
Input = GetInput()
While (Input != Alt-F4) do
   Case (Input = Time)
      If ValidHour(Time.Hour) and ValidMin(Time.Minute) and
         ValidSec(Time.Second) and ValidAP(Time.AmPm)
      Then
         UpdateSystemTime(Time)
      Else
         DisplayError("Invalid Time.")
      Endif

   Case (Input = Date)
      If ValidDay(Date.Day) and ValidMnth(Date.Month) and
         ValidYear(Date.Year)
      Then
         UpdateSystemDate(Date)
      Else
         DisplayError("Invalid Date.")
      Endif

   Case (Input = Tab)
      If TabLocation = 1
      Then
         MoveCursor(2)
         TabLocation = 2
      Else
         MoveCursor(1)
         TabLocation = 1
      Endif

   Endcase
Input = GetInput()
Enddo
====================

This is sample code from a small application that presents a small clock interface that allows the user to change the date and time as well as tab between the two fields or close the application (via pressing Alt+F4). Now ask: How many test cases does it take to fully cover, or exercise, the source code? To determine this, we evaluate each condition to both true and false by means of a truth table. We thus execute not only each source statement, but we also cover each possible branch in the software. The truth table after the source code documents each possible combination of conditions in the While loop, the three parts of the Case statement, and the nested If statements.

Possible Cases While Case 1 If 1 Case 2 If 2 Case 3 If 3
1 F - - - - - -
2 T T T - - - -
3 T T F - - - -
4 T F - T T - -
5 T F - T F - -
6 T F - F - T T
7 T F - F - T F
8 T F - F - F -

These eight possible cases cover only statements and branches. When we consider how each complex condition in the If statements actually gets evaluated, we must add several more cases. Although there is only one way for these statements to evaluate true (that is, every condition must be true for the statement to be true), there is more than one way for the first two If statements to evaluate false. In fact, we would find that there are 2x - 1 ways (where x is the number of conditions in the statement). Using this logic, there are 24 - 1 = 15 ways to execute the third test and 23 - 1 = 7 ways to execute the fifth (each of these cases appears in bold, above), for a total of 28 test cases. Now, imagine how many test cases would be required to test a software system with a few hundred thousand lines of code and thousands of such complex conditions to evaluate. It is easy to see why software is commonly released with unexecuted source code. (In addition to covering the source code, testers also must think about missing code. The fact that the Case statement has no default case could present problems. This also does not take into account any interface that operates as a front-end to the code.)

 


GlobalTester, TechQA, Copyright © 2002 All Rights Reserved. (
Labelled with ICRA)

Print | posted on Thursday, July 08, 2004 6:13 PM | Filed Under [ Software Testing ]

Feedback

No comments posted yet.
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 

Powered by: