Java Forum / General / March 2006
unit testing guidelines
Jacob - 18 Mar 2006 00:00 GMT I have compiled a set og unit testing recommendations based on my own experience on the concept.
Feedback and suggestions for improvements are appreciated:
http://geosoft.no/development/unittesting.html
Thanks.
Hendrik Maryns - 18 Mar 2006 18:36 GMT -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message
Jacob uitte de volgende tekst op 03/18/2006 12:03 AM:
> I have compiled a set og unit testing > recommendations based on my own experience [quoted text clipped - 4 lines] > > http://geosoft.no/development/unittesting.html Nice work.
I don't totally agree with point 16: a throws statement means an exception *might* be thrown, and the circumstances under which this can happen should be documented. It is seldom that an exception must be thrown.
You might want to give some explanation about what you assertX methods do.
H.
 Signature Hendrik Maryns
================== www.lieverleven.be http://aouw.org
Daniel T. - 18 Mar 2006 20:41 GMT > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 [quoted text clipped - 15 lines] > exception *might* be thrown, and the circumstances under which this can > happen should be documented. It is seldom that an exception must be thrown. I agree. Only test what you actually want the client code to rely on. Now if you want the client code to rely on the method throwing an exception...
 Signature Magic depends on tradition and belief. It does not welcome observation, nor does it profit by experiment. On the other hand, science is based on experience; it is open to correction by observation and experiment.
Jacob - 20 Mar 2006 08:10 GMT > I don't totally agree with point 16: a throws statement means an > exception *might* be thrown, and the circumstances under which this can > happen should be documented. It is seldom that an exception must be thrown. I assume the conditions for when an exception is thrown is deterministic and well documented (though it to a large extent depend on *documentation* rather than language syntax which is a problem as documentation is inherently inaccurate).
The simple example is the java List.get(int index) method that is documented to throw an exception if index < 0. This is the contract, and this is one of the things I want to test in a unit test.
Recommendation 16 just indicate how this is done in practice.
Ian Collins - 18 Mar 2006 22:52 GMT > I have compiled a set og unit testing > recommendations based on my own experience [quoted text clipped - 4 lines] > > http://geosoft.no/development/unittesting.html I'd add point 0 - write the tests first.
8 - names should be more expressive, rather than testSaveAs(), how about a series of tests, testSaveAsCreatesANewFile(), testSaveAsSavesCuentdataInNewFile() etc. Often tests with a broad name attempt to test too much ad don't express their intent.
Point 0 covers point 11.
13 - take care with random numbers, they can lead to failures that are hard to reproduce. I'd use a pseudo-random sequence that is repeatable with a given seed.
0 and 8 covers 14.
0 covers 17.
0 covers 20.
 Signature Ian Collins.
Jacob - 20 Mar 2006 08:28 GMT > I'd add point 0 - write the tests first. Personaly find the XP approach to unit testing a bit too restrictive and therefore left the issue intentionally open. I really like more feedback on it though, as though I have practiced unit testing for years, I never adopted this practice myself.
> 8 - names should be more expressive, rather than testSaveAs(), how about > a series of tests, testSaveAsCreatesANewFile(), > testSaveAsSavesCuentdataInNewFile() etc. Often tests with a broad name > attempt to test too much ad don't express their intent. Agree. I think this is basically what's in #8 without being to verbose.
> Point 0 covers point 11. I am not sure it does, and I wanted to define the two concepts "execution coverage" and "test coverage" anyway. There is a blurred distinction between the two in the literature as far as I have been able to dig up.
> 13 - take care with random numbers, they can lead to failures that are > hard to reproduce. I'd use a pseudo-random sequence that is repeatable > with a given seed. > > 0 and 8 covers 14. To some degree, but I'd include them even if #0 was there. I don't see "testing first" as a silver bullet, but more as a different process aproach.
> 0 covers 17. Not necesserily. #0 states when to write the tests. #17 states that the *code* should be written so that the workload of the unit testing is minimized.
> 0 covers 20. Yes, assuming everything is tested always. But in that case it is covered without #0 as well. What I see in the industry today is a major shift in adding unit testing to legacy code. I added #20 as a suggestion to start this work at the bottom level.
Ian Collins - 20 Mar 2006 09:58 GMT >> I'd add point 0 - write the tests first. > > Personaly find the XP approach to unit testing a bit too restrictive > and therefore left the issue intentionally open. I really like > more feedback on it though, as though I have practiced unit testing > for years, I never adopted this practice myself. TDD is more than an approach to unit testing, it is an approach to the full design-test-code cycle.
>> 8 - names should be more expressive, rather than testSaveAs(), how >> about a series of tests, testSaveAsCreatesANewFile(), [quoted text clipped - 10 lines] > distinction between the two in the literature as far as I have been > able to dig up. TDD done well will give you 100% execution coverage for free. How good your test coverage is depends on how good you are at thinking up edge cases to test.
>> 13 - take care with random numbers, they can lead to failures that are >> hard to reproduce. I'd use a pseudo-random sequence that is [quoted text clipped - 5 lines] > see "testing first" as a silver bullet, but more as a different > process aproach. Simple, incremental tests are the essence of good TDD.
>> 0 covers 17. > > Not necesserily. #0 states when to write the tests. #17 states that > the *code* should be written so that the workload of the unit testing > is minimized. If you start with the tests,the code will have to be written that way.
>> 0 covers 20. > > Yes, assuming everything is tested always. But in that case > it is covered without #0 as well. What I see in the industry today > is a major shift in adding unit testing to legacy code. I added > #20 as a suggestion to start this work at the bottom level. Very true.
 Signature Ian Collins.
Andrew McDonagh - 20 Mar 2006 20:51 GMT >>> I'd add point 0 - write the tests first. >> [quoted text clipped - 5 lines] > TDD is more than an approach to unit testing, it is an approach to the > full design-test-code cycle. More fundamentally, TDD is Design Methodology, Not a Testing Methodology.
It just happens to use Unit tests as its means of describing the design, much like RUP uses UML.
Indeed, some TDD practitioners are starting to call it BDD - as in
http://www.google.co.uk/search?hl=en&q=behaviour+driven+development&btnG=Google+ Search&meta=
>>> 8 - names should be more expressive, rather than testSaveAs(), how >>> about a series of tests, testSaveAsCreatesANewFile(), [quoted text clipped - 12 lines] >> > TDD done well will give you 100% execution coverage for free. I'd clarify that with 'TDD done *Correctly will give you 100% execution coverage'
*Correctly = Write 1 failing Testcase, Write only enough code to make test Pass, Refactor to Remove Duplication, Repeat
More commonly referred to as Red, Green, Refactor.
> How good your test coverage is depends on how good you are at thinking up edge > cases to test. Always starting with the test first, only allows for 100%.
>>> 13 - take care with random numbers, they can lead to failures that >>> are hard to reproduce. I'd use a pseudo-random sequence that is [quoted text clipped - 7 lines] >> > Simple, incremental tests are the essence of good TDD. These kind of tests are unit tests as in the TDD usage - they are stress tests that happen to be written in the same framework as the TDD unit tests.
However, looping over a random set of numbers isn't the best approach to this style of testing. If the OP wants to do this style, then using one of the various Agitating frameworks/products will give a better result.
These tools tend to use byte code manipulation to random change various values which aren't just numbers, but anything: int, long, float, Integer, Double, String, Boolean, boolean, introducing Nulls, etc.
See http://www.agitar.com/
Phlip - 20 Mar 2006 21:56 GMT >>>> I'd add point 0 - write the tests first. >>> >>> Personaly find the XP approach to unit testing a bit too restrictive I find debugging a bit too restrictive. I can't just use Undo to make the bug go >poof<.
Imagine if you had such a button on your debugger! You would hit it all the time!
You have such a button; it's just a little more expensive than raw code. The cost savings - no more debugging - overwhelmingly offsets that cost.
>> TDD is more than an approach to unit testing, it is an approach to the >> full design-test-code cycle. [quoted text clipped - 5 lines] > > Indeed, some TDD practitioners are starting to call it BDD - as in http://www.google.co.uk/search?hl=en&q=behaviour+driven+development&btnG=Google+ Search&meta=
And some call it Test First Programming, because TDD is position to replace the hideous name "eXtreme Programming".
And it doesn't create "unit tests", which are a different topic entirely.
The failure of a unit test implicates only one unit - such as the Ariane V engine controller.
The failure of a _Developer_ Test implicates the developer's last edit. Time to hit Undo.
>> TDD done well will give you 100% execution coverage for free. That's not exhaustive.
TDD done well will reduce the _odds_ that you need exhaustive unit testing.
 Signature Phlip http://www.greencheese.org/ZeekLand <-- NOT a blog!!!
Ian Collins - 21 Mar 2006 02:50 GMT >>> I am not sure it does, and I wanted to define the two concepts >>> "execution coverage" and "test coverage" anyway. There is a blurred [quoted text clipped - 17 lines] > > Always starting with the test first, only allows for 100%. I was using the OP's definition of "test coverage". It might just be me, but I've always had testers or users (normally testers) find some bizarre use case that wasn't catered for in the original user stories or unit tests.
 Signature Ian Collins.
Phlip - 21 Mar 2006 02:57 GMT > I was using the OP's definition of "test coverage". It might just be me, > but I've always had testers or users (normally testers) find some bizarre > use case that wasn't catered for in the original user stories or unit > tests. That's why, regardless of your unit testing strategy, you work to lower the cost of acceptance tests, so anyone can write them, and they come up with all sorts of things.
Hence all of XP is driven by tests.
 Signature Phlip http://www.greencheese.org/ZeekLand <-- NOT a blog!!!
Andrew McDonagh - 21 Mar 2006 21:38 GMT >> Always starting with the test first, only allows for 100%. >> > I was using the OP's definition of "test coverage". It might just be > me, but I've always had testers or users (normally testers) find some > bizarre use case that wasn't catered for in the original user stories or > unit tests. But we are talking about unit testing here - developers write and run unit tests.
Users/testers don't unit test - they Acceptance (integration, System) Test.
Everyone runs the acceptance tests.
Timo Stamm - 18 Mar 2006 23:24 GMT Jacob schrieb:
> I have compiled a set og unit testing > recommendations based on my own experience > on the concept. > > Feedback and suggestions for improvements > are appreciated:
| 7. Keep tests close to the class being tested | | If the class to test is Foo the test class should be called FooTest | and kept in the same package (directory) as Foo. The build environment | must be configured so that the test classes doesn't make its way into | production code. It is necessary to have test classes in the same package as the tested class in order to test package private methods.
But you don't have to put the classes in the same directory. Most IDEs support several source folders. You can setup two source folders. For example: "src" for your application source, "test" for your test source. If you use the same package structure in the test source folder, you can test package private methods and it is very easy to deploy only application code.
Timo
Ian Collins - 19 Mar 2006 00:13 GMT > Jacob schrieb: > [quoted text clipped - 14 lines] > It is necessary to have test classes in the same package as the tested > class in order to test package private methods. Another view that tests that require access to private methods are a design smell. Often these can be refactored into objects that can be tested in isolation.
In C++, it's very tempting to make the test class a friend of the class under test. I've found that I end up with a better design by resisting this temptation.
 Signature Ian Collins.
Hendrik Maryns - 19 Mar 2006 00:49 GMT -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message
Ian Collins uitte de volgende tekst op 03/19/2006 12:13 AM:
>> Jacob schrieb: >> [quoted text clipped - 18 lines] > design smell. Often these can be refactored into objects that can be > tested in isolation. I was about to answer the same: shouldn't problems in package private methods spill through to public methods? Then why test them separately? Find an error in a public method and retrace it with you favorite debugger to the package private method, I'd say (without much experience, so correct me if I'm wrong).
H.
 Signature Hendrik Maryns
================== www.lieverleven.be http://aouw.org
Bent C Dalager - 19 Mar 2006 01:00 GMT >I was about to answer the same: shouldn't problems in package private >methods spill through to public methods? Then why test them separately? It makes it more time-consuming to find out where the error is.
> Find an error in a public method and retrace it with you favorite >debugger to the package private method, I'd say (without much >experience, so correct me if I'm wrong). I prefer my unit tests to have obvious failure modes so that I can basically tell from which test failed, exactly where in my source the bug is. This means I don't have to muck around with a debugger, I can just fix it and get on with things.
For this to be the case, however, the methods that I test need to be reasonably small and not do a whole lot. These are my private helper methods that I invoke from my more involved algorithm methods. Many are one or two liners and they generally don't make sense to have publicly accessible since they're really just internal building blocks for constructing other more interesting methods.
Cheers Bent D
 Signature Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd powered by emacs
Ian Collins - 19 Mar 2006 02:27 GMT >>>| 7. Keep tests close to the class being tested >>>| [quoted text clipped - 15 lines] > debugger to the package private method, I'd say (without much > experience, so correct me if I'm wrong). As Brent said, you are testing too much with your tests. A golden rule is not to rely on indirect tests.
I've recently come to the conclusion (while working with PHP which doesn't have a handy debugger) that resorting to the debugger is a strong indicator that your tests aren't fine grained enough. Try working without one for a while and see your tests improve!
 Signature Ian Collins.
Timo Stamm - 19 Mar 2006 01:23 GMT Ian Collins schrieb:
>> It is necessary to have test classes in the same package as the tested >> class in order to test package private methods. >> > Another view that tests that require access to private methods are a > design smell. Often these can be refactored into objects that can be > tested in isolation. Not "private", but "package private".
Package private classes are only visible within the same package (same directory). They are useful in large APIs where you have a lot of functionality, but only want to expose a small interface.
Timo
Ian Collins - 19 Mar 2006 02:22 GMT > Ian Collins schrieb: > [quoted text clipped - 10 lines] > directory). They are useful in large APIs where you have a lot of > functionality, but only want to expose a small interface. I see, a concept not shared with C++.
 Signature Ian Collins.
Jacob - 20 Mar 2006 08:45 GMT > Ian Collins schrieb: > [quoted text clipped - 10 lines] > directory). They are useful in large APIs where you have a lot of > functionality, but only want to expose a small interface. I regard this as "private" in this context. An error in the inner logic between classes of the same package (or *friends* in C++ syntax) will eventually reveal itself through the public API.
I want to keep test classes close to the class being tested for practical reasons rather than technical reasons.
I understand the objection of "testing too large chunks of code" (Ian C.), but test code adds complexity and workload to your system afterall, and I really want to keep it to a minimum. That's why I reduce the public API of classes as much as possible (by heavy use of package private methods for instance) and insist on testing public API only.
But I don't clain that this is the only way, and it might well depend on the nature of the project being tested.
Jacob - 20 Mar 2006 08:36 GMT > Another view that tests that require access to private methods are a > design smell. Often these can be refactored into objects that can be [quoted text clipped - 3 lines] > under test. I've found that I end up with a better design by resisting > this temptation. This is my experience as well, and the reason why I added recommendation #9 "Test public API".
That something is technically feasable (private method testing through reflection or by other means) doesn't necesserily mean it is a good idea.
You need to draw the line somewhere, and the public API seems quite natural in this case. This is also more robust agains changes, in that it will be more stable and require less testing maintainance during code refactoring.
Timo Stamm - 19 Mar 2006 01:59 GMT Timo Stamm schrieb:
> | 7. Keep tests close to the class being tested > | [quoted text clipped - 12 lines] > test package private methods and it is very easy to deploy only > application code. Oops, I didn't realize that the guidelines aren't java-specific and that this thread is on c.l.java.p as well as c.l.c++.
My objection is specific to java. I doubt that the same applies to c++.
Adam Maass - 20 Mar 2006 07:42 GMT >I have compiled a set og unit testing > recommendations based on my own experience [quoted text clipped - 6 lines] > > Thanks. I strongly object to number 13. Unit-tests, especially in an automated framework, should be repeatable. (When a test fails, you need to know on what inputs it failed. Once you fix the failure, you should hard-code the inputs it failed on so that subsequent changes do not cause a regression of the error.)
I don't necessarily object to looping over large numbers of inputs and testing each one for expected outputs. But a unit test should contain no randomness at all. (Or at least should have a way of specifying the seed for the randomness generator(s).)
-- Adam Maass
Jacob - 20 Mar 2006 08:54 GMT > I strongly object to number 13. Unit-tests, especially in an automated > framework, should be repeatable. (When a test fails, you need to know on > what inputs it failed. Once you fix the failure, you should hard-code the > inputs it failed on so that subsequent changes do not cause a regression of > the error.) I understand your objection, but this is actually one of the mechanisms that have helped me found some of the hardest to trace and most subtle errors in the code. It has proven to be extremely helpful. Also, it gives me lots of confidence knowing that my test suite of several thousand tests are executed every hour with different input each time. It is like adding another dimension to unit testing.
But as tests must be reproducable I agree, I added #15 to ensure that when a test fails, the test report will include the input parameters if failed with exactly. Then you can add a test with this explicit input and debug it from there.
Adam Maass - 23 Mar 2006 04:29 GMT >> I strongly object to number 13. Unit-tests, especially in an automated >> framework, should be repeatable. (When a test fails, you need to know on [quoted text clipped - 9 lines] > are executed every hour with different input each time. > It is like adding another dimension to unit testing. There is sometimes value in testing on large numbers of random inputs. But this isn't *unit* testing; it's more akin to a system or stress test. It's something you hope your QAs will do for you; test on inputs that you weren't necessarily expecting and see what breaks. Unit testing is about the correctness of code for known inputs. If you come across a failure for a novel set of inputs in system or stress testing, by all means, take that input and add it to your unit test suite.
Note that test frameworks can be used both for unit tests as well as other kinds of tests. (Simply because it's called 'JUnit', for example, does not necessarily mean that all the test cases are, in fact, unit tests.)
-- Adam Maass
Jacob - 23 Mar 2006 20:03 GMT > There is sometimes value in testing on large numbers of random inputs. But > this isn't *unit* testing; it's more akin to a system or stress test. It's > something you hope your QAs will do for you; test on inputs that you weren't > necessarily expecting and see what breaks. Unit testing is about the > correctness of code for known inputs. Which definition of unit testing is this? I have searched the net but hasn't been able to find any backing for this?
If I write a method void setLength(double length), who define the input "necesserily expected", and why isn't this the entire double range? I'd claim the latter and to cover as many inputs as possible I use the random trick.
I don't have a problem with defining this kind of testing differently, for instance "stress testing", but on the other hand there isn't really any more "stress" is calling setLength(1.23076e+307) than setLength(2.0) as long as the method accepts a double as input?
And why do you care about "known" input as long as the actual (failing) input can be traced afterwards anyway?
You define this as a unit test:
for (int i = 0; i < 1000; i++) testMyIntMethod(i);
while this is not:
for (int i = 0; i < 1000; i++) testMyIntMethod(getRandomInt())
even if an error on input=42 will produce identical error reports in both cases. Only the latter will (eventually) reveal the error for input=-100042.
Also, if I have a setLength() method which cover the "typical" input cases just fine, but is in general crap (a common scenario), then a testSetLength() method that verifies that setLength() work fine for "typical" input isn't worth a lot. What you need is a test method that test the non-typical inputs. From a black-box perspective you don't really know what is typical or non-typical, so why not just throw a random number genrator at it?
Ben Pope - 23 Mar 2006 21:29 GMT >> There is sometimes value in testing on large numbers of random inputs. >> But this isn't *unit* testing; it's more akin to a system or stress [quoted text clipped - 9 lines] > double range? I'd claim the latter and to cover as many inputs > as possible I use the random trick. The programmer specifies the preconditions. If the preconditions are not met, there is no reason for it to produce valid results. Random is not repeatable, and is not predictable.
> I don't have a problem with defining this kind of testing > differently, for instance "stress testing", but on the other > hand there isn't really any more "stress" is calling > setLength(1.23076e+307) than setLength(2.0) as long as the > method accepts a double as input? No, but the point is that when you unit test, you need to make informed choices about the inputs you choose. It's usually wise to throw in a couple of "normal", "everyday" values, but also explicitly check boundary cases and out of range.
> And why do you care about "known" input as long as the > actual (failing) input can be traced afterwards anyway? Repeatability. It's no use relying on randomness to thoroughly test. You have to design your test cases.
> You define this as a unit test: > > for (int i = 0; i < 1000; i++) > testMyIntMethod(i); Not really. What are you testing? That it doesn't crash? Presumably you need to check the output against an array of 1000 pre-computed values? Not much fun.
> while this is not: > > for (int i = 0; i < 1000; i++) > testMyIntMethod(getRandomInt()) How can you possibly check the output is correct for a random input?
> even if an error on input=42 will produce identical error reports > in both cases. Only the latter will (eventually) reveal the > error for input=-100042. I don't understand, are you checking for crashing?
> Also, if I have a setLength() method which cover the "typical" > input cases just fine, but is in general crap (a common scenario), [quoted text clipped - 3 lines] > you don't really know what is typical or non-typical, so why not just > throw a random number genrator at it? You know your preconditions. You know your postconditions.
If anything can happen on invalid input, then no point in testing. If you want a default output, exception or whatever for out-of-range, then check it with a unit test. You get to choose your input and you get to check your output.
Randomness just doesn't cut it, and I don't understand how you can check the output is correct, without knowing the input.
Ben Pope
 Signature I'm not just a number. To many, I'm known as a string...
Jacob - 25 Mar 2006 16:15 GMT > Randomness just doesn't cut it, and I don't understand how you can check > the output is correct, without knowing the input. You *do* know the input!
Consider testing this method:
double square(double v) { return v * v; }
Below is a typical unit test that verifies that the method behaves correctly on typical input:
double v = 2.0; double v2 = square(v); // You know the input: It is 2.0! assertEquals(v2, 4.0);
The same test using random input:
double v = getRandomDouble(); double v2 = square(v); // You know the input: It is v! assertEquals(v2, v*v);
If the test fails, all the details will be in the error report.
And this method actually *do* fail for a mjority of all possible inputs (abs of v exceeding sqrt(maxDouble)). This will be revealed instantly using the random approach.
For an experienced programmer the limitation of square() might be obvious so border cases are probably covered sufficiently in both the code and the test. But for more complex logic this might not be this apparent and throwing in random input (in ADDITION to the typical cases and all obvious border cases) has proven quite helpful, at least to me.
Tom Leylan - 25 Mar 2006 17:17 GMT Jacob:
You've chosen a trivial example where your assert can compute the results of the Square() function you are calling. That is hardly a typical situation or there would be no reason for the function to have been created.
double v = getRandomDouble(); double v2 = AccountBalance( v ); assertEquals( v2, ? );
So explain how you get the value to type in ? given you don't know what the input will be. Perhaps you would do the computations you read about in the AccountBalance() method inline to see if those and yours matched?
>> Randomness just doesn't cut it, and I don't understand how you can check >> the output is correct, without knowing the input. [quoted text clipped - 35 lines] > obvious border cases) has proven quite helpful, at least > to me. Jacob - 25 Mar 2006 18:55 GMT > You've chosen a trivial example where your assert can compute the results of > the Square() function you are calling. That is hardly a typical situation [quoted text clipped - 7 lines] > input will be. Perhaps you would do the computations you read about in the > AccountBalance() method inline to see if those and yours matched? I chose a fairly typical example of a basic unit requiring unit testing and I proved that by using random input it easily identified an error that otherwise could slip through.
I never said that using random input was useful in all cases and perhaps it isn't in your specific example. On the other hand, how do you know what goes into "?" given you know the input? There must be some sort of reasoning behind your result as well.
Below is a different example which might not be as trivial as my previous. It "proves" that encoding + decoding (according to some procedure) of any string should give back the original string:
String text = getRandomString(0,1000000); // 0 - 1MB String encoded = Encoder.encode(text); String decoded = Encoder.decode(encoded); assertEquals(text, decoded);
Andrew McDonagh - 25 Mar 2006 19:27 GMT > Below is a different example which might not be as trivial > as my previous. It "proves" that encoding + decoding (according [quoted text clipped - 5 lines] > String decoded = Encoder.decode(encoded); > assertEquals(text, decoded); This only proves that the encoding & decoding scheme is the same. I can make this test pass like this...
public String encode(String text) { return text; }
public String decode(String encoded) { return encoded; }
Here, the unit test is testing every line of code, yet its a worthless implementation.
The unit test is not testing the encoding mechanism as it should be implemented as its using the decode() for the assertion.
In this case, its prudent/better to have separate unit tests for both the encode & decode, an doing the opposite conversion locally within the unit test itself as this prevents errors of implementation but more importantly, forces the implementation to use the correct algorithms.
public void testEncoder() { String text = getRandomString(0,1000000); // 0 - 1MB String encoded = Encoder.encode(text); String decoded = MD5.decode(encoded); assertEquals(text, decoded); }
public void testDecoder() { String text = getRandomString(0,1000000); // 0 - 1MB String encoded = MD5.encode(text); String decoded = Encoder.decode(encoded); assertEquals(text, decoded); }
Here we have separated the logic needed for the test from the unit under test.
Alex Hunsley - 26 Mar 2006 05:26 GMT >> Below is a different example which might not be as trivial >> as my previous. It "proves" that encoding + decoding (according [quoted text clipped - 19 lines] > Here, the unit test is testing every line of code, yet its a worthless > implementation. No, an encoding that doesn't do any encoding is still an encoding. It just happens to be the 'identity' encoding (in the same way that 1 is the identity for multiplication and 0 is the identity for addition). If an encoding and decoding function are presented as a pair of opposite functions, then you are perfectly justified in testing that one is the inverse of the other:
Decode(Encode(x)) = x
Testing that an encoding is *any good* by whatever means you judge 'good' is entirely a different matter to testing the reversibility of an encode/decode pair (and harder to do, to boot, although testing that the encoding did something, anything, to the data is trivial).
> The unit test is not testing the encoding mechanism as it should be > implemented as its using the decode() for the assertion. [quoted text clipped - 20 lines] > Here we have separated the logic needed for the test from the unit under > test. Jacob - 26 Mar 2006 08:42 GMT > Here, the unit test is testing every line of code, yet its a worthless > implementation. The test is still useful. It can't prove that the code is correct, but if it fails, it can prove that the code is wrong.
And as stated several times already: The test comes in ADDITION to the test for typical cases and all obvious boundary cases, which in this particular case would have been written quite differently.
Andrew McDonagh - 26 Mar 2006 08:38 GMT >> Here, the unit test is testing every line of code, yet its a worthless >> implementation. > > The test is still useful. It can't prove that the code is correct, > but if it fails, it can prove that the code is wrong. But it cant tell you which of the collaborating en/de coding methods is the cause.
Jacob - 26 Mar 2006 09:14 GMT >>> Here, the unit test is testing every line of code, yet its a >>> worthless implementation. [quoted text clipped - 4 lines] > But it cant tell you which of the collaborating en/de coding methods is > the cause. Given an explicit input for which the operation fails should give you enough information to be able to track this down yourself.
Alex Hunsley - 26 Mar 2006 13:24 GMT >>> Here, the unit test is testing every line of code, yet its a >>> worthless implementation. [quoted text clipped - 4 lines] > But it cant tell you which of the collaborating en/de coding methods is > the cause. But it is telling you that you have a problem, which is a good start. You can then invetigate further, or back up with other complementary tests.
Tom Leylan - 25 Mar 2006 20:57 GMT Forgive me but you are terming it "fairly typical" and it isn't typical of anything I have seen. So rather than you or I decide (since we don't agree) let others weigh in on how typical a function is that an easily-contrived formula produces the exact same answer. Show me your assertEquals for IsPrime() for instance.
The discussion hasn't been (as I read it) that random input is of no value in all cases. It was illustrated by others that "unit tests" imply one does know the answer and that random input means you can rarely know the answer. To answer your question as to how one might know the value of ? it would be "computed" by whatever method was required. A test for IsPrime() would be fed known prime and non-prime values safely knowing which ones should return true and which should return false. A test for AccountBalance() would similarly have inputs and outputs which have been determined to test the functionality.
I think Andrews response does a great job of pointing out how relying on the two functions to test each other is a mistake.
>> You've chosen a trivial example where your assert can compute the results >> of the Square() function you are calling. That is hardly a typical [quoted text clipped - 28 lines] > String decoded = Encoder.decode(encoded); > assertEquals(text, decoded); Jacob - 26 Mar 2006 09:03 GMT > Forgive me but you are terming it "fairly typical" and it isn't typical of > anything I have seen. The most typical methods around are getters and setters which are even less complex than the square example I used previously:
String name = getRandomString(0,1000); A.setName(name); assertEquals(A.getName(), name);
They are not the most interesting ones to test, but they should still be tested, and using random input increase the test coverage.
> Show me your assertEquals for IsPrime() for instance. Not the best example I could come up with, but it indicates the principle:
for (int i = 0; i < 1000; i++) { int v1 = getRandomInt(); if (isPrime(v1)) { for (int j = 0; j < 1000; j++) { int v2 = getRandomInt(); if (isPrime(v2)) { assertNotEquals(v2 % v1, 0); assertNotEquals(v1 % v2, 0); } } } }
Again: It doesn't prove that isPrime() is correct, but it may be able to prove that it is wrong.
Tom Leylan - 26 Mar 2006 17:15 GMT Of course one can insert getRandomString() into a test when the actual string value has no known limits. Put it into your US State or US Zip Get/Set... Any field which validates it's entry should fail upon assignment and your assertion doesn't get a chance to run does it? Again nobody claimed that there is anything wrong with random string testing. It was pointed out that it shouldn't form the basis of one's unit tests.
It seems to me the purpose of unit testing is to verify that values known to be good, pass and that values known to be bad, fail. If a bad value makes it through it can be added to the test suite. That is different than "generating an additional random value."
So you should continue to include random string tests (to your unit tests) and I (and a few others) will probably recommend against it. There is no problem with differing viewpoints.
>> Forgive me but you are terming it "fairly typical" and it isn't typical >> of anything I have seen. [quoted text clipped - 29 lines] > Again: It doesn't prove that isPrime() is correct, but it may be able > to prove that it is wrong. Scott.R.Lemke@gmail.com - 28 Mar 2006 16:19 GMT > > Forgive me but you are terming it "fairly typical" and it isn't typical of > > anything I have seen. [quoted text clipped - 8 lines] > They are not the most interesting ones to test, but they should > still be tested, and using random input increase the test coverage. Unless of course you pass in an invalid string; too long, too short, not unique, etc, and your setter silently fixes/fails, then because of that your getter fails, and you get a false failure on your assertion.
> > Show me your assertEquals for IsPrime() for instance. > [quoted text clipped - 16 lines] > Again: It doesn't prove that isPrime() is correct, but it may be able > to prove that it is wrong. It doesn't prove either. You cannot prove that it was wrong based upon a random input, as the input might be wrong.
I have long stopped using terms like "Unit", "Black box", "System" when referrring to test, as there are too many definitions out there. Instead describe tests by purpose and context, and leave names out. So, for your random test your purpose would be to test a variety of inputs, and the context would be on a method with unknown results. By doing that instead of pre-placing a term like "Unit" and all the prejudice/preconceptions that come with that term, you will better get your point across as to why you are doing a test.
Hendrik Maryns - 29 Mar 2006 11:17 GMT -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message
Scott.R.Lemke@gmail.com schreef:
>>> Forgive me but you are terming it "fairly typical" and it isn't typical of >>> anything I have seen. [quoted text clipped - 11 lines] > not unique, etc, and your setter silently fixes/fails, then because of > that your getter fails, and you get a false failure on your assertion. Then you should have preconditions or postconditions for you setter method which take care of that, and integrate them in the test.
H.
 Signature Hendrik Maryns
================== www.lieverleven.be http://aouw.org
Scott.R.Lemke@gmail.com - 29 Mar 2006 15:54 GMT > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 [quoted text clipped - 20 lines] > Then you should have preconditions or postconditions for you setter > method which take care of that, and integrate them in the test. And what if every one of your random choices fails those conditions, and the test is never run?
The point I was trying to make is that this type of random testing is actually a form of another type of test, often referred to as monkey testing, and by dropping the label of "unit" or "monkey", and instead stating the purpose and context you eliminate this whole argument.
Ben Pope - 27 Mar 2006 11:31 GMT >> Randomness just doesn't cut it, and I don't understand how you can >> check the output is correct, without knowing the input. [quoted text clipped - 23 lines] > If the test fails, all the details will be in the error > report. And how exactly did you come up with v*v as the value to test against? Did you copy it from the function you're testing? Do you expect that to fail?
Did you get somebody else to write the code? Do you implement all the code twice, independently and check them against each other?
> And this method actually *do* fail for a mjority of all > possible inputs (abs of v exceeding sqrt(maxDouble)). [quoted text clipped - 7 lines] > obvious border cases) has proven quite helpful, at least > to me. I fail to see how you are going to automatically test this complicated logic.
Ben Pope
 Signature I'm not just a number. To many, I'm known as a string...
Jacob - 27 Mar 2006 21:19 GMT > And how exactly did you come up with v*v as the value to test against? > Did you copy it from the function you're testing? Do you expect that to > fail? The unit test reflects the requirements and for a square() method the requirement is to return the square of its argument: v*v.
That this happens to be identical to the code implementation is purly coincidental and a result of picking a (too?) simple example. The square method may well be implemented by establishing a socket connection to the math query engine at the MIT, a fancy caching mechanism or some advanced bit operation.
davidrubin@warpmail.net - 30 Mar 2006 05:45 GMT > > Randomness just doesn't cut it, and I don't understand how you can check > > the output is correct, without knowing the input. [quoted text clipped - 14 lines] > double v2 = square(v); // You know the input: It is 2.0! > assertEquals(v2, 4.0); This is fine.
> The same test using random input: > > double v = getRandomDouble(); > double v2 = square(v); // You know the input: It is v! > assertEquals(v2, v*v); This is completely broken. You can't test an implementation of 'square' with an identical implementation. You need a separate representation for your expected result. Otherwise, you are not testing anything.
> If the test fails, all the details will be in the error > report. > > And this method actually *do* fail for a mjority of all > possible inputs (abs of v exceeding sqrt(maxDouble)). > This will be revealed instantly using the random approach. This may not ever be revealed using random inputs, but in the case of 'square' this is a moot point. The contract of 'square' must stipulate that the input (v) is invalid unless 'v * v < "max double"'. Since such inputs are invalid by the contract, there is no point in testing them.
> For an experienced programmer the limitation of square() > might be obvious so border cases are probably covered [quoted text clipped - 3 lines] > obvious border cases) has proven quite helpful, at least > to me. This is also wrong. The boundaries of the input is stated in the function's contract. It is not something determined by the user's level of experience. Your test cases must cover the boundary conditions stipulated by the function's documented contract *as* *well* *as* boundary conditions based on white-box knowledge of the function's implementation. If you cover these cases, plus a small assortment of well-chosen "sanity" values, you don't need to waste time with large amounts of random data.
If you can't test your function in this way, it is probably not factored correctly.
Jacob - 30 Mar 2006 08:02 GMT > This is completely broken. You can't test an implementation of 'square' > with an identical implementation. You need a separate representation > for your expected result. Otherwise, you are not testing anything. I've already answered this in a different posting: The unit test reflects the requirements. The requirements for square() is to return the square of the input: v*v. From a black-box perspecitive I don't know the implementation of square(). It can be anything.
> This is also wrong. The boundaries of the input is stated in the > function's contract. It is not something determined by the user's level [quoted text clipped - 4 lines] > well-chosen "sanity" values, you don't need to waste time with large > amounts of random data. This is all correct given you are able to identify the boundary cases up front. In some cases you are, but for more complex ones you easily forget some in the same way you forget to handle these cases in the original code (that's why there are bugs afterall).
Imagine implementing a tree container. In order to test correct removal of nodes, some of the boundary cases might be:
remove root remove intermediate node remove leaf node remove root when this is the only node remove root with exactly one leaf remove root with exactly one intermediate node remove intermediate node with one child remove intermediate node with many children remove leaf node without siblings remove leaf node with siblings remove intermediate node with root parent remove intermediate node with only leaf nodes remove intermediate node with leaf nodes and other intermediate nodes remove intermediate node with only other intermediate node children remove non-existing node remove null remove node with unique name remove node with non-unique name etc.
The above might or might not be boundary cases, that actually depends on the implementation: A good implementation has few! From experience you "know" which cases are more likely to contains bugs, even without knowing the implementation.
I don't say you shouldn't cover the boundary cases explicitly, of course you should (see #13 in the guidelines).
But when that is in place I whould have built a tree on random, containing a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on random and performed a random (add, remove, movde, copy, whatever) operation on those, a random number of times (0 - 10.000 perhaps) and verified that the operation behave as expected and that the tree is always in a consistent state afterwards. This whould leave me with the confidence that if there are cases I've forgotten (or that appears during code refactoring) they might be trapped by this additional test.
davidrubin@warpmail.net - 30 Mar 2006 16:40 GMT > > This is completely broken. You can't test an implementation of 'square' > > with an identical implementation. You need a separate representation [quoted text clipped - 4 lines] > return the square of the input: v*v. From a black-box perspecitive > I don't know the implementation of square(). It can be anything. This is why black-box tests are not entirely sufficient. You must (especially for unit tests) use some white-box knowledge to test the boundary conditions of both the contract and the implementation.
[snip - tree stuff]
> But when that is in place I whould have built a tree on random, containing > a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on [quoted text clipped - 4 lines] > cases I've forgotten (or that appears during code refactoring) they might > be trapped by this additional test. I went to Brian Kernighan's site at Princeton a while back. One of his assignments was to implement associative arrays similar to those in awk. Then, he provided a script generator that produces random output (add, remove, lookup, etc). You are supposed to run this script against both awk and your own implementation, and compare the results. So, I think you would probably appreciate this.
Also, John Lakos' new book is due to be published later this year. In it, he promises to address the issue of component-level testing in great detail, including a section on random testing, which I think you will find very interesting.
Adam Maass - 27 Mar 2006 16:41 GMT >> There is sometimes value in testing on large numbers of random inputs. >> But this isn't *unit* testing; it's more akin to a system or stress test. [quoted text clipped - 32 lines] > in both cases. Only the latter will (eventually) reveal the > error for input=-100042. If you don't care about the result for input 100042, then the "random" version is flawed. In unit testing, you want to select several typical inputs, as well as boundary and out-of-range inputs. This is sufficient to obtain a general sense that the code is correct for the general case. It also requires the test-writer to /think/ about what the boundary conditions are. There may be several of these, at many points in the domain.
> Also, if I have a setLength() method which cover the "typical" > input cases just fine, but is in general crap (a common scenario), [quoted text clipped - 3 lines] > you don't really know what is typical or non-typical, so why not just > throw a random number genrator at it? My objection to random inputs is that unit-tests must be 100% repeatable for every run of the test suite. I don't ever want to see a failure of a unit test that doesn't reappear on the next run of the suite unless something significant -- either the test case or the code under test -- has changed. Random inputs are likely to skip those inputs that cause failures, even if every once in a while they do uncover a failure.
Note too that unit-testing is not black-box testing. Good unit tests usually have pretty good knowledge of the underlying algorithm under test.
-- Adam Maass
Timbo - 27 Mar 2006 16:51 GMT > My objection to random inputs is that unit-tests must be 100% repeatable for > every run of the test suite. I don't ever want to see a failure of a unit > test that doesn't reappear on the next run of the suite unless something > significant -- either the test case or the code under test -- has changed. > Random inputs are likely to skip those inputs that cause failures, even if > every once in a while they do uncover a failure. Agreed. A potential problem with randomly generated inputs is that the person fixing the fault has to write a unit test to reproduce the bug. Some people are lazy and will just fix the bug, run the random unit tests, see them pass (because the randomly generated input is not tested the next time), and recommit the new version.
Also, I've never seen anything to indicate that random tests are any more likely to uncover a fault than properly selected test cases.
Jacob - 27 Mar 2006 21:29 GMT > Also, I've never seen anything to indicate that random tests are any > more likely to uncover a fault than properly selected test cases. "Properly selected" is fine. If you miss some of those (there may be MANY remember), the random cases *may* catch them.
That's it. You are not supposed to replace any of the good stuff you are already doing. It's just a simple tool for making the whole package even better.
Roedy Green - 27 Mar 2006 18:24 GMT On Mon, 27 Mar 2006 07:41:54 -0800, "Adam Maass" <adam.nospam.maass@comcast.net> wrote, quoted or indirectly quoted someone who said :
>In unit testing, you want to select several typical >inputs, as well as boundary and out-of-range inputs. a term you will also hear is "corner cases".
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Jacob - 27 Mar 2006 21:09 GMT > In unit testing, you want to select several typical > inputs, as well as boundary and out-of-range inputs. This is sufficient to > obtain a general sense that the code is correct for the general case. It > also requires the test-writer to /think/ about what the boundary conditions > are. There may be several of these, at many points in the domain. You describe an ideal world where the unit test writer thinks of every possible scenario beforehand. In such a regime you don't need unit testing in the first place.
My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
> My objection to random inputs is that unit-tests must be 100% repeatable for > every run of the test suite. I don't ever want to see a failure of a unit > test that doesn't reappear on the next run of the suite unless something > significant -- either the test case or the code under test -- has changed. If I have a flaw in my code I'd be more happy with a test that indicates this *sometime* rather than *never*. Of course *always* is even better, but then we're back to Utopia.
BTW: You can acheieve repeatability by specifying the random seed in the test setup. My personal approach is of course to seed with a maximum of randomness (using current time millis :-)
> Note too that unit-testing is not black-box testing. Good unit tests usually > have pretty good knowledge of the underlying algorithm under test. Again you add definition to unit testing without further reference. Unit testing is *in practice* white-box testing since the tests are normally written by the target code developer, but it is actually beneficial to treat it as a black-box test: Look at the class from the public API, consider the requirements, and then try to tear it appart without thinking too much about the code internals. This is at least my personal approach when writing unit tests for my own code.
Noah Roberts - 27 Mar 2006 21:15 GMT > > In unit testing, you want to select several typical > > inputs, as well as boundary and out-of-range inputs. This is sufficient to [quoted text clipped - 5 lines] > of every possible scenario beforehand. In such a regime you don't > need unit testing in the first place. Sure you do. Unit tests can stop a lot of bugs before they happen and before tracking them down gets difficult. The ones that remain mean that you have to track them down as you normally would, write a test for the condition that causes the bug to replicate, and then fix your code until all tests pass.
This means that changes you make to the code later in refactoring or adding features do not reintroduce bugs you have fixed before. Think about how many times you have fixed a bug only for it to turn up later because of changes you or someone else made to the code.
> My experience is that you tend to "forget" certain scenarios > when you write the code, and then "forget" the exact same cases > in the test. It helps to write the test first and to write the test independant of the code in question. For instance, my latest batch of additions to our code base involved adding features that were available in a different code base...one we are depricating. My tests simply verify that the same results result from the same inputs since at this time I want the answers to be the same. I chose those values randomly but I put them in as static values in my tests.
Forgetting is also important as I described above in which bugs reappear after being fixed ages ago because you or someone else forgot what caused them and put that problem back when altering the code.
The result is a test that works fine in normal cases,
> but fails to reveal the flaw in the code for the not-so-normal > cases. This is a useless and costly excercise. Random inputs may > cover some of the cases that was forgotten in this process. Random inputs are difficult to regenerate. It might be beneficial to initially create some random inputs but always put those as static values in your test. This may cover some forgotten conditions yet remain predictable and traceable. Remember, unit tests should be completely automatic.
> > Note too that unit-testing is not black-box testing. Good unit tests usually > > have pretty good knowledge of the underlying algorithm under test. [quoted text clipped - 6 lines] > too much about the code internals. This is at least my personal approach > when writing unit tests for my own code. Yes, that is how unit tests should be performed. The don't test the code, they test the interface to make sure the code conforms to that interface and that the interface is what is needed. They also serve to document your code base fairly well.
Patricia Shanahan - 27 Mar 2006 21:46 GMT ...
> Random inputs are difficult to regenerate. Whether or not pseudo-random inputs are difficult to regenerate depends on the design of the test framework.
I suggest the following requirements:
1. Each pseudo-random test must support both an externally supplied seed and a system time based seed.
2. The seed is part of the output on any pseudo-random test failure.
Given those properties, I think one can set up a test regime that gets the benefits of random testing without the costs.
All tests in the regression test suite that is run for each code change must be effectively non-random. That includes random tests bound to a fixed seed. This is important, because any failure in this context should be due to the most recent code change.
Running with system time seeds is an additional test activity. If it finds an error, the first step towards a fix is to add the failing test/seed combination to the regression test suite, so that it fails.
Whether the system time seed testing is considered "unit test" is a matter of how "unit test" is defined.
Patricia
Roedy Green - 28 Mar 2006 02:58 GMT >Running with system time seeds is an additional test activity. If it >finds an error, the first step towards a fix is to add the failing >test/seed combination to the regression test suite, so that it fails. Good thinking. It would be so frustrating to discover an error you can't reproduce.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 27 Mar 2006 21:29 GMT >My experience is that you tend to "forget" certain scenarios >when you write the code, and then "forget" the exact same cases >in the test. The result is a test that works fine in normal cases, >but fails to reveal the flaw in the code for the not-so-normal >cases. This is a useless and costly excercise. Random inputs may >cover some of the cases that was forgotten in this process. the other way to get coverage is to get same some tests written by people unfamiliar with the inner workings. The will test things that "don't need" testing.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Andrew McDonagh - 27 Mar 2006 21:44 GMT >> In unit testing, you want to select several typical inputs, as well as >> boundary and out-of-range inputs. This is sufficient to obtain a [quoted text clipped - 12 lines] > cases. This is a useless and costly excercise. Random inputs may > cover some of the cases that was forgotten in this process. There is where TDD comes in.
If we write one test at a time . Write Just Enough Code to make the test pass. Refactor to improve the current state of the design
We are only writing code for tests we already have. The next test is only needed if we need to code something or to strengthen the corner-case tests of the code that we have just made.
This way - there is no forgetting.
To make this achievable, each test case (method) should : 1) only test one aspect of the code 2) Have as few asserts as possible (1 being the best) 3) Be small (like any method) ~ 10(or what ever your favourite number is) lines of code. 4) be fast - the faster they run, the more we run them continuosly, the sooner we find problems. 5) Do not use/touch: Files, Networks, dbs - these are slow compared to in memory fake data/objects.
>> My objection to random inputs is that unit-tests must be 100% >> repeatable for every run of the test suite. I don't ever want to see a [quoted text clipped - 9 lines] > seed in the test setup. My personal approach is of course to seed > with a maximum of randomness (using current time millis :-) you might want to google 'seeding with time' to see why its not a great idea.... especially when unit tests are concerned.
>> Note too that unit-testing is not black-box testing. Good unit tests >> usually have pretty good knowledge of the underlying algorithm under [quoted text clipped - 7 lines] > too much about the code internals. This is at least my personal approach > when writing unit tests for my own code. white box /black box.... all the same really from a testing PoV... the only difference is how tolerable the test case is to the code design changing. White box..not terribly tolerant. Black box...tolerant.
With TDD, its better to consider the unit tests to be 'Behavior Specification Tests'. They are validating that the specified Behavior exists within the code under test. But each specification test is specifying a small part of the code under test, as we have multiple small test cases. Not few large testcases.
For example, we have Calculator class that can Add, Subtract, Multiply & Divide Integers.
So we'd have the following tests...
testAddingZeros() testAddingPositiveNumbers() testAddingNegativeNumbers() testAddingNegativeWithPositiveNumbers() testAddingPositiveWithNegativeNumbers();
testDividingByZero() testDividingPositiveNumberByNegative() ....
I Don't need to have tests for different values within the Integer Range within each test case, as I have separate testcases for the different boundaries. One benefit of having separate named testcases rather than lumping them all in a single testAdd() method, is that I can write Just Enought code to make each test pass. However, the biggest benefit comes later when I or someone else modifies the code and one or two Named testcase fail rather than a single test case. Immediately - with having debug! I can see what has broken.
"typing.... run all tests ... bang! ... testAddingNegativeWithPositiveNumbers() failed - expected -10, got -30) "
I know I've broken the negative with Positive code somehow, but I also know I Have Not broken any other conditions (testcases).
if all of those asserts were in one testAdd() method, then any asserts after the one testing -10 + 20 would NOT be run, so I would know if I've broken anything else.
This might seem like a small thing, but when your application has 1700s unit tests, its so much easier to see whats happening quickly with this apporach.
Now each of these test cases my end up being the same apart from the values passed to the Calc object and the expected output.
In that case I'd do one of two things: 1) refactor the tests to use a private helper method private void testWith(Integer num1, Integer num2, Integer expected)..
2) Apply the 'ParameterisedTestcase pattern.
Andrew
Adam Maass - 28 Mar 2006 05:30 GMT >> In unit testing, you want to select several typical inputs, as well as >> boundary and out-of-range inputs. This is sufficient to obtain a general [quoted text clipped - 5 lines] > of every possible scenario beforehand. In such a regime you don't > need unit testing in the first place. Well, no. You still need the unit tests for regression testing purposes. (Make a change; does the code still obey the contract on it as expressed by its test regime? If a unit test fails, it means that the code no longer meets it contract.)
Unit tests are also a really good /development/ aide, if you write the test cases first. Express your preconditions and postconditions, then write the code to make the pre- and post- conditions hold true. The test cases are often easier to write than the code that implements the logic required by them.
> My experience is that you tend to "forget" certain scenarios > when you write the code, and then "forget" the exact same cases > in the test. The result is a test that works fine in normal cases, > but fails to reveal the flaw in the code for the not-so-normal > cases. This is a useless and costly excercise. Random inputs may > cover some of the cases that was forgotten in this process. Which is why no test regime is complete if it relies solely on unit-testing. You want to expend some effort exposing the code to novel inputs -- just to see what happens. My argument is that these novel inputs do not belong in /unit/ testing.
>> My objection to random inputs is that unit-tests must be 100% repeatable >> for every run of the test suite. I don't ever want to see a failure of a [quoted text clipped - 5 lines] > indicates this *sometime* rather than *never*. Of course *always* > is even better, but then we're back to Utopia. See above. No testing regime is complete if it relies solely on unit tests. By all means, run your code through random inputs if you think it will discover failures. But do not make it a main feature of your unit test suite, because a unit test must be 100% repeatable from run to run. (Else how do you know that you've really fixed any failure you've discovered?)
If other kinds of testing show a failure, by all means add that case to your unit test suite [when it makes sense] so that it doesn't happen again.
> BTW: You can acheieve repeatability by specifying the random > seed in the test setup. My personal approach is of course to seed > with a maximum of randomness (using current time millis :-) [Unimpressed.] Yes, you *could* do that. But another important feature of a unit-test suite should be that it is easy to run, not requiring any special setup. In short, it shouldn't require any parameters, and yet still be 100% repeatable from run to run. That means hard-coded inputs.
>> Note too that unit-testing is not black-box testing. Good unit tests >> usually have pretty good knowledge of the underlying algorithm under [quoted text clipped - 7 lines] > too much about the code internals. This is at least my personal approach > when writing unit tests for my own code. My experience in many different organizations is that the QA teams expect code to be unit-tested by the developers before being turned over to QA. Developers writing unit tests means that the unit tests are white-box, of necessity.
Story time! Consider your reaction to a failing test case.
"Gee, that's odd. The tests passed last time..."
"What's different this time?"
"Well, I just modified the file FooBar.java. The failure must have something to do with the change I just made there."
"But the test case that is failing is called 'testBamBazzAdd1'. How could a change to FooBar.java cause that case to fail?"
[Many hours later...]
"There is no possible way that FooBar.java has anything to do with the failing test case."
"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1. I wonder how that happened?"
"Well, let's fix the code to account for the novel input..."
[Make some changes, but do not add a new test case. The change doesn't actually fix the error.]
"Well, that's a relief... the test suite now runs to completion without error."
These are harried, busy developers working on a codebase that has thousands of classes, and they're under the gun to get code out the door... they cut corners here (bad developers!) but I think we can all relate to them.
Random inputs in a unit-test case can:
1. Mislead developers when a failure suddenly appears on novel inputs. If they aren't working on the piece of code that the random inputs test, they have to switch gears to understand what's going on;
2. Mislead developers into believing the code is actually fixed, when in fact it is not, when the failure disappears on the next run of the test suite.
3. Can create an air of suspicion around the unit-test suite. (To make errors go away, just run the suite multiple times until you get a run without errors.)
-- Adam Maass
Jacob - 28 Mar 2006 16:43 GMT > Story time! Consider your reaction to a failing test case. > [quoted text clipped - 23 lines] > "Well, that's a relief... the test suite now runs to completion without > error." Given there is an error in the baseline I'd rather have a team of developers tracing it for hours than having a test suite that tells me that everything is OK.
Adam Maass - 30 Mar 2006 04:12 GMT >> Story time! Consider your reaction to a failing test case. >> [quoted text clipped - 27 lines] > of developers tracing it for hours than having a test suite that > tells me that everything is OK. One has to wonder about the failure in this scenario -- it is a novel input generated by a randomness generator. If the failure were critical to the operation of the system, (one hopes that) it would have been noted, and probably fixed, in other, earlier test cycles. (Perhaps not a unit test... maybe a system test run by a QA.) Since this is a new failure that has not been fixed in earlier cycles, the behavior of the system on these novel inputs must not be that critical. If this is the case, I'd rather have my developers finish the work they were doing on FooBar.java than trace the failure in testBamBazzAdd1. (Of course, in a Utopian world, they would have the time to do both.)
Ultimately, I'd like developers to be able to use a heuristic to determine where to look for errors when a unit-test fails. That heuristic is "The error is almost certainly caused by some delta in the code since the last time you ran the test suite." (Note that controlling the size of the deltas is an issue, which is why we get recommendations to make the test suite easy and fast to run -- so that developers aren't afraid to run the suite very frequently.)
If the unit-test suite also contains some randomly generated inputs, then there are two heuristics that the developers must apply to determine where the failure is:
1. "The error could be caused by a delta in the code since the last time you ran the test suite"; or 2. "The error could be caused by an input value the test suite has generated that we've never seen before."
Deciding which of these cases applies complicates the task of the developer when faced with a failure.
-- Adam Maass
Jacob - 30 Mar 2006 08:09 GMT > 1. "The error could be caused by a delta in the code since the last time you > ran the test suite"; or [quoted text clipped - 3 lines] > Deciding which of these cases applies complicates the task of the developer > when faced with a failure. If I add a test to your test suite that is able to reveal a flaw in your code, you still don't want it because when it fails your developers will be confused about what happened?
I am not sure I get it? You should all be happy you identified an error shouldn't you? The unit test failing should be pretty clear on what went wrong anyway.
Adam Maass - 30 Mar 2006 19:32 GMT >> 1. "The error could be caused by a delta in the code since the last time >> you ran the test suite"; or [quoted text clipped - 9 lines] > confused > about what happened? Let me clarify. I don't want it in the /unit/ test suite if it relies on generation of random inputs, due to this confusion issue. If however, the inputs are hard-coded, then the confusion issue does not apply, and I'd be perfectly happy to have it in the unit test suite.
If there's a level of testing during which we generate random inputs to improve the quality of the code, then that is where it belongs. If there isn't this kind of testing already in the project, perhaps we ought to start. It just doesn't belong in the /unit/ test suite.
> I am not sure I get it? You should all be happy you identified an error > shouldn't > you? The unit test failing should be pretty clear on what went wrong > anyway. Finding and fixing failures is, in general, a good thing, however it happens. But a /unit/ test suite should give developers a really good idea of where any failure originates from, and having to decide whether a failure is due to a delta in the code under test or a novel input just overly complicates a /unit/ test suite. The confusion issue is especially of concern if a failure on one run of the suite simply disappears on the next run because it didn't generate a set of inputs that causes the code to fail. [If I saw a unit test suite with this behavior, I wouldn't have much confidence in the value of passing all the tests -- because the next run could just as easily produce a failure as a pass.]
Note too that there are some failures that are acceptable to tolerate, even in shipping product. (Perhaps: It's an obscure corner case that no-one ever actually encounters in production. It's in some subsystem that hardly anyone uses. Or a variety of other justifications...) The critical cases should be covered by hard-coded inputs. That leaves the non-critical cases -- and if something non-critical fails, then it should be fixed but perhaps there are more important things to do before it gets fixed.
-- Adam Maass
Ed Kirwan - 28 Mar 2006 08:36 GMT > My experience is that you tend to "forget" certain scenarios > when you write the code, and then "forget" the exact same cases > in the test. The result is a test that works fine in normal cases, > but fails to reveal the flaw in the code for the not-so-normal > cases. This is a useless and costly excercise. An observation; not written in stone; a subejective view.
Ignoring TDD, no unit test ever has and no unit test ever will verify a requirement or testify to completeness of behaviour. You seem to think that unit testing is to help find all possible inputs for a given behaviour; I don't think this is true.
Unit tests are regression tests.
When you introduce new feature X in an iteration 5, you write unit tests to show some confidence that the feature works; you're not guaranteeing it works for any subset, or for the entire range, of input possibilities. You could easily have a flaw in the program that gives the correct output for a given input, but for entirely the wrong reason, as would be apparent if you used input+1; but you didn't. The unit tests you write in iteration 5 are, in fact, a cost without a return*.
When you introduce feature Y in iteration 6 is when you see the returns for your iteration 5 unit tests. As when you run these again, and they all pass, then you know that whatever you did in iteration 6 didn't break those parts of iteration 5 that seen to run before. But they still don't guarantee that feature X is fully tested. If you missed a test in iteration 5, then re-running the tests in iteration 6 won't help. And you could still have that bug iteration 5. Unit testing will never uncover it. All they do is show that whatever you did in iteration 6 didn't change much.
|
|