Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2006

Tip: Looking for answers? Try searching our database.

unit testing guidelines

Thread view: 
Jacob - 18 Mar 2006 00:00 GMT
I have compiled a set og unit testing
recommendations based on my own experience
on the concept.

Feedback and suggestions for improvements
are appreciated:

  http://geosoft.no/development/unittesting.html

Thanks.
Hendrik Maryns - 18 Mar 2006 18:36 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Jacob uitte de volgende tekst op 03/18/2006 12:03 AM:
> I have compiled a set og unit testing
> recommendations based on my own experience
[quoted text clipped - 4 lines]
>
>   http://geosoft.no/development/unittesting.html

Nice work.

I don't totally agree with point 16: a throws statement means an
exception *might* be thrown, and the circumstances under which this can
happen should be documented.  It is seldom that an exception must be thrown.

You might want to give some explanation about what you assertX methods do.

H.
Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

Daniel T. - 18 Mar 2006 20:41 GMT
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
[quoted text clipped - 15 lines]
> exception *might* be thrown, and the circumstances under which this can
> happen should be documented.  It is seldom that an exception must be thrown.

I agree. Only test what you actually want the client code to rely on.
Now if you want the client code to rely on the method throwing an
exception...

Signature

Magic depends on tradition and belief. It does not welcome observation,
nor does it profit by experiment. On the other hand, science is based
on experience; it is open to correction by observation and experiment.

Jacob - 20 Mar 2006 08:10 GMT
> I don't totally agree with point 16: a throws statement means an
> exception *might* be thrown, and the circumstances under which this can
> happen should be documented.  It is seldom that an exception must be thrown.

I assume the conditions for when an exception is thrown
is deterministic and well documented (though it to a large
extent depend on *documentation* rather than language syntax
which is a problem as documentation is inherently inaccurate).

The simple example is the java List.get(int index) method that
is documented to throw an exception if index < 0. This is the
contract, and this is one of the things I want to test in a
unit test.

Recommendation 16 just indicate how this is done in practice.
Ian Collins - 18 Mar 2006 22:52 GMT
> I have compiled a set og unit testing
> recommendations based on my own experience
[quoted text clipped - 4 lines]
>
>   http://geosoft.no/development/unittesting.html

I'd add point 0 - write the tests first.

8 - names should be more expressive, rather than testSaveAs(), how about
a series of tests, testSaveAsCreatesANewFile(),
testSaveAsSavesCuentdataInNewFile() etc.  Often tests with a broad name
attempt to test too much ad don't express their intent.

Point 0 covers point 11.

13 - take care with random numbers, they can lead to failures that are
hard to reproduce.  I'd use a pseudo-random sequence that is repeatable
with a given seed.

0 and 8 covers 14.

0 covers 17.

0 covers 20.

Signature

Ian Collins.

Jacob - 20 Mar 2006 08:28 GMT
> I'd add point 0 - write the tests first.

Personaly find the XP approach to unit testing a bit too restrictive
and therefore left the issue intentionally open. I really like
more feedback on it though, as though I have practiced unit testing
for years, I never adopted this practice myself.

> 8 - names should be more expressive, rather than testSaveAs(), how about
> a series of tests, testSaveAsCreatesANewFile(),
> testSaveAsSavesCuentdataInNewFile() etc.  Often tests with a broad name
> attempt to test too much ad don't express their intent.

Agree. I think this is basically what's in #8 without being to
verbose.

> Point 0 covers point 11.

I am not sure it does, and I wanted to define the two concepts
"execution coverage" and "test coverage" anyway. There is a blurred
distinction between the two in the literature as far as I have been
able to dig up.

> 13 - take care with random numbers, they can lead to failures that are
> hard to reproduce.  I'd use a pseudo-random sequence that is repeatable
> with a given seed.
>
> 0 and 8 covers 14.

To some degree, but I'd include them even if #0 was there. I don't
see "testing first" as a silver bullet, but more as a different
process aproach.

> 0 covers 17.

Not necesserily. #0 states when to write the tests. #17 states that
the *code* should be written so that the workload of the unit testing
is minimized.

> 0 covers 20.

Yes, assuming everything is tested always. But in that case
it is covered without #0 as well. What I see in the industry today
is a major shift in adding unit testing to legacy code. I added
#20 as a suggestion to start this work at the bottom level.
Ian Collins - 20 Mar 2006 09:58 GMT
>> I'd add point 0 - write the tests first.
>
> Personaly find the XP approach to unit testing a bit too restrictive
> and therefore left the issue intentionally open. I really like
> more feedback on it though, as though I have practiced unit testing
> for years, I never adopted this practice myself.

TDD is more than an approach to unit testing, it is an approach to the
full design-test-code cycle.

>> 8 - names should be more expressive, rather than testSaveAs(), how
>> about a series of tests, testSaveAsCreatesANewFile(),
[quoted text clipped - 10 lines]
> distinction between the two in the literature as far as I have been
> able to dig up.

TDD done well will give you 100% execution coverage for free.  How good
your test coverage is depends on how good you are at thinking up edge
cases to test.

>> 13 - take care with random numbers, they can lead to failures that are
>> hard to reproduce.  I'd use a pseudo-random sequence that is
[quoted text clipped - 5 lines]
> see "testing first" as a silver bullet, but more as a different
> process aproach.

Simple, incremental tests are the essence of good TDD.

>> 0 covers 17.
>
> Not necesserily. #0 states when to write the tests. #17 states that
> the *code* should be written so that the workload of the unit testing
> is minimized.

If you start with the tests,the code will have to be written that way.

>> 0 covers 20.
>
> Yes, assuming everything is tested always. But in that case
> it is covered without #0 as well. What I see in the industry today
> is a major shift in adding unit testing to legacy code. I added
> #20 as a suggestion to start this work at the bottom level.

Very true.

Signature

Ian Collins.

Andrew McDonagh - 20 Mar 2006 20:51 GMT
>>> I'd add point 0 - write the tests first.
>>
[quoted text clipped - 5 lines]
> TDD is more than an approach to unit testing, it is an approach to the
> full design-test-code cycle.

More fundamentally, TDD is  Design Methodology, Not a Testing Methodology.

It just happens to use Unit tests as its means of describing the design,
much like RUP uses UML.

Indeed, some TDD practitioners are starting to call it BDD - as in

http://www.google.co.uk/search?hl=en&q=behaviour+driven+development&btnG=Google+
Search&meta
=

>>> 8 - names should be more expressive, rather than testSaveAs(), how
>>> about a series of tests, testSaveAsCreatesANewFile(),
[quoted text clipped - 12 lines]
>>
> TDD done well will give you 100% execution coverage for free.  

I'd clarify that with 'TDD done *Correctly will give you 100% execution
coverage'

*Correctly  =  Write 1 failing Testcase,
               Write only enough code to make test Pass,
               Refactor to Remove Duplication,
               Repeat

More commonly referred to as Red, Green, Refactor.

> How good your test coverage is depends on how good you are at thinking up edge
> cases to test.

Always starting with the test first, only allows for 100%.

>>> 13 - take care with random numbers, they can lead to failures that
>>> are hard to reproduce.  I'd use a pseudo-random sequence that is
[quoted text clipped - 7 lines]
>>
> Simple, incremental tests are the essence of good TDD.

These kind of tests are unit tests as in the TDD usage - they are stress
 tests that happen to be written in the same framework as the TDD unit
tests.

However, looping over a random set of numbers isn't the best approach to
this style of testing.  If the OP wants to do this style, then using one
of the various Agitating frameworks/products will give a better result.

These tools tend to use byte code manipulation to random change various
values which aren't just numbers, but anything: int, long, float,
Integer, Double, String, Boolean, boolean, introducing Nulls, etc.

See http://www.agitar.com/
Phlip - 20 Mar 2006 21:56 GMT
>>>> I'd add point 0 - write the tests first.
>>>
>>> Personaly find the XP approach to unit testing a bit too restrictive

I find debugging a bit too restrictive. I can't just use Undo to make the
bug go >poof<.

Imagine if you had such a button on your debugger! You would hit it all the
time!

You have such a button; it's just a little more expensive than raw code. The
cost savings - no more debugging - overwhelmingly offsets that cost.

>> TDD is more than an approach to unit testing, it is an approach to the
>> full design-test-code cycle.
[quoted text clipped - 5 lines]
>
> Indeed, some TDD practitioners are starting to call it BDD - as in

http://www.google.co.uk/search?hl=en&q=behaviour+driven+development&btnG=Google+
Search&meta
=

And some call it Test First Programming, because TDD is position to replace
the hideous name "eXtreme Programming".

And it doesn't create "unit tests", which are a different topic entirely.

The failure of a unit test implicates only one unit - such as the Ariane V
engine controller.

The failure of a _Developer_ Test implicates the developer's last edit. Time
to hit Undo.

>> TDD done well will give you 100% execution coverage for free.

That's not exhaustive.

TDD done well will reduce the _odds_ that you need exhaustive unit testing.

Signature

 Phlip
 http://www.greencheese.org/ZeekLand <-- NOT a blog!!!

Ian Collins - 21 Mar 2006 02:50 GMT
>>> I am not sure it does, and I wanted to define the two concepts
>>> "execution coverage" and "test coverage" anyway. There is a blurred
[quoted text clipped - 17 lines]
>
> Always starting with the test first, only allows for 100%.

I was using the OP's definition of "test coverage".  It might just be
me, but I've always had testers or users (normally testers) find some
bizarre use case that wasn't catered for in the original user stories or
unit tests.

Signature

Ian Collins.

Phlip - 21 Mar 2006 02:57 GMT
> I was using the OP's definition of "test coverage".  It might just be me,
> but I've always had testers or users (normally testers) find some bizarre
> use case that wasn't catered for in the original user stories or unit
> tests.

That's why, regardless of your unit testing strategy, you work to lower the
cost of acceptance tests, so anyone can write them, and they come up with
all sorts of things.

Hence all of XP is driven by tests.

Signature

 Phlip
 http://www.greencheese.org/ZeekLand  <-- NOT a blog!!!

Andrew McDonagh - 21 Mar 2006 21:38 GMT
>> Always starting with the test first, only allows for 100%.
>>
> I was using the OP's definition of "test coverage".  It might just be
> me, but I've always had testers or users (normally testers) find some
> bizarre use case that wasn't catered for in the original user stories or
> unit tests.

But we are talking about unit testing here - developers write and run
unit tests.

Users/testers don't unit test - they Acceptance (integration, System) Test.

Everyone runs the acceptance tests.
Timo Stamm - 18 Mar 2006 23:24 GMT
Jacob schrieb:
> I have compiled a set og unit testing
> recommendations based on my own experience
> on the concept.
>
> Feedback and suggestions for improvements
> are appreciated:

| 7. Keep tests close to the class being tested
|
| If the class to test is Foo the test class should be called FooTest
| and kept in the same package (directory) as Foo. The build environment
| must be configured so that the test classes doesn't make its way into
| production code.

It is necessary to have test classes in the same package as the tested
class in order to test package private methods.

But you don't have to put the classes in the same directory. Most IDEs
support several source folders. You can setup two source folders. For
example: "src" for your application source, "test" for your test source.
If you use the same package structure in the test source folder, you can
test package private methods and it is very easy to deploy only
application code.

Timo
Ian Collins - 19 Mar 2006 00:13 GMT
> Jacob schrieb:
>
[quoted text clipped - 14 lines]
> It is necessary to have test classes in the same package as the tested
> class in order to test package private methods.

Another view that tests that require access to private methods are a
design smell.  Often these can be refactored into objects that can be
tested in isolation.

In C++, it's very tempting to make the test class a friend of the class
under test.  I've found that I end up with a better design by resisting
this temptation.

Signature

Ian Collins.

Hendrik Maryns - 19 Mar 2006 00:49 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Ian Collins uitte de volgende tekst op 03/19/2006 12:13 AM:
>> Jacob schrieb:
>>
[quoted text clipped - 18 lines]
> design smell.  Often these can be refactored into objects that can be
> tested in isolation.

I was about to answer the same: shouldn't problems in package private
methods spill through to public methods?  Then why test them separately?
Find an error in a public method and retrace it with you favorite
debugger to the package private method, I'd say (without much
experience, so correct me if I'm wrong).

H.

Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

Bent C Dalager - 19 Mar 2006 01:00 GMT
>I was about to answer the same: shouldn't problems in package private
>methods spill through to public methods?  Then why test them separately?

It makes it more time-consuming to find out where the error is.

> Find an error in a public method and retrace it with you favorite
>debugger to the package private method, I'd say (without much
>experience, so correct me if I'm wrong).

I prefer my unit tests to have obvious failure modes so that I can
basically tell from which test failed, exactly where in my source the
bug is. This means I don't have to muck around with a debugger, I can
just fix it and get on with things.

For this to be the case, however, the methods that I test need to be
reasonably small and not do a whole lot. These are my private helper
methods that I invoke from my more involved algorithm methods. Many
are one or two liners and they generally don't make sense to have
publicly accessible since they're really just internal building blocks
for constructing other more interesting methods.

Cheers
    Bent D
Signature

Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd
                                   powered by emacs

Ian Collins - 19 Mar 2006 02:27 GMT
>>>| 7. Keep tests close to the class being tested
>>>|
[quoted text clipped - 15 lines]
> debugger to the package private method, I'd say (without much
> experience, so correct me if I'm wrong).

As Brent said, you are testing too much with your tests.  A golden rule
is not to rely on indirect tests.

I've recently come to the conclusion (while working with PHP which
doesn't have a handy debugger) that resorting to the debugger is a
strong indicator that your tests aren't fine grained enough. Try working
without one for a while and see your tests improve!

Signature

Ian Collins.

Timo Stamm - 19 Mar 2006 01:23 GMT
Ian Collins schrieb:
>> It is necessary to have test classes in the same package as the tested
>> class in order to test package private methods.
>>
> Another view that tests that require access to private methods are a
> design smell.  Often these can be refactored into objects that can be
> tested in isolation.

Not "private", but "package private".

Package private classes are only visible within the same package (same
directory). They are useful in large APIs where you have a lot of
functionality, but only want to expose a small interface.

Timo
Ian Collins - 19 Mar 2006 02:22 GMT
> Ian Collins schrieb:
>
[quoted text clipped - 10 lines]
> directory). They are useful in large APIs where you have a lot of
> functionality, but only want to expose a small interface.

I see, a concept not shared with C++.

Signature

Ian Collins.

Jacob - 20 Mar 2006 08:45 GMT
> Ian Collins schrieb:
>
[quoted text clipped - 10 lines]
> directory). They are useful in large APIs where you have a lot of
> functionality, but only want to expose a small interface.

I regard this as "private" in this context. An error in the
inner logic between classes of the same package (or *friends*
in C++ syntax) will eventually reveal itself through the public
API.

I want to keep test classes close to the class being tested for
practical reasons rather than technical reasons.

I understand the objection of "testing too large chunks of code"
(Ian C.), but test code adds complexity and workload to your system
afterall, and I really want to keep it to a minimum. That's why I
reduce the public API of classes as much as possible (by heavy
use of package private methods for instance) and insist on testing
public API only.

But I don't clain that this is the only way, and it might well
depend on the nature of the project being tested.
Jacob - 20 Mar 2006 08:36 GMT
> Another view that tests that require access to private methods are a
> design smell.  Often these can be refactored into objects that can be
[quoted text clipped - 3 lines]
> under test.  I've found that I end up with a better design by resisting
> this temptation.

This is my experience as well, and the reason why I added
recommendation #9 "Test public API".

That something is technically feasable (private method testing through
reflection or by other means) doesn't necesserily mean it is a good idea.

You need to draw the line somewhere, and the public API seems quite
natural in this case. This is also more robust agains changes, in that it
will be more stable and require less testing maintainance during code
refactoring.
Timo Stamm - 19 Mar 2006 01:59 GMT
Timo Stamm schrieb:
> | 7. Keep tests close to the class being tested
> |
[quoted text clipped - 12 lines]
> test package private methods and it is very easy to deploy only
> application code.

Oops, I didn't realize that the guidelines aren't java-specific and that
this thread is on c.l.java.p as well as c.l.c++.

My objection is specific to java. I doubt that the same applies to c++.
Adam Maass - 20 Mar 2006 07:42 GMT
>I have compiled a set og unit testing
> recommendations based on my own experience
[quoted text clipped - 6 lines]
>
> Thanks.

I strongly object to number 13. Unit-tests, especially in an automated
framework, should be repeatable. (When a test fails, you need to know on
what inputs it failed. Once you fix the failure, you should hard-code the
inputs it failed on so that subsequent changes do not cause a regression of
the error.)

I don't necessarily object to looping over large numbers of inputs and
testing each one for expected outputs. But a unit test should contain no
randomness at all. (Or at least should have a way of specifying the seed for
the randomness generator(s).)

-- Adam Maass
Jacob - 20 Mar 2006 08:54 GMT
> I strongly object to number 13. Unit-tests, especially in an automated
> framework, should be repeatable. (When a test fails, you need to know on
> what inputs it failed. Once you fix the failure, you should hard-code the
> inputs it failed on so that subsequent changes do not cause a regression of
> the error.)

I understand your objection, but this is actually one of the
mechanisms that have helped me found some of the hardest to
trace and most subtle errors in the code. It has proven to be
extremely helpful. Also, it gives me lots of confidence
knowing that my test suite of several thousand tests
are executed every hour with different input each time.
It is like adding another dimension to unit testing.

But as tests must be reproducable I agree, I added #15 to ensure
that when a test fails, the test report will include the input
parameters if failed with exactly. Then you can add a test with
this explicit input and debug it from there.
Adam Maass - 23 Mar 2006 04:29 GMT
>> I strongly object to number 13. Unit-tests, especially in an automated
>> framework, should be repeatable. (When a test fails, you need to know on
[quoted text clipped - 9 lines]
> are executed every hour with different input each time.
> It is like adding another dimension to unit testing.

There is sometimes value in testing on large numbers of random inputs. But
this isn't *unit* testing; it's more akin to a system or stress test. It's
something you hope your QAs will do for you; test on inputs that you weren't
necessarily expecting and see what breaks. Unit testing is about the
correctness of code for known inputs. If you come across a failure for a
novel set of inputs in system or stress testing, by all means, take that
input and add it to your unit test suite.

Note that test frameworks can be used both for unit tests as well as other
kinds of tests. (Simply because it's called 'JUnit', for example, does not
necessarily mean that all the test cases are, in fact, unit tests.)

-- Adam Maass
Jacob - 23 Mar 2006 20:03 GMT
> There is sometimes value in testing on large numbers of random inputs. But
> this isn't *unit* testing; it's more akin to a system or stress test. It's
> something you hope your QAs will do for you; test on inputs that you weren't
> necessarily expecting and see what breaks. Unit testing is about the
> correctness of code for known inputs.

Which definition of unit testing is this? I have searched the
net but hasn't been able to find any backing for this?

If I write a method void setLength(double length), who define
the input "necesserily expected", and why isn't this the entire
double range? I'd claim the latter and to cover as many inputs
as possible I use the random trick.

I don't have a problem with defining this kind of testing
differently, for instance "stress testing", but on the other
hand there isn't really any more "stress" is calling
setLength(1.23076e+307) than setLength(2.0) as long as the
method accepts a double as input?

And why do you care about "known" input as long as the
actual (failing) input can be traced afterwards anyway?

You define this as a unit test:

  for (int i = 0; i < 1000; i++)
     testMyIntMethod(i);

while this is not:

  for (int i = 0; i < 1000; i++)
     testMyIntMethod(getRandomInt())

even if an error on input=42 will produce identical error reports
in both cases. Only the latter will (eventually) reveal the
error for input=-100042.

Also, if I have a setLength() method which cover the "typical"
input cases just fine, but is in general crap (a common scenario),
then a testSetLength() method that verifies that setLength() work
fine for "typical" input isn't worth a lot. What you need is a test
method that test the non-typical inputs. From a black-box perspective
you don't really know what is typical or non-typical, so why not just
throw a random number genrator at it?
Ben Pope - 23 Mar 2006 21:29 GMT
>> There is sometimes value in testing on large numbers of random inputs.
>> But this isn't *unit* testing; it's more akin to a system or stress
[quoted text clipped - 9 lines]
> double range? I'd claim the latter and to cover as many inputs
> as possible I use the random trick.

The programmer specifies the preconditions.  If the preconditions are
not met, there is no reason for it to produce valid results.  Random is
not repeatable, and is not predictable.

> I don't have a problem with defining this kind of testing
> differently, for instance "stress testing", but on the other
> hand there isn't really any more "stress" is calling
> setLength(1.23076e+307) than setLength(2.0) as long as the
> method accepts a double as input?

No, but the point is that when you unit test, you need to make informed
choices about the inputs you choose.  It's usually wise to throw in a
couple of "normal", "everyday" values, but also explicitly check
boundary cases and out of range.

> And why do you care about "known" input as long as the
> actual (failing) input can be traced afterwards anyway?

Repeatability.  It's no use relying on randomness to thoroughly test.
You have to design your test cases.

> You define this as a unit test:
>
>   for (int i = 0; i < 1000; i++)
>      testMyIntMethod(i);

Not really.  What are you testing?  That it doesn't crash?  Presumably
you need to check the output against an array of 1000 pre-computed
values?  Not much fun.

> while this is not:
>
>   for (int i = 0; i < 1000; i++)
>      testMyIntMethod(getRandomInt())

How can you possibly check the output is correct for a random input?

> even if an error on input=42 will produce identical error reports
> in both cases. Only the latter will (eventually) reveal the
> error for input=-100042.

I don't understand, are you checking for crashing?

> Also, if I have a setLength() method which cover the "typical"
> input cases just fine, but is in general crap (a common scenario),
[quoted text clipped - 3 lines]
> you don't really know what is typical or non-typical, so why not just
> throw a random number genrator at it?

You know your preconditions.  You know your postconditions.

If anything can happen on invalid input, then no point in testing.  If
you want a default output, exception or whatever for out-of-range, then
check it with a unit test.  You get to choose your input and you get to
check your output.

Randomness just doesn't cut it, and I don't understand how you can check
the output is correct, without knowing the input.

Ben Pope
Signature

I'm not just a number. To many, I'm known as a string...

Jacob - 25 Mar 2006 16:15 GMT
> Randomness just doesn't cut it, and I don't understand how you can check
> the output is correct, without knowing the input.

You *do* know the input!

Consider testing this method:

  double square(double v)
  {
    return v * v;
  }

Below is a typical unit test that verifies that the
method behaves correctly on typical input:

  double v = 2.0;
  double v2 = square(v); // You know the input: It is 2.0!
  assertEquals(v2, 4.0);

The same test using random input:

  double v = getRandomDouble();
  double v2 = square(v);  // You know the input: It is v!
  assertEquals(v2, v*v);

If the test fails, all the details will be in the error
report.

And this method actually *do* fail for a mjority of all
possible inputs (abs of v exceeding sqrt(maxDouble)).
This will be revealed instantly using the random approach.

For an experienced programmer the limitation of square()
might be obvious so border cases are probably covered
sufficiently in both the code and the test. But for more
complex logic this might not be this apparent and throwing
in random input (in ADDITION to the typical cases and all
obvious border cases) has proven quite helpful, at least
to me.
Tom Leylan - 25 Mar 2006 17:17 GMT
Jacob:

You've chosen a trivial example where your assert can compute the results of
the Square() function you are calling.  That is hardly a typical situation
or there would be no reason for the function to have been created.

double v = getRandomDouble();
double v2 = AccountBalance( v );
assertEquals( v2, ? );

So explain how you get the value to type in ? given you don't know what the
input will be.  Perhaps you would do the computations you read about in the
AccountBalance() method inline to see if those and yours matched?

>> Randomness just doesn't cut it, and I don't understand how you can check
>> the output is correct, without knowing the input.
[quoted text clipped - 35 lines]
> obvious border cases) has proven quite helpful, at least
> to me.
Jacob - 25 Mar 2006 18:55 GMT
> You've chosen a trivial example where your assert can compute the results of
> the Square() function you are calling.  That is hardly a typical situation
[quoted text clipped - 7 lines]
> input will be.  Perhaps you would do the computations you read about in the
> AccountBalance() method inline to see if those and yours matched?

I chose a fairly typical example of a basic unit requiring
unit testing and I proved that by using random input it
easily identified an error that otherwise could slip through.

I never said that using random input was useful in all
cases and perhaps it isn't in your specific example. On the
other hand, how do you know what goes into "?" given you
know the input? There must be some sort of reasoning behind
your result as well.

Below is a different example which might not be as trivial
as my previous. It "proves" that encoding + decoding (according
to some procedure) of any string should give back the original
string:

    String text = getRandomString(0,1000000);  // 0 - 1MB
    String encoded = Encoder.encode(text);
    String decoded = Encoder.decode(encoded);
    assertEquals(text, decoded);
Andrew McDonagh - 25 Mar 2006 19:27 GMT
> Below is a different example which might not be as trivial
> as my previous. It "proves" that encoding + decoding (according
[quoted text clipped - 5 lines]
>     String decoded = Encoder.decode(encoded);
>     assertEquals(text, decoded);

This only proves that the encoding & decoding scheme is the same.  I can
make this test pass like this...

public String encode(String text) {
  return text;
}

public String decode(String encoded) {
  return encoded;
}

Here, the unit test is testing every line of code, yet its a worthless
implementation.

The unit test is not testing the encoding mechanism as it should be
implemented as its using the decode() for the assertion.

In this case, its prudent/better to have separate unit tests for both
the encode & decode, an doing the opposite conversion locally within the
unit test itself as this prevents errors of implementation but more
importantly, forces the implementation to use the correct algorithms.

public void testEncoder() {
 String text = getRandomString(0,1000000);  // 0 - 1MB
 String encoded = Encoder.encode(text);
 String decoded = MD5.decode(encoded);
 assertEquals(text, decoded);
}

public void testDecoder() {
 String text = getRandomString(0,1000000);  // 0 - 1MB
 String encoded = MD5.encode(text);
 String decoded = Encoder.decode(encoded);
 assertEquals(text, decoded);
}

Here we have separated the logic needed for the test from the unit under
test.
Alex Hunsley - 26 Mar 2006 05:26 GMT
>> Below is a different example which might not be as trivial
>> as my previous. It "proves" that encoding + decoding (according
[quoted text clipped - 19 lines]
> Here, the unit test is testing every line of code, yet its a worthless
> implementation.

No, an encoding that doesn't do any encoding is still an encoding. It
just happens to be the 'identity' encoding (in the same way that 1 is
the identity for multiplication and 0 is the identity for addition). If
an encoding and decoding function are presented as a pair of opposite
functions, then you are perfectly justified in testing that one is the
inverse of the other:

  Decode(Encode(x)) = x

Testing that an encoding is *any good* by whatever means you judge
'good' is entirely a different matter to testing the reversibility of an
encode/decode pair (and harder to do, to boot, although testing that the
encoding did something, anything, to the data is trivial).

> The unit test is not testing the encoding mechanism as it should be
> implemented as its using the decode() for the assertion.
[quoted text clipped - 20 lines]
> Here we have separated the logic needed for the test from the unit under
> test.
Jacob - 26 Mar 2006 08:42 GMT
> Here, the unit test is testing every line of code, yet its a worthless
> implementation.

The test is still useful. It can't prove that the code is correct,
but if it fails, it can prove that the code is wrong.

And as stated several times already: The test comes in ADDITION to
the test for typical cases and all obvious boundary cases, which in
this particular case would have been written quite differently.
Andrew McDonagh - 26 Mar 2006 08:38 GMT
>> Here, the unit test is testing every line of code, yet its a worthless
>> implementation.
>
> The test is still useful. It can't prove that the code is correct,
> but if it fails, it can prove that the code is wrong.

But it cant tell you which of the collaborating en/de coding methods is
the cause.
Jacob - 26 Mar 2006 09:14 GMT
>>> Here, the unit test is testing every line of code, yet its a
>>> worthless implementation.
[quoted text clipped - 4 lines]
> But it cant tell you which of the collaborating en/de coding methods is
> the cause.

Given an explicit input for which the operation fails should
give you enough information to be able to track this down yourself.
Alex Hunsley - 26 Mar 2006 13:24 GMT
>>> Here, the unit test is testing every line of code, yet its a
>>> worthless implementation.
[quoted text clipped - 4 lines]
> But it cant tell you which of the collaborating en/de coding methods is
> the cause.

But it is telling you that you have a problem, which is a good start.
You can then invetigate further, or back up with other complementary tests.
Tom Leylan - 25 Mar 2006 20:57 GMT
Forgive me but you are terming it "fairly typical" and it isn't typical of
anything I have seen.  So rather than you or I decide (since we don't agree)
let others weigh in on how typical a function is that an easily-contrived
formula produces the exact same answer.  Show me your assertEquals for
IsPrime() for instance.

The discussion hasn't been (as I read it) that random input is of no value
in all cases.  It was illustrated by others that "unit tests" imply one does
know the answer and that random input means you can rarely know the answer.
To answer your question as to how one might know the value of ? it would be
"computed" by whatever method was required.  A test for IsPrime() would be
fed known prime and non-prime values safely knowing which ones should return
true and which should return false.  A test for AccountBalance() would
similarly have inputs and outputs which have been determined to test the
functionality.

I think Andrews response does a great job of pointing out how relying on the
two functions to test each other is a mistake.

>> You've chosen a trivial example where your assert can compute the results
>> of the Square() function you are calling.  That is hardly a typical
[quoted text clipped - 28 lines]
>     String decoded = Encoder.decode(encoded);
>     assertEquals(text, decoded);
Jacob - 26 Mar 2006 09:03 GMT
> Forgive me but you are terming it "fairly typical" and it isn't typical of
> anything I have seen.  

The most typical methods around are getters and setters which
are even less complex than the square example I used previously:

  String name = getRandomString(0,1000);
  A.setName(name);
  assertEquals(A.getName(), name);

They are not the most interesting ones to test, but they should
still be tested, and using random input increase the test coverage.

> Show me your assertEquals for IsPrime() for instance.

Not the best example I could come up with, but it indicates
the principle:

  for (int i = 0; i < 1000; i++) {
    int v1 = getRandomInt();
    if (isPrime(v1)) {
      for (int j = 0; j < 1000; j++) {
        int v2 = getRandomInt();
        if (isPrime(v2)) {
          assertNotEquals(v2 % v1, 0);
          assertNotEquals(v1 % v2, 0);
        }
      }
    }
  }

Again: It doesn't prove that isPrime() is correct, but it may be able
to prove that it is wrong.
Tom Leylan - 26 Mar 2006 17:15 GMT
Of course one can insert getRandomString() into a test when the actual
string value has no known limits.  Put it into your US State or US Zip
Get/Set... Any field which validates it's entry should fail upon assignment
and your assertion doesn't get a chance to run does it?  Again nobody
claimed that there is anything wrong with random string testing.  It was
pointed out that it shouldn't form the basis of one's unit tests.

It seems to me the purpose of unit testing is to verify that values known to
be good, pass and that values known to be bad, fail.  If a bad value makes
it through it can be added to the test suite.  That is different than
"generating an additional random value."

So you should continue to include random string tests (to your unit tests)
and I (and a few others) will probably recommend against it.  There is no
problem with differing viewpoints.

>> Forgive me but you are terming it "fairly typical" and it isn't typical
>> of anything I have seen.
[quoted text clipped - 29 lines]
> Again: It doesn't prove that isPrime() is correct, but it may be able
> to prove that it is wrong.
Scott.R.Lemke@gmail.com - 28 Mar 2006 16:19 GMT
> > Forgive me but you are terming it "fairly typical" and it isn't typical of
> > anything I have seen.
[quoted text clipped - 8 lines]
> They are not the most interesting ones to test, but they should
> still be tested, and using random input increase the test coverage.

Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion.

>  > Show me your assertEquals for IsPrime() for instance.
>
[quoted text clipped - 16 lines]
> Again: It doesn't prove that isPrime() is correct, but it may be able
> to prove that it is wrong.

It doesn't prove either. You cannot prove that it was wrong based upon
a random input, as the input might be wrong.

I have long stopped using terms like "Unit", "Black box", "System" when
referrring to test, as there are too many definitions out there.
Instead describe tests by purpose and context, and leave names out. So,
for your random test your purpose would be to test a variety of inputs,
and the context would be on a method with unknown results. By doing
that instead of pre-placing a term like "Unit" and all the
prejudice/preconceptions that come with that term, you will better get
your point across as to why you are doing a test.
Hendrik Maryns - 29 Mar 2006 11:17 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Scott.R.Lemke@gmail.com schreef:

>>> Forgive me but you are terming it "fairly typical" and it isn't typical of
>>> anything I have seen.
[quoted text clipped - 11 lines]
> not unique, etc, and your setter silently fixes/fails, then because of
> that your getter fails, and you get a false failure on your assertion.

Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.

H.
Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

Scott.R.Lemke@gmail.com - 29 Mar 2006 15:54 GMT
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
[quoted text clipped - 20 lines]
> Then you should have preconditions or postconditions for you setter
> method which take care of that, and integrate them in the test.

And what if every one of your random choices fails those conditions,
and the test is never run?

The point I was trying to make is that this type of random testing is
actually a form of another type of test, often referred to as monkey
testing, and by dropping the label of "unit" or "monkey", and instead
stating the purpose and context you eliminate this whole argument.
Ben Pope - 27 Mar 2006 11:31 GMT
>> Randomness just doesn't cut it, and I don't understand how you can
>> check the output is correct, without knowing the input.
[quoted text clipped - 23 lines]
> If the test fails, all the details will be in the error
> report.

And how exactly did you come up with v*v as the value to test against?
Did you copy it from the function you're testing?  Do you expect that to
fail?

Did you get somebody else to write the code?  Do you implement all the
code twice, independently and check them against each other?

> And this method actually *do* fail for a mjority of all
> possible inputs (abs of v exceeding sqrt(maxDouble)).
[quoted text clipped - 7 lines]
> obvious border cases) has proven quite helpful, at least
> to me.

I fail to see how you are going to automatically test this complicated
logic.

Ben Pope
Signature

I'm not just a number. To many, I'm known as a string...

Jacob - 27 Mar 2006 21:19 GMT
> And how exactly did you come up with v*v as the value to test against?
> Did you copy it from the function you're testing?  Do you expect that to
> fail?

The unit test reflects the requirements and for a square()
method the requirement is to return the square of its argument: v*v.

That this  happens to be identical to the code implementation is
purly coincidental and a result of picking a (too?) simple example.
The square method may well be implemented by establishing a socket
connection to the math query engine at the MIT, a fancy caching
mechanism or some advanced bit operation.
davidrubin@warpmail.net - 30 Mar 2006 05:45 GMT
> > Randomness just doesn't cut it, and I don't understand how you can check
> > the output is correct, without knowing the input.
[quoted text clipped - 14 lines]
>    double v2 = square(v); // You know the input: It is 2.0!
>    assertEquals(v2, 4.0);

This is fine.

> The same test using random input:
>
>    double v = getRandomDouble();
>    double v2 = square(v);  // You know the input: It is v!
>    assertEquals(v2, v*v);

This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.

> If the test fails, all the details will be in the error
> report.
>
> And this method actually *do* fail for a mjority of all
> possible inputs (abs of v exceeding sqrt(maxDouble)).
> This will be revealed instantly using the random approach.

This may not ever be revealed using random inputs, but in the case of
'square' this is a moot point. The contract of 'square' must stipulate
that the input (v) is invalid unless
'v * v < "max double"'. Since such inputs are invalid by the contract,
there is no point in testing them.

> For an experienced programmer the limitation of square()
> might be obvious so border cases are probably covered
[quoted text clipped - 3 lines]
> obvious border cases) has proven quite helpful, at least
> to me.

This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.

If you can't test your function in this way, it is probably not
factored correctly.
Jacob - 30 Mar 2006 08:02 GMT
> This is completely broken. You can't test an implementation of 'square'
> with an identical implementation. You need a separate representation
> for your expected result. Otherwise, you are not testing anything.

I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.

> This is also wrong. The boundaries of the input is stated in the
> function's contract. It is not something determined by the user's level
[quoted text clipped - 4 lines]
> well-chosen "sanity" values, you don't need to waste time with large
> amounts of random data.

This is all correct given you are able to identify the boundary
cases up front. In some cases you are, but for more complex ones
you easily forget some in the same way you forget to handle these
cases in the original code (that's why there are bugs afterall).

Imagine implementing a tree container. In order to test correct
removal of nodes, some of the boundary cases might be:

  remove root
  remove intermediate node
  remove leaf node
  remove root when this is the only node
  remove root with exactly one leaf
  remove root with exactly one intermediate node
  remove intermediate node with one child
  remove intermediate node with many children
  remove leaf node without siblings
  remove leaf node with siblings
  remove intermediate node with root parent
  remove intermediate node with only leaf nodes
  remove intermediate node with leaf nodes and other intermediate nodes
  remove intermediate node with only other intermediate node children
  remove non-existing node
  remove null
  remove node with unique name
  remove node with non-unique name
  etc.

The above might or might not be boundary cases, that actually depends
on the implementation: A good implementation has few! From experience
you "know" which cases are more likely to contains bugs, even
without knowing the implementation.

I don't say you shouldn't cover the boundary cases explicitly,
of course you should (see #13 in the guidelines).

But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.
davidrubin@warpmail.net - 30 Mar 2006 16:40 GMT
> > This is completely broken. You can't test an implementation of 'square'
> > with an identical implementation. You need a separate representation
[quoted text clipped - 4 lines]
> return the square of the input: v*v. From a black-box perspecitive
> I don't know the implementation of square(). It can be anything.

This is why black-box tests are not entirely sufficient. You must
(especially for unit tests) use some white-box knowledge to test the
boundary conditions of both the contract and the implementation.

[snip - tree stuff]
> But when that is in place I whould have built a tree on random, containing
> a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
[quoted text clipped - 4 lines]
> cases I've forgotten (or that appears during code refactoring) they might
> be trapped by this additional test.

I went to Brian Kernighan's site at Princeton a while back. One of his
assignments was to implement associative arrays similar to those in
awk. Then, he provided a script generator that produces random output
(add, remove, lookup, etc). You are supposed to run this script against
both awk and your own implementation, and compare the results. So, I
think you would probably appreciate this.

Also, John Lakos' new book is due to be published later this year. In
it, he promises to address the issue of component-level testing in
great detail, including a section on random testing, which I think you
will find very interesting.
Adam Maass - 27 Mar 2006 16:41 GMT
>> There is sometimes value in testing on large numbers of random inputs.
>> But this isn't *unit* testing; it's more akin to a system or stress test.
[quoted text clipped - 32 lines]
> in both cases. Only the latter will (eventually) reveal the
> error for input=-100042.

If you don't care about the result for input 100042, then the "random"
version is flawed. In unit testing, you want to select several typical
inputs, as well as boundary and out-of-range inputs. This is sufficient to
obtain a general sense that the code is correct for the general case. It
also requires the test-writer to /think/ about what the boundary conditions
are. There may be several of these, at many points in the domain.

> Also, if I have a setLength() method which cover the "typical"
> input cases just fine, but is in general crap (a common scenario),
[quoted text clipped - 3 lines]
> you don't really know what is typical or non-typical, so why not just
> throw a random number genrator at it?

My objection to random inputs is that unit-tests must be 100% repeatable for
every run of the test suite. I don't ever want to see a failure of a unit
test that doesn't reappear on the next run of the suite unless something
significant -- either the test case or the code under test -- has changed.
Random inputs are likely to skip those inputs that cause failures, even if
every once in a while they do uncover a failure.

Note too that unit-testing is not black-box testing. Good unit tests usually
have pretty good knowledge of the underlying algorithm under test.

-- Adam Maass
Timbo - 27 Mar 2006 16:51 GMT
> My objection to random inputs is that unit-tests must be 100% repeatable for
> every run of the test suite. I don't ever want to see a failure of a unit
> test that doesn't reappear on the next run of the suite unless something
> significant -- either the test case or the code under test -- has changed.
> Random inputs are likely to skip those inputs that cause failures, even if
> every once in a while they do uncover a failure.

Agreed. A potential problem with randomly generated inputs is that
the person fixing the fault has to write a unit test to reproduce
the bug. Some people are lazy and will just fix the bug, run the
random unit tests, see them pass (because the randomly generated
input is not tested the next time), and recommit the new version.

Also, I've never seen anything to indicate that random tests are
any more likely to uncover a fault than properly selected test cases.
Jacob - 27 Mar 2006 21:29 GMT
> Also, I've never seen anything to indicate that random tests are any
> more likely to uncover a fault than properly selected test cases.

"Properly selected" is fine. If you miss some of those (there may
be MANY remember), the random cases *may* catch them.

That's it. You are not supposed to replace any of the good stuff
you are already doing. It's just a simple tool for making the whole
package even better.
Roedy Green - 27 Mar 2006 18:24 GMT
On Mon, 27 Mar 2006 07:41:54 -0800, "Adam Maass"
<adam.nospam.maass@comcast.net> wrote, quoted or indirectly quoted
someone who said :

>In unit testing, you want to select several typical
>inputs, as well as boundary and out-of-range inputs.

a term you will also hear is "corner cases".
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Jacob - 27 Mar 2006 21:09 GMT
> In unit testing, you want to select several typical
> inputs, as well as boundary and out-of-range inputs. This is sufficient to
> obtain a general sense that the code is correct for the general case. It
> also requires the test-writer to /think/ about what the boundary conditions
> are. There may be several of these, at many points in the domain.

You describe an ideal world where the unit test writer thinks
of every possible scenario beforehand. In such a regime you don't
need unit testing in the first place.

My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.

> My objection to random inputs is that unit-tests must be 100% repeatable for
> every run of the test suite. I don't ever want to see a failure of a unit
> test that doesn't reappear on the next run of the suite unless something
> significant -- either the test case or the code under test -- has changed.

If I have a flaw in my code I'd be more happy with a test that
indicates this *sometime* rather than *never*. Of course *always*
is even better, but then we're back to Utopia.

BTW: You can acheieve repeatability by specifying the random
seed in the test setup. My personal approach is of course to seed
with a maximum of randomness (using current time millis :-)

> Note too that unit-testing is not black-box testing. Good unit tests usually
> have pretty good knowledge of the underlying algorithm under test.

Again you add definition to unit testing without further reference. Unit
testing is *in practice* white-box testing since the tests are normally
written by the target code developer, but it is actually beneficial to
treat it as a black-box test: Look at the class from the public API,
consider the requirements, and then try to tear it appart without thinking
too much about the code internals. This is at least my personal approach
when writing unit tests for my own code.
Noah Roberts - 27 Mar 2006 21:15 GMT
> > In unit testing, you want to select several typical
> > inputs, as well as boundary and out-of-range inputs. This is sufficient to
[quoted text clipped - 5 lines]
> of every possible scenario beforehand. In such a regime you don't
> need unit testing in the first place.

Sure you do.  Unit tests can stop a lot of bugs before they happen and
before tracking them down gets difficult.  The ones that remain mean
that you have to track them down as you normally would, write a test
for the condition that causes the bug to replicate, and then fix your
code until all tests pass.

This means that changes you make to the code later in refactoring or
adding features do not reintroduce bugs you have fixed before.  Think
about how many times you have fixed a bug only for it to turn up later
because of changes you or someone else made to the code.

> My experience is that you tend to "forget" certain scenarios
> when you write the code, and then "forget" the exact same cases
> in the test.

It helps to write the test first and to write the test independant of
the code in question.  For instance, my latest batch of additions to
our code base involved adding features that were available in a
different code base...one we are depricating.  My tests simply verify
that the same results result from the same inputs since at this time I
want the answers to be the same.  I chose those values randomly but I
put them in as static values in my tests.

Forgetting is also important as I described above in which bugs
reappear after being fixed ages ago because you or someone else forgot
what caused them and put that problem back when altering the code.

The result is a test that works fine in normal cases,
> but fails to reveal the flaw in the code for the not-so-normal
> cases. This is a useless and costly excercise. Random inputs may
> cover some of the cases that was forgotten in this process.

Random inputs are difficult to regenerate.  It might be beneficial to
initially create some random inputs but always put those as static
values in your test.  This may cover some forgotten conditions yet
remain predictable and traceable.  Remember, unit tests should be
completely automatic.

> > Note too that unit-testing is not black-box testing. Good unit tests usually
> > have pretty good knowledge of the underlying algorithm under test.
[quoted text clipped - 6 lines]
> too much about the code internals. This is at least my personal approach
> when writing unit tests for my own code.

Yes, that is how unit tests should be performed.  The don't test the
code, they test the interface to make sure the code conforms to that
interface and that the interface is what is needed.  They also serve to
document your code base fairly well.
Patricia Shanahan - 27 Mar 2006 21:46 GMT
...
> Random inputs are difficult to regenerate.

Whether or not pseudo-random inputs are difficult to regenerate depends
on the design of the test framework.

I suggest the following requirements:

1. Each pseudo-random test must support both an externally supplied seed
and a system time based seed.

2. The seed is part of the output on any pseudo-random test failure.

Given those properties, I think one can set up a test regime that gets
the benefits of random testing without the costs.

All tests in the regression test suite that is run for each code change
must be effectively non-random. That includes random tests bound to a
fixed seed. This is important, because any failure in this context
should be due to the most recent code change.

Running with system time seeds is an additional test activity. If it
finds an error, the first step towards a fix is to add the failing
test/seed combination to the regression test suite, so that it fails.

Whether the system time seed testing is considered "unit test" is a
matter of how "unit test" is defined.

Patricia
Roedy Green - 28 Mar 2006 02:58 GMT
>Running with system time seeds is an additional test activity. If it
>finds an error, the first step towards a fix is to add the failing
>test/seed combination to the regression test suite, so that it fails.

Good thinking. It would be so frustrating to discover an error you
can't reproduce.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 27 Mar 2006 21:29 GMT
>My experience is that you tend to "forget" certain scenarios
>when you write the code, and then "forget" the exact same cases
>in the test. The result is a test that works fine in normal cases,
>but fails to reveal the flaw in the code for the not-so-normal
>cases. This is a useless and costly excercise. Random inputs may
>cover some of the cases that was forgotten in this process.

the other way to get coverage is to get same some tests written by
people unfamiliar with the inner workings. The will test things that
"don't need" testing.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Andrew McDonagh - 27 Mar 2006 21:44 GMT
>> In unit testing, you want to select several typical inputs, as well as
>> boundary and out-of-range inputs. This is sufficient to obtain a
[quoted text clipped - 12 lines]
> cases. This is a useless and costly excercise. Random inputs may
> cover some of the cases that was forgotten in this process.

There is where TDD comes in.

If we write one test at a time .
  Write Just Enough Code to make the test pass.
    Refactor to improve the current state of the design

We are only writing code for tests we already have. The next test is
only needed if we need to code something or to strengthen the
corner-case tests of the code that we have just made.

This way - there is no forgetting.

To make this achievable, each test case (method) should :
 1) only test one aspect of the code
 2) Have as few asserts as possible (1 being the best)
 3) Be small (like any method) ~ 10(or what ever your favourite number
is) lines of code.
 4) be fast - the faster they run, the more we run them continuosly,
the sooner we find problems.
 5) Do not use/touch: Files, Networks, dbs - these are slow compared to
in memory fake data/objects.

>> My objection to random inputs is that unit-tests must be 100%
>> repeatable for every run of the test suite. I don't ever want to see a
[quoted text clipped - 9 lines]
> seed in the test setup. My personal approach is of course to seed
> with a maximum of randomness (using current time millis :-)

you might want to google 'seeding with time' to see why its not a great
idea....  especially when unit tests are concerned.

>> Note too that unit-testing is not black-box testing. Good unit tests
>> usually have pretty good knowledge of the underlying algorithm under
[quoted text clipped - 7 lines]
> too much about the code internals. This is at least my personal approach
> when writing unit tests for my own code.

white box /black box.... all the same really from a testing PoV...  the
only difference is how tolerable the test case is to the code design
changing. White box..not terribly tolerant. Black box...tolerant.

With TDD, its better to consider the unit tests to be 'Behavior
Specification Tests'.  They are validating that the specified Behavior
exists within the code under test. But each specification test is
specifying a small part of the code under test, as we have multiple
small test cases. Not few large testcases.

For example, we have Calculator class that can Add, Subtract, Multiply &
Divide Integers.

So we'd have the following tests...

testAddingZeros()
testAddingPositiveNumbers()
testAddingNegativeNumbers()
testAddingNegativeWithPositiveNumbers()
testAddingPositiveWithNegativeNumbers();

testDividingByZero()
testDividingPositiveNumberByNegative()
....

I Don't need to have tests for different values within the Integer Range
within each test case, as I have separate testcases for the different
boundaries.  One benefit of having separate named testcases rather than
lumping them all in a single testAdd() method, is that I can write Just
Enought code to make each test pass. However, the biggest benefit comes
later when I or someone else modifies the code and one or two Named
testcase fail rather than a single test case.  Immediately - with having
debug!  I can see what has broken.

"typing....  run all tests ... bang!
 ...
 testAddingNegativeWithPositiveNumbers() failed - expected -10, got -30)
"

I know I've broken the negative with Positive code somehow, but I also
know I Have Not broken any other conditions (testcases).

if all of those asserts were in one testAdd() method, then any asserts
after the one testing -10 + 20 would NOT be run, so I would know if I've
broken anything else.

This might seem like a small thing, but when your application has 1700s
unit tests, its so much easier to see whats happening quickly with this
apporach.

Now each of these test cases my end up being the same apart from the
values passed to the Calc object and the expected output.

In that case I'd do one of two things:
1) refactor the tests to use a private helper method
  private void testWith(Integer num1, Integer num2, Integer expected)..

2) Apply the 'ParameterisedTestcase pattern.

Andrew
Adam Maass - 28 Mar 2006 05:30 GMT
>> In unit testing, you want to select several typical inputs, as well as
>> boundary and out-of-range inputs. This is sufficient to obtain a general
[quoted text clipped - 5 lines]
> of every possible scenario beforehand. In such a regime you don't
> need unit testing in the first place.

Well, no. You still need the unit tests for regression testing purposes.
(Make a change; does the code still obey the contract on it as expressed by
its test regime? If a unit test fails, it means that the code no longer
meets it contract.)

Unit tests are also a really good /development/ aide, if you write the test
cases first. Express your preconditions and postconditions, then write the
code to make the pre- and post- conditions hold true. The test cases are
often easier to write than the code that implements the logic required by
them.

> My experience is that you tend to "forget" certain scenarios
> when you write the code, and then "forget" the exact same cases
> in the test. The result is a test that works fine in normal cases,
> but fails to reveal the flaw in the code for the not-so-normal
> cases. This is a useless and costly excercise. Random inputs may
> cover some of the cases that was forgotten in this process.

Which is why no test regime is complete if it relies solely on unit-testing.
You want to expend some effort exposing the code to novel inputs -- just to
see what happens. My argument is that these novel inputs do not belong in
/unit/ testing.

>> My objection to random inputs is that unit-tests must be 100% repeatable
>> for every run of the test suite. I don't ever want to see a failure of a
[quoted text clipped - 5 lines]
> indicates this *sometime* rather than *never*. Of course *always*
> is even better, but then we're back to Utopia.

See above. No testing regime is complete if it relies solely on unit tests.
By all means, run your code through random inputs if you think it will
discover failures. But do not make it a main feature of your unit test
suite, because a unit test must be 100% repeatable from run to run. (Else
how do you know that you've really fixed any failure you've discovered?)

If other kinds of testing show a failure, by all means add that case to your
unit test suite [when it makes sense] so that it doesn't happen again.

> BTW: You can acheieve repeatability by specifying the random
> seed in the test setup. My personal approach is of course to seed
> with a maximum of randomness (using current time millis :-)

[Unimpressed.] Yes, you *could* do that. But another important feature of a
unit-test suite should be that it is easy to run, not requiring any special
setup. In short, it shouldn't require any parameters, and yet still be 100%
repeatable from run to run. That means hard-coded inputs.

>> Note too that unit-testing is not black-box testing. Good unit tests
>> usually have pretty good knowledge of the underlying algorithm under
[quoted text clipped - 7 lines]
> too much about the code internals. This is at least my personal approach
> when writing unit tests for my own code.

My experience in many different organizations is that the QA teams expect
code to be unit-tested by the developers before being turned over to QA.
Developers writing unit tests means that the unit tests are white-box, of
necessity.

Story time! Consider your reaction to a failing test case.

"Gee, that's odd. The tests passed last time..."

"What's different this time?"

"Well, I just modified the file FooBar.java. The failure must have something
to do with the change I just made there."

"But the test case that is failing is called 'testBamBazzAdd1'. How could a
change to FooBar.java cause that case to fail?"

[Many hours later...]

"There is no possible way that FooBar.java has anything to do with the
failing test case."

"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1.
I wonder how that happened?"

"Well, let's fix the code to account for the novel input..."

[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]

"Well, that's a relief... the test suite now runs to completion without
error."

These are harried, busy developers working on a codebase that has thousands
of classes, and they're under the gun to get code out the door... they cut
corners here (bad developers!) but I think we can all relate to them.

Random inputs in a unit-test case can:

1. Mislead developers when a failure suddenly appears on novel inputs. If
they aren't working on the piece of code that the random inputs test, they
have to switch gears to understand what's going on;

2. Mislead developers into believing the code is actually fixed, when in
fact it is not, when the failure disappears on the next run of the test
suite.

3. Can create an air of suspicion around the unit-test suite. (To make
errors go away, just run the suite multiple times until you get a run
without errors.)

-- Adam Maass
Jacob - 28 Mar 2006 16:43 GMT
> Story time! Consider your reaction to a failing test case.
>
[quoted text clipped - 23 lines]
> "Well, that's a relief... the test suite now runs to completion without
> error."

Given there is an error in the baseline I'd rather have a team
of developers tracing it for hours than having a test suite that
tells me that everything is OK.
Adam Maass - 30 Mar 2006 04:12 GMT
>> Story time! Consider your reaction to a failing test case.
>>
[quoted text clipped - 27 lines]
> of developers tracing it for hours than having a test suite that
> tells me that everything is OK.

One has to wonder about the failure in this scenario -- it is a novel input
generated by a randomness generator. If the failure were critical to the
operation of the system, (one hopes that) it would have been noted, and
probably fixed, in other, earlier test cycles. (Perhaps not a unit test...
maybe a system test run by a QA.) Since this is a new failure that has not
been fixed in earlier cycles, the behavior of the system on these novel
inputs must not be that critical. If this is the case, I'd rather have my
developers finish the work they were doing on FooBar.java than trace the
failure in testBamBazzAdd1. (Of course, in a Utopian world, they would have
the time to do both.)

Ultimately, I'd like developers to be able to use a heuristic to determine
where to look for errors when a unit-test fails. That heuristic is "The
error is almost certainly caused by some delta in the code since the last
time you ran the test suite." (Note that controlling the size of the deltas
is an issue, which is why we get recommendations to make the test suite easy
and fast to run -- so that developers aren't afraid to run the suite very
frequently.)

If the unit-test suite also contains some randomly generated inputs, then
there are two heuristics that the developers must apply to determine where
the failure is:

1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."

Deciding which of these cases applies complicates the task of the developer
when faced with a failure.

-- Adam Maass
Jacob - 30 Mar 2006 08:09 GMT
> 1. "The error could be caused by a delta in the code since the last time you
> ran the test suite"; or
[quoted text clipped - 3 lines]
> Deciding which of these cases applies complicates the task of the developer
> when faced with a failure.

If I add a test to your test suite that is able to reveal a flaw in your code,
you still don't want it because when it fails your developers will be confused
about what happened?

I am not sure I get it? You should all be happy you identified an error shouldn't
you? The unit test failing should be pretty clear on what went wrong anyway.
Adam Maass - 30 Mar 2006 19:32 GMT
>> 1. "The error could be caused by a delta in the code since the last time
>> you ran the test suite"; or
[quoted text clipped - 9 lines]
> confused
> about what happened?

Let me clarify. I don't want it in the /unit/ test suite if it relies on
generation of random inputs, due to this confusion issue. If however, the
inputs are hard-coded, then the confusion issue does not apply, and I'd be
perfectly happy to have it in the unit test suite.

If there's a level of testing during which we generate random inputs to
improve the quality of the code, then that is where it belongs. If there
isn't this kind of testing already in the project, perhaps we ought to
start. It just doesn't belong in the /unit/ test suite.

> I am not sure I get it? You should all be happy you identified an error
> shouldn't
> you? The unit test failing should be pretty clear on what went wrong
> anyway.

Finding and fixing failures is, in general, a good thing, however it
happens. But a /unit/ test suite should give developers a really good idea
of where any failure originates from, and having to decide whether a failure
is due to a delta in the code under test or a novel input just overly
complicates a /unit/ test suite. The confusion issue is especially of
concern if a failure on one run of the suite simply disappears on the next
run because it didn't generate a set of inputs that causes the code to fail.
[If I saw a unit test suite with this behavior, I wouldn't have much
confidence in the value of passing all the tests -- because the next run
could just as easily produce a failure as a pass.]

Note too that there are some failures that are acceptable to tolerate, even
in shipping product. (Perhaps: It's an obscure corner case that no-one ever
actually encounters in production. It's in some subsystem that hardly anyone
uses. Or a variety of other justifications...) The critical cases should be
covered by hard-coded inputs. That leaves the non-critical cases -- and if
something non-critical fails, then it should be fixed but perhaps there are
more important things to do before it gets fixed.

-- Adam Maass
Ed Kirwan - 28 Mar 2006 08:36 GMT
> My experience is that you tend to "forget" certain scenarios
> when you write the code, and then "forget" the exact same cases
> in the test. The result is a test that works fine in normal cases,
> but fails to reveal the flaw in the code for the not-so-normal
> cases. This is a useless and costly excercise.

An observation; not written in stone; a subejective view.

Ignoring TDD, no unit test ever has and no unit test ever will verify a
requirement or testify to completeness of behaviour. You seem to think
that unit testing is to help find all possible inputs for a given
behaviour; I don't think this is true.

Unit tests are regression tests.

When you introduce new feature X in an iteration 5, you write unit tests
to show some confidence that the feature works; you're not guaranteeing
it works for any subset, or for the entire range, of input
possibilities. You could easily have a flaw in the program that gives
the correct output for a given input, but for entirely the wrong reason,
as would be apparent if you used input+1; but you didn't. The unit tests
you write in iteration 5 are, in fact, a cost without a return*.

When you introduce feature Y in iteration 6 is when you see the returns
for your iteration 5 unit tests. As when you run these again, and they
all pass, then you know that whatever you did in iteration 6 didn't
break those parts of iteration 5 that seen to run before. But they still
don't guarantee that feature X is fully tested. If you missed a test in
iteration 5, then re-running the tests in iteration 6 won't help. And
you could still have that bug iteration 5. Unit testing will never
uncover it. All they do is show that whatever you did in iteration 6
didn't change much.