Testing and Documentation Examples

Summary

A test document demonstrates aspects of a program’s correctness to the marker. When an assignment question requires test documentation, you must:

devise a small number (4-6) of creative and deep tests, and
explain why you chose these tests, what aspect of the program is being tested, how you demonstrated the program satisfies the test.

In the time allotted for an assignment, you cannot test your program completely. So the goal is to show you thought about a few important aspects of the program and then demonstrated their correctness.

Note, writing a script that tests your program on all possible inputs is not considered test documentation (and is impossible in most cases). Test documentation is supposed to make you think critically about your code and possibly even find mistakes that can be fixed.

Ground rules

Testing, test documentation, and the designs thereof, are academic-integrity work, just like code and its design.
You must test your program, not a provided sample solution. Submitting testing for any program not your own is an academic-integrity violation.
If a test fails, just say so. Testing marks are available for good test cases, even if the program fails to handle them.
Testing must cover behaviour that exists. Do not invent program output, for the sake of having output to analyze.
Limited code modification is allowed during testing. If you have to work around problems, or cannot produce the necessary control flow, make small code adjustments. Document it as part of the testing explanation and keep it easy to apply, e.g., #ifdef TESTING_WORKAROUND_1.
Do not treat a provided sample solution as an authority of correctness. Base your correctness arguments on the assignment.

Two expected layers of testing

Layer 1: User Interface test

In this course, the user interface is the program’s command-line interface (CLI) and input/output data. For example, the program tests the command-line arguments for correctness:

  $ ./pizzamachine asdf
  Usage: ./pizzamachine [ npies (>= 0) | d [ maxtops (> 0) | d [ seed (> 0) | d [infile] ] ]
  $ ./pizzamachine 100 5 d doesnt-exist.txt
  error: could not read from doesnt-exist.txt
  $ ./pizzamachine 100 5 d < one-pie.txt
  pep gp mshr olv chv: made in 3 minutes
  error: input ended before 100 pies read

A general CLI is provided that can be specialized for each assignment. User Interface testing does not change significantly from one assignment to the next, so much of your test design can be reused/improved.

Without good User Interface quality, you risk that when we test your program, we won't be able to drive the parts that matter.

Since you are given a general CLI outline, only 1 test case should be used involving a several small program runs, as above. For example:

Testing the program’s interface checks shell arguments.
If data comes from an input file, test cases should cover a non-existent file, a file that cannot be opened for reading, an empty file, and possibly checking for invalid data (check the assignment specifications). Similar tests apply for output file.
If a program is supposed to treat standard input and an input file in the same way, show an example of both, and use a shell command like cmp or diff to demonstrate the results are the same.
Individual questions may exclude certain testing requirements (like assume input data is valid); if so, do not test that, so read the assignment’s specification carefully.

Layer 2: Algorithm tests

These tests handle assignment-specific definitions of correctness and are the other 4-5 test cases.

Algorithm tests cover correctness or error cases: does the program go to the right places, does it compute the right results, does it detect error cases during a computation, does it handle interesting boundary cases.
For each of these selected tests:
- Run your program in a way that exercises the case. You may subtly change the program to force a particular case to happen for testing, but it must be documented.
- Paste the smallest amount of the input/output into your test document to demonstrate what you are trying to achieve in the test.
- Explain how the input/output shows the program is doing the right thing, usually by analyzing the output (2-3 sentences is often sufficient).
You presentation of a case should state, or argue for, these elements:
- A situation that is important. Often requires discussing the program’s control flow.
- The fact that the program encounters this situation. Often requires output analysis.
- The fact that the program handles it as desired. Sometimes requires output analysis.
Output analysis means either pointing out the existence of specific lines/cells, or describing a pattern that is respected across all lines/cells, within a range that is small enough to verify quickly. Use line numbers to connect quoted output with analyses. See man nl for a way to do it.
Usually, include one small standard/complete run of the program to show basic end-to-end correctness, then focus on special cases.
Your Algorithm testing mark is based on hitting a few, significantly different, issues that matter. It is not for "thinking of every case."

Test Documentation Examples

Download a zip file with all of the sample documentation listed below.

Hello World (SwapCase)
- Good, starter example of writing output analysis.
- Nothing to note about document structure or conventions because the case is so small.
- Program (swapcase.cc)
- Test documentation (test.txt)
Full Sequential (FlexVec)
- Size is comparable to an assignment question.
- Good example of how much testing you need to give.
- Good example of an original presentation, in organizing a test’s formal inputs-outputs to accompany descriptive analysis. However, the User Interface of this question is unlike those you will write this term, being a stateful read-eval-print loop, with no uses of files or command-line arguments. The way the document interleaves inputs and outputs is good for this question, but is not a specific style that you will have reason to emulate.
- Good example of User Interface tests demonstrating robustness.
- Problem description (FlexVec.txt)
- Explanation of acceptable versus excellent solution (README.txt)
- Program
  - acceptable/flexvec.cc
  - excellent/classes.h, excellent/classes.cc, excellent/flexvec.cc
- Data
  - acceptable/valid.in.txt, acceptable/invalid.in.txt
  - excellent/valid.in.txt, excellent/invalid.in.txt
- Test documentation (acceptable/test.txt, excellent/test.txt)
Accessible Concurrent (Telephone)
- A good example of an analysis/output cross-referencing style that you should be able to reuse.
- A small example with multiple threads. Should make sense to students with a CS350/co-op understanding of concurrency, given some checking in the μC++ manual.
- Note that A3--A5 make you deal with deeper issues of concurrent correctness than this example addresses, and A6 makes you apply the pattern of this example (plus some related patterns) at a much larger scale.
- Problem description (Telephone.txt)
- Explanation of acceptable versus excellent solution (README.txt)
- Program
  - acceptable/player.h, acceptable/player.cc, acceptable/driver.cc,
  - excellent/classes.h, excellent/player.cc, excellent/driver.cc
- Test documentation (acceptable/test.txt, excellent/test.txt)

Appendix: Tips for going deeper

You would spend too long on testing if you followed an exhaustive interpretation of this section. Consult this section if you are feeling stuck about how to find good cases.

Consider Data vs Control Flow

Incorrect data occurs when the program’s memory has wrong values.

User Interface testing shows a tolerance for consuming incorrect data. Algorithm testing shows (a step of) your core computation consuming barely correct data and/or not producing incorrect data.
A datum can be as simple as a scalar variable, or as elusive as a set of coroutines reachable by a linked structure.
Test a representative set of values for a datum. For example, if the valid input consists of the letters a to z, there is usually no need to test all values in that range. Similarly, there is no point in testing all invalid digits and punctuation characters, unless they have different behaviours.
Always test border/boundary cases. For example, given a range of data values, test some values below/above, start/end, and within the range. Similarly, given a collection, test with it empty, single-filled, many-filled, full.

Incorrect control flow occurs when the program transfers to the wrong location.

User Interface testing is awkward to explain from this perspective. Algorithm testing shows that your program does the control flow that the assignment description spends the most time discussing.
For example, [setup] if Alice has dropped off a package and Bob must pick it up, then you must consider whether it is possible for Eve to run next while expecting a package, and if so, [control-flow quality] Eve must choose not to pick up the package.
The logical expressions of a conditional or looping statement must be tested to ensure correct control flow.
The different forms of call/return also significantly affect control flow and must be tested.
Communication of information at call/return must be tested.

Try to apply these perspectives together. Identify cases where different call-stack paths interact with a common datum. Maybe “most” of a collection gets consumed one way, during the core of a run, and leftover items need to be dismissed during shutdown. Then the zero-one-more heuristic applies to both sides of the split.

Consider Black- vs. White-box cases

Black-box coverage is an argument that analyzes a specification; white-box analyzes an implementation. Black-box coverage does not reveal many problems introduced by implementation complexity and white-box coverage does not reveal many unimplemented requirements.

Plan to appeal to both sources, and do not make a big deal about distinguishing them.
Try reading your code, picking a construct like an if-statement, and asking what would happen if you commented out lines, such that the then-part always ran. Sometimes that program would do the right thing. When would it do the wrong thing? Which inputs would it get wrong?
You can do the same analysis on the assignment description: pick a sentence and imagine you forgot to implement it. How could you observe that thing as missing?

Find the concurrent cases that aren't user-driven

Here is a suggestion for handling the apparent nondeterminism of concurrent programs, without having to force behaviours by changing your code. It is not applicable in A1--A2.

We call a concurrent program nondeterministic, as a useful simplification of the fact that it takes inputs from the scheduler, whose consequences are impractical to predict. Some testing tools try to put the tester in control of those inputs; note that this starting point can still leave the "impractical to predict" part unsolved. You should not try to take control of those inputs here. Instead, take a guess-and-check approach, combined with output anlysis.

We set you up for success. The assignment questions prescribe a detailed output format that gives visibility into the influence of the thread scheduler. Use it to claim your program enters a test-worthy scenario, by analyzing the program’s output prior to the point in question.
For cases/sections that need to show correctness under various schedules:
- Give the CLI arguments, including RNG seed, as "the input" for a test case. Don't try to present scheduler's decisions as input.
- Because of how the question forces you to use the RNG, your program’s output is usually stable enough (from one execution of your program to the next), for long enough (lines of output in a run), for your program’s behaviour to be practically repeatable, given an RNG seed.
- You must describe more about how the output for that seed is interesting, than you would if you controlled the input detail.
Such a test case should answer:
- What is the important situation being demonstrated? More specifically, now: what step happens, within what context?
- What line of output corresponds with actually doing the step?
- What outputs do you see before this point that show it happened in the interesting context?
- What outcome, after the step being tested, corresponds with, "Nothing bad happened?"
For example:
Doc Section 3: Tests of application shutdown.

In all cases, the correctness criterion is that the program exits after printing a "Q" message, e.g. it does not deadlock.

Case 3.1: The closing bell rings while exactly one customer is on the waiting bench: Mid-size setup B with Seed = 25, output is excerpt #7. Customer 3 rings bell at line 137 while customer 2 is waiting (since line 122) and all other customers have printed done (lines 114, 125, 131).

Case 3.2: The closing bell rings while zero customers …

Cases …
A process that you could follow, to produce scenario descriptions, like the one above, having chosen an important situation:
- Decide what observable criteria could be helpful in showing that your program entered the desired context. In the example, "all other customers have printed done" is above-and-beyond the requirement that at most one of them is waiting, but it helps imply the requirement, and it is easy to observe.
- Go fishing with scenario sizes and RNG seeds until you see this evidence. (Ensure your program then does the right thing.)
- Ensure that your program indeed does this scenario repeatably, given the seed.
- Write it up.

CS 343 - Concurrent and Parallel Programming