Functional Tests As A Tree Of Continuations

By Evan Miller

June 15, 2010

One of the most essential practices for maintaining the long-term quality of computer code is to write automated tests that ensure the program continues to act as expected, even when other people (including your future self) muck with it.

Test code is often longer than the code that is being tested. A former colleauge estimates that the right ratio is around 3 lines of functional test code for every line of "real" code. Writing test code is often mindless and repetitive, because all you do is trace out all the steps a user could take and write down all the things that should be true about each one.

A great deal of test code simply exists to set up the tests. Sometimes you can abstract out the logic into setup() and teardown() functions, and put the fake data needed for tests into things called fixtures. Even then, you'll often have a subset of tests that requires additional setup, and test code often becomes littered with miniature, unofficial setup functions.

The problem

It's a wonder to me that most testing environments are structured as lists of tests. Lists of tests are sensible for unit tests, but they're a disaster for functional tests. The list structure is one of the main reasons that functional test suites are so repetitive.

Let's say you want to test a 5-step process, and test the "Continue" and "Cancel" buttons at each step. You can only get to a step by pressing "Continue" in the previous step. With the traditional "list of tests" paradigm, you have to write something like


test_cancel_at_step_one() {
    Result = do_step_one("Cancel");
    assert_something(Result);
}

test_cancel_at_step_two() {
    Result = do_step_one("Continue");
    assert_something(Result);
    Result = do_step_two("Cancel");
    assert_something(Result);
}

test_cancel_at_step_three() {
    do_step_one("Continue"); # tested above
    Result = do_step_two("Continue");
    assert_something(Result);
    Result = do_step_three("Cancel");
    assert_something(Result);
}

test_cancel_at_step_four() {
    do_step_one("Continue"); # tested above
    do_step_two("Continue"); # tested above
    Result = do_step_three("Continue");
    assert_something(Result);
    Result = do_step_four("Cancel");
    assert_something(Result);
}

test_cancel_at_step_five() {
    do_step_one("Continue"); # tested above
    do_step_two("Continue"); # tested above
    do_step_three("Continue"); # tested above
    Result = do_step_four("Continue");
    assert_something(Result);
    Result = do_step_five("Cancel");
    assert_something(Result);
}

As you can start to see, the length of each test is growing linearly in the number of steps we're testing, so the length of the total test suite ends up being O(N2) in the number of steps.

The solution

A more appropriate data structure for functional tests is a testing tree. The tree essentially maps out the possible actions at each step. At each node, there is a set of assertions, and parent nodes pass a copy of state down to each child (representing a possible user action). Child nodes are free to modify and make assertions on the state received from the parent node, and pass a copy of the modified state down to its children. Nodes should not affect the state of parents or siblings.

Let's take a concrete example. In a 5-step process, the tree would look like:

Here, the first "Cancel" and "Continue to Step 2" are like parallel universes. Rather than repeating Step 1 to test each of these, we want to automatically make a copy of the universe at the end of Step 1, then run child tests on each parallel universe. If we can write our tests as a tree in this way, the length of the total test suite will be O(N) in the number of steps, rather than O(N2).

For modern web applications, all the state is stored in a database. Therefore to make a "copy of the universe," we just need a way to make a copy of the database to pass down to the child tests, while preserving older copies that tests further up the tree can copy and use.

The solution is to implement a stack of databases. As we walk down the testing tree, we push a copy of the current database onto the stack, and the child can play with the database at the top of the stack. After we've finished with a set of child nodes and ascend back up the testing tree, we pop the modified databases off the stack, returning to the previous database revisions.

An example

I won't go through the details of writing a testing framework or the database stack, but here's how you'd test a multi-step process with Chicago Boss's test framework. This is a tree implemented as nested callbacks in Erlang. Each "node" is an HTTP request with a list of callbacks that make assertions on the response, and a list of labeled continuation callbacks — these are the child nodes. Each child node receives a fresh database copy that it can thrash to its heart's content. Managing the stack of databases is all done under the hood.

The resulting test code is surprisingly elegant:


start() ->
  boss_test:get_request("/step1", [],
    [ % Three assertions on the response
      fun boss_assert:http_ok/1, 
      fun(Res) -> boss_assert:link_with_text("Continue", Res) end
      fun(Res) -> boss_assert:link_with_text("Cancel", Res) end
    ],
    [ % A list of two labeled continuations; each takes the previous response as the argument

      "Cancel at Step 1", % First continuation
      fun(Response1) -> 
          boss_test:follow_link("Cancel", Response1,
            [ fun boss_assert:http_ok/1 ], []) % One assertion, no continuations
      end,

      "Continue at Step 1", % Second continuation
      fun(Response1) ->
          boss_test:follow_link("Continue", Response1,
            [ fun boss_assert:http_ok/1 ], [ 

              "Cancel at Step 2", % Two more continuations
              fun(Response2) ->
                  boss_test:follow_link("Cancel", Response2,
                    [ fun boss_assert:http_ok/1 ], [])
              end,

              "Continue at Step 2",
              fun(Response2) ->
                  boss_test:follow_link("Continue", Response2,
                    [ fun boss_assert:http_ok/1 ], [ 

                      "Cancel at Step 3",
                      fun(Response3) ->
                          boss_test:follow_link("Cancel", Response3,
                            [ fun boss_assert:http_ok/1 ], [])
                      end,

                      "Continue at Step 3",
                      fun(Response3) ->
                          boss_test:follow_link("Continue", Response3,
                            [ fun boss_assert:http_ok/1 ], [ 

                              "Cancel at Step 4",
                              fun(Response4) ->
                                  boss_test:follow_link("Cancel", Response4,
                                    [ fun boss_assert:http_ok/1 ], [])
                              end,

                              "Continue at Step 4",
                              fun(Response4) ->
                                  boss_test:follow_link("Continue", Response4,
                                    [ fun boss_assert:http_ok/1 ], [])
                              end ]) end ]) end ]) end ]).

If the indentation ever gets out of hand, we can simply put the list of continuations into a new function.

Conclusion

There are several benefits to structuring functional tests as a tree of continuations:

Why haven't I been able to find anyone else using this approach? My guess is that all of the side effects of OO languages encourage a wrecking-ball mentality when it comes to unit tests — destroy all possible state after each test. But for functional tests with many steps, this approach is grossly inefficient — if you want to test every rung on a ladder, it's pointless to climb all the way down to the ground and trudge back up for each test.

To write your own testing framework based on continuation trees, all you need is a stack of databases (or rather, a database that supports rolling back to an arbitrary revision). I don't know what databases support this kind of revisioning functionality, but adding the feature to Chicago Boss's in-memory database took about 25 lines of Erlang code.

Once you start writing functional tests as a tree of assertions and continuations, you really will wonder how you did it any other way. It's just one of those ideas that seems too obvious in hindsight.

Back to Evan Miller's Home Page