State of SRFI-64

2024-08-08

I recently started collection of various utilities and procedures named guile-wolfsden, in the spirit of guile-lib. And, being the good little programmer I am, I wanted to write some tests. Guile already bundles implementation of SRFI-64, great! I might as well not reinvent the wheel and just use it.

At that time I needed test-error, and the test passed. All was well. But little devil on my shoulder decided that everyone's (mine included) code is garbage and I should verify that the tests do test right.

What was my surprise when I realized this code passes:

(test-error 'foo (throw 'bar))

Ugh. Despite specification claiming that error-type is implementation-defined, I am pretty sure 'foo should not match 'bar. And indeed, in the source code we see:

;; TODO: decide how to specify expected error types for Guile.

Well that is not a good start. Signaling an error to indicate that anything except #t is not implemented would be a reasonable way how to handle this, but just deciding that anything matches everything is bad.

I originally tried to fix the Guile's version. Due to being portable (with cond-expands for many schemes), the code is pretty hard to follow. Just moving the code from testing.scm to srfi-64.scm and expanding all cond-expands manually using the Guile's branch leads to infinite loops due to definition order. It is very hard to work with. So I gave up after few commits and decided it is time to reinvent this particular wheel.

How hard can it be, right?

Problems with the specification

Turns out it can be quite hard. Mainly due to the quality of the specification.

Now, let me say that I appreciate the work everyone puts into the ecosystem. And I am sure I am wrong in at least some of the items below. And I am not saying I would have done better job. These are meant as a constructive criticism based on notes I took while writing the code. While SRFI-64 cannot be changed now, the feedback could be incorporated into a new SRFI, called for example A Revised Scheme API for test suites.

What does ... mean?

In multiple places the specification uses ... however in some places it means 0 or more, and in some places it means 1 or more.

For test-group, test-group-with-cleanup it means 1 or more. For test-apply it means 0 or more. For others, who knows. I have decided to use the syntax-case rules (0 or more) when specification does not say otherwise, but having one syntax pattern mean two different things in one document sucks.

Can test-group be top-level?

Let us assume we have a test file containing just following code.

(test-group "x" #t)

Is that valid according to the SRFI? Here is what it says:

(test-group suite-name decl-or-expr ...)

Equivalent to:

(if (not (test-to-skip% suite-name))
  (dynamic-wind
    (lambda () (test-begin suite-name))
    (lambda () decl-or-expr ...)
    (lambda () (test-end suite-name))))

This is usually equivalent to executing the decl-or-exprs within the named test group. However, the entire group is skipped if it matched an active test-skip (see later). Also, the test-end is executed in case of an exception.

Of course the test-to-skip% is not listed anywhere else. I think it is allowed (Guile disagrees) due to fact that if there is no test runner, there is no active skip list, hence the group cannot be on it and should run.

But explicit statement would be nice.

test-on-final-simple

This procedure is listed as default value of test-runner-on-final in test-runner-simple, however it is missing from this section:

The default test-runner returned by test-runner-simple uses the following call-back functions:

(test-on-test-begin-simple runner)
(test-on-test-end-simple runner)
(test-on-group-begin-simple runner suite-name count)
(test-on-group-end-simple runner)
(test-on-bad-count-simple runner actual-count expected-count)
(test-on-bad-end-name-simple runner begin-name end-name)

You can call those if you want to write your own test-runner.

So the portable way to get it is not

test-on-final-simple

but

(test-runner-on-final (test-runner-simple))

Very succinct. And due to this sentence:

You can call those if you want to write your own test-runner.

I am not sure if I am even allowed to use it.

I think it is just missing from the enumeration, but cannot really be added now since compliant implementation do not need to provide it currently.

test-passed?

This procedure checks whether the test was 'pass or 'xpass. But 'xpass is kind of failure, since the test was expected to fail. I am pretty sure it should check 'pass and 'xfail.

test-runner-on-bad-end-name

Spot the problem:

Called from test-end (before the on-group-end-function is called) if a suite-name was specified, and it did not that the name in the matching test-begin.

This happens. This post likely has similar issues.

Can simple test runner signal errors?

The only description of simple test runner is:

Creates a new simple test-runner, that prints errors and a summary on the standard output port.

Is it allowed to signal errors? It does not sound like that to me. However the reference implementation does signal error in test-on-bad-end-name-simple. I think that is wrong.

I think I illustrated my points (and am getting bored), so from here on it will get much more bullet-pointed.

When is on-group-end-function called?

This procedure is called by test-end, but before or after the group is popped form the test-runner-group-stack? It is unspecified, so portable test runner cannot assume either way and need to carry their own group stack. That sucks.

Can test-runner-group-stack compute the result?

Emphasis mine:

(test-runner-group-stack runner)

A list of names of groups we're nested in, with the outermost group last. (This is more efficient than test-runner-group-path, since it doesn't require any copying.)

This mandates that test-runner-group-stack needs to not compute the result. That is too limiting on the implementation and requires duplicating the information. In my version I have internal test-runner-groups containing SRFI-9 records of groups, and test-runner-group-stack containing just their names.

What I would like to do is this:

(define (test-runner-group-stack runner)
  (map group-name (test-runner-groups runner)))

But that would run afoul of the requirements.

Value of test-runner-test-name outside of tests

It is not specified what the value is supposed to be when on inside the test-begin or test-.... The ``obvious'' choice is to leave the last value in there, but since it is not mandated, you cannot rely on it.

No way to log skipped test groups

There is not much to say here. The specification allows skipping whole test-group, but does not provide any way for runner to log that event.

Should test hooks be called for skipped tests?

The on-test-begin-function is called at the start of an individual testcase, before the test expression (and expected value) are evaluated.

So if the test will be skipped, and therefore the test expression will *not* be evaluated, do I still need to call on-test-begin-function? I decided to call it, but both readings seem plausible.

What is ``a following non-nested test-end''?

The concept of nesting is not described anywhere, yet the specification uses these words:

Any skip specifiers introduced by a test-skip are removed by a following non-nested test-end.

I think it tries to say that any skip specifiers introduced during a test group are removed when it ends, but why not just say that?

(test-begin "x")
(test-skip 1)
(test-begin "y")
(test-end)       ; Is this nested relative to the test-skip?
(test-end)

The test-end is on the same level, dropping the skip specifier could be considered justified. After all, we have test-group which can introduce actual nesting.

(test-begin "x")
(test-skip 1)
(test-group "y"
  (test-begin)
  (test-end))    ; This is definitely "nested test-end".
(test-end)

Should test-apply always check skip and fail lists?

This is not fully specified. I believe that implementation checking skip and fail lists only for tests matching the run list would comply to the wording in the specification. However due to having stateful specifiers, it would lead to pure chaos regarding which tests are skipped. So I always check all three lists. Except for test-group, where run list should not be consulted (I think the Guile does not handle this case correctly).

Preliminary result when on both skip and fail lists

If test is matched by both skip list and fail list, should on-test-begin-function see 'xfail or 'skip? The specification is vague at best, but seems to be in the direction of 'xfail. Guile does 'skip.

In what order should the fail and skip lists be checked?

The specification explicitly contains concept of stateful specifiers. However what happens what do you put one specifier into two lists? What will happen in the following code?

(let ((s (test-match-nth 1)))
  (test-expect-fail s)
  (test-skip s)
  (test-assert #t))

The behavior seems to be unspecified, but if that is the case, it would be nice to say so.

Should fail list be cleared on end of the group?

The skip list is cleared on the end of a group that introduced the specifier (ignoring the above mentioned issue about meaning of ``non-nested''). Fail list have no such provision. The description of test-expect-fail does contain

[..] where matching is defined as in test-skip [..]

Which is confusing at best due to other implications. I think this is probably just a bug in the specification, however I believe based on current wording compliant implementation can forgo the removal of expected fails on test-end (and my does).

Only test-approximate mandates evaluating the arguments just once

test-assert and friends (test-error including) do not mandate their arguments (expression, expected, test-expr, error-type) to be evaluated just once. Since this requirement is explicitly specified for test-name (except in test-error, because of course it is inconsistent), it seems that implementation evaluating them multiple times would be compliant.

This sucks. Makes writing portable tests hard in some cases.

Examples for test-skip are missing the test bodies

(test-skip "test-b")
(test-assert "test-a")   ;; executed
(test-assert "test-b")   ;; skipped

Yeah, no. "test-a" and "test-b" are expressions from test-assert's definition, not test-names.

``executed'' versus ``executed or skipped''

test-begin talks about count of executed tests, but on-bad-count-function talks about count of executed or skipped. Which is right?

test-runner-xfail-count

Returns the number of tests that failed, and were expected to pass.

This is another point where my implementation is not fully compliant. The specification is very clear regarding what it mandates, but in this case I just could not bring myself to implement it as written.

I am pretty sure the sentence should end with ``and were expected to fail.'' This has to be a copy&paste error right?

test-group-with-cleanup does not mandate a test group

If implementation decided to not enter a new test group when test-group-with-cleanup is invoked, it would be fully compliant. While it does take suite-name argument, it does not say anywhere how or if it is used. So it does not have to be.

Same goes for skip list. While test-group can be skipped, for test-group-with-cleanup that is not required.

Natural implementation is to wrap test-group with dynamic-wind, but strictly speaking that is probably not allowed.

Yet again I digressed from standard here. While the standard is fairly clear here, the prescribed behavior is just flat out confusing.

Does test-skip take integers?

Section dedicated to specifiers describe two kind of convenience short-hands (name and count). However test-skip speaks just about the string variant. Since it seems to be special cased, does that mean that integer is not required here? Otherwise why would the string be singled out like this?

Problems with Guile's (== reference) implementation

There are few things I find scary.

How many low-hanging bugs there are. Most of the bug reports are not result of careful code review, but of reading the specification and writing tests based on it (I have got 334 of them, albeit that includes tests for non-portable specifics of my implementation).

How long the bugs are for. It seems that Guile pretty much just took the reference implementation and wrapped it in a module. That would mean majority of the bugs are nearly 20 years old (SRFI-64 was finalized 2006-06-18). The old quote comes to mind.

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

Bjarne Stroustrup

I know that people do use Scheme. So why no one complains?

Here is list of the bugs I reported. There are more things wrong, but I have a feeling no one will be fixing even these, so I cannot justify the effort to report the rest.

Conclusion

Not sure how to wrap this up. I spent way too much time on this, and I still do not have a fully compliant implementation, because it would just suck.

I wonder how the specification was written at the time. I cannot shake the feeling that reference implementation was first, and spec written only after the fact. Would explain the omissions.

Maybe it should be required to have a different person write an independent implementation during the approval process? I am pretty sure it would catch most of the things I struggled with.