State of SRFI-64
I recently started collection of various utilities and procedures named guile-wolfsden, in the spirit of guile-lib
. And, being
the good little programmer I am, I wanted to write some tests. Guile already
bundles implementation of SRFI-64, great! I might as well not
reinvent the wheel and just use it.
At that time I needed test-error
, and the test passed. All
was well. But little devil on my shoulder decided that everyone's (mine included) code
is garbage and I should verify that the tests do test right.
What was my surprise when I realized this code passes:
(test-error 'foo (throw 'bar))
Ugh. Despite specification claiming that error-type
is
implementation-defined, I am pretty sure 'foo
should not
match 'bar
. And indeed, in the source code we see:
;; TODO: decide how to specify expected error types for Guile.
Well that is not a good start. Signaling an error to indicate that
anything except #t
is not implemented would be a reasonable way how
to handle this, but just deciding that anything matches everything is bad.
I originally tried to fix the Guile's version. Due to being portable
(with cond-expand
s for many schemes), the code is pretty hard to
follow. Just moving the code from testing.scm
to srfi-64.scm
and expanding all cond-expand
s manually
using the Guile's branch leads to infinite loops due to definition order. It is
very hard to work with. So I gave up after few commits and decided it is time
to reinvent this particular wheel.
How hard can it be, right?
Problems with the specification
Turns out it can be quite hard. Mainly due to the quality of the specification.
Now, let me say that I appreciate the work everyone puts into the
ecosystem. And I am sure I am wrong in at least some of the items below. And I
am not saying I would have done better job. These are meant as a constructive
criticism based on notes I took while writing the code. While SRFI-64 cannot be
changed now, the feedback could be incorporated into a new SRFI, called for
example A Revised Scheme API for test suites
.
What does @samp{...} mean?
In multiple places the specification uses ...
however in
some places it means 0 or more, and in some places it means 1 or more.
For test-group
, test-group-with-cleanup
it
means 1 or more. For test-apply
it means 0 or more. For others,
who knows. I have decided to use the syntax-case
rules (0 or more)
when specification does not say otherwise, but having one syntax pattern mean
two different things in one document sucks.
Can test-group be top-level?
Let us assume we have a test file containing just following code.
(test-group "x" #t)
Is that valid according to the SRFI? Here is what it says:
(test-group suite-name decl-or-expr ...)
Equivalent to:
(if (not (test-to-skip% suite-name)) (dynamic-wind (lambda () (test-begin suite-name)) (lambda () decl-or-expr ...) (lambda () (test-end suite-name))))
This is usually equivalent to executing the decl-or-exprs within the named test group. However, the entire group is skipped if it matched an active
test-skip
(see later). Also, thetest-end
is executed in case of an exception.
Of course the test-to-skip%
is not listed anywhere else. I
think it is allowed (Guile disagrees) due to fact that if there is no test
runner, there is no active skip list, hence the group cannot be on it and should
run.
But explicit statement would be nice.
test-on-final-simple
This procedure is listed as default value
of test-runner-on-final
in test-runner-simple
, however
it is missing from this section:
The default test-runner returned by test-runner-simple uses the following call-back functions:
(test-on-test-begin-simple runner) (test-on-test-end-simple runner) (test-on-group-begin-simple runner suite-name count) (test-on-group-end-simple runner) (test-on-bad-count-simple runner actual-count expected-count) (test-on-bad-end-name-simple runner begin-name end-name)
You can call those if you want to write your own test-runner.
So the portable way to get it is not
test-on-final-simple
but
(test-runner-on-final (test-runner-simple))
Very succinct. And due to this sentence:
You can call those if you want to write your own test-runner.
I am not sure if I am even allowed to use it.
I think it is just missing from the enumeration, but cannot really be added now since compliant implementation do not need to provide it currently.
test-passed?
This procedure checks whether the test was 'pass
or 'xpass
. But 'xpass
is kind of failure, since the
test was expected to fail. I am pretty sure it should check 'pass
and 'xfail
.
test-runner-on-bad-end-name
Spot the problem:
Called from test-end (before the on-group-end-function is called) if a suite-name was specified, and it did not that the name in the matching test-begin.
This happens. This post likely has similar issues.
Can simple test runner signal errors?
The only description of simple test runner is:
Creates a new simple test-runner, that prints errors and a summary on the standard output port.
Is it allowed to signal errors? It does not sound like that to me.
However the reference implementation does signal error
in test-on-bad-end-name-simple
. I think that is wrong.
I think I illustrated my points (and am getting bored), so from here on it will get much more bullet-pointed.
When is on-group-end-function
called?
This procedure is called by test-end
, but before or
after the group is popped form
the test-runner-group-stack
? It is unspecified, so portable test
runner cannot assume either way and need to carry their own group stack. That
sucks.
Can test-runner-group-stack
compute the result?
Emphasis mine:
(test-runner-group-stack runner)
A list of names of groups we're nested in, with the outermost group last. (This is more efficient than test-runner-group-path, since it doesn't require any copying.)
This mandates that test-runner-group-stack
needs to not
compute the result. That is too limiting on the implementation and requires
duplicating the information. In my version I have internal
test-runner-groups
containing SRFI-9 records of groups, and
test-runner-group-stack
containing just their names.
What I would like to do is this:
(define (test-runner-group-stack runner)
(map group-name (test-runner-groups runner)))
But that would run afoul of the requirements.
Value of test-runner-test-name
outside of tests
It is not specified what the value is supposed to be when on inside the
test-begin
or test-...
. The "obvious" choice is to
leave the last value in there, but since it is not mandated, you cannot rely on
it.
No way to log skipped test groups
There is not much to say here. The specification allows skipping whole
test-group
, but does not provide any way for runner to log that
event.
Should test hooks be called for skipped tests?
The
on-test-begin-function
is called at the start of an individual testcase, before the test expression (and expected value) are evaluated.
So if the test will be skipped, and therefore the test expression will
not be evaluated, do I still need to
call on-test-begin-function
? I decided to call it, but both
readings seem plausible.
What is "a following non-nested test-end"?
The concept of nesting is not described anywhere, yet the specification uses these words:
Any skip specifiers introduced by a
test-skip
are removed by a following non-nestedtest-end
.
I think it tries to say that any skip specifiers introduced during a test group are removed when it ends, but why not just say that?
(test-begin "x")
(test-skip 1)
(test-begin "y")
(test-end) ; Is this nested relative to the test-skip?
(test-end)
The test-end
is on the same level, dropping the skip
specifier could be considered justified. After all, we
have test-group
which can introduce actual nesting.
(test-begin "x")
(test-skip 1)
(test-group "y"
(test-begin)
(test-end)) ; This is definitely "nested test-end".
(test-end)
Should test-apply
always check skip and fail lists?
This is not fully specified. I believe that implementation checking skip
and fail lists only for tests matching the run list would comply to the wording
in the specification. However due to having stateful specifiers, it would lead
to pure chaos regarding which tests are skipped. So I always check all three
lists. Except for test-group
, where run list should not
be consulted (I think the Guile does not handle this case correctly).
Preliminary result when on both skip and fail lists
If test is matched by both skip list and fail list, should
on-test-begin-function
see 'xfail
or 'skip
? The specification is vague at best, but seems to be in
the direction of
'xfail
. Guile does 'skip
.
In what order should the fail and skip lists be checked?
The specification explicitly contains concept of stateful specifiers. However what happens what do you put one specifier into two lists? What will happen in the following code?
(let ((s (test-match-nth 1)))
(test-expect-fail s)
(test-skip s)
(test-assert #t))
The behavior seems to be unspecified, but if that is the case, it would be nice to say so.
Should fail list be cleared on end of the group?
The skip list is cleared on the end of a group that introduced the
specifier (ignoring the above mentioned issue about meaning of "non-nested").
Fail list have no such provision. The description
of test-expect-fail
does contain
[..] where matching is defined as in test-skip [..]
Which is confusing at best due to other implications. I think this is
probably just a bug in the specification, however I believe based on current
wording compliant implementation can forgo the removal of expected fails
on test-end
(and my does).
Only test-approximate
mandates evaluating the arguments just once
test-assert
and friends (test-error
including) do not
mandate their arguments (EXPRESSION
, EXPECTED
,
TEST-EXPR
, ERROR-TYPE
) to be evaluated just once.
Since this requirement is explicitly specified
for TEST-NAME
(except in test-error
, because of course
it is inconsistent), it seems that implementation evaluating them multiple times
would be compliant.
This sucks. Makes writing portable tests hard in some cases.
Examples for test-skip are missing the test bodies
(test-skip "test-b") (test-assert "test-a") ;; executed (test-assert "test-b") ;; skipped
Yeah, no. "test-a"
and "test-b"
are EXPRESSION
s from test-assert
's definition,
not TEST-NAME
s.
"executed" versus "executed or skipped"
test-begin
talks about count of executed
tests, but
on-bad-count-function
talks about count of executed or skipped
.
Which is right?
test-runner-xfail-count
Returns the number of tests that failed, and were expected to pass.
This is another point where my implementation is not fully compliant. The specification is very clear regarding what it mandates, but in this case I just could not bring myself to implement it as written.
I am pretty sure the sentence should end with "and were expected to fail." This has to be a copy&paste error right?
test-group-with-cleanup
does not mandate a test group
If implementation decided to not enter a new test group when
test-group-with-cleanup
is invoked, it would be fully compliant.
While it does take SUITE-NAME
argument, it does not say anywhere
how or if it is used. So it does not have to be.
Same goes for skip list. While test-group
can be skipped, for
test-group-with-cleanup
that is not required.
Natural implementation is to wrap test-group
with
dynamic-wind
, but strictly speaking that is probably not allowed.
Yet again I digressed from standard here. While the standard is fairly clear here, the prescribed behavior is just flat out confusing.
Does test-skip
take integers?
Section dedicated to specifiers describe two kind of convenience
short-hands (name and count). However test-skip
speaks just about
the string variant. Since it seems to be special cased, does that mean that
integer is not required here? Otherwise why would the string be singled out
like this?
Problems with Guile's (== reference) implementation
There are few things I find scary.
How many low-hanging bugs there are. Most of the bug reports are not result of careful code review, but of reading the specification and writing tests based on it (I have got 334 of them, albeit that includes tests for non-portable specifics of my implementation).
How long the bugs are for. It seems that Guile pretty much just took the reference implementation and wrapped it in a module. That would mean majority of the bugs are nearly 20 years old (SRFI-64 was finalized 2006-06-18). The old quote comes to mind.
There are only two kinds of languages: the ones people complain about and the ones nobody uses.
Bjarne Stroustrup
I know that people do use Scheme. So why no one complains?
Here is list of the bugs I reported. There are more things wrong, but I have a feeling no one will be fixing even these, so I cannot justify the effort to report the rest.
- test-end should not clear fail list
- test-runner-reset clobbers the run list
- test-with-runner requires some decl-or-expr
- test-result-remove fails to remove property
- test-end does not uninstall runner if on-final was modified
- test-approximate does not handle exceptions
- top-level test-group does not work
- test-eq evaluates test-name multiple times
- test-equal evaluates test-name multiple times
- test-eqv evaluates test-name multiple times
- test-apply does not accept convenience specifiers
- test-assert evaluates test-name multiple times
- test-approximate evaluates test-name multiple times
- test marked for skip and as expected failure has wrong result-kind in on-test-begin-function
- test-apply requires at least one specifier
- test-end fails to signal an error with null runner
- test-begin does not set test-runner-test-name
- nested group not counted as one test
- on-bad-end-name-function has swapped arguments
- test-on-bad-end-name-simple is not allowed to raise an exception
- test-error silently treats anything as #t
Conclusion
Not sure how to wrap this up. I spent way too much time on this, and I still do not have a fully compliant implementation, because it would just suck.
I wonder how the specification was written at the time. I cannot shake the feeling that reference implementation was first, and spec written only after the fact. Would explain the omissions.
Maybe it should be required to have a different person write an independent implementation during the approval process? I am pretty sure it would catch most of the things I struggled with.