Using GNU Shepherd timers as crons

2025-01-26

With the release 1.0.0 of GNU Shepherd, there now is new timer service available. Since the intention seems to be replacing mcron with timer services, let us take a look at how I went about it in my configuration.

The NEWS describe the new timer functionality in this way:

The timer service provides functionality similar to the venerable at command [..]
[..]
In the coming weeks, we will most likely gradually move service definitions in Guix from mcron to timed services and similarly replace Rottlog and syslogd. This should be an improvement for Guix users and system administrators!
Ludovic Courtès

I had a note in my configuration file for a long time to finally add a cronjob to scrub my BTRFS root file-system, and this week I decided to finally take a stab at it. I tried using mcron, but the package in GNU Guix is buggy (#75819) and mcron itself seems to have some issues (Crash When Running Simultaneous Jobs, Behaviour on laptops), so in the end I have decided to give the Shepherd's timers a try (hence this post).

(╥﹏╥)

Side note: I do not understand why I always keep finding bugs. During this whole endeavor I hit two bugs in mcron (links above) and three in either Shepherd itself or its packaging (#75833, #75836, #75843). I get that software will always have bugs (I write plenty of them myself), but why it is always me finding them? Just once I would like to be able to change my configuration without having to bug report something.

(╥﹏╥)

I gave myself couple of goals. It needs to be easy and terse to add new scheduled tasks. And, if it shall replace cronjobs for me, it needs to be able to send emails with the output from the execution. Turns out neither of these are satisfied out of the box.

While it is simple to add new scheduled task using simple-service, it is fairly verbose (see code below). And there is no built-in support for mailing the output of the job.

(simple-service
 name
 shepherd-root-service-type
 (list (shepherd-service
        (documentation "Do stuff at noon.")
        (provision (list 'nooner))
        (requirement '())
        (start #~(make-timer-constructor
                  (calendar-event #:hours '(12) #:minutes '(0))
                  (command (list "echo" "a" "b" "c"))))
        (stop #~(make-timer-destructor))
        (modules (cons '(shepherd service timer)
                       %default-modules))
        (actions (list (shepherd-action
                        (name 'trigger)
                        (documentation "Immediately do stuff.")
                        (procedure #~trigger-timer)))))))

Rest of this text will present the necessary scaffolding to simplify adding new jobs down to:

(cron-timer 'cron-guix-gc-verify
            "20 0 * * 0"
            (program-file/sh
             "guix-gc-verify"
             guix "/bin/guix gc --verify=contents"))

All the code below uses utility procedures from my channel and my personal library. Both are available under free license, but should not be hard to implement yourself if you want to avoid the dependency. invoke* builds around spawn to provide better interface for running external processes and program-file/sh is a hybrid between program-file and mixed-text-file. with-extension/guile-wolfsden exists only to work around #74532. So let us get building.

First we need to think about how to send and email with the output. I took a quick look at how mcron does it, and with the newly gained inspiration, wrote the following procedure:

(define %sendmail "/run/setuid-programs/sendmail")

(define* (with-mail-out program mail-to mail-from mail-subject
                        #:optional (write-out #t))
  "Return a script to run the @var{program} and email the output.

The script tries to mimick cron semantics, so the email is sent only when there
is some output or (and this is my extension) the @var{program} exits with non-0
exit code.  The email is sent using @command{/run/setuid-programs/sendmail}, it
has to be present and working.

You can specify the addressee using @var{mail-to} and subject using
@var{mail-subject}.  Sender is configured using @var{mail-from}, you can pass
@code{#f} to fill it based on current user and hostname.

In addition to sending the email, the output from @var{program} is also written
to standard output.  You can suppress this by setting @var{write-out} to
@code{#f}.

The whole output is (currently) buffered in memory."
  (program-file
   "with-mail-out"
   (with-extension/guile-wolfsden
    #~(begin
        (use-modules (srfi srfi-71)
                     (rnrs bytevectors)
                     (rnrs io ports)
                     (ice-9 format)
                     (wolfsden sh))

        (let* ((port get-bv (open-bytevector-output-port))
               (code (invoke* #$program
                              #:> port #:2> port
                              #:raise-exception? #f))
               (data (get-bv)))
          (when (not (and (= 0 code)
                          (= 0 (bytevector-length data))))
            (let* ((io (pipe))
                   (in (car io))
                   (out (cdr io))

                   (mail-from (if #$mail-from
                                  #$mail-from
                                  (format #f "~a@~a"
                                          (or (false-if-exception
                                               (passwd:name (getpwuid (getuid))))
                                              (getuid))
                                          (gethostname))))

                   (pid (spawn #$%sendmail '(#$%sendmail "-t") #:input in)))
              (set-port-encoding! out "UTF-8")
              (format out "To: ~a~%From: ~a~%Subject: ~[~*~:;FAIL(~a): ~]~a~%~%"
                      #$mail-to
                      mail-from
                      code code
                      #$mail-subject)
              (put-bytevector out data)
              (close-port out)
              ;; Let us ignore errors.
              (waitpid pid)

              (when #$write-out
                (put-bytevector (current-output-port) data)))))))))

Fairly simple. We just return a program file which executes the passed in PROGRAM and, if it has some output or failed, mails the information to specified address. I believe the "or failed" is not part of normal cron behavior, but I like it. Same with writing out the output to its own standard error. It helps with debugging on the node (since the output is in /var/log/messages), so why not.

Next we need something to generate the shepherd service for us. The shepherd service can be used both in root and user shepherds, so separating it into standalone procedure makes sense. The procedure takes few arguments and returns new instance of shepherd-service which uses the above introduced with-mail-out to run the PROGRAM. The schedule can be specified using a Vixie syntax (and in few other ways).

(define* (make-cron-like-service name schedule program
                                 #:key
                                 documentation
                                 (requirement '())
                                 (mail-to "root")
                                 mail-from
                                 (mail-subject (symbol->string name)))
  "Return a shepherd timer service based on the arguments.

@var{name} is the name of the service (used in @code{provision} field).
@var{program} can be either a string (shell snippet) or a file-like object.

@var{schedule} determines the schedule for the timer.  It can be a string, in
which case it shall be Vixie cron compatible schedule.  It can be a list with a
keyword as a first member, in which case it is passed to @code{calendar-event}
procedure.  And it can also be a list, in which case it is just passed as it is.

@var{documentation} and @var{requirement} are just passed to the
@code{shepherd-service}.  For the various @var{mail-*} arguments, see
documentation for @code{with-mail-out}."
  (let ((event (match schedule
                 ((? string? vixie-spec)
                  #~(cron-string->calendar-event #$vixie-spec))
                 (((? keyword? kw) . rest)
                  #~(calendar-event kw #$@rest))
                 (((? list? lst))
                  #~lst)))
        (program (if (string? program)
                     (program-file/sh (symbol->string name) program)
                     program)))
    (shepherd-service
     (documentation documentation)
     (provision (list name))
     (requirement requirement)
     (start #~(make-timer-constructor
               #$event
               (command (list #$(with-mail-out program
                                               mail-to
                                               mail-from
                                               mail-subject)))))
     (stop #~(make-timer-destructor))
     (modules (cons '(shepherd service timer)
                    %default-modules))
     (actions (list (shepherd-action
                     (name 'trigger)
                     (documentation "Execute right now.")
                     (procedure #~trigger-timer)))))))

Last (and simplest) piece of the puzzle is just a wrapper to make producing simple-service instances easier:

(define (cron-timer name . rest)
  "Return a simple service adding a shepherd timer.

See @code{make-cron-like-service} for the arguments."
  (simple-service
   name
   shepherd-root-service-type
   (list
    (apply make-cron-like-service name rest))))

With these helpers, adding new cron jobs is easy and quick. As a bonus, I will throw in my jobs I added to all my machines to check on health of my BTRFS root partition.

;;; Define these helpers...
(define (script/btrfs-scrub path)
  "Return a script to scrub the @var{path}."
  (program-file/sh
   (string-append "btrfs-scrub-" (string-replace-substring path "/" "-"))
   btrfs-progs "/bin/btrfs scrub start -Bd " path))

(define (script/btrfs-error-counts path)
  "Return a script to check BTRFS error counts at @var{path}."
  (program-file/sh
   (string-append "btrfs-error-counts-" (string-replace-substring path "/" "-"))
   btrfs-progs "/bin/btrfs device stats " path
   " | " sed "/bin/sed '/^$/d'"
   " | " grep "/bin/grep -vE ' 0$'"
   " ||:"))

;;; ... and put these into your services.
(cron-timer 'cron-btrfs-scrub-root
            "0 0 * * 0"
            (script/btrfs-scrub "/"))
(cron-timer 'cron-btrfs-error-counts-root
            ;; minutes: */15 does not work: #75843
            ;; hours: * does not work: #75836
            "0-59/15 0-23 * * *"
            (script/btrfs-error-counts "/"))

One thing I do not like is how the output is presented in the Shepherd's log file. Output is associated with the command name from the store, which, honestly, is pretty useless. It looks like this:

Jan 25 20:19:00 localhost shepherd[1]: Timer 'outerr-1' spawned process 32137.
Jan 25 20:19:00 localhost shepherd[1]: [srhm85lyjpqid63i4xyl7ynmvc4c76xz-with-mail-out] a
Jan 25 20:19:00 localhost shepherd[1]: [srhm85lyjpqid63i4xyl7ynmvc4c76xz-with-mail-out] b
Jan 25 20:19:00 localhost shepherd[1]: Process 32137 of timer 'outerr-1' terminated with status 0 after 0 seconds.

If you have just a single timer, which finishes quickly, it is not that big of a problem, but I think something like the following would be better:

Jan 25 20:19:00 localhost shepherd[1]: Timer 'outerr-1' spawned process 32137.
Jan 25 20:19:00 localhost shepherd[1]: [outerr-1] a
Jan 25 20:19:00 localhost shepherd[1]: [outerr-1] b
Jan 25 20:19:00 localhost shepherd[1]: Process 32137 of timer 'outerr-1' terminated with status 0 after 0 seconds.

Alas, while not great, I can live with this.