Let's write a DSL

In the previous post I described an a small DSL to export graph data to .svg using Datalog queries. Here is a shorter version of the core example; with more self-explanatory names:

(select (g v svg)
  (.&edges ?x ?y (?x :path ?y)
    (.&draw-path :sw 1.75 :so 0.85))
  (.&segments ?x ?y (?x :stipple ?y)
    (prob 0.7 (.&draw-stipple 2.0 1.3)))
  (.&vertices ?x (?x _ _)
    (prob 0.1 (.&draw-circ 8f0 :fill :black))))

The main purpose of the DSL is to easily select either vertices, edges, or connected segments. Then draw each match. All selectors and drawing methods are prefixed with .&. For lack of a better word I have called these prefixed symbols "triggers".

Implementing the full DSL requires several other libraries, which makes it inconvenient as a tutorial. However, you probably recognize that iterating over a dataset and doing something to each match—depending on some condition—is a pretty common programming pattern. So much so that it might seem unnecessary to re-invent it. But I promise there is a pretty good reason, if you just bear with me for a bit.

Generatively distorted mesh rendered in Blender Cycles

What are we Doing?

In this post we will implement a simplified working version of the Common Lisp macro that makes this DSL work. Instead of selecting things from a graph we will select keys in a list of key value pairs based on the prefix of the key.

We already have a template above, but here is the smallest example that outlines the syntax we want to achieve:

(select l (.&itr "DE" (.&print)))

That is: for each item in list, l; where the key starts with "DE", print the matching item.

As we will see, the design of Lisp (in general) and CL (in particular) enables us to do this is in 35 lines of code. You can see the final code in this gist.

Reading Suggestions

This is not anything resembling a comprehensive intro to CL. But I will explain all the steps, so the overall approach should be understandable, even if you don't follow every detail in the code.

To avoid going on multiple tangents I will gloss over some technical details. And instead provide a number of smaller code examples as we go, to illustrate what each part of the code does, and why we need it.

For a much more detailed intro to Common Lisp I recommend Practical Common lisp by Peter Seibel. Large parts of the book are available online. The early early chapters cover syntax, types, semantics, and most importantly macros, in detail.

If you have more than a passing interest in meta programming I can recommend Let over Lambda by Doug Hoyte. Much of which is available online as well.

The code snippets below can be pasted into a REPL, if you want to experiment along the way. In which case I suggest installing SBCL. Which is a pretty solid CL implementation. SBCL can be installed from package mangers on Linux (apt-get install sbcl) and Mac (brew install sbcl). You can also download SBCL here.

3d photogrammetry from IR photos rendered in Blender Cycles, 2023

Symbolic Computing

Let's start with what is an inadequate, but hopefully not entirely incomprehensible intro to the parts of CL we will be using.

Comments in CL are prefixed with ;. In addition I will use ;; for comments that show what some code evaluates to. Eg. if you type it into the REPL.

Assuming you have some programming experience you are already used to using symbols to represent numbers, syntax, functions and variables. So much so that you might not even think about these as symbols at all.

If you type a symbol like xyz into a CL REPL it will be interpreted as a variable. Just as you might expect. And if the variable is not defined, you will get an error:

xyz ; error: the variable XYZ is unbound

If an S-expression starts with a symbol, that symbol is expected to be a function. For example:

(this is an s-expression) ; error: undefined function: THIS

Unlike in many laguages, however, symbols are first class citizens in CL. That is, symbols are distinct from the meaning, or value of the symbol. And as we will see shortly you can manipulate symbols directly. Consider this example:

; it is convention to use uppercase for
; something that has been evaluated
(let ((xyz 73))  xyz) ;; 73
(let ((xyz 73)) 'xyz) ;; XYZ

'xyz is actually a shorter way to write (quote xyz). quote is called a special operator in CL. All you need to know before we move on is that quote will return whatever was passed in without evaluating it.

(quote xyz) ;; XYZ
'xyz        ;; XYZ

This might not immediately seem useful, but we will see that it has important consequences.

Generatively distorted mesh rendered in Blender Cycles, 2023

List Processing

Now we need to look at some lists. This is Lisp after all. You can make lists like this:

(list 1 2 3)          ;; (1 2 3)
(list 1 2 (list 3 4)) ;; (1 2 (3 4))

Using quote we can write the following instead. It is a little more compact, and evaluates to the same lists:

'(1 2 3)     ;; (1 2 3)
'(1 2 (3 4)) ;; (1 2 (3 4))

We also need to access elements or parts of lists. The functions we will be using are car, cdr and subseq. The first two have unusual names. I refer you here for an explanation as to why. But here are examples of what they do:

(car '(1 2 3 4)) ;; 1
(cdr '(1 2 3 4)) ;; (2 3 4)

Notice how car and cdr compliment each other by returning the first element, and the rest of the list, respectively. We will see this pattern in use later on.

In general you can use subseq to get parts of a list:

(subseq '(1 2 3 4) 1)   ;; (2 3 4)
(subseq '(1 2 3 4) 1 3) ;; (2 3)

To get an element at an arbitrary index you can use (nth ind a), but we won't be needing it. We will however need to get substrings from a string. Fortunately subseq has us covered for that as well:

(subseq "abcd" 1 3) ;; "bc"

As mentioned in the introduction we want to iterate lists of key value pairs. For simplicity we will just use lists of lists. And one item is the list with a key and a value:

'((key1 val1)  ;; ((KEY1 VAL1)
  (key2 7))    ;;  (KEY2 7))

For convenience CL also has the backtick syntax: `. The backtick is a more sophisticated version of '. You can consider ` to behave the same as '. Except you can also use , to turn evaluation back on temporarily:

`(a 1 2 ,(+ 3 4) b) ;; (A 1 2 7 B)

Another very convenient feature of is that ,@ inside ` will splice in a list:

`(a 1 2 ,@(list 3 4) b) ;; (A 1 2 3 4 B)

Sometimes it is necessary to nest ', `, , and ,@ in more complicated ways to achieve the result you want. This gets confusing rather quickly. Experimenting in the REPL is an invaluable approach to figure out what is going on in more complex cases. Our use will be pretty straightforward, but we will see that the syntax is rather convenient for writing macros.

3d photogrammetry from IR photos rendered in Blender Cycles, 2023

Evaluating Code

So far we have seen quote used in a few different examples, but what about this:

'(list 1 2 3) ;; (LIST 1 2 3)

What we get back here looks suspiciously like the code we used initially to create our first list. So what if we try to pass this quoted code to CL's eval function?

(eval '(list 1 2 3)) ;; (1 2 3)

The quoted code evaluates to the same list as before!

We won't be using eval directly anywhere. But this ability to make S-expressions that can be interpreted as code is really the core of CL macros. We will see this soon.

Any S-expression can be interpreted as code. But whether the code works in the context where it is evaluated is another matter. Take this for example:

(eval '(fx 1 2 3)) ; error: undefined function FX

As noted before, the first symbol in an S-expression is expected to be a function. We have not defined fx. So we get an error.

Generatively distorted photogrammetry rendered in Blender Cycles, 2022

Some Utilities

CL comes out of the box with everything we need to write a DSL/macro in CL. But first we will define a few extra utility functions to help us out.

Since we will be looking for "triggers" in our code, we need a way to find them. We have already seen that code is represented as lists with symbols in them. If we convert symbols to strings, we can check the prefix of each string for ".&".

Here is a function that turns anything into a string:

; turn anything into a string
(defun mkstr (&rest args)
  (with-output-to-string (s)
    (dolist (a args) (princ a s))))

(mkstr (list 1 2 3))  ;; "(1 2 3)"     
(mkstr '(list 1 2 3)) ;; "(LIST 1 2 3)"
(mkstr 'xyz)          ;; "XYZ"         
(mkstr 'xyz 'abc)     ;; "XYZABC"

And here is a function to check if a string is a prefix of another string:

; returns t if "s" has this "prefix"; otherwise nil
(defun startswith? (s prefix)
  (let ((s (mkstr s)))
    (and (<= (length prefix) (length s))
         (string= prefix s :end2 (length prefix)))))

(startswith? "abc" "ab") ;; T  
(startswith? "abc" "de") ;; NIL

We will also need to convert strings back into symbols:

; turn anything into a symbol
(defun symb (&rest args)
  (values (intern (apply #'mkstr args))))

(symb "XYZ") ;; XYZ
(symb 'xyz ) ;; XYZ

Now we can make the following function that tells us whether a given s-expression is a trigger:

; return t if body is a list, where the
; first symbol starts with "trig"
(defun has-trigger? (body trig)
  (and (listp body)
       (startswith? (car body) trig)))

(has-trigger? '(.&hi 3 2) ".&") ;; T
(has-trigger? '(.&hello)  ".&") ;; T
(has-trigger? '(".&")     ".&") ;; NIL
(has-trigger? 'ohhai      ".&") ;; NIL

If you remember back to the beginning; what we set out to do was really just to iterate a list, then do something with items that match our condition. It seems like we have deviated a little from that? Surely there must already be a way to do this in CL?

Here is one way using the Loop macro that already exists in CL:

; l is a local variable in loop
(loop with l = '(.&abc def ghi)
      for s in l
      if (startswith? (mkstr s) ".&")
      do (print s))

The fact that a better—at least more comprehensive—version of our DSL already exists in CL will not deter us! In fact, this exact loop is the perfect template for the last bit of code we need. In the next section we combine everything.

Generatively distorted mesh rendered in Blender Cycles, 2023

Compiling our DSL

Now that we have all the parts, we need to put it together. The following functions do just that: The functions call each other recursively, so here are both of them:

(defun do-trigger (l body)
  ; body will only ever be on the format:
  ; (.&itr sel ...) or (.&fx ...)
  (let ((fx (subseq (mkstr (car body)) 2)))
    (if (equal "ITR" fx)
      ; if ITR: return the loop we saw earlier
      `(loop for $ in ,l
             if (startswith? (car $) ,(second body))
             do ,@(rec l (subseq body 2)))
      ; else: return (fx $ ...)
      `(,(symb fx) $ ,@(rec l (cdr body))))))

(defun rec (l body)
  (cond
    ; if body is an atom: return body
    ; [symbols, numbers, strings, t, nil are atoms]
    ((atom body) body)
    ; else if body is (.&trigger ...): call do-trigger
    ((has-trigger? body ".&") (do-trigger l body))
    ; else body is a list: recursively call rec
    ;      and splice it back together
    (t `( ,(rec l (car body))
         ,@(rec l (cdr body))))))

I hope the comments make it relatively clear how this works, but it is a good idea to play around with this to really get a feel for what is happening.

All this and we have yet to write our macro. Except, we kind of already have. As you see in the select macro just below; rec performs all the heavy lifting:

(defmacro select (l &body body)
  ; (gensym) creates a new, unique symbol
  (let ((l* (gensym "L")))
    `(let ((,l* ,l))
       ,@(rec l* body))))

And using it:

(select `((urq 77) (abc 3) (abcd 2) (dec 8))
  (.&itr "DE" (print (list "HIT" $)))
  (print '2nd-loop)
  (.&itr "AB" (print (.&list 'hi))
              (print (.&list 'ohhai))))

; which prints:
;   ("HIT" (DEC 8))
;   2ND-LOOP 
;   ((ABC 3) HI)   
;   ((ABC 3) OHHAI)
;   ((ABCD 2) HI)
;   ((ABCD 2) OHHAI)
; as we would hope

If you want to see what a macro expands into, you can check for yourself:

(macroexpand-1 ; expands macros once
  '(select l ((urq 77) (abc 3) (abcd 2) (dec 8))
             (.&itr "DE" (.&print))))

;; (LET ((#:L94 L))
;;   ((URQ 77) (ABC 3) (ABCD 2) (DEC 8))
;;   (LOOP FOR $ IN #:L94
;;         IF (STARTSWITH? (CAR $) "DE")
;;         DO (PRINT $)))

If you are now feeling confused about how this works, you might also be wondering when our macro generates this code? In the next section I will try to give you some examples that can help thinking about this further.

Generatively distorted mesh rendered in Blender Cycles, 2023

Runtime or Compile time?

The defmacro syntax is very similar to defun, which we have used multiple times. This makes it easy to fall for the deception that macros are just weird functions. But they are not functions in the sense many programmers are used to.

Macros behave very different from functions in three important ways. First, the input to a macro (e.g. our select macro) is not evaluated. You might say the arguments are implicitly quoted.

Second, macros are not functions that return results; they are code that generates new code. The generated code is what will be executed in the program's runtime.

And third. We know from testing our select macro in the previous section that the rec function is definitely executed as a part of the macro expansion. That is how our loop code is generated after all.

But the macro expansion does not happen in the same context as a function in the same location in source code is executed. It is sometimes said that the macro is expanded at compile time. Although the distinction between compile time and runtime can get a little blurry.

Consider this macro:

(defmacro dummy (a b)
  : format is a fancy print
  (let ((res `(format t "~&executed: ~a ~a~&" ,a ,b)))
    ; prints something when dummy is expanded:
    (format t "~&expanded: ~a ~a~%  ~a~&" a b res)
    ; dummy returns this code:
    res))

What do you think will be printed if we do this:

(loop for i from 0 below 3 ; counts i=0, 1, 2
      do (format t "~&log: ~a~&" i)
         (dummy 99 i))

This is the printed output (in SBCL):

; expanded: 99 I
;   (FORMAT T ~&executed: ~a ~a~& 99 I)
; log: 0
; executed: 99 0
; log: 1
; executed: 99 1
; log: 2
; executed: 99 2

We can see that even though it appears as if dummy is inside a loop, it is only expanded once. But the code generated by dummy is executed three times.

Moreover, we see that while the line that prints log precedes dummy in the code, the expanded message from the macro expansion appears first.

In addition to all that we also see in the expanded message that when dummy is expanded the counter is just the symbol I.

Debugging macros is notoriously difficult. If you don't understand these examples right away there is no reason to despair. It took me a long time to get a good intuition for this. But I hope these small examples can help a little.

Generatively distorted mesh rendered in Blender Cycles, 2023

Extending a Language

I framed this tutorial around building a DSL. That is the title of this post after all. But a more important motivation for writing this is to show that the way we recursively processed s-expressions as data in rec is no different from how Lisp reads and evaluates any other source code you hand to it.

Saying this absolutely an oversimplification. And I am not skirting the details because I don't think understanding them is important. It is important. But I wanted to give a small taste of what it can be like to work with code that generates code. That way it might seem a little less abstract if you want to learn more in the future.

I hope I managed to make it feel kind of intuitive. Almost as if you are discovering a tool as you are using it. That has been my experience of using CL over several years. And it would be nice to try to replicate some of that feeling in others in a slightly shorter amount of time.

Even if this tutorial seems like an overly complicated way to write what is ostensibly a for loop. Another way to think about it is that we have extended the Lisp language by writing a plugin for Lisp. And the plugin is written in Lisp as well. Once you start thinking about programming and—more importantly—problem solving in this way, it is very hard to go back.

TL;DR, Macros are an extremely powerful way to fundamentally shape the language you are using to solve your problems. Not just the words and tools you already have in it.

EDIT: I expand on this approach in the next post.

3d photogrammetry from IR photos rendered in Blender Cycles, 2023

Learning CL is absolutely worth your time if you are interested in learning a new programming language. However, if you want to spend your time learning something you are more likely to use directly in a professional setting you may want to consider Clojure instead. If I had not already built all my utilities I probably would start using it instead.
I don't know what is the easiest alternative for Windows, sorry.
mkstr and symb are two of the first functions introduced as a series of utilities in On Lisp by Paul Graham. the book and the other utilities can be seen in the link.
In fact loop is a macro (or DSL, if you will) for looping that is already part of CL. Practical Common Lisp has an entire chapter about it.
The reason for using (gensym) is an entire topic in its own right. We have skirted the issue here, but macro hygiene is an important topic for writing good macros. Different Lisps have different approaches to this issue, and it is covered in detail in both Practical Common Lisp, On Lisp and Let over Lambda. I can also mention that the way we unceremoniously use $ as the iterator symbol in do-trigger is vulnerable what's called variable capture; one of the primary issues related to macro hygiene.