In the previous post I introduced my graph data structure (cl-grph), and gave some examples of Datalog queries. Datalog is a Domain Specific Language (DSL) for queries on graphs.
Building a Datalog compiler as an example DSL in a blog post is a little ambitious. Instead we will look at two different approaches for making a DSL for vector mathematics.
It is necessary to explain some technical details specific to Common Lisp (CL) in order to introduce the motivation for making this DSL. I make an effort to give a cursory explanation of these details, so that anyone with some programming experience will be able to follow the core points.
To start us off I want to show you this quote:
[...] a good programmer does not just write programs. A good programmer builds a working vocabulary. A good programmer does language design, though not from scratch, but building on the frame of a base language.
—Guy Steele, Growing a Language, 1998
This is an excellent presentation. And while Steele is talking about language design in more general terms, I still hope you will agree that it is relevant to the topic at hand.
Values and Values
We will make frequent use of the CL function
values
.
values
is a way to group several values together. The most
relevant use in this context is that functions can return a block of
(values ...)
:
;define function named fx
(defun fx (a) (values 1 2 3 a)) (fx 7);; returns: (values 1 2 3 7)
This makes it convenient to use values
to represent point vectors. Which
is part of the motivation behind what follows.
However, a function will only receive one value from each block of values. That is, if we do this:
; define an nary function that returns
; any number of arguments as a list:
(defun my-function (&rest rest) rest) (my-function (values 1 2) (values 3 4));; '(1 3)
my-function
only sees the first value from each value block.
To call my-function
on all
values we have to use
multiple-value-call
:
(multiple-value-call
#'my-function (values 1 2)
(values 3 4)) ;; '(1 2 3 4)
This is awkward. And brings us to the next part of the motivation. Surely we can avoid this.
Macros
The term macro in programming can mean a few different things. Here we are talking about CL macros. A macro in CL looks a lot like defining a function:
(defmacro awesome-defun (name args &body body); recognize this bit just below ⁂ :
`(defun ,(symb "awesome-" name) (,@args) (progn ,@body))); if we call the macro:
(awesome-defun print (p) (print p)); ⁂ it produces this code:
(DEFUN AWESOME-PRINT (P) (PROGN (PRINT P))); by convention, generated code is in UPPERCASE
We are going to ignore the details of the macro
syntax. But notice that the final function definition (⁂
)
is a perfectly valid function definition. That has just been evaluated. In
other words we can now call the function:
(awesome-print :hello-world)
; prints: :hello-world
You undoubtedly would like all your code to be full of awesome functions. Yet it may not be obvious why the ability to extend the language is so meaningful. But if you can write code that can define functions, then you can do just about anything.
The next sections introduce two examples.
Using macros
We start by creating addition (2+
) and multiplication
(2*
) for 2d point vectors. One possible approach is something like
this:
; 2d addition +
(defun internal-2+ (ax ay bx by) (values (+ ax bx) (+ ay by))); a macro that uses multiple-value-call as described above
(defmacro 2+ (&rest rest) `(multiple-value-call #'internal-2+ ,@rest)); 2d multiplication *
(defun internal-2* (ax ay bx by) (values (* ax bx) (* ay by))); multiple-value-call again
(defmacro 2* (&rest rest) `(multiple-value-call #'internal-2* ,@rest))
With these definitions we can do vector mathematics:
; simple
(2+ 1 2 3 4);; (values (+ 1 3) (+ 2 4))
; nested
(2+ (2* 10 20 30 40);; (values (+ (* 10 30) 3)
3 4);; (+ (* 20 40) 4))
Considering how repetitive the above function and macro definitions are, you can probably see what is coming next. Now we create a macro that will define vector operations for a given dimension and function expression.
Here is a naive version of that macro:
(defmacro define-dim-op ((dim name) &body body) (let ((mname (symb dim name));; ex: '2+
(fname (symb :internal- dim name));; ex: 'internal-2+
(args (loop for i from 1 repeat dim;; ex: '(g11 g12 g21 g22):
append (loop for j from 1 repeat dim collect (symb :g i j))))) `(progn (defun ,fname (,@args) ,@body); define function
(defmacro ,mname (&rest rest); define wrapper
`(multiple-value-call ,#',fname ,@rest)))))
It can be used like this:
(define-dim-op (2 +) (values (+ g11 g21) (+ g12 g22)))
This creates the same 2+
operation as before:
(2+ 1 2 3 4) ;; (values (+ 1 3) (+ 2 4))
With some more work this approach can be extended to other types of operations. For example, if you want to sum two arrays of point vectors, or if you want to reduce the dimension of point vectors in an array and so on.
But what if there is a more general approach?
An Alternative Approach
So far we have created code that builds a library of all the relevant functions (and corresponding wrapper macros) that we need. What if we instead identify the pattern of the operations we want. Then design the code that matches that pattern.
With 2+
in mind. We have two vectors of dimension two. We want to
call +
on consecutive pairs of elements in those vectors.
Furthermore we want to express it as an S-expression, where
the first two values belong to the first point vector, and the next two
values belong to the second.
Using 2+
looks like this: (2+ 1 2 3 4)
. If we add a
unique identifier to the function name, it will be possible to recognize it in
code. We will use !@
:
(2!@+ 1 2 3 4); which gives us the pattern:
; ([dim]!@[fx] (values ...) (values ...))
Describing how to do this substitution is outside the scope of this post , but it is not too difficult
to make a macro that traverses arbitrary code, finds function calls
that match [dim]!@[fx]
and generates something like the following
code:
; binds arg/a = 1, arg/b = 2, ...
(MULTIPLE-VALUE-BIND (ARG/A ARG/B ARG/C ARG/D)
(MULTIPLE-VALUE-CALL
#'VALUES 1 2 3 4)
(VALUES (+ ARG/A ARG/C) (+ ARG/B ARG/D))))
Which should look quite familiar by now, even if it is a little more involved than before.
Finally, here is an illustration of how you can extend this directly by using your own function:
; labels defines a local function, inverse-sub
(labels ((inverse-sub (a b) (- b a))) (2!@inverse-sub 1 2 3 4));; (values (- 3 1)
;; (- 4 2))
Macros as Compilers
The trigger symbol approach outlines what is really a kind of compiler for a small DSL. With this framework in place the next step is to extend the DSL (and thus CL itself) with additional syntax and/or new triggers. That way it can handle more types of vector operations.
Hopefully this has brought you a little closer to appreciating that macros are a powerful tool. And that They enable you to alter the language you are using to solve your problem. Not just the vocabulary.
If you want to read more about this in particular you can look at the vv
macro in the cl-veq documentation.
EDIT: In this post I describe in detail how to write a small Common Lisp DSL. And here is an example implementation of the vector DSL.
- If you want to learn Common Lisp I suggest starting with Peter Seibel's excellent book Practical Common Lisp.
- Efficiency is another important consideration. Passing
values
between functions can be more efficient than using heavier data structures. This is the case in SBCL where this approach is significantly faster than all other alternatives I have tried. Hit me up if you have other suggestions. - There are a number of technical improvements that strictly speaking should be made to this macro, but I have omitted then in the interest of simplicity. If you want an exhaustive discussion of CL macros I suggest On Lisp by Paul Graham and Let over Lambda by Dough Hoyte.
- Here is a working example implementation.
(multiple-value-call #'fx (values a b))
is roughly equivalent to(multiple-value-bind (a* b*) (values a b) (fx a* b*))