inconvergent

For a while I have wanted to make my own terminal utility for manipulating text files. Some version of Sed, or AWK; or maybe even .jq. And I finally did. So here are the first 25 Fibonacci numbers calculated, and printed in an unnecessarily complicated way, using my new query language: Lisp Query Notation (LQN):

❭ echo '(0 1)' | lqn -t "
  (?rec (< (cnt) (1- 25))
        (cat* _ (apply* + (tail* _ 2))))
  #( (strcat #1=(fmt  \"~2,'0d ~30,'0d.\" (cnt) _)
             (seq* (reverse #1#) 1)) )"
⇒ 00 000000000000000000000000000000.000000000000000000000000000000 00
  01 000000000000000000000000000001.100000000000000000000000000000 10
  02 000000000000000000000000000001.100000000000000000000000000000 20
  03 000000000000000000000000000002.200000000000000000000000000000 30
  04 000000000000000000000000000003.300000000000000000000000000000 40
  05 000000000000000000000000000005.500000000000000000000000000000 50
  06 000000000000000000000000000008.800000000000000000000000000000 60
  07 000000000000000000000000000013.310000000000000000000000000000 70
  08 000000000000000000000000000021.120000000000000000000000000000 80
  09 000000000000000000000000000034.430000000000000000000000000000 90
  10 000000000000000000000000000055.550000000000000000000000000000 01
  11 000000000000000000000000000089.980000000000000000000000000000 11
  12 000000000000000000000000000144.441000000000000000000000000000 21
  13 000000000000000000000000000233.332000000000000000000000000000 31
  14 000000000000000000000000000377.773000000000000000000000000000 41
  15 000000000000000000000000000610.016000000000000000000000000000 51
  16 000000000000000000000000000987.789000000000000000000000000000 61
  17 000000000000000000000000001597.795100000000000000000000000000 71
  18 000000000000000000000000002584.485200000000000000000000000000 81
  19 000000000000000000000000004181.181400000000000000000000000000 91
  20 000000000000000000000000006765.567600000000000000000000000000 02
  21 000000000000000000000000010946.649010000000000000000000000000 12
  22 000000000000000000000000017711.117710000000000000000000000000 22
  23 000000000000000000000000028657.756820000000000000000000000000 32
  24 000000000000000000000000046368.863640000000000000000000000000 42
  25 000000000000000000000000075025.520570000000000000000000000000 52

What is it?

LQN is a query language, CL library and terminal utility. To use the query language in the terminal there are three different commands: tqn, lqn and jqn. For ingesting text (e.g. CSV), Lisp data (e.g. source code), and JSON, respectively. More on those later.

LQN has a lot of similarities to .jq. Most notably the functional style, and chaining commands together. But I don't know .jq well enough to say exactly how similar. Even so, I have noticed that I ended up making several similar behaviours. Which make sense given the common domain and programming style.

On top of that I wanted something that is terse, but flexible enough that I can write arbitrary CL code if I need to. Finally I wanted the compiler to be relatively simple.

Symbols, Strings and Keywords

A symbol in CL is (among other things) used to represent functions and variables. In this context all we need to know is that symbols are used to represent function names and LQN operators. reverse and ?rec in the above code are symbols. reverse is a native CL function that reverses sequences, and ?rec is an LQN operator for recursion.

Unsurprisingly strings in CL are written "like this". Writing strings in terminal commands can be impractical, So LQN uses :keywords to represent lower case strings where possible. You can have virtually any character in a keyword, so :this/is-@-valid-keyword!

Asemic writing experiment
Asemic Writing Experiment

Data Representation

For LQN to work on multiple data formats, all incoming data is loaded into native CL objects. Most frequently vectors and hash-tables (known as kvs for short from now on). Text files are read into vectors of strings, whereas JSON is read into vectors and kvs depending on the structure.

Here is an example of a JSON structure, which I assume is familiar.

[{ "_id": "65679d23", "index": 0,
   "things": [{ "id": 0, "name": "Chris" }],
   "msg": "this is a message",
   "fave": "strawberry" }]

And here is the same data written in what I have named Lisp Data Notation (LDN) here. Just so I have one more acronym to shuffle around:

#(( (:_ID . "65679d23") (:INDEX . 0)
    (:THINGS . #(( (:ID . 0) (:NAME . "Chris") )))
    (:MSG . "this is a message")
    (:FAVE . "strawberry") ))

As you see CL vectors are written as #(..). This is how CL prints vectors in the REPL. And also how you can write vectors in CL source code. kvs are lists ((:OF . "tuples") (:LIKE . "this")). Also known as alists.

In time I might add support for Extensible Data Notation (edn), which I find more pleasant to look at. But LDN will do for now.

Terminal Velocity

Since the query language is the same for all kinds of input data we'll start by looking at tqn. It's the easiest to use for examples. tqn can read files or data from standard input:

tqn [options] <qry> [files ...]
echo 'some string' | tqn [options] <qry>

All commands can output data in any (supported) format using either -t, -l, -j options. To print internal CL objects like kvs and vectors in the terminal they have to be serialized one way or another. You can use -tj or -tl to serialize to text where internal objects are printed as either JSON or LDN respectively. Here is a small example where we pipe two lines to lqn, and select all the input using the "current value" symbol _:

❭ echo 'a b c\ndef' | tqn _
⇒ a b c
  def

All lines are read into a vector, then each item is printed on a new line. if We use the -l option we get the input serialized as LDN instead:

❭ echo 'a b c\ndef' | tqn _
⇒ #("a b c" "def")
Asemic writing experiment
Asemic Writing Experiment

Chaining Operations

A query can consist of multiple operations or "clauses". To explicitly chain two clauses together you can use the pipe operator (|| expr-1 .. expr-n) where the resulting value of a each expression is passed on to the next. Naturally _ can be used to refer to the input value. Below is a query that splits the incoming string at every "x" and uppercases each new string. splt will trim off any whitespace by default.

❭ echo 'a b c x def x 27'\
  | tqn '(|| (splt _ :x) (sup _))'
⇒ A B C
  DEF
  27

|| is the default operator for any query, so the following will yield the same result:

❭ echo 'a b c x def x 27'\
  | tqn '(splt _ :x) sup'

Notice that any "bare" function name (sup) inside the pipe operator is called on each individual item in the incoming vector. This is shorthand for the map operator #(..). Here is the same expression written with an explicit map:

❭ echo 'a b c x def x 27'\
  | tqn '(splt _ :x) #(sup)'

As you might expect #(..) will also chain clauses together. The following is a query that splits the substrings at "B", then joins them again with "-":

❭ echo 'abc x def x abcdef'\
  | tqn '(splt _ :x) #(sup (splt _ "B") (join _ :-))'
⇒ A-C
  DEF
  A-CDEF

If you want to filter the input, you can use strings or bare keywords directly, or you can use the filter operator to do more complex filtering. e.g: Find all rows that contain the substring "e!":

❭ echo 'one! x two! x three!'\
  | tqn '(splt _ :x) :e!'
⇒ one!
  three!

Or items that contain "ef", or are (parsable as) integers:

❭ echo 'a b c x def x 27'\
  | tqn '(splt _ :x) [:ef int!?]'
⇒ def
  27

Filters also support inversion. This will drop all integers:

❭ echo 'abc x def x 27'\
  | tqn '(splt _ :x) [-@int!?]'
⇒ abc
  def

Here is a slightly more complex filter:

❭ echo 'abc x abcdef x abcdefghi'\
  | tqn '(splt _ :x) [:+@ab :+@bc :-@gh]'
⇒ abc
  abcdef

The behaviour of the filters is explained in more detail in the documentation, but Here is a query that will return the same result written using conventional CL boolean operators (and, not), and the substring search function sub?. To demonstrate that you can use conventional CL code if you want to:

❭ echo 'abc x abcdef x abcdefghi'\
  | tqn '(splt _ :x) [(and (sub? _ "ab")
                           (sub? _ "bc")
                           (not (sub? _ "gh")))]'

Now that we have an idea of the basic functionality, let's try to transform some JSON.

generative symbols
Generative Symbols

Selecting from JSON

So far we have looked at reading text from the terminal. Working with JSON is not so different. But there are some operators in LQN that come in very handy when working with structured data like JSON. You can pipe JSON to jqn as well, but we will use the following JSON file as our example:

❭ cat data.json
⇒ [ { "id": "1",
      "things": [ { "id": 4, "name": "Ball", "info": "round" } ],
      "fave": "strawberry" },
    { "id": "2",
      "things": [ { "id": 9, "name": "Scissor", "info": "sharp" },
                  { "id": 7, "name": "Herring", "info": "frozen" },
                  { "id": 3, "name": "Computer" } ],
      "msg": "Nih!",
      "fave": "strawberry" },
    { "id": "3",
      "things": [ { "id": 2, "name": "Paper" },
                  { "id": 8, "name": "Bottle", "info": empty } ],
      "msg": "+++Banana, banana, banana!+++" } ]

Sometimes you want just some keys from a list of JSON objects. Let's select the id and msg fields using the #{..} operator:

❭ jqn '#{:id :msg}' data.json
⇒ [ { "id": "1", "msg": null },
    { "id": "2", "msg": "Nih!" },
    { "id": "3", "msg": "+++Banana, banana, banana!+++" } ]

We see that #{..} selects keys from kvs in a vector into new kvs in a vector. The default behaviour is to include keys even if they have no value. To only include keys that have a value you can use the ?@ modifier, which only selects keys if they are present and not nil:

❭ jqn '#{:id :?@msg}' data.json
⇒ [ { "id": "1" },
    { "id": "2", "msg": "Nih!" },
    { "id": "3", "msg": "+++Banana, banana, banana!+++" } ]

And if you want to transform values you can use expressions like this:

❭ jqn '#{(:id (+ 10 (int!? _)))
         (:?@msg sup)}' data.json
⇒ [ { "id": 11 },
    { "id": 12, "msg": "NIH!" },
    { "id": 13, "msg": "+++BANANA, BANANA, BANANA!+++" } ]

Again we see that bare symbols are interpreted as a function with the current value as the only argument. whereas expressions are evaluated direclty.

Before we move on, here is a slightly more involved example that selects the msg field only when the message has a value, and is longer than 10 characters:

❭ jqn '#{ (:%@msg (?? _ (> (size? _) 10)
                        (sup _))) }
       [is?]' data.json
⇒ [ { "msg": "+++BANANA, BANANA, BANANA!+++" } ]

If we didnt have the filter at the end we would also see two null values.

This post is already longer than I can reasonably expect anyone to read about this topic. But here is one more example where we use multiple selectors at the same time, and print the result as newline separated JSON.

❭ jqn -tjm '#[( :things #[ (:name sdwn)
                           (:?@info sup) ] )]' code/sample.json
⇒  ["ball","ROUND"]
   ["scissor","SHARP","herring","FROZEN","computer"]
   ["paper","bottle","EMPTY"]

While this particular data is just nonsense, I enjoy how little code we need in order to do this transformation.

Conclusion

There are quite a few other useful operators and functions in LQN. (?rec ..), as you might have glimpsed in the initial Fibonacci calculation, performs recursion. (?srch ..) can search for things inside a nested structure. Similarly (?txpr ..) can do search and replace in nested data. And there are obviously several operators for making new objects, looking up individual paths, and so on. If any of this peaks your interest, you can read more in the readme.

Initially I said I wanted a relatively simple compiler. The core of the LQN compiler is about 300 lines. Obviously there is more code than that, but it doesn't feel unmanageable.

LQN is very much an experiment. Since I haven't had the opportunity to use it much in practice, I don't know if the behaviour of the operators and built-ins are convenient. Which will probably depend a little on the typical use cases anyway. However, this is an interesting, and maybe even useful, little language for performing some of the tasks you might encounter in the terminal.

Perhaps more interestingly I think it can be helpful when writing other DSLs in the future. As a tool to perform some of the transformations you will often need when implementing CL macros.


  1. Writing a compiler to calculate the Fibonacci numbers in my own language is the most complicated way I have ever calculated the Fibonacci numbers.
  2. I haven't really used it for anything in practice. So I mostly know it from looking at the docs and common examples.
  3. The output has been formatted to be a little more compact than the actual output to the terminal.
  4. It is possible to write very efficient code in CL. However LQN will never be as fast as e.g. grep, Sed and AWK. Among other things the startup time for running CL from the terminal is higher than for these smaller utilities.