EDIT: There is a more technical paper about LQN here (PDF).
For a while I have wanted to make my own terminal utility for manipulating text
files. Some version of Sed, or
AWK; or maybe even .jq. And
I finally did. So here are the first 25 Fibonacci
numbers calculated, and printed in an unnecessarily complicated way, using my
new query language: Lisp Query Notation (LQN
):
❭ echo '(0 1)' | lqn -t " (?rec (< (cnt) (1- 25)) (cat* _ (apply* + (tail* _ 2)))) #( (strcat #1=(fmt \"~2,'0d ~30,'0d.\" (cnt) _) (seq* (reverse #1#) 1)) )" ⇒ 00 000000000000000000000000000000.000000000000000000000000000000 00 01 000000000000000000000000000001.100000000000000000000000000000 10 02 000000000000000000000000000001.100000000000000000000000000000 20 03 000000000000000000000000000002.200000000000000000000000000000 30 04 000000000000000000000000000003.300000000000000000000000000000 40 05 000000000000000000000000000005.500000000000000000000000000000 50 06 000000000000000000000000000008.800000000000000000000000000000 60 07 000000000000000000000000000013.310000000000000000000000000000 70 08 000000000000000000000000000021.120000000000000000000000000000 80 09 000000000000000000000000000034.430000000000000000000000000000 90 10 000000000000000000000000000055.550000000000000000000000000000 01 11 000000000000000000000000000089.980000000000000000000000000000 11 12 000000000000000000000000000144.441000000000000000000000000000 21 13 000000000000000000000000000233.332000000000000000000000000000 31 14 000000000000000000000000000377.773000000000000000000000000000 41 15 000000000000000000000000000610.016000000000000000000000000000 51 16 000000000000000000000000000987.789000000000000000000000000000 61 17 000000000000000000000000001597.795100000000000000000000000000 71 18 000000000000000000000000002584.485200000000000000000000000000 81 19 000000000000000000000000004181.181400000000000000000000000000 91 20 000000000000000000000000006765.567600000000000000000000000000 02 21 000000000000000000000000010946.649010000000000000000000000000 12 22 000000000000000000000000017711.117710000000000000000000000000 22 23 000000000000000000000000028657.756820000000000000000000000000 32 24 000000000000000000000000046368.863640000000000000000000000000 42 25 000000000000000000000000075025.520570000000000000000000000000 52
What is it?
LQN
is a query
language, CL library
and terminal utility. To use the query language in the terminal there are
three different commands: tqn
, lqn
and
jqn
. For ingesting text (e.g. CSV
), Lisp data
(e.g. source code), and JSON
, respectively. More on those
later.
LQN
has a lot of similarities to .jq
. Most notably
the functional style, and chaining commands together. But I don't know
.jq
well enough to say exactly how similar.
Even so, I have noticed that I ended up making several similar behaviours.
Which make sense given the common domain and programming style.
On top of that I wanted something that is terse, but flexible enough that I can write arbitrary CL code if I need to. Finally I wanted the compiler to be relatively simple.
Symbols, Strings and Keywords
A symbol in CL is
(among other things) used to represent functions and variables. In this context
all we need to know is that symbols
are used to represent function
names and LQN
operators. reverse
and
?rec
in the above code are symbols. reverse
is a
native CL function that reverses sequences, and ?rec
is an
LQN
operator for recursion.
Unsurprisingly strings
in CL are written "like this"
.
Writing strings in terminal commands can be impractical, So LQN
uses :keywords
to represent lower case strings where possible. You
can have virtually any character in a keyword, so
:this/is-@-valid-keyword!
Data Representation
For LQN
to work on multiple data formats, all incoming data is
loaded into native CL objects. Most frequently vectors
and
hash-tables
(known as kvs
for short from now on).
Text files are read into vectors
of strings, whereas
JSON
is read into vectors
and kvs
depending on the structure.
Here is an example of a JSON
structure, which I assume is familiar.
[{ "_id": "65679d23", "index": 0, "things": [{ "id": 0, "name": "Chris" }], "msg": "this is a message", "fave": "strawberry" }]
And here is the same data written in what I have named Lisp Data Notation
(LDN
) here. Just so I have one more acronym to shuffle around:
#(( (:_ID . "65679d23") (:INDEX . 0) (:THINGS . #(( (:ID . 0) (:NAME . "Chris") ))) (:MSG . "this is a message") (:FAVE . "strawberry") ))
As you see CL vectors
are written as #(..)
. This is
how CL prints vectors in the REPL. And also how you can write vectors in CL
source code. kvs
are lists ((:OF . "tuples") (:LIKE .
"this"))
.
Also known as alists.
In time I might add support for
Extensible Data Notation
(edn
), which I find more pleasant to look at. But LDN
will do for now.
Terminal Velocity
Since the query language is the same for all kinds of input data we'll start by
looking at tqn
. It's the easiest to use for examples.
tqn
can read files or data from standard input:
tqn [options] <qry> [files ...] echo 'some string' | tqn [options] <qry>
All commands can output data in any (supported) format using either
-t
, -l
, -j
options. To print internal CL
objects like kvs
and vectors
in the terminal they
have to be serialized one way or another. You can use -tj
or
-tl
to serialize to text where internal objects are printed as
either JSON
or LDN
respectively. Here is a small
example where we pipe two lines to lqn
, and select all the input
using the "current value" symbol _
:
❭ echo 'a b c\ndef' | tqn _ ⇒ a b c def
All lines are read into a vector, then each item is printed on a new line. if
We use the -l
option we get the input serialized as
LDN
instead:
❭ echo 'a b c\ndef' | tqn _ ⇒ #("a b c" "def")
Chaining Operations
A query can consist of multiple operations or "clauses". To explicitly chain two
clauses together you can use the pipe operator (|| expr-1 ..
expr-n)
where the resulting value of a each expression is passed on
to the next. Naturally _
can be used to refer to the input value.
Below is a query that splits the incoming string at every "x"
and
uppercases each new string. splt
will trim off any whitespace by
default.
❭ echo 'a b c x def x 27'\ | tqn '(|| (splt _ :x) (sup _))' ⇒ A B C DEF 27
||
is the default operator for any query, so we
can write this instead:
❭ echo 'a b c x def x 27'\ | tqn '(splt _ :x) sup' ⇒ A B C DEF 27
Also notice that any "bare" function name (sup
) inside the pipe
operator is called on each individual item in the incoming vector. This is
shorthand for the map operator #(..)
. Here is the same expression
written with an explicit map:
❭ echo 'a b c x def x 27'\ | tqn '(splt _ :x) #(sup)'
As you might expect #(..)
will also chain clauses together. The
following is a query that splits the substrings at "B"
, then joins
them again with "-"
:
❭ echo 'abc x def x abcdef'\ | tqn '(splt _ :x) #(sup (splt _ "B") (join _ :-))' ⇒ A-C DEF A-CDEF
If you want to filter the input, you can use strings or bare keywords
directly, or you can use the filter operator to do more complex filtering.
e.g: Find all rows that contain the substring "e!"
:
❭ echo 'one! x two! x three!'\ | tqn '(splt _ :x) :e!' ⇒ one! three!
Or items that contain "ef"
, or are (parsable as) integers:
❭ echo 'a b c x def x 27'\ | tqn '(splt _ :x) [:ef int!?]' ⇒ def 27
Filters also support inversion. This will drop all integers:
❭ echo 'abc x def x 27'\ | tqn '(splt _ :x) [-@int!?]' ⇒ abc def
Here is a slightly more complex filter:
❭ echo 'abc x abcdef x abcdefghi'\ | tqn '(splt _ :x) [:+@ab :+@bc :-@gh]' ⇒ abc abcdef
The behaviour of the filters is explained in more detail in the documentation, but Here is a
query that will return the same result written using conventional
CL boolean operators (and
, not
), and the substring
search function sub?
. To demonstrate that you can use conventional
CL code if you want to:
❭ echo 'abc x abcdef x abcdefghi'\ | tqn '(splt _ :x) [(and (sub? _ "ab") (sub? _ "bc") (not (sub? _ "gh")))]'
Now that we have an idea of the basic functionality, let's try to transform
some JSON
.
Selecting from JSON
So far we have looked at reading text from the terminal. Working with
JSON
is not so different. But there are some operators in
LQN
that come in very handy when working with structured data like
JSON
. You can pipe JSON
to jqn
as well,
but we will use the following JSON
file as our example:
❭ cat data.json ⇒ [ { "id": "1", "things": [ { "id": 4, "name": "Ball", "info": "round" } ], "fave": "strawberry" }, { "id": "2", "things": [ { "id": 9, "name": "Scissor", "info": "sharp" }, { "id": 7, "name": "Herring", "info": "frozen" }, { "id": 3, "name": "Computer" } ], "msg": "Nih!", "fave": "strawberry" }, { "id": "3", "things": [ { "id": 2, "name": "Paper" }, { "id": 8, "name": "Bottle", "info": empty } ], "msg": "+++Banana, banana, banana!+++" } ]
Sometimes you want just some keys from a list of JSON
objects.
Let's select the id
and msg
fields using the
#{..}
operator:
❭ jqn '#{:id :msg}' data.json ⇒ [ { "id": "1", "msg": null }, { "id": "2", "msg": "Nih!" }, { "id": "3", "msg": "+++Banana, banana, banana!+++" } ]
We see that #{..}
selects keys from kvs
in a
vector
into new kvs
in a vector
. The
default behaviour is to include keys even if they have no value. To only
include keys that have a value you can use the ?@
modifier, which
only selects keys if they are present and not nil
:
❭ jqn '#{:id :?@msg}' data.json ⇒ [ { "id": "1" }, { "id": "2", "msg": "Nih!" }, { "id": "3", "msg": "+++Banana, banana, banana!+++" } ]
And if you want to transform values you can use expressions like this:
❭ jqn '#{(:id (+ 10 (int!? _))) (:?@msg sup)}' data.json ⇒ [ { "id": 11 }, { "id": 12, "msg": "NIH!" }, { "id": 13, "msg": "+++BANANA, BANANA, BANANA!+++" } ]
Again we see that bare symbols are interpreted as a function with the current value as the only argument. whereas expressions are evaluated direclty.
Before we move on, here is a slightly more involved example that
selects the msg
field only when the message has a value, and
is longer than 10 characters:
❭ jqn '#{ (:%@msg (?? _ (> (size? _) 10) (sup _))) } [is?]' data.json ⇒ [ { "msg": "+++BANANA, BANANA, BANANA!+++" } ]
If we didnt have the filter at the end we would also see two null
values.
This post is already longer than I can reasonably expect anyone to read about
this topic. But here is one more example where we use multiple selectors at the
same time, and print the result as newline separated JSON
.
❭ jqn -tjm '#[( :things #[ (:name sdwn) (:?@info sup) ] )]' code/sample.json ⇒ ["ball","ROUND"] ["scissor","SHARP","herring","FROZEN","computer"] ["paper","bottle","EMPTY"]
While this particular data is just nonsense, I enjoy how little code we need in order to do this transformation.
Conclusion
There are quite a few other useful operators
and functions in LQN
. (?rec ..)
, as you might have glimpsed in the initial
Fibonacci calculation, performs recursion. (?srch ..)
can search
for things inside a nested structure. Similarly (?txpr ..)
can
do search and replace in nested data. And there are obviously several
operators for making new objects, looking up individual paths, and so on.
If any of this peaks your interest, you can
read more in the readme.
Initially I said I wanted a relatively simple compiler. The core
of the LQN
compiler is about 300 lines. Obviously there is more
code than that, but it doesn't feel unmanageable.
LQN
is very much an experiment. Since I haven't had the
opportunity to use it much in practice, I don't know if the behaviour of the
operators and built-ins are convenient. Which will probably depend a
little on the typical use cases anyway. However, this is
an interesting, and maybe even useful, little language for performing some of the tasks you
might encounter in the terminal.
Perhaps more interestingly I think it can be helpful when writing other DSLs in the future. As a tool to perform some of the transformations you will often need when implementing CL macros.
- Writing a compiler to calculate the Fibonacci numbers in my own language is the most complicated way I have ever calculated the Fibonacci numbers.
- I haven't really used it for anything in practice. So I mostly know it from looking at the docs and common examples.
- The output has been formatted to be a little more compact than the actual output to the terminal.
- It is possible to write very efficient code in CL. However
LQN
will never be as fast as e.g.grep
,Sed
andAWK
. Among other things the startup time for running CL from the terminal is higher than for these smaller utilities.