Clojure Cookbook: Custom Data Structures
This recipe from Clojure Cookbook is one of my all-time favorites. Contributor (and all-around awesome dude) Leonardo Borges submitted a two-part series on building custom data structures in Clojure, using core.match no less. I’d never really dug into the Clojure sequence abstraction before, but this recipe prompted me to dig in further than I’d ever gone before.
While Custom Data Structures is a two-part series, what I really want to showcase is Clojure’s sequence abstractions, which are featured in Part II. To catch up on the details of implementing a Red-black tree, visit Part I. Or hey, buy the book.
You’ll probably notice this recipe is credited to both Leonardo and myself. The reason for this is that I substantially changed the implementation and explanation based on feedback from my colleagues at Cognitect.
This isn’t to diminish Leonardo’s accomplishments, we just don’t want to put words in other people’s mouths.
Implementing Custom Data Structures: Red-Black Trees–Part II
by Ryan Neufeld; originally submitted by Leonardo Borges
Problem
You want to use Clojure’s core sequence functions (conj, map,
filter, etc.) with your custom data structure.
Solution
In part one of this recipe (Implementing Custom Data Structures: Red-black Trees Part I), you implemented all the functions necessary for creating an efficient red-black tree.
What’s missing is participation in Clojure’s sequence abstraction.
The most important part of participating in sequence abstraction
is the ability to expose values of a data structure sequentially. The
built-in tree-seq is well suited for this task. One extra step is
needed, however; tree-seq returns a sequence of nodes, not values.
Here’s the final rb-tree->seq function:
(defn- rb-tree->tree-seq "Return a seq of all nodes in an red-black tree." [rb-tree] (tree-seq sequential? (fn [[_ left _ right]] (remove nil? [left right])) rb-tree)) (defn rb-tree->seq "Convert a red-black tree to a seq of its values." [rb-tree] (map (fn [[_ _ val _]] val) (rb-tree->tree-seq rb-tree))) (rb-tree->seq (-> nil (insert-val 5) (insert-val 2))) ;; -> (5 2)
Since RBTs most closely resemble sets, they should adhere well to the
IPersistentSet interface. Extend the IPersistentSet and IFn
protocols to a new RedBlackTree type, implementing all of the
necessary functions. It’s also wise to implement the multimethod
print-method for RedBlackTree, as the default implementation will
fail for RedBlackTree as implemented:
(deftype RedBlackTree [tree] clojure.lang.IPersistentSet (cons [self v] (RedBlackTree. (insert-val tree v))) (empty [self] (RedBlackTree. nil)) (equiv [self o] (if (instance? RedBlackTree o) (= tree (.tree o)) false)) (seq [this] (if tree (rb-tree->seq tree))) (get [this n] (find-val tree n)) (contains [this n] (boolean (get this n))) ;; (disjoin [this n] ...) ;; Omitted due to complexity clojure.lang.IFn (invoke [this n] (get this n)) Object (toString [this] (pr-str this))) (defmethod print-method RedBlackTree [o ^java.io.Writer w] (.write w (str "#rbt " (pr-str (.tree o)))))
Note
disjoin and the corresponding remove-val functions are left as
exercises for the reader.
It is now possible to use a RedBlackTree instance like any other
collection–in particular, instances act like sets:
(into (->RedBlackTree nil) (range 2)) ;; -> #rbt [:black nil 0 [:red nil 1 nil]] (def to-ten (into (->RedBlackTree nil) (range 10))) (seq to-ten) ;; -> (3 1 0 2 5 4 7 6 8 9) (get to-ten 9) ;; -> 9 (contains? to-ten 9) ;; -> true (to-ten 9) ;; -> 9 (map inc to-ten) ;; -> (4 2 1 3 6 5 8 7 9 10)
Discussion
In the end, it doesn’t take a lot to participate in the sequence
abstraction. By implementing a small handful of interface functions, the
red-black tree implementation from Implementing Custom Data Structures: Red-black Trees Part I
can participate in an array of sequence-oriented functions: map, filter,
reduce, you name it.
At its essence, clojure.lang.IPersistentSet is an abstraction of what it
means to represent a mathematical set structure; this matches a tree data
structure well. A set isn’t a list or sequence, though. So how is RedBlackTree
then said to be participating in the sequence abstraction?
In Clojure, types extending the clojure.lang.ISeq interface are true
sequences, represented as a logical list of head and tail. While
IPersistentSet does not inherit from ISeq, it does share a common
ancestry with it. Both interfaces extend
clojure.lang.IPersistentCollection and its parent
clojure.lang.Seqable. As luck would have it,footnote:[Actually, as
design would have it.] sequence functions rely on collections being
Seqable, not ISeq. Since RedBlackTree can be read as a
sequence, it is Seqable and can be operated on by all of the
sequence functions you know and love.
Most of the functions in the IPersistentSet interface are self-explanatory, but some deserve further explanation. The function cons
is a historical name for constructing a new list by appending a value
to an existing list. seq is intended to produce a sequence from a
collection, or nil if empty:
IPersistentSet.java
public interface IPersistentSet extends IPersistentCollection, Counted { public IPersistentSet disjoin(Object key); public boolean contains(Object key); public Object get(Object key); }
IPersistentCollection.java
public interface IPersistentCollection extends Seqable { int count(); IPersistentCollection cons(Object o); IPersistentCollection empty(); boolean equiv(Object o); }
Seqable.java
public interface Seqable { ISeq seq(); }
The most challenging part of any Seqable implementation is actually
making a sequence out of the underlying data structure. This would be
particularly challenging if you needed to write your own lazy
tree-traversal algorithms, but luckily Clojure has a built-in function,
tree-seq, that does precisely this. By leveraging tree-seq to
produce a sequence of nodes, it is trivial to write an rb-tree->seq conversion function that lazily traverses a RedBlackTree, yielding
node values as it goes.
tree-seq accepts three arguments:
branch?: A conditional that returnstrueif a node is a branch (not a leaf node). ForRedBlackTree,sequential?is an adequate check, as every node is a vector.children: A function that returns all of the children for a given node.root: The node to begin traversal on.
Note
tree-seq performs a depth-first traversal of trees. Given how
red-black trees are represented, this will not be an ordered
traversal.
With a sequence conversion function in hand, it is easy enough to
write the seq function. Similarly, cons and empty are a breeze–simply utilize the existing tree functions. Equality testing can be a
bit more difficult, however.
For the sake of simplicity, we chose to implement equality (equiv)
between only RedBlackTree instances. Further, the implementation
compares a sorted sequence of their elements. In this case, equiv is
answering the question, “Do these trees have the same values?” and not
the question, “Are these the same trees?” It’s an important
distinction, one you’ll need to consider carefully when implementing
your own data structures.
As discussed in Determining if a Collection Holds One of Several Values,
one of the big bonuses of sets is their ability to be invoked just like any
other function. It’s easy enough to provide this ability to RedBlackTrees
too. By implementing the single-arity invoke function of the
clojure.lang.IFn interface, RedBlackTrees can be invoked like any other
function (or set, for that matter):
(some (rbt [2 3 5 7]) [6]) ;; -> nil ((rbt (range 10)) 3) ;; -> 3
Even with the full IPersistentSet interface implemented, there are
still a number of conveniences RedBlackTree is lacking. For one, you
need to use the kludgy /->RedBlackTree or RedBlackTree. functions
to create a new RedBlackTree and add values to it manually. By
convention, many built-in collections provide convenience functions
for populating them (aside from literal tags like [] or {}, of
course). It’s easy enough to mirror vec and vector for RedBlackTrees:
(defn rbt "Create a new RedBlackTree with the contents of coll." [coll] (into (->RedBlackTree nil) coll)) (defn red-black-tree "Creates a new RedBlackTree containing the args." [& args] (rbt args)) (rbt (range 3)) ;; -> #rbt [:black [:black nil 0 nil] 1 [:black nil 2 nil]] (red-black-tree 7 42) ;; -> #rbt [:black nil 7 [:red nil 42 nil]]
You may also have noticed printing is not a concern of the sequence
abstraction, although it is certainly an important consideration to
make for developing developer- and machine-friendly data structures.
There are two types of printing in Clojure: toString and pr-based
printing. The toString function is intended for printing
human-readable values at the REPL, while the pr family of functions
are meant (more or less) to be readable by the Clojure reader.
To provide our own readable representation of RBT, we must implement
print-method (the heart of pr) for the RedBlackTree type. By
writing in a “tagged literal” format (e.g., #rbt), it is possible to
configure the reader to ingest and hydrate written values as
first-class objects:
(require '[clojure.edn :as edn]) ;; Recall ... (defmethod print-method RedBlackTree [o ^java.io.Writer w] (.write w (str "#rbt " (pr-str (.tree o))))) (def rbt-string (pr-str (rbt [1 4 2]))) rbt-string ;; -> "#rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]" (edn/read-string rbt-string) ;; -> RuntimeException No reader function for tag rbt ... (edn/read-string {:readers {'rbt ->RedBlackTree}} rbt-string) ;; -> #rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]
See Also
- The first part of this recipe, Implementing Custom Data Structures: Red-black Trees Part I, where we define the initial red-black tree implementation
- Reading and Writing Clojure Data and Handling Unknown Tagged Literals When Reading Clojure Data for more information on reading Clojure data
Like this post? Subscribe to my newsletter.
Get fresh content on Clojure, Architecture and Software Development, each and every week.