Clojure Cookbook: Custom Data Structures

  • clojure
  • cookbook

This recipe from Clojure Cookbook is one of my all-time favorites. Contributor (and all-around awesome dude) Leonardo Borges submitted a two-part series on building custom data structures in Clojure, using core.match no less. I’d never really dug into the Clojure sequence abstraction before, but this recipe prompted me to dig in further than I’d ever gone before.

While Custom Data Structures is a two-part series, what I really want to showcase is Clojure’s sequence abstractions, which are featured in Part II. To catch up on the details of implementing a Red-black tree, visit Part I. Or hey, buy the book.

You’ll probably notice this recipe is credited to both Leonardo and myself. The reason for this is that I substantially changed the implementation and explanation based on feedback from my colleagues at Cognitect.

This isn’t to diminish Leonardo’s accomplishments, we just don’t want to put words in other people’s mouths.

Implementing Custom Data Structures: Red-Black Trees–Part II

by Ryan Neufeld; originally submitted by Leonardo Borges

Problem

You want to use Clojure’s core sequence functions (conj, map, filter, etc.) with your custom data structure.

Solution

In part one of this recipe (Implementing Custom Data Structures: Red-black Trees Part I), you implemented all the functions necessary for creating an efficient red-black tree.

What’s missing is participation in Clojure’s sequence abstraction.

The most important part of participating in sequence abstraction is the ability to expose values of a data structure sequentially. The built-in tree-seq is well suited for this task. One extra step is needed, however; tree-seq returns a sequence of nodes, not values.

Here’s the final rb-tree->seq function:

(defn- rb-tree->tree-seq
  "Return a seq of all nodes in an red-black tree."
  [rb-tree]
  (tree-seq sequential? (fn [[_ left _ right]]
                          (remove nil? [left right]))
            rb-tree))

(defn rb-tree->seq
  "Convert a red-black tree to a seq of its values."
  [rb-tree]
  (map (fn [[_ _ val _]] val) (rb-tree->tree-seq rb-tree)))

(rb-tree->seq (-> nil
                  (insert-val 5)
                  (insert-val 2)))
;; -> (5 2)

Since RBTs most closely resemble sets, they should adhere well to the IPersistentSet interface. Extend the IPersistentSet and IFn protocols to a new RedBlackTree type, implementing all of the necessary functions. It’s also wise to implement the multimethod print-method for RedBlackTree, as the default implementation will fail for RedBlackTree as implemented:

(deftype RedBlackTree [tree]
  clojure.lang.IPersistentSet
  (cons [self v] (RedBlackTree. (insert-val tree v)))
  (empty [self] (RedBlackTree. nil))
  (equiv [self o] (if (instance? RedBlackTree o)
                    (= tree (.tree o))
                    false))
  (seq [this] (if tree (rb-tree->seq tree)))
  (get [this n] (find-val tree n))
  (contains [this n] (boolean (get this n)))
  ;; (disjoin [this n] ...) ;; Omitted due to complexity
  clojure.lang.IFn
  (invoke [this n] (get this n))
  Object
  (toString [this] (pr-str this)))

(defmethod print-method RedBlackTree [o ^java.io.Writer w]
  (.write w (str "#rbt " (pr-str (.tree o)))))

Note

disjoin and the corresponding remove-val functions are left as exercises for the reader.

It is now possible to use a RedBlackTree instance like any other collection–in particular, instances act like sets:

(into (->RedBlackTree nil) (range 2))
;; -> #rbt [:black nil 0 [:red nil 1 nil]]

(def to-ten (into (->RedBlackTree nil) (range 10)))

(seq to-ten)
;; -> (3 1 0 2 5 4 7 6 8 9)

(get to-ten 9)
;; -> 9

(contains? to-ten 9)
;; -> true

(to-ten 9)
;; -> 9

(map inc to-ten)
;; -> (4 2 1 3 6 5 8 7 9 10)

Discussion

In the end, it doesn’t take a lot to participate in the sequence abstraction. By implementing a small handful of interface functions, the red-black tree implementation from Implementing Custom Data Structures: Red-black Trees Part I can participate in an array of sequence-oriented functions: map, filter, reduce, you name it.

At its essence, clojure.lang.IPersistentSet is an abstraction of what it means to represent a mathematical set structure; this matches a tree data structure well. A set isn’t a list or sequence, though. So how is RedBlackTree then said to be participating in the sequence abstraction?

In Clojure, types extending the clojure.lang.ISeq interface are true sequences, represented as a logical list of head and tail. While IPersistentSet does not inherit from ISeq, it does share a common ancestry with it. Both interfaces extend clojure.lang.IPersistentCollection and its parent clojure.lang.Seqable. As luck would have it,footnote:[Actually, as design would have it.] sequence functions rely on collections being Seqable, not ISeq. Since RedBlackTree can be read as a sequence, it is Seqable and can be operated on by all of the sequence functions you know and love.

Most of the functions in the IPersistentSet interface are self-explanatory, but some deserve further explanation. The function cons is a historical name for constructing a new list by appending a value to an existing list. seq is intended to produce a sequence from a collection, or nil if empty:

IPersistentSet.java

public interface IPersistentSet extends IPersistentCollection, Counted {
  public IPersistentSet disjoin(Object key);
  public boolean contains(Object key);
  public Object get(Object key);
}

IPersistentCollection.java

public interface IPersistentCollection extends Seqable {
  int count();
  IPersistentCollection cons(Object o);
  IPersistentCollection empty();
  boolean equiv(Object o);
}

Seqable.java

public interface Seqable {
  ISeq seq();
}

The most challenging part of any Seqable implementation is actually making a sequence out of the underlying data structure. This would be particularly challenging if you needed to write your own lazy tree-traversal algorithms, but luckily Clojure has a built-in function, tree-seq, that does precisely this. By leveraging tree-seq to produce a sequence of nodes, it is trivial to write an rb-tree->seq conversion function that lazily traverses a RedBlackTree, yielding node values as it goes.

tree-seq accepts three arguments:

  • branch?: A conditional that returns true if a node is a branch (not a leaf node). For RedBlackTree, sequential? is an adequate check, as every node is a vector.

  • children: A function that returns all of the children for a given node.

  • root: The node to begin traversal on.

Note

tree-seq performs a depth-first traversal of trees. Given how red-black trees are represented, this will not be an ordered traversal.

With a sequence conversion function in hand, it is easy enough to write the seq function. Similarly, cons and empty are a breeze–simply utilize the existing tree functions. Equality testing can be a bit more difficult, however.

For the sake of simplicity, we chose to implement equality (equiv) between only RedBlackTree instances. Further, the implementation compares a sorted sequence of their elements. In this case, equiv is answering the question, “Do these trees have the same values?” and not the question, “Are these the same trees?” It’s an important distinction, one you’ll need to consider carefully when implementing your own data structures.

As discussed in Determining if a Collection Holds One of Several Values, one of the big bonuses of sets is their ability to be invoked just like any other function. It’s easy enough to provide this ability to RedBlackTrees too. By implementing the single-arity invoke function of the clojure.lang.IFn interface, RedBlackTrees can be invoked like any other function (or set, for that matter):

(some (rbt [2 3 5 7]) [6])
;; -> nil

((rbt (range 10)) 3)
;; -> 3

Even with the full IPersistentSet interface implemented, there are still a number of conveniences RedBlackTree is lacking. For one, you need to use the kludgy /->RedBlackTree or RedBlackTree. functions to create a new RedBlackTree and add values to it manually. By convention, many built-in collections provide convenience functions for populating them (aside from literal tags like [] or {}, of course). It’s easy enough to mirror vec and vector for RedBlackTrees:

(defn rbt
 "Create a new RedBlackTree with the contents of coll."
 [coll]
 (into (->RedBlackTree nil) coll))

(defn red-black-tree
  "Creates a new RedBlackTree containing the args."
  [& args]
  (rbt args))

(rbt (range 3))
;; -> #rbt [:black [:black nil 0 nil] 1 [:black nil 2 nil]]

(red-black-tree 7 42)
;; -> #rbt [:black nil 7 [:red nil 42 nil]]

You may also have noticed printing is not a concern of the sequence abstraction, although it is certainly an important consideration to make for developing developer- and machine-friendly data structures. There are two types of printing in Clojure: toString and pr-based printing. The toString function is intended for printing human-readable values at the REPL, while the pr family of functions are meant (more or less) to be readable by the Clojure reader.

To provide our own readable representation of RBT, we must implement print-method (the heart of pr) for the RedBlackTree type. By writing in a “tagged literal” format (e.g., #rbt), it is possible to configure the reader to ingest and hydrate written values as first-class objects:

(require '[clojure.edn :as edn])

;; Recall ...
(defmethod print-method RedBlackTree [o ^java.io.Writer w]
  (.write w (str "#rbt " (pr-str (.tree o)))))

(def rbt-string (pr-str (rbt [1 4 2])))
rbt-string
;; -> "#rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]"

(edn/read-string rbt-string)
;; -> RuntimeException No reader function for tag rbt ...

(edn/read-string {:readers {'rbt ->RedBlackTree}}
                 rbt-string)
;; -> #rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]

See Also

Like this post? Subscribe to my newsletter.

Get fresh content on Clojure, Architecture and Software Development, each and every week.

comments powered by Disqus