chapter thirteen

13 Sets

 

A set is a data structure that contains distinct and unordered items. Sets are widely used in computer science and have a strong connection with mathematical set theory. Clojure offers two types of sets with their corresponding initializer functions:

  • hash-set is the most used, offering fast lookup, specific syntax literal #{} and a transient version. hash-set is implemented on top clojure.lang.PersistentHashSet which in turn is a thin layer on top of clojure.lang.PersistentHashMap, the same of hash-map. Both hash-set and hash-map are instances of Hash Array Mapped Trie [203]. hash-set offers near constant time lookup (at O(log32N)), addition and removal of items.
  • sorted-set also guarantees uniqueness of items but additionally maintains ordering based on a comparator. It is based on Red Black trees (see subseq call-out section) and offers a well balanced logarithmic access (at O(log2N)).

Both set types can be used as functions (especially as predicates) to verify the presence of items in a concise way:

((sorted-set 5 3 1) 1) ; #1
;; 1

(some #{1 2 3} [0 4 6 8 1]) ; #2
;; 1

The clojure.core namespace contains some basic functions to create (hash-set, set, sorted-set) and remove (disj) elements from a set. conj and into also works on sets (along with many other collection types). Functions like union, intersection or difference are available in the clojure.set namespace instead.

13.1  hash-set

13.2  set

13.3  sorted-set and sorted-set-by

13.4  disj

13.5  union, difference and intersection

13.6  subset? and superset?

13.7  select, index, rename, join and project