Loopr: A Loop/Reduction Macro for Clojure

2022-06-23

I write a lot of reductions: loops that combine every element from a collection in some way. For example, summing a vector of integers:

(reduce (fn [sum x] (+ sum x)) 0 [1 2 3])
; => 6

If you’re not familiar with Clojure’s reduce, it takes a reducing function f, an initial accumulator init, and a collection xs. It then invokes (f init x0) where x0 is the first element in xs. f returns a new accumulator value acc1, which is then passed to (f acc1 x1) to produce a new accumulator acc2, and so on until every x in xs is folded into the accumulator. That accumulator is the return value of reduce.

In writing reductions, there are some problems that I run into over and over. For example, what if you want to find the mean of some numbers in a single pass? You need two accumulator variables–a sum and a count. The usual answer to this is to make the accumulator a vector tuple. Destructuring bind makes this… not totally awful, but a little awkward:

(reduce (fn [[sum count] x]
          [(+ sum x) (inc count)])
        [0 0]
        [1 2 3 4 5 6 7])
; => [28 7]

Ah, right. We need to divide sum by count to get the actual mean. Fine, we’ll wrap that in a let binding:

(let [[sum count] (reduce (fn [[sum count] x]
                            [(+ sum x) (inc count)])
                          [0 0]
                          [1 2 3 4 5 6 7])]
  (/ sum count))
; => 4

This is awkward for a few reasons. One is that we’ve chewed up a fair bit of indentation, and indentation is a precious commodity if you’re an 80-column masochist like me. Another is that we’ve got the accumulator structure specified in four different places: the let binding’s left hand side, the fn arguments, the fn return value(s), and the initial value. When the reducing function is large and complex, these expressions can drift far apart from one another. They may not even fit on a single screen–start juggling a half-dozen variables and it’s easy for init to get out of sync with the fn args. Not the end of the world, but a little frustrating. Then there’s the runtime overhead. Creating and tearing apart all those vectors comes with significant performance cost.

We could write this as a loop:

(loop [sum   0
       count 0 
       xs    [1 2 3 4 5 6 7]]
  (if (seq xs)
    (recur (+ sum (first xs)) (inc count) (next xs))
    (/ sum count)))
; => 4

No let binding, significantly less indentation. Brings the initial values for accumulators directly next to their names, which is nice. No vector destructuring overhead.

On the flip side, we now burn a lot of time in seq machinery: next allocates a seq wrapper for every single step. Clojure’s reduce traverses the internal structure of vectors without these wrappers, and is significantly more efficient. We also have this extra boilerplate for sequence traversal, and the traversal logic is mixed together with the reduction accumulator. When we used reduce, that traversal was implicit.

Enter Loopr

Check this out:

(require '[dom-top.core :refer [loopr]])
(loopr [sum   0
        count 0]
       [x [1 2 3 4 5 6 7]]
       (recur (+ sum x) (inc count))
       (/ sum count))
; => 4

loopr is a hybrid of loop and reduce. Like loop, it starts with a binding vector of accumulator variables and initial values. Then it takes a binding vector of iteration variables: for each x in the vector [1 2 3 4 5 6 7], it evaluates the third form–the body of the loop. Just like Clojure’s loop, that body should recur with new values for each accumulator. The fourth argument to loopr is a final form, and is evaluated with the final values of the accumulators bound. That’s the return value for the loop.

Like loop, it keeps initial values close to their names, and needs no destructuring of accumulators. Like reduce, it leaves iteration implicit–closer to a for loop. It avoids the need for wrapping the reduce return value in another destructuring let, and requires much less indentation.

Did I mention it’s faster than both the reduce and loop shown here?

(def bigvec   (->> (range 10000) vec))
(def bigarray (->> (range 10000) long-array))
(def bigseq   (->> (range 10000) (map identity)))

(quick-bench   
  (loop [sum   0
         count 0
         xs    bigvec]
    (if-not (seq xs)
      [sum count]
      (let [[x & xs] xs] 
        (recur (+ sum x) (inc count) xs)))))
; Evaluation count : 456 in 6 samples of 76 calls.
;              Execution time mean : 1.366176 ms
;     Execution time std-deviation : 23.450717 µs
;    Execution time lower quantile : 1.334857 ms ( 2.5%)
;    Execution time upper quantile : 1.392398 ms (97.5%)
;                    Overhead used : 20.320257 ns

(quick-bench
  (reduce (fn [[sum count] x]
            [(+ sum x) (inc count)])
          [0 0]
          bigvec))
; Evaluation count : 588 in 6 samples of 98 calls.
;              Execution time mean : 1.284103 ms
;     Execution time std-deviation : 118.742660 µs
;    Execution time lower quantile : 1.106587 ms ( 2.5%)
;    Execution time upper quantile : 1.354247 ms (97.5%)
;                    Overhead used : 20.320257 ns

(quick-bench
  (loopr [sum 0, count 0]
         [x bigvec]
         (recur (+ sum x) (inc count))))
; Evaluation count : 792 in 6 samples of 132 calls.
;              Execution time mean : 793.698823 µs
;     Execution time std-deviation : 52.061355 µs
;    Execution time lower quantile : 763.412280 µs ( 2.5%)
;    Execution time upper quantile : 883.322045 µs (97.5%)
;                    Overhead used : 20.320257 ns

How so fast? loopr macroexpands into loop over a mutable iterator, loop with aget for arrays, or reduce, depending on some heuristics about which tactic is likely to be fastest for your structure. It speeds up multi-accumulator reduce by squirreling away extra accumulators in stateful volatiles.

Multidimensional Reductions

Another problem I hit all the time: reducing over nested collections. Say we’ve got a bunch of people, each one with some pets:

(def people [{:name "zhao"
              :pets ["miette" "biscuit"]}
             {:name "chloe"
              :pets ["arthur meowington the third" "miette"]}])

And I wanted to, say, find the set of all pet names. With nested collections, we need a new reduce for each level of nesting.

(reduce (fn [pet-names person]
          (reduce (fn [pet-names pet]
                    (conj pet-names pet))
                  pet-names
                  (:pets person)))
        #{}
        people)
; => #{"biscuit" "miette" "arthur meowington the third"}

(I know you could write this with a single level of reduce via into or set/union, but we’re using this to illustrate a pattern that’s necessary for more complex reductions, especially those in 3 or 4 dimensions.)

Two problems here. One is that those reduces chew up indentation real quick. Another is that we wind up specifying pet-names over and over again–threading it in and out of the inner reduce. Reduce is kind of backwards too–the things it starts with, the initial value and the collection, come last. The whole thing reads a bit inside-out.

What about loop? Any better?

(loop [pet-names    #{}
       people       people]
  (if-not (seq people)
    pet-names
    (let [[person & people] people
          pet-names (loop [pet-names pet-names
                           pets      (:pets person)]
                      (if-not (seq pets)
                        pet-names
                        (let [pet (first pets)]
                          (recur (conj pet-names pet)
                                 (next pets)))))]
      (recur pet-names people))))
; => #{"biscuit" "miette" "arthur meowington the third"}

Ooof. Again we’re threading accumulators in and out of nested structures, and the loop bodies are interwoven with iteration machinery We could fold this all into a single loop, in theory…

(loop [pet-names    #{}
       people       people
       pets         (:pets (first people))]
  (if-not (seq people)
    pet-names ; Done with outer loop
    (if-not (seq pets)
      ; Done with this person, move on to next
      (recur pet-names (next people) (:pets (first (next people))))
      (let [[pet & pets] pets]
        (recur (conj pet-names pet) people pets)))))

It is shorter, but there are so many ways to get this subtly wrong. I made at least four mistakes writing this loop, and it’s not even that complicated! More complex multi-dimensional iteration is (at least for my brain) playing on nightmare mode.

We do have a lovely, simple macro for nested iteration in Clojure: for.

(for [person people
      pet    (:pets person)]
  pet)
; => ("miette" "biscuit" "arthur meowington the third" "miette")

Problem is for returns a sequence of results, one for each iteration–and there’s no ability to carry accumulators. It’s more like map than reduce. That’s why loopr can take multiple iteration bindings. Just like for it traverses the first binding, then the second, then the third, and so on. Each binding pair has access to the currently bound values of the previous iterators.

(loopr [pet-names #{}]
       [person people
        pet    (:pets person)]
       (recur (conj pet-names pet)))

Without an explicit final expression loopr returns its sole accumulator (or a vector of accumulators, if more than one is given). We get a clear separation of accumulators, iterators, and the loop body. No nesting, much less indentation. It performs identically to the nested reduce, because it actually macroexpands to a very similar nested reduce. Both are about 40% faster than the nested loop with seqs.

Early Return

In reduce we use (reduced x) to return a value immediately; in loop you omit recur. The same works in loopr. Here, we find the first odd number in a collection, returning its index in the collection and the odd value.

(loopr [i 0]
       [x [0 3 4 5]]
       (if (odd? x)
         {:index i, :number x}
         (recur (inc i))))
; => {:index 1, :number 3}

With zero accumulators, loopr still iterates. This can be helpful for side effects or search. Here’s how to find a key in a map by that key’s corresponding value. Note that iterator bindings support the usual destructuring syntax–we can iterate over a map as [key value] pairs.

(loopr []
       [[k v] {:x 1, :y 2}]
       (if (= v 2)
         k
         (recur))
       :not-found)
; => :y

If we don’t return early, loopr returns the final form: :not-found.

Arrays

Sometimes you write an algorithm which reduces over vectors or other collections as a prototype, then start using arrays for speed. loopr can compile the same reduction to a loop using integer indices and aget operations. Just tell it you’d like to iterate :via :array.

(def ary (long-array (range 10000)))
(loopr [sum 0]
       [x ary :via :array]
       (recur (+ sum x)))
; => 49995000

This is on par with (reduce + ary) for single-dimensional reductions. For multi-dimensional arrays loopr is ~65% faster on my machine, though you do have to explicitly type-hint. Faster still with multiple accumulators, of course.

(def matrix (to-array-2d (repeat 1000 (range 1000))))
(loopr [sum 0]
       [row                     matrix :via :array
        x   ^"[Ljava.lang.Long;" row    :via :array]
       (recur (+ sum x)))
; => 499500000

You can control the iteration tactic for each collection separately, by the way. Here’s the average of the numbers in a vector of vectors, where we traverse the outermost vector using a reduce, and the inner vectors using a mutable iterator.

(loopr [count 0
        sum   0]
       [row [[1 2 3] [4 5 6] [7 8 9]] :via :reduce
        x   row                       :via :iterator]
       (recur (inc count) (+ sum x))
       (/ sum count))
; => 5

In Summary

I’m always hesitant to introduce nontrivial macros. That said, in the last five months I’ve written a lot of code with and without loopr, and I’ve found it frequently useful. It’s often clearer and more efficient than the code I was already writing, and it makes refactoring complex reductions less difficult. If you’d like to give it a shot, you’ll find it in dom-top, which is a small control flow library. You’ll find tons of examples and Criterium benchmarks in the test suite. I hope loopr helps you too.

PEZ on 2022-06-24

Very nice! Thanks for sharing.

For the first reduce example I usually use a map instead of a tuple:

(-> (reduce (fn [acc x]
              {:sum (+ (:sum acc) x) :count (inc (:count acc))})
            {:sum 0 :count 0}
            [1 2 3 4 5 6 7])
    ((fn [r] (/ (:sum r) (:count r)))))

But it aint pretty. I deffo prefer loopr!

Noah Bogart on 2022-06-26

I like this macro a lot but this line stood out to me:

Reduce is kind of backwards too–the things it starts with, the initial value and the collection, come last. The whole thing reads a bit inside-out.

(You probably already know the below, so my apologies if that’s the case)

The purpose of the collection going the final position is twofold: consistency of api and usage in threading macros (which then allows for leaving it off to create a transducer arity). Compare this to other languages where you have to mix functions and chained methods (python) or the methods mutate (javascript) or even other lisps that require remembering which argument the collection goes in (common lisp, scheme).

By enforcing coll-last function arguments, the thread-last macro becomes the most ergonomic method of nesting reducing functions. This leads nicely into writing transducers too, as leaving out the collection can return the xform and be used directly in comp calls.

In Clojure, loop and for are designed to be used at the start. It makes sense to me that loopr would have the same argument positions, but I could see use in slightly different semantics that match transducers (writing an anonymous function with multiple arities to denote step action vs final action, etc).

Thanks for this macro and the rest of the library! I’m gonna be using it a lot.

Aphyr on 2022-06-27

I do in fact know how ->> works and why reduce is shaped this way. You’ll find both in the macro itself, actually. :-)

To your point that one can use ->> to improve readability of nested reductions: it helps somewhat, but not entirely, and it comes with its own costs. ->> can move the collection to the beginning, but leaves the init at the end. You still have to jump back and forth around the fn body in order to thread the accumulator state into and out of nested reduces. It also chews up another 2-5 spaces of indentation per reduce, depending on style.

(reduce (fn [pet-names person]
          ; Thanks to ->>, the starting collection is now close to its point of
          ; definition
          (->> (:pets person)
               (reduce (fn [pet-names pet]
;         ^^^^^          ; Some
; Extra spaces!          ; long
                         ; complex
                         ; inner
                         ; loop
                         ; body
                         (conj pet-names pet))
                       ; But the accumulator is still far from its definition
                       pet-names))
          #{}
        people))

Not the end of the world, but I’ve still found this a little annoying, and it’s been the source of several bugs in my own code–I’d update an accumulator definition by, say, reordering vector elements in the fn args and return vals but miss the init, and it’d silently fail/corrupt state in some weird way.

I could see use in slightly different semantics that match transducers (writing an anonymous function with multiple arities to denote step action vs final action, etc).

I considered this, but it doesn’t really make sense with loopr’s multi-dimensional traversal, and I think binding the collection with let then using loopr over it is generally a reasonable alternative to a special-case loopr-with-single-collection-for-threading-macros. Hasn’t come up enough in my own use to bother me.

anonymous on 2022-06-29

Why is it called dom top 😭

anonymous on 2022-06-29

Unorthodox 😭

Kees

Ed on 2022-07-06

I’ve often used transducers for their completing step when calculating things like averages. Maybe something like:

  (transduce (map inc)
             (completing (fn [[sum cnt] x] [(+ sum x) (inc cnt)])
                         (fn [[sum cnt]] (/ sum cnt)))
             [0 0]
             (range 7))

Sam Griffith on 2022-07-06

I was wondering if you’d seen this library. It has some great features similar to what you are doing.

https://github.com/nathell/clj-iter

Nice article. Thanks for sharing what you learned and the macro you created.

Coby on 2022-07-07

Nit:

f returns a new accumulator value x1, which is then passed to (f accumulator x1)

Seems not quite accurate to say it returns x1, since x1 is just the next value, not the accumulator itself. Language is hard. :)

Aphyr on 2022-07-17

Seems not quite accurate to say it returns x1, since x1 is just the next value, not the accumulator itself. Language is hard. :)

Sigh, this is what I get for trying to finish a post at 1 AM. Fixed.

Aphyr on 2022-07-17

I was wondering if you’d seen this library. It has some great features similar to what you are doing.

Oh no! That’s extremely neat! Thanks for sharing. :-)

Axman6 on 2022-12-06

This reminds me a lot of the Haskell foldl library. The core data type is:

data Fold a b = forall s. Fold (a -> s -> s) s (s -> b)

which is three things - a function for combining some input with the current state, the current state, and a function to produce the final result from the state.

For simple folds, the code is trivial:

sum :: Num a => Fold a a
sum = Fold (+) 0 id

genericLength :: Num a => Fold x b
genericLength = Fold (\x n -> n + 1) (0::Int) fromIntegral

It’s pretty obvious that a Fold is a Functor:

instance Functor (Fold x) where
  fmap f (Fold step st done) = (Fold step st (f . done))

What might be a bit surprising is that it is also an Applicative Functor:

instance Applicative (Fold x) where
  pure a = Fold (\_ _ -> ()) () (const a)
  (Fold stepf str donef) <*> (Fold stepa sta donea) = 
        Fold 
          (\x (sf,sa) -> (stepf x sf, stepa x sa)) 
          (stf,sta)
          (\(sf,sa) -> donef sf (donea sa))

Because of this, it also means we can give Folds some really useful other instances, like Num and Fractional:

instance Num b => Num (Fold a b) where
  fromInteger a = pure (fromInteger a)
  (+) = liftA2 (+)
  ...

instance Fractional b => Fractional (Fold a b) where
  (/) = liftA2 (/)
  ...

This means that we can now write equations using folds - the classic example this is beulding up to is average:

average :: Fractional a => Fold a a
average = sum / genericLength

which operates in constant space and tends to produce quite efficient code, as there’s no recursion so each fold can be inlined into the definition of average.

We can also map over the input to the fold

-- If I have a fold that can accept be's and a function to convert a's into b's
-- I can make a fold which can accept a's.
premap :: (a -> b) -> Fold b r -> Fold a r
premap f (Fold step s done) = Fold (step . f) s done

which allows you to extract values from a larger structure

sumAges :: Fold Person Int
sumAges = premap age sum

averageAgeAndYearsAtCompany :: Fold EmployeeRecord (Double,Double)
averageAgeAndYearsAtCompany = 
    (,) <$> premap (fromIntegral . age . employee) average 
        <*> premap (yearsService . employmentHistory) average

How do we use these? using the fold function:

fold :: Foldable f => Fold a b -> f a -> b

which is ignostic of what sort of container it is operation over, or you can trivially write functions that take a fold and stream data in for any other source; all you need to be able to do is provide elements one at a time.

There are many other useful functions which allow you to deal with more complex inputs, and produce more complex outputs, such as

-- Only process a given input if the predicate holds
prefilter :: (a -> Bool) -> Fold a r -> Fold a r
-- Don't process elements until you find a successful item
predropWhile :: (a -> Bool) -> Fold a r -> Fold a r
map :: Ord a => Fold (a, b) (Map a b)
foldByKeyMap :: forall k a b. Ord k => Fold a b -> Fold (k, a) (Map k b)
-- Perform a Fold while grouping the data according to a specified group projection function. Returns the folded result grouped as a map keyed by the group.
groupBy :: Ord g => (a -> g) -> Fold a r -> Fold a (Map g r)
-- generic mapping of the input to zero or more values to be processed using lens-y things:
handles :: Handler a b -> Fold b r -> Fold a r
handles _1       :: Fold a r -> Fold (a, b) r
handles _Left    :: Fold a r -> Fold (Either a b) r
handles traverse :: Traversable t => Fold a r -> Fold (t a) r
handles folded   :: Foldable    t => Fold a r -> Fold (t a) r

Aphyr on 2022-12-07

…which is ignostic of what sort of container it is operation over, or you can trivially write functions that take a fold and stream data in for any other source; all you need to be able to do is provide elements one at a time.

Yeah, that’s very cool! This kind of stream fusion & multi-query optimization is something that the Clojure compiler and JVM JIT can’t (at least to my knowledge) do automatically–over in Clojure-land we have to be more explicit about the the composition. I’ve actually built a couple libraries to do just that: tesser and jepsen.history.fold. They’re both fairly heavyweight though, and loopr serves a slightly different niche. :-)

hashim on 2022-12-07

Neat! This seems very similar to for/fold and for*/fold in Racket I always find the amount of things Racket has to be surprising given how few people work on it.