# Loopr: A Loop/Reduction Macro for Clojure

I write a lot of reductions: loops that combine every element from a collection in some way. For example, summing a vector of integers:

``````(reduce (fn [sum x] (+ sum x)) 0 [1 2 3])
; => 6
``````

If you’re not familiar with Clojure’s `reduce`, it takes a reducing function `f`, an initial accumulator `init`, and a collection `xs`. It then invokes `(f init x0)` where `x0` is the first element in `xs`. `f` returns a new accumulator value `acc1`, which is then passed to `(f acc1 x1)` to produce a new accumulator `acc2`, and so on until every `x` in `xs` is folded into the accumulator. That accumulator is the return value of `reduce`.

In writing reductions, there are some problems that I run into over and over. For example, what if you want to find the mean of some numbers in a single pass? You need two accumulator variables–a sum and a count. The usual answer to this is to make the accumulator a vector tuple. Destructuring bind makes this… not totally awful, but a little awkward:

``````(reduce (fn [[sum count] x]
[(+ sum x) (inc count)])
[0 0]
[1 2 3 4 5 6 7])
; => [28 7]
``````

Ah, right. We need to divide `sum` by `count` to get the actual mean. Fine, we’ll wrap that in a `let` binding:

``````(let [[sum count] (reduce (fn [[sum count] x]
[(+ sum x) (inc count)])
[0 0]
[1 2 3 4 5 6 7])]
(/ sum count))
; => 4
``````

This is awkward for a few reasons. One is that we’ve chewed up a fair bit of indentation, and indentation is a precious commodity if you’re an 80-column masochist like me. Another is that we’ve got the accumulator structure specified in four different places: the `let` binding’s left hand side, the `fn` arguments, the `fn` return value(s), and the initial value. When the reducing function is large and complex, these expressions can drift far apart from one another. They may not even fit on a single screen–start juggling a half-dozen variables and it’s easy for `init` to get out of sync with the `fn` args. Not the end of the world, but a little frustrating. Then there’s the runtime overhead. Creating and tearing apart all those vectors comes with significant performance cost.

We could write this as a `loop`:

``````(loop [sum   0
count 0
xs    [1 2 3 4 5 6 7]]
(if (seq xs)
(recur (+ sum (first xs)) (inc count) (next xs))
(/ sum count)))
; => 4
``````

No let binding, significantly less indentation. Brings the initial values for accumulators directly next to their names, which is nice. No vector destructuring overhead.

On the flip side, we now burn a lot of time in seq machinery: `next` allocates a seq wrapper for every single step. Clojure’s `reduce` traverses the internal structure of vectors without these wrappers, and is significantly more efficient. We also have this extra boilerplate for sequence traversal, and the traversal logic is mixed together with the reduction accumulator. When we used `reduce`, that traversal was implicit.

## Enter Loopr

Check this out:

``````(require '[dom-top.core :refer [loopr]])
(loopr [sum   0
count 0]
[x [1 2 3 4 5 6 7]]
(recur (+ sum x) (inc count))
(/ sum count))
; => 4
``````

`loopr` is a hybrid of loop and reduce. Like loop, it starts with a binding vector of accumulator variables and initial values. Then it takes a binding vector of iteration variables: for each `x` in the vector `[1 2 3 4 5 6 7]`, it evaluates the third form–the body of the loop. Just like Clojure’s `loop`, that body should `recur` with new values for each accumulator. The fourth argument to `loopr` is a final form, and is evaluated with the final values of the accumulators bound. That’s the return value for the loop.

Like `loop`, it keeps initial values close to their names, and needs no destructuring of accumulators. Like `reduce`, it leaves iteration implicit–closer to a `for` loop. It avoids the need for wrapping the `reduce` return value in another destructuring `let`, and requires much less indentation.

Did I mention it’s faster than both the `reduce` and `loop` shown here?

``````(def bigvec   (->> (range 10000) vec))
(def bigarray (->> (range 10000) long-array))
(def bigseq   (->> (range 10000) (map identity)))

(quick-bench
(loop [sum   0
count 0
xs    bigvec]
(if-not (seq xs)
[sum count]
(let [[x & xs] xs]
(recur (+ sum x) (inc count) xs)))))
; Evaluation count : 456 in 6 samples of 76 calls.
;              Execution time mean : 1.366176 ms
;     Execution time std-deviation : 23.450717 µs
;    Execution time lower quantile : 1.334857 ms ( 2.5%)
;    Execution time upper quantile : 1.392398 ms (97.5%)
;                    Overhead used : 20.320257 ns

(quick-bench
(reduce (fn [[sum count] x]
[(+ sum x) (inc count)])
[0 0]
bigvec))
; Evaluation count : 588 in 6 samples of 98 calls.
;              Execution time mean : 1.284103 ms
;     Execution time std-deviation : 118.742660 µs
;    Execution time lower quantile : 1.106587 ms ( 2.5%)
;    Execution time upper quantile : 1.354247 ms (97.5%)
;                    Overhead used : 20.320257 ns

(quick-bench
(loopr [sum 0, count 0]
[x bigvec]
(recur (+ sum x) (inc count))))
; Evaluation count : 792 in 6 samples of 132 calls.
;              Execution time mean : 793.698823 µs
;     Execution time std-deviation : 52.061355 µs
;    Execution time lower quantile : 763.412280 µs ( 2.5%)
;    Execution time upper quantile : 883.322045 µs (97.5%)
;                    Overhead used : 20.320257 ns
``````

How so fast? `loopr` macroexpands into `loop` over a mutable iterator, loop with `aget` for arrays, or `reduce`, depending on some heuristics about which tactic is likely to be fastest for your structure. It speeds up multi-accumulator `reduce` by squirreling away extra accumulators in stateful volatiles.

## Multidimensional Reductions

Another problem I hit all the time: reducing over nested collections. Say we’ve got a bunch of people, each one with some pets:

``````(def people [{:name "zhao"
:pets ["miette" "biscuit"]}
{:name "chloe"
:pets ["arthur meowington the third" "miette"]}])
``````

And I wanted to, say, find the set of all pet names. With nested collections, we need a new reduce for each level of nesting.

``````(reduce (fn [pet-names person]
(reduce (fn [pet-names pet]
(conj pet-names pet))
pet-names
(:pets person)))
#{}
people)
; => #{"biscuit" "miette" "arthur meowington the third"}
``````

(I know you could write this with a single level of `reduce` via `into` or `set/union`, but we’re using this to illustrate a pattern that’s necessary for more complex reductions, especially those in 3 or 4 dimensions.)

Two problems here. One is that those `reduce`s chew up indentation real quick. Another is that we wind up specifying `pet-names` over and over again–threading it in and out of the inner reduce. Reduce is kind of backwards too–the things it starts with, the initial value and the collection, come last. The whole thing reads a bit inside-out.

What about `loop`? Any better?

``````(loop [pet-names    #{}
people       people]
(if-not (seq people)
pet-names
(let [[person & people] people
pet-names (loop [pet-names pet-names
pets      (:pets person)]
(if-not (seq pets)
pet-names
(let [pet (first pets)]
(recur (conj pet-names pet)
(next pets)))))]
(recur pet-names people))))
; => #{"biscuit" "miette" "arthur meowington the third"}
``````

Ooof. Again we’re threading accumulators in and out of nested structures, and the loop bodies are interwoven with iteration machinery We could fold this all into a single loop, in theory…

``````(loop [pet-names    #{}
people       people
pets         (:pets (first people))]
(if-not (seq people)
pet-names ; Done with outer loop
(if-not (seq pets)
; Done with this person, move on to next
(recur pet-names (next people) (:pets (first (next people))))
(let [[pet & pets] pets]
(recur (conj pet-names pet) people pets)))))
``````

It is shorter, but there are so many ways to get this subtly wrong. I made at least four mistakes writing this loop, and it’s not even that complicated! More complex multi-dimensional iteration is (at least for my brain) playing on nightmare mode.

We do have a lovely, simple macro for nested iteration in Clojure: `for`.

``````(for [person people
pet    (:pets person)]
pet)
; => ("miette" "biscuit" "arthur meowington the third" "miette")
``````

Problem is `for` returns a sequence of results, one for each iteration–and there’s no ability to carry accumulators. It’s more like `map` than `reduce`. That’s why `loopr` can take multiple iteration bindings. Just like `for` it traverses the first binding, then the second, then the third, and so on. Each binding pair has access to the currently bound values of the previous iterators.

``````(loopr [pet-names #{}]
[person people
pet    (:pets person)]
(recur (conj pet-names pet)))
``````

Without an explicit final expression `loopr` returns its sole accumulator (or a vector of accumulators, if more than one is given). We get a clear separation of accumulators, iterators, and the loop body. No nesting, much less indentation. It performs identically to the nested `reduce`, because it actually macroexpands to a very similar nested `reduce`. Both are about 40% faster than the nested `loop` with seqs.

## Early Return

In `reduce` we use `(reduced x)` to return a value immediately; in `loop` you omit `recur`. The same works in `loopr`. Here, we find the first odd number in a collection, returning its index in the collection and the odd value.

``````(loopr [i 0]
[x [0 3 4 5]]
(if (odd? x)
{:index i, :number x}
(recur (inc i))))
; => {:index 1, :number 3}
``````

With zero accumulators, loopr still iterates. This can be helpful for side effects or search. Here’s how to find a key in a map by that key’s corresponding value. Note that iterator bindings support the usual destructuring syntax–we can iterate over a map as `[key value]` pairs.

``````(loopr []
[[k v] {:x 1, :y 2}]
(if (= v 2)
k
(recur))
:not-found)
; => :y
``````

If we don’t return early, `loopr` returns the final form: `:not-found`.

## Arrays

Sometimes you write an algorithm which reduces over vectors or other collections as a prototype, then start using arrays for speed. `loopr` can compile the same reduction to a `loop` using integer indices and `aget` operations. Just tell it you’d like to iterate `:via :array`.

``````(def ary (long-array (range 10000)))
(loopr [sum 0]
[x ary :via :array]
(recur (+ sum x)))
; => 49995000
``````

This is on par with `(reduce + ary)` for single-dimensional reductions. For multi-dimensional arrays `loopr` is ~65% faster on my machine, though you do have to explicitly type-hint. Faster still with multiple accumulators, of course.

``````(def matrix (to-array-2d (repeat 1000 (range 1000))))
(loopr [sum 0]
[row                     matrix :via :array
x   ^"[Ljava.lang.Long;" row    :via :array]
(recur (+ sum x)))
; => 499500000
``````

You can control the iteration tactic for each collection separately, by the way. Here’s the average of the numbers in a vector of vectors, where we traverse the outermost vector using a `reduce`, and the inner vectors using a mutable iterator.

``````(loopr [count 0
sum   0]
[row [[1 2 3] [4 5 6] [7 8 9]] :via :reduce
x   row                       :via :iterator]
(recur (inc count) (+ sum x))
(/ sum count))
; => 5
``````

## In Summary

I’m always hesitant to introduce nontrivial macros. That said, in the last five months I’ve written a lot of code with and without `loopr`, and I’ve found it frequently useful. It’s often clearer and more efficient than the code I was already writing, and it makes refactoring complex reductions less difficult. If you’d like to give it a shot, you’ll find it in dom-top, which is a small control flow library. You’ll find tons of examples and Criterium benchmarks in the test suite. I hope `loopr` helps you too. Very nice! Thanks for sharing.

For the first `reduce` example I usually use a map instead of a tuple:

``````(-> (reduce (fn [acc x]
{:sum (+ (:sum acc) x) :count (inc (:count acc))})
{:sum 0 :count 0}
[1 2 3 4 5 6 7])
((fn [r] (/ (:sum r) (:count r)))))
``````

But it aint pretty. I deffo prefer `loopr`! Noah Bogart on

I like this macro a lot but this line stood out to me:

Reduce is kind of backwards too–the things it starts with, the initial value and the collection, come last. The whole thing reads a bit inside-out.

(You probably already know the below, so my apologies if that’s the case)

The purpose of the collection going the final position is twofold: consistency of api and usage in threading macros (which then allows for leaving it off to create a transducer arity). Compare this to other languages where you have to mix functions and chained methods (python) or the methods mutate (javascript) or even other lisps that require remembering which argument the collection goes in (common lisp, scheme).

By enforcing coll-last function arguments, the thread-last macro becomes the most ergonomic method of nesting reducing functions. This leads nicely into writing transducers too, as leaving out the collection can return the xform and be used directly in `comp` calls.

In Clojure, `loop` and `for` are designed to be used at the start. It makes sense to me that `loopr` would have the same argument positions, but I could see use in slightly different semantics that match transducers (writing an anonymous function with multiple arities to denote step action vs final action, etc).

Thanks for this macro and the rest of the library! I’m gonna be using it a lot.

Aphyr on

I do in fact know how `->>` works and why `reduce` is shaped this way. You’ll find both in the macro itself, actually. :-)

To your point that one can use `->>` to improve readability of nested reductions: it helps somewhat, but not entirely, and it comes with its own costs. `->>` can move the collection to the beginning, but leaves the `init` at the end. You still have to jump back and forth around the fn body in order to thread the accumulator state into and out of nested reduces. It also chews up another 2-5 spaces of indentation per reduce, depending on style.

``````(reduce (fn [pet-names person]
; Thanks to ->>, the starting collection is now close to its point of
; definition
(->> (:pets person)
(reduce (fn [pet-names pet]
;         ^^^^^          ; Some
; Extra spaces!          ; long
; complex
; inner
; loop
; body
(conj pet-names pet))
; But the accumulator is still far from its definition
pet-names))
#{}
people))
``````

Not the end of the world, but I’ve still found this a little annoying, and it’s been the source of several bugs in my own code–I’d update an accumulator definition by, say, reordering vector elements in the `fn` args and return vals but miss the `init`, and it’d silently fail/corrupt state in some weird way.

I could see use in slightly different semantics that match transducers (writing an anonymous function with multiple arities to denote step action vs final action, etc).

I considered this, but it doesn’t really make sense with `loopr`’s multi-dimensional traversal, and I think binding the collection with `let` then using `loopr` over it is generally a reasonable alternative to a special-case `loopr-with-single-collection-for-threading-macros`. Hasn’t come up enough in my own use to bother me.  anonymous on

Why is it called dom top 😭 anonymous on

Unorthodox 😭

Kees I’ve often used transducers for their completing step when calculating things like averages. Maybe something like:

``````  (transduce (map inc)
(completing (fn [[sum cnt] x] [(+ sum x) (inc cnt)])
(fn [[sum cnt]] (/ sum cnt)))
[0 0]
(range 7))
`````` Sam Griffith on

I was wondering if you’d seen this library. It has some great features similar to what you are doing.

https://github.com/nathell/clj-iter

Nice article. Thanks for sharing what you learned and the macro you created. Coby on

Nit:

f returns a new accumulator value x1, which is then passed to (f accumulator x1)

Seems not quite accurate to say it returns `x1`, since `x1` is just the next value, not the accumulator itself. Language is hard. :)

Aphyr on

Seems not quite accurate to say it returns x1, since x1 is just the next value, not the accumulator itself. Language is hard. :)

Sigh, this is what I get for trying to finish a post at 1 AM. Fixed. Aphyr on

I was wondering if you’d seen this library. It has some great features similar to what you are doing.

Oh no! That’s extremely neat! Thanks for sharing. :-)  Axman6 on

This reminds me a lot of the Haskell foldl library. The core data type is:

``````data Fold a b = forall s. Fold (a -> s -> s) s (s -> b)
``````

which is three things - a function for combining some input with the current state, the current state, and a function to produce the final result from the state.

For simple folds, the code is trivial:

``````sum :: Num a => Fold a a
sum = Fold (+) 0 id

genericLength :: Num a => Fold x b
genericLength = Fold (\x n -> n + 1) (0::Int) fromIntegral
``````

It’s pretty obvious that a `Fold` is a `Functor`:

``````instance Functor (Fold x) where
fmap f (Fold step st done) = (Fold step st (f . done))
``````

What might be a bit surprising is that it is also an Applicative Functor:

``````instance Applicative (Fold x) where
pure a = Fold (\_ _ -> ()) () (const a)
(Fold stepf str donef) <*> (Fold stepa sta donea) =
Fold
(\x (sf,sa) -> (stepf x sf, stepa x sa))
(stf,sta)
(\(sf,sa) -> donef sf (donea sa))
``````

Because of this, it also means we can give Folds some really useful other instances, like `Num` and `Fractional`:

``````instance Num b => Num (Fold a b) where
fromInteger a = pure (fromInteger a)
(+) = liftA2 (+)
...

instance Fractional b => Fractional (Fold a b) where
(/) = liftA2 (/)
...
``````

This means that we can now write equations using folds - the classic example this is beulding up to is average:

``````average :: Fractional a => Fold a a
average = sum / genericLength
``````

which operates in constant space and tends to produce quite efficient code, as there’s no recursion so each fold can be inlined into the definition of `average`.

We can also map over the input to the fold

``````-- If I have a fold that can accept be's and a function to convert a's into b's
-- I can make a fold which can accept a's.
premap :: (a -> b) -> Fold b r -> Fold a r
premap f (Fold step s done) = Fold (step . f) s done
``````

which allows you to extract values from a larger structure

``````sumAges :: Fold Person Int
sumAges = premap age sum

averageAgeAndYearsAtCompany :: Fold EmployeeRecord (Double,Double)
averageAgeAndYearsAtCompany =
(,) <\$> premap (fromIntegral . age . employee) average
<*> premap (yearsService . employmentHistory) average
``````

How do we use these? using the `fold` function:

``````fold :: Foldable f => Fold a b -> f a -> b
``````

which is ignostic of what sort of container it is operation over, or you can trivially write functions that take a fold and stream data in for any other source; all you need to be able to do is provide elements one at a time.

There are many other useful functions which allow you to deal with more complex inputs, and produce more complex outputs, such as

``````-- Only process a given input if the predicate holds
prefilter :: (a -> Bool) -> Fold a r -> Fold a r
-- Don't process elements until you find a successful item
predropWhile :: (a -> Bool) -> Fold a r -> Fold a r
map :: Ord a => Fold (a, b) (Map a b)
foldByKeyMap :: forall k a b. Ord k => Fold a b -> Fold (k, a) (Map k b)
-- Perform a Fold while grouping the data according to a specified group projection function. Returns the folded result grouped as a map keyed by the group.
groupBy :: Ord g => (a -> g) -> Fold a r -> Fold a (Map g r)
-- generic mapping of the input to zero or more values to be processed using lens-y things:
handles :: Handler a b -> Fold b r -> Fold a r
handles _1       :: Fold a r -> Fold (a, b) r
handles _Left    :: Fold a r -> Fold (Either a b) r
handles traverse :: Traversable t => Fold a r -> Fold (t a) r
handles folded   :: Foldable    t => Fold a r -> Fold (t a) r
``````
Aphyr on

…which is ignostic of what sort of container it is operation over, or you can trivially write functions that take a fold and stream data in for any other source; all you need to be able to do is provide elements one at a time.

Yeah, that’s very cool! This kind of stream fusion & multi-query optimization is something that the Clojure compiler and JVM JIT can’t (at least to my knowledge) do automatically–over in Clojure-land we have to be more explicit about the the composition. I’ve actually built a couple libraries to do just that: tesser and jepsen.history.fold. They’re both fairly heavyweight though, and loopr serves a slightly different niche. :-)  hashim on

Neat! This seems very similar to for/fold and for*/fold in Racket I always find the amount of things Racket has to be surprising given how few people work on it.

Comments are moderated. Links have `nofollow`. Seriously, spammers, give it a rest.
Supports Github-flavored Markdown, including `[links](http://foo.com/)`, `*emphasis*`, `_underline_`, ``code``, and `> blockquotes`. Use ````clj` on its own line to start an (e.g.) Clojure code block, and ````` to end the block.