So there’s a blog post that advises every method should, when possible, return self. I’d like to suggest you do the opposite: wherever possible, return something *other* than `self`

.

## Mutation is hard

Mutation makes code harder to reason about. Mutable objects make equality comparisons tricky: if you use a mutable object as the key in a hashmap, for instance, then change one of its fields, what happens? Can you access the value by the new string value? By the old one? What about a set? An array? For a fun time, try these in various languages. Try it with mutable primitives, like Strings, if the language makes a distinction. Enjoy the results.

If you call a function with a mutable object as an argument, you have *very* few guarantees about the new object’s value. It’s up to you to enforce invariants like “certain fields must be read together”.

If you have two threads interacting with mutable objects concurrently, things get weird *fast*.

Now, nobody’s arguing that mutability is always bad. There are really good reasons to mutate: your program ultimately *must* change state; *must* perform IO, to be meaningful. Mutation is usually *faster*, reduces GC pressure, and can be safe! It just comes with costs! The more of your program deals with pure values, the easier it is to reason about. If you compare two objects now, you know they’ll compare the same later. You can pass arguments to functions without ever having to worry that they’ll be changed out from underneath you. It gets easier to reason about thread safety.

Moreover, you don’t need a fancy type system like Haskell to experience these benefits: even in the unityped default-mutable wonderland of Ruby, having a *culture* that makes mutation explicit (for instance, `gsub`

vs `gsub!`

), a culture where not clobbering state is the default, can make our jobs a little easier. Remember, we don’t have to categorically *prevent* bugs; just make them less likely. Every bit helps.

## Returning nil, void, or self strongly suggests impurity

Any time you see a method like

```
public void foo(String X) {
...
}
```

```
function(a, b) {
...
return undefined;
}
```

```
def foo(args)
...
self
end
```

you should read: “This function probably mutates state!” In an object oriented language, it might mutate the receiver (`self`

or `this`

). It might mutate any of its arguments. It might mutate variables in lexical scope. It might mutate the computing environment, by setting a global variable, or writing to the filesystem, or sending a network packet.

The hand-wavy argument for this is that there is *exactly* one meaningful pure function for each of these three return types: the constant void function, the constant nil function, and the identity function(s). If you see this signature used over and over, it’s a hint you’re staring at a big ball of mutable state.

## Proof

We aim to show there is only one pure function returning `void`

, one pure function returning `nil`

, etc. In general, we wish to show for *any* value r you might care to return, there exists exactly one pure function which always returns r.

I’m going to try to write this for folks without a proofs background, but I will use some notation:

- Capital letters, e.g. X, denote sets
- f(x) is function application
- a iff b means “a if, and only if, b”
- | means “such that”
- ∀ x means “for all x”
- ∃ x means “there exists an x”
- x ∈ X means “x is an element of the set X”
- (x, y) is an ordered pair, like a tuple
- X x Y is the Cartesian product: all ordered pairs of (x, y) taken from X and Y respectively.

### Definitions

I’m going to depart slightly from the usual set-theoretic definitions to simplify the proof and reduce confusion with common CS terms. We’re interested in functions which might:

- Take a receiver (e.g.
`this`

,`self`

) - Take arguments
- Return values
- Throw exceptions
- Depend on an environment
- Mutate their environment

Let’s simplify.

- A receiver is simply the first argument to a function.
- Zero or multiple arguments can be represented as an ordered tuple: (), (arg1), (arg1, arg2, arg3, …).
- Returning multiple return values (as in go) can be modeled by returning tuples.
- Exceptions can be modeled as a special set of return values, e.g. (“exception”, “something bad!”)
- In addition to mapping an argument to a return value, the function will map an initial environment e to a (possibly identical) final environment e'. The environment encapsulates IO, global variables, dynamic scope, mutable state, etc.

Now we adapt the usual set-theoretic graph definition of a function to our model:

**Definition 1.** A *function* f in an environment set E, from an input set X (the “domain”), to a set of return values Y (the “codomain”), written f: E, X -> Y, is the set of ordered tuples (e, e', x, y) where e and e' ∈ E, x ∈ X, and y ∈ Y, with two constraints:

- Completeness. ∀ x ∈ X, e ∈ E: ∃ (e, e', x, y) ∈ f.
- Determinism. ∀ (e, e', x, y) ∈ f: e' = e' and y = y if e = e and x = x

Completeness simply means that the function must return a value for all environments and x’s. Determinism just means that the environment and input x uniquely determine the new environment and return value. Nondeterministic functions are modeled by state in the environment.

We write function application in this model as f(e, x) = (e', y). Read: “Calling f on x in environment e returns y and changes the environment to e'.”

**Definition 2.** A function is *pure* iff ∀ (e, e', x, y) ∈ f, e = e'; e.g, its initial and final environments are identical.

### There can be only one

We wish to show that for any value r, there is only one pure function which always returns r. Assume there exist two distinct pure functions f and g, over the same domain X, returning r. Remember, these functions are pure, so their initial and final environments are the same:

- ∀ e ∈ E, x ∈ X: f(e, x) -> (e, r)
- ∀ e ∈ E, x ∈ X: g(e, x) -> (e, r)

But by definition 1, f and g are simply:

- f = {(e, e, x, r) | e ∈ E, x ∈ X}
- g = {(e, e, x, r) | e ∈ E, x ∈ X}

… which are identical sets. We obtain a contradiction: f and g *cannot* be distinct; therefore, in any environment E and over any input set X, there exists only a single function returning r. ∎

You can make the exact same argument for functions that return their first (or nth) argument: they’re just variations on the identity function, one version for each arity:

- (e, e, (x), x)
- (e, e, (x, a), x)
- (e, e, (x, a, b), x)
- (e, e, (x, a, b, …), x)

### Redundancy of functions over different domains

Given two pure single-valued functions over different domains f: E, X1 -> {r} and g: E, X2 -> {r}, let h be the set of all tuples in either f or g: h = f ∪ g.

Since f is pure, ∀ (e, e', x, y) ∈ f, e = e'; and the same for g. Therefore, ∀ (e, e', x, y) ∈ h, e = e' as well: h does not mutate its environment.

Since f has a mapping for all combinations of environments in E and inputs in X1, so does h. And the same goes for g: h has mappings for all combinations of environments in E and inputs in X2. h is therefore *complete* over E and X1 ∪ X2.

Since f and g always return r, ∀ (e, e', x, y) ∈ h, y = r too. Because h can never have multiple values for y (and because it does not mutate its environment), it is *deterministic* per definition 1.

Therefore, h is a pure function in E over X1 ∪ X2–and is therefore a pure function over either X1 or X2 alone. You can safely replace any instance of f or g with h: there isn’t really a point to having more than one pure function returning `void`

, `nil`

, etc. in your program, unless you’re doing it for static type safety.

Don’t believe me? Here’s a single Clojure function that can replace any pure function returning its first argument. Works on integers, strings, other functions… whatever types you like.

```
user=> (def selfie (fn [self & args] self)))
#'user/selfie
user=> (selfie 3)
3
user=> (selfie "channing" "tatum")
"channing"
```

## Returning self suggests impurity

You can write the same function more than one way. Here are two pure functions in Ruby that both return self:

```
def meow
self
end
def stretch
nil
ENV["USER"] + " in spaaace"
5.3 / 3
self
end
```

`meow`

is just `identity`

–but so is `stretch`

, and, by our proof above, so is *every other pure function returning self*. The only difference is that

`stretch`

has useless dead code, which any compiler, linter, or human worth their salt will strip out. Writing code like this is probably silly. You can construct weird cases (interfaces, etc) where you want a whole bunch of identity functions, or `(constantly nil)`

, etc, but I think those are pretty rare.What about calling a function then returning `self`

?

```
def foo
enjoy("http://shirtless-channing-tatum.biz")
self
end
```

There are only two cases. If `enjoy`

is pure, so is `foo`

, and we can replace the function by

```
def foo
self
end
```

If `enjoy`

is impure (and let’s face it: shirtless Channing Tatum induces side effects in most callers), then `foo`

is *also* impure, and we’re back to square one: mutation.

## Final thoughts

When you see functions that return `void`

, `nil`

, or `self`

, ask “what is this mutating?” If you have a pure function (say, returning the number of explosions in a film) and follow the advice of returning self as much as possible, you are *turning a pure function into an impure one*. You have to *add* state and mutability to the system. You should strive to do the opposite: reduce mutation wherever possible.

I assure you, return values are OK.

What do you think of Bostock’s getter-setter-methods? http ://bost.ocks.org/mike/chart/

Unclear whether these methods return a new chart with different parameters, or mutate the existing parameters in scope. I generally prefer the first because it reduces mutability, reduces the need for defensive copying, etc.

Grammatical mistake within first 4 words:

Nice article, loved the proof.

Regarding D3js, in general they mutate the underlying objects. That has bothered me a couple of times, but for such a performance sensitive work I can understand it.

A bit late, but do you think that languages (like Rust) whose type systems capture mutability change this much?

Personally, I’d be vastly more comfortable with things like

`struct Foo { x: u32, y: u32 } impl Foo { fn toggle_parity(self) -> Foo { // Takes self by-move, so effectively destroys and constructs a new one self.x = x ^ 1 self } fn extract_parity(&self, out: &mut bool) -> Foo { // Takes self by read-only reference, cannot mutate *out = self.x & 1 == 1 } }`

(Although in Rust, that last one would get you some odd looks - tuple-return makes output parameters somewhat unidiomatic)

Also, given self you can take &mut self (mutable reference), and given either you can take &self (immutable reference), so one can go “down” in mutation power, but not “up”.

Er, that

`extract_parity`

should return`&Foo`

and have a semicolon after the assignment and a trailing`self`