Returning self or void suggests mutability

2015-02-12

So there’s a blog post that advises every method should, when possible, return self. I’d like to suggest you do the opposite: wherever possible, return something other than self.

Mutation is hard

Mutation makes code harder to reason about. Mutable objects make equality comparisons tricky: if you use a mutable object as the key in a hashmap, for instance, then change one of its fields, what happens? Can you access the value by the new string value? By the old one? What about a set? An array? For a fun time, try these in various languages. Try it with mutable primitives, like Strings, if the language makes a distinction. Enjoy the results.

If you call a function with a mutable object as an argument, you have very few guarantees about the new object’s value. It’s up to you to enforce invariants like “certain fields must be read together”.

If you have two threads interacting with mutable objects concurrently, things get weird fast.

Now, nobody’s arguing that mutability is always bad. There are really good reasons to mutate: your program ultimately must change state; must perform IO, to be meaningful. Mutation is usually faster, reduces GC pressure, and can be safe! It just comes with costs! The more of your program deals with pure values, the easier it is to reason about. If you compare two objects now, you know they’ll compare the same later. You can pass arguments to functions without ever having to worry that they’ll be changed out from underneath you. It gets easier to reason about thread safety.

Moreover, you don’t need a fancy type system like Haskell to experience these benefits: even in the unityped default-mutable wonderland of Ruby, having a culture that makes mutation explicit (for instance, gsub vs gsub!), a culture where not clobbering state is the default, can make our jobs a little easier. Remember, we don’t have to categorically prevent bugs; just make them less likely. Every bit helps.

Returning nil, void, or self strongly suggests impurity

Any time you see a method like

public void foo(String X) {
  ...
}

function(a, b) {
  ...
  return undefined;
}

def foo(args)
  ...
  self
end

you should read: “This function probably mutates state!” In an object oriented language, it might mutate the receiver (self or this). It might mutate any of its arguments. It might mutate variables in lexical scope. It might mutate the computing environment, by setting a global variable, or writing to the filesystem, or sending a network packet.

The hand-wavy argument for this is that there is exactly one meaningful pure function for each of these three return types: the constant void function, the constant nil function, and the identity function(s). If you see this signature used over and over, it’s a hint you’re staring at a big ball of mutable state.

Proof

We aim to show there is only one pure function returning void, one pure function returning nil, etc. In general, we wish to show for any value r you might care to return, there exists exactly one pure function which always returns r.

I’m going to try to write this for folks without a proofs background, but I will use some notation:

Capital letters, e.g. X, denote sets
f(x) is function application
a iff b means “a if, and only if, b”
| means “such that”
∀ x means “for all x”
∃ x means “there exists an x”
x ∈ X means “x is an element of the set X”
(x, y) is an ordered pair, like a tuple
X x Y is the Cartesian product: all ordered pairs of (x, y) taken from X and Y respectively.

Definitions

I’m going to depart slightly from the usual set-theoretic definitions to simplify the proof and reduce confusion with common CS terms. We’re interested in functions which might:

Take a receiver (e.g. this, self)
Take arguments
Return values
Throw exceptions
Depend on an environment
Mutate their environment

Let’s simplify.

A receiver is simply the first argument to a function.
Zero or multiple arguments can be represented as an ordered tuple: (), (arg1), (arg1, arg2, arg3, …).
Returning multiple return values (as in go) can be modeled by returning tuples.
Exceptions can be modeled as a special set of return values, e.g. (“exception”, “something bad!”)
In addition to mapping an argument to a return value, the function will map an initial environment e to a (possibly identical) final environment e’. The environment encapsulates IO, global variables, dynamic scope, mutable state, etc.

Now we adapt the usual set-theoretic graph definition of a function to our model:

Definition 1. A function f in an environment set E, from an input set X (the “domain”), to a set of return values Y (the “codomain”), written f: E, X -> Y, is the set of ordered tuples (e, e’, x, y) where e and e’ ∈ E, x ∈ X, and y ∈ Y, with two constraints:

Completeness. ∀ x ∈ X, e ∈ E: ∃ (e, e’, x, y) ∈ f.
Determinism. ∀ (e, e’, x, y) ∈ f: e’ = e’ and y = y if e = e and x = x

Completeness simply means that the function must return a value for all environments and x’s. Determinism just means that the environment and input x uniquely determine the new environment and return value. Nondeterministic functions are modeled by state in the environment.

We write function application in this model as f(e, x) = (e’, y). Read: “Calling f on x in environment e returns y and changes the environment to e’.”

Definition 2. A function is pure iff ∀ (e, e’, x, y) ∈ f, e = e’; e.g, its initial and final environments are identical.

There can be only one

We wish to show that for any value r, there is only one pure function which always returns r. Assume there exist two distinct pure functions f and g, over the same domain X, returning r. Remember, these functions are pure, so their initial and final environments are the same:

∀ e ∈ E, x ∈ X: f(e, x) -> (e, r)
∀ e ∈ E, x ∈ X: g(e, x) -> (e, r)

But by definition 1, f and g are simply:

f = {(e, e, x, r) | e ∈ E, x ∈ X}
g = {(e, e, x, r) | e ∈ E, x ∈ X}

… which are identical sets. We obtain a contradiction: f and g cannot be distinct; therefore, in any environment E and over any input set X, there exists only a single function returning r. ∎

You can make the exact same argument for functions that return their first (or nth) argument: they’re just variations on the identity function, one version for each arity:

(e, e, (x), x)
(e, e, (x, a), x)
(e, e, (x, a, b), x)
(e, e, (x, a, b, …), x)

Redundancy of functions over different domains

Given two pure single-valued functions over different domains f: E, X1 -> {r} and g: E, X2 -> {r}, let h be the set of all tuples in either f or g: h = f ∪ g.

Since f is pure, ∀ (e, e’, x, y) ∈ f, e = e’; and the same for g. Therefore, ∀ (e, e’, x, y) ∈ h, e = e’ as well: h does not mutate its environment.

Since f has a mapping for all combinations of environments in E and inputs in X1, so does h. And the same goes for g: h has mappings for all combinations of environments in E and inputs in X2. h is therefore complete over E and X1 ∪ X2.

Since f and g always return r, ∀ (e, e’, x, y) ∈ h, y = r too. Because h can never have multiple values for y (and because it does not mutate its environment), it is deterministic per definition 1.

Therefore, h is a pure function in E over X1 ∪ X2–and is therefore a pure function over either X1 or X2 alone. You can safely replace any instance of f or g with h: there isn’t really a point to having more than one pure function returning void, nil, etc. in your program, unless you’re doing it for static type safety.

Don’t believe me? Here’s a single Clojure function that can replace any pure function returning its first argument. Works on integers, strings, other functions… whatever types you like.

user=> (def selfie (fn [self & args] self)))
#'user/selfie
user=> (selfie 3)
3
user=> (selfie "channing" "tatum")
"channing"

Returning self suggests impurity

You can write the same function more than one way. Here are two pure functions in Ruby that both return self:

def meow
  self
end

def stretch
  nil
  ENV["USER"] + " in spaaace"
  5.3 / 3
  self
end

meow is just identity–but so is stretch, and, by our proof above, so is every other pure function returning self. The only difference is that stretch has useless dead code, which any compiler, linter, or human worth their salt will strip out. Writing code like this is probably silly. You can construct weird cases (interfaces, etc) where you want a whole bunch of identity functions, or (constantly nil), etc, but I think those are pretty rare.

What about calling a function then returning self?

def foo
  enjoy("http://shirtless-channing-tatum.biz")
  self
end

There are only two cases. If enjoy is pure, so is foo, and we can replace the function by

def foo
  self
end

If enjoy is impure (and let’s face it: shirtless Channing Tatum induces side effects in most callers), then foo is also impure, and we’re back to square one: mutation.

Final thoughts

When you see functions that return void, nil, or self, ask “what is this mutating?” If you have a pure function (say, returning the number of explosions in a film) and follow the advice of returning self as much as possible, you are turning a pure function into an impure one. You have to add state and mutability to the system. You should strive to do the opposite: reduce mutation wherever possible.

I assure you, return values are OK.

Post a Comment