This is a response to a Hacker News thread asking about concurrency vs parallelism.
Concurrency is more than decomposition, and more subtle than “different pieces running simultaneously.” It’s actually about causality.
Two operations are concurrent if they have no causal dependency between them.
That’s it, really. f(a) and g(b) are concurrent so long as a does not depend on g and b does not depend on f. If you’ve seen special relativity before, think of “concurrency” as meaning “spacelike”–events which can share no information with each other save a common past.
The concurrency invariant allows a compiler/interpreter/cpu/etc to make certain transformations of a program. For instance, it can take code like
x = f(a) y = g(b)
y = g(b) x = f(a)
… perhaps because b becomes available before a does. Both programs will produce identical functional results. Side effects like IO and queue operations could strictly speaking be said to violate concurrency, but in practice these kinds of reorderings are considered to be acceptable. Some compilers can use concurrency invariants to parallelize operations on a single chip by taking advantage of, say, SIMD instructions or vector operations:
PIPELINE1 PIPELINE2 x = f(a) y = g(b)
Or more often, vectorized variants of pure functions
[x1, x2, x3, x4] = [f(a1), f(a2), f(a3), f(a4)]
where f could be something like “multiply by 2”.
Concurrency allows for cooperative-multitasking optimizations. Unix processes are typically concurrent with each other, allowing the kernel to schedule them freely on the CPU. It also allows thread, CPU, and machine-level parallelism: executing non-dependent instructions in multiple places at the same wall-clock time.
CPU1 CPU2 x = f(a) y = g(b)
Languages provide a range of constructs for implicit and explicit concurrency (with the aim of parallelism), ranging from compiler optimizations that turn for loops into vector instructions, push matrix operations onto the GPU and so on; to things like Thread.new, Erlang processes, coroutines, futures, agents, actors, distributed mapreduce, etc. Many times the language and kernel cooperate to give you different kinds of parallelism for the same logical concurrency: say, executing four threads out of 16 simultaneously because that’s how many CPUs you have.
What does this mean in practice? It means that the fewer causal dependencies between parts of your program, the more freely you, the library, the language, and the CPU can rearrange instructions to improve throughput, latency, etc. If you build your program out of small components that have well-described inputs and outputs, control the use of mutable shared variables, and use the right synchronization primitives for the job (shared memory, compare-and-set, concurrent collections, message queues, STM, etc.), your code can go faster.