We’ve learned the basics of Clojure’s syntax and evaluation model. Now we’ll take a tour of the basic nouns in the language.
Types
We’ve seen a few different values already–for instance, nil
, true
, false
, 1
, 2.34
, and "meow"
. Clearly all these things are different values, but some of them seem more alike than others.
For instance, 1
and 2
are very similar numbers; both can be added, divided, multiplied, and subtracted. 2.34
is also a number, and acts very much like 1 and 2, but it’s not quite the same. It’s got decimal points. It’s not an integer. And clearly true
is not very much like a number. What is true plus one? Or false divided by 5.3? These questions are poorly defined.
We say that a type is a group of values which work in the same way. It’s a property that some values share, which allows us to organize the world into sets of similar things. 1 + 1 and 1 + 2 use the same addition, which adds together integers. Types also help us verify that a program makes sense: that you can only add together numbers, instead of adding numbers to porcupines.
Types can overlap and intersect each other. Cats are animals, and cats are fuzzy too. You could say that a cat is a member (or sometimes “instance”), of the fuzzy and animal types. But there are fuzzy things like moss which aren’t animals, and animals like alligators that aren’t fuzzy in the slightest.
Other types completely subsume one another. All tabbies are housecats, and all housecats are felidae, and all felidae are animals. Everything which is true of an animal is automatically true of a housecat. Hierarchical types make it easier to write programs which don’t need to know all the specifics of every value; and conversely, to create new types in terms of others. But they can also get in the way of the programmer, because not every useful classification (like “fuzziness”) is purely hierarchical. Expressing overlapping types in a hierarchy can be tricky.
Every language has a type system; a particular way of organizing nouns into types, figuring out which verbs make sense on which types, and relating types to one another. Some languages are strict, and others more relaxed. Some emphasize hierarchy, and others a more ad-hoc view of the world. We call Clojure’s type system strong in that operations on improper types are simply not allowed: the program will explode if asked to subtract a dandelion. We also say that Clojure’s types are dynamic because they are enforced when the program is run, instead of when the program is first read by the computer.
We’ll learn more about the formal relationships between types later, but for now, keep this in the back of your head. It’ll start to hook in to other concepts later.
Integers
Let’s find the type of the number 3:
user=> (type 3)
java.lang.Long
So 3 is a java.lang.Long
, or a “Long”, for short. Because Clojure is built on top of Java, many of its types are plain old Java types.
Longs, internally, are represented as a group of sixty-four binary digits (ones and zeroes), written down in a particular pattern called signed two’s complement representation. You don’t need to worry about the specifics–there are only two things to remember about longs. First, longs use one bit to store the sign: whether the number is positive or negative. Second, the other 63 bits represent the size of the number. That means the biggest number you can represent with a long is 263 - 1 (the minus one is because of the number 0), and the smallest long is -263.
How big is 263 - 1?
user=> Long/MAX_VALUE
9223372036854775807
That’s a reasonably big number. Most of the time, you won’t need anything bigger, but… what if you did? What happens if you add one to the biggest Long?
user=> (inc Long/MAX_VALUE)
ArithmeticException integer overflow clojure.lang.Numbers.throwIntOverflow (Numbers.java:1388)
An error occurs! This is Clojure telling us that something went wrong. The type of error was an ArithmeticException
, and its message was “integer overflow”, meaning “this type of number can’t hold a number that big”. The error came from a specific place in the source code of the program: Numbers.java
, on line 1388. That’s a part of the Clojure source code. Later, we’ll learn more about how to unravel error messages and find out what went wrong.
The important thing is that Clojure’s type system protected us from doing something dangerous; instead of returning a corrupt value, it aborted evaluation and returned an error.
If you do need to talk about really big numbers, you can use a BigInt: an arbitrary-precision integer. Let’s convert the biggest Long into a BigInt, then increment it:
user=> (inc (bigint Long/MAX_VALUE))
9223372036854775808N
Notice the N at the end? That’s how Clojure writes arbitrary-precision integers.
user=> (type 5N)
clojure.lang.BigInt
There are also smaller numbers.
user=> (type (int 0))
java.lang.Integer
user=> (type (short 0))
java.lang.Short
user=> (type (byte 0))
java.lang.Byte
Integers are half the size of Longs; they store values in 32 bits. Shorts are 16 bits, and Bytes are 8. That means their biggest values are 231-1, 215-1, and 27-1, respectively.
user=> Integer/MAX_VALUE
2147483647
user=> Short/MAX_VALUE
32767
user=> Byte/MAX_VALUE
127
Fractional numbers
To represent numbers between integers, we often use floating-point numbers, which can represent small numbers with fine precision, and large numbers with coarse precision. Floats use 32 bits, and Doubles use 64. Doubles are the default in Clojure.
user=> (type 1.23)
java.lang.Double
user=> (type (float 1.23))
java.lang.Float
Floating point math is complicated, and we won’t get bogged down in the details just yet. The important thing to know is floats and doubles are approximations. There are limits to their correctness:
user=> 0.99999999999999999
1.0
To represent fractions exactly, we can use the ratio type:
user=> (type 1/3)
clojure.lang.Ratio
Mathematical operations
The exact behavior of mathematical operations in Clojure depends on their types. In general, though, Clojure aims to preserve information. Adding two longs returns a long; adding a double and a long returns a double.
user=> (+ 1 2)
3
user=> (+ 1 2.0)
3.0
3
and 3.0
are not the same number; one is a long, and the other a double. But for most purposes, they’re equivalent, and Clojure will tell you so:
user=> (= 3 3.0)
false
user=> (== 3 3.0)
true
=
asks whether all the things that follow are equal. Since floats are approximations, =
considers them different from integers. ==
also compares things, but a little more loosely: it considers integers equivalent to their floating-point representations.
We can also subtract with -
, multiply with *
, and divide with /
.
user=> (- 3 1)
2
user=> (* 1.5 3)
4.5
user=> (/ 1 2)
1/2
Putting the verb first in each list allows us to add or multiply more than one number in the same step:
user=> (+ 1 2 3)
6
user=> (* 2 3 1/5)
6/5
Subtraction with more than 2 numbers subtracts all later numbers from the first. Division divides the first number by all the rest.
user=> (- 5 1 1 1)
2
user=> (/ 24 2 3)
4
By extension, we can define useful interpretations for numeric operations with just a single number:
user=> (+ 2)
2
user=> (- 2)
-2
user=> (* 4)
4
user=> (/ 4)
1/4
We can also add or multiply a list of no numbers at all, obtaining the additive and multiplicative identities, respectively. This might seem odd, especially coming from other languages, but we’ll see later that these generalizations make it easier to reason about higher-level numeric operations.
user=> (+)
0
user=> (*)
1
Often, we want to ask which number is bigger, or if one number falls between two others. <=
means “less than or equal to”, and asserts that all following values are in order from smallest to biggest.
user=> (<= 1 2 3)
true
user=> (<= 1 3 2)
false
<
means “strictly less than”, and works just like <=
, except that no two values may be equal.
user=> (<= 1 1 2)
true
user=> (< 1 1 2)
false
Their friends >
and >=
mean “greater than” and “greater than or equal to”, respectively, and assert that numbers are in descending order.
user=> (> 3 2 1)
true
user=> (> 1 2 3)
false
Also commonly used are inc
and dec
, which add and subtract one to a number, respectively:
user=> (inc 5)
6
user=> (dec 5)
4
One final note: equality tests can take more than 2 numbers as well.
user=> (= 2 2 2)
true
user=> (= 2 2 3)
false
Strings
We saw that strings are text, surrounded by double quotes, like "foo"
. Strings in Clojure are, like Longs, Doubles, and company, backed by a Java type:
user=> (type "cat")
java.lang.String
We can make almost anything into a string with str
. Strings, symbols, numbers, booleans; every value in Clojure has a string representation. Note that nil
’s string representation is ""
; an empty string.
user=> (str "cat")
"cat"
user=> (str 'cat)
"cat"
user=> (str 1)
"1"
user=> (str true)
"true"
user=> (str '(1 2 3))
"(1 2 3)"
user=> (str nil)
""
str
can also combine things together into a single string, which we call “concatenation”.
user=> (str "meow " 3 " times")
"meow 3 times"
To look for patterns in text, we can use a regular expression, which is a tiny language for describing particular arrangements of text. re-find
and re-matches
look for occurrences of a regular expression in a string. To find a cat:
user=> (re-find #"cat" "mystic cat mouse")
"cat"
user=> (re-find #"cat" "only dogs here")
nil
That #"..."
is Clojure’s way of writing a regular expression.
With re-matches
, you can extract particular parts of a string which match an expression. Here we find two strings, separated by a :
. The parentheses mean that the regular expression should capture that part of the match. We get back a list containing the part of the string that matched the first parentheses, followed by the part that matched the second parentheses.
user=> (rest (re-matches #"(.+):(.+)" "mouse:treat"))
("mouse" "treat")
Regular expressions are a powerful tool for searching and matching text, especially when working with data files. Since regexes work the same in most languages, you can use any guide online to learn more. It’s not something you have to master right away; just learn specific tricks as you find you need them. For a deeper guide, try Fitzgerald’s Introducing Regular Expressions.
Booleans and logic
Everything in Clojure has a sort of charge, a truth value, sometimes called “truthiness”. true
is positive and false
is negative. nil
is negative, too.
user=> (boolean true)
true
user=> (boolean false)
false
user=> (boolean nil)
false
Every other value in Clojure is positive.
user=> (boolean 0)
true
user=> (boolean 1)
true
user=> (boolean "hi there")
true
user=> (boolean str)
true
If you’re coming from a C-inspired language, where 0 is considered false, this might be a bit surprising. Likewise, in much of POSIX, 0 is considered success and nonzero values are failures. Lisp allows no such confusion: the only negative values are false
and nil
.
We can reason about truth values using and
, or
, and not
. and
returns the first negative value, or the last value if all are truthy.
user=> (and true false true)
false
user=> (and true true true)
true
user=> (and 1 2 3)
3
Similarly, or
returns the first positive value.
user=> (or false 2 3)
2
user=> (or false nil)
nil
And not
inverts the logical sense of a value:
user=> (not 2)
false
user=> (not nil)
true
We’ll learn more about Boolean logic when we start talking about control flow; the way we alter evaluation of a program and express ideas like “if I’m a cat, then meow incessantly”.
Symbols
We saw symbols in the previous chapter; they’re bare strings of characters, like foo
or +
.
user=> (class 'str)
clojure.lang.Symbol
Symbols can have either short or full names. The short name is used to refer to things locally. The fully qualified name is used to refer unambiguously to a symbol from anywhere. If I were a symbol, my name would be “Kyle”, and my full name “Kyle Kingsbury.”
Symbol names are separated with a /
. For instance, the symbol str
is also present in a family called clojure.core
; the corresponding full name is clojure.core/str
.
user=> (= str clojure.core/str)
true
user=> (name 'clojure.core/str)
"str"
When we talked about the maximum size of an integer, that was a fully-qualified symbol, too.
(type 'Integer/MAX_VALUE)
clojure.lang.Symbol
The job of symbols is to refer to things, to point to other values. When evaluating a program, symbols are looked up and replaced by their corresponding values. That’s not the only use of symbols, but it’s the most common.
Keywords
Closely related to symbols and strings are keywords, which begin with a :
. Keywords are like strings in that they’re made up of text, but are specifically intended for use as labels or identifiers. These aren’t labels in the sense of symbols: keywords aren’t replaced by any other value. They’re just names, by themselves.
user=> (type :cat)
clojure.lang.Keyword
user=> (str :cat)
":cat"
user=> (name :cat)
"cat"
As labels, keywords are most useful when paired with other values in a collection, like a map. Keywords can also be used as verbs to look up specific values in other data types. We’ll learn more about keywords shortly.
Lists
A collection is a group of values. It’s a container which provides some structure, some framework, for the things that it holds. We say that a collection contains elements, or members. We saw one kind of collection–a list–in the previous chapter.
user=> '(1 2 3)
(1 2 3)
user=> (type '(1 2 3))
clojure.lang.PersistentList
Remember, we quote lists with a '
to prevent them from being evaluated. You can also construct a list using list
:
user=> (list 1 2 3)
(1 2 3)
Lists are comparable just like every other value:
user=> (= (list 1 2) (list 1 2))
true
You can modify a list by conj
oining an element onto it:
user=> (conj '(1 2 3) 4)
(4 1 2 3)
We added 4 to the list–but it appeared at the front. Why? Internally, lists are stored as a chain of values: each link in the chain is a tiny box which holds the value and a connection to the next link. This data structure, called a linked list, offers immediate access to the first element.
user=> (first (list 1 2 3))
1
But getting to the second element requires an extra hop down the chain
user=> (second (list 1 2 3))
2
and the third element a hop after that, and so on.
user=> (nth (list 1 2 3) 2)
3
nth
gets the element of an ordered collection at a particular index. The first element is index 0, the second is index 1, and so on.
This means that lists are well-suited for small collections, or collections which are read in linear order, but are slow when you want to get arbitrary elements from later in the list. For fast access to every element, we use a vector.
Vectors
Vectors are surrounded by square brackets, just like lists are surrounded by parentheses. Because vectors aren’t evaluated like lists are, there’s no need to quote them:
user=> [1 2 3]
[1 2 3]
user=> (type [1 2 3])
clojure.lang.PersistentVector
You can also create vectors with vector
, or change other structures into vectors with vec
:
user=> (vector 1 2 3)
[1 2 3]
user=> (vec (list 1 2 3))
[1 2 3]
conj
on a vector adds to the end, not the start:
user=> (conj [1 2 3] 4)
[1 2 3 4]
Our friends first
, second
, and nth
work here too; but unlike lists, nth
is fast on vectors. That’s because internally, vectors are represented as a very broad tree of elements, where each part of the tree branches into 32 smaller trees. Even very large vectors are only a few layers deep, which means getting to elements only takes a few hops.
In addition to first
, you’ll often want to get the remaining elements in a collection. There are two ways to do this:
user=> (rest [1 2 3])
(2 3)
user=> (next [1 2 3])
(2 3)
rest
and next
both return “everything but the first element”. They differ only by what happens when there are no remaining elements:
user=> (rest [1])
()
user=> (next [1])
nil
rest
returns logical true, next
returns logical false. Each has their uses, but in almost every case they’re equivalent–I interchange them freely.
We can get the final element of any collection with last
:
user=> (last [1 2 3])
3
And figure out how big the vector is with count
:
user=> (count [1 2 3])
3
Because vectors are intended for looking up elements by index, we can also use them directly as verbs:
user=> ([:a :b :c] 1)
:b
So we took the vector containing three keywords, and asked “What’s the element at index 1?” Lisp, like most (but not all!) modern languages, counts up from zero, not one. Index 0 is the first element, index 1 is the second element, and so on. In this vector, finding the element at index 1 evaluates to :b
.
Finally, note that vectors and lists containing the same elements are considered equal in Clojure:
user=> (= '(1 2 3) [1 2 3])
true
In almost all contexts, you can consider vectors, lists, and other sequences as interchangeable. They only differ in their performance characteristics, and in a few data-structure-specific operations.
Sets
Sometimes you want an unordered collection of values; especially when you plan to ask questions like “does the collection have the number 3 in it?” Clojure, like most languages, calls these collections sets.
user=> #{:a :b :c}
#{:a :c :b}
Sets are surrounded by #{...}
. Notice that though we gave the elements :a
, :b
, and :c
, they came out in a different order. In general, the order of sets can shift at any time. If you want a particular order, you can ask for it as a list or vector:
user=> (vec #{:a :b :c})
[:a :c :b]
Or ask for the elements in sorted order:
(sort #{:a :b :c})
(:a :b :c)
conj
on a set adds an element:
user=> (conj #{:a :b :c} :d)
#{:a :c :b :d}
user=> (conj #{:a :b :c} :a)
#{:a :c :b}
Sets never contain an element more than once, so conj
ing an element which is already present does nothing. Conversely, one removes elements with disj
:
user=> (disj #{"hornet" "hummingbird"} "hummingbird")
#{"hornet"}
The most common operation with a set is to check whether something is inside it. For this we use contains?
.
user=> (contains? #{1 2 3} 3)
true
user=> (contains? #{1 2 3} 5)
false
Like vectors, you can use the set itself as a verb. Unlike contains?
, this expression returns the element itself (if it was present), or nil
.
user=> (#{1 2 3} 3)
3
user=> (#{1 2 3} 4)
nil
You can make a set out of any other collection with set
.
user=> (set [:a :b :c])
#{:a :c :b}
Maps
The last collection on our tour is the map: a data structure which associates keys with values. In a dictionary, the keys are words and the definitions are the values. In a library, keys are call signs, and the books are values. Maps are indexes for looking things up, and for representing different pieces of named information together. Here’s a cat:
user=> {:name "mittens" :weight 9 :color "black"}
{:weight 9, :name "mittens", :color "black"}
Maps are surrounded by braces {...}
, filled by alternating keys and values. In this map, the three keys are :name
, :color
, and :weight
, and their values are "mittens"
, "black"
, and 9, respectively. We can look up the corresponding value for a key with get
:
user=> (get {"cat" "meow" "dog" "woof"} "cat")
"meow"
user=> (get {:a 1 :b 2} :c)
nil
get
can also take a default value to return instead of nil, if the key doesn’t exist in that map.
user=> (get {:glinda :good} :wicked :not-here)
:not-here
Since lookups are so important for maps, we can use a map as a verb directly:
user=> ({"amlodipine" 12 "ibuprofen" 50} "ibuprofen")
50
And conversely, keywords can also be used as verbs, which look themselves up in maps:
user=> (:raccoon {:weasel "queen" :raccoon "king"})
"king"
You can add a value for a given key to a map with assoc
.
user=> (assoc {:bolts 1088} :camshafts 3)
{:camshafts 3 :bolts 1088}
user=> (assoc {:camshafts 3} :camshafts 2)
{:camshafts 2}
Assoc adds keys if they aren’t present, and replaces values if they’re already there. If you associate a value onto nil
, it creates a new map.
user=> (assoc nil 5 2)
{5 2}
You can combine maps together using merge
, which yields a map containing all the elements of all given maps, preferring the values from later ones.
user=> (merge {:a 1 :b 2} {:b 3 :c 4})
{:c 4, :a 1, :b 3}
Finally, to remove a value, use dissoc
.
user=> (dissoc {:potatoes 5 :mushrooms 2} :mushrooms)
{:potatoes 5}
Putting it all together
All these collections and types can be combined freely. As software engineers, we model the world by creating a particular representation of the problem in the program. Having a rich set of values at our disposal allows us to talk about complex problems. We might describe a person:
{:name "Amelia Earhart"
:birth 1897
:death 1939
:awards {"US" #{"Distinguished Flying Cross" "National Women's Hall of Fame"}
"World" #{"Altitude record for Autogyro" "First to cross Atlantic twice"}}}
Or a recipe:
{:title "Chocolate chip cookies"
:ingredients {"flour" [(+ 2 1/4) :cup]
"baking soda" [1 :teaspoon]
"salt" [1 :teaspoon]
"butter" [1 :cup]
"sugar" [3/4 :cup]
"brown sugar" [3/4 :cup]
"vanilla" [1 :teaspoon]
"eggs" 2
"chocolate chips" [12 :ounce]}}
Or the Gini coefficients of nations, as measured over time:
{"Afghanistan" {2008 27.8}
"Indonesia" {2008 34.1 2010 35.6 2011 38.1}
"Uruguay" {2008 46.3 2009 46.3 2010 45.3}}
In Clojure, we compose data structures to form more complex values; to talk about bigger ideas. We use operations like first
, nth
, get
, and contains?
to extract specific information from these structures, and modify them using conj
, disj
, assoc
, dissoc
, and so on.
We started this chapter with a discussion of types: groups of similar objects which obey the same rules. We learned that bigints, longs, ints, shorts, and bytes are all integers, that doubles and floats are approximations to decimal numbers, and that ratios represent fractions exactly. We learned the differences between strings for text, symbols as references, and keywords as short labels. Finally, we learned how to compose, alter, and inspect collections of elements. Armed with the basic nouns of Clojure, we’re ready to write a broad array of programs.
I’d like to conclude this tour with one last type of value. We’ve inspected dozens of types so far–but what happens when you turn the camera on itself?
user=> (type type)
clojure.core$type
What is this type
thing, exactly? What are these verbs we’ve been learning, and where do they come from? This is the central question of chapter three: functions.
This is a nice gentle introduction. I would like to take issue with one thing, something that annoys me often: -263 is not the smallest possible Long, it is the most negative. The comparison operators do not test for size but for order.
It might sound like nit-picking but I encountered a bug quite recently that hinged on this distinction where a function was rounding a number and the comments said that the result would always be greater than or equal to the input when what the code actually did was to ensure that the result was never more negative than the input. This was in VB.Net but the principle is the same for all the languages with which I am familiar. Documentation for all the languages I know use greater and less than in this jargon way which misleads people when they apply code to the real world (I write simulation software).