Learning Clojure
From Wikibooks, the open-content textbooks collection
Some paragraphs in [ ] are author notes. They will be removed as the page matures. You should be able to read the text OK if you ignore these notes.
This introduction surveys most of the features of Clojure using cursory, abstract examples. It assumes some familiarity with Java (or C#, perhaps); some familiarity with a dynamic language like Python, Javascript, or Ruby will also help. For detailed coverage of Clojure, consult the language and API reference at clojure.org. Also see Clojure Programming.
Lisp is one of the oldest of all programming languages, invented by John McCarthy in 1958. The original language spawned many variant dialects, the most predominant of which today are Common Lisp and Scheme. Clojure (pronounced "closure") is a new dialect of Lisp created by Rich Hickey. Like Scheme, Clojure is a functional dialect, meaning that it supports and encourages programming in a "functional style".
To give you a taste of the language, here's a small example program in Clojure for correcting spelling (this is a translation into Clojure of Peter Norvig's Python spelling corrector):
(defn words [text] (re-seq #"[a-z]+" (.toLowerCase text))) (defn train [features] (reduce (fn [model f] (assoc model f (inc (get model f 1)))) {} features)) (def *nwords* (train (words (slurp "big.txt")))) (defn edits1 [word] (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)] (distinct (concat (for [i (range n)] (str (subs word 0 i) (subs word (inc i)))) (for [i (range (dec n))] (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i)))) (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i)))) (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i))))))) (defn known [words nwords] (seq (for [w words :when (nwords w)] w))) (defn known-edits2 [word nwords] (seq (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)] e2))) (defn correct [word nwords] (let [candidates (or (known [word] nwords) (known (edits1 word) nwords) (known-edits2 word nwords) [word])] (apply max-key #(get nwords % 1) candidates)))
By the end of this document, you should be able to understand this code (with some help from the API reference).
Contents |
[edit] Functional programming
Functional programming is one of those unfortunate terms in programming shrouded in abuse and confusion but which really shouldn't be all that mysterious. Best understood as an alternative paradigm to imperative programming, functional programming is characterized by:
- functions without side-effects
The term "functional" comes from mathematics, where a function strictly just returns a value based upon zero or more input values and does not produce side-effects nor changes its behavior based upon any values other than those passed to it. Such a "pure" function returns the same thing every time it is called with the same set of values.
This is not the way functions work in imperative (non-functional) programming. In imperative code, functions often:
- produce side-effects (such as writing to a file or some I/O device)
- mutate their inputs and global variables
- change their behavior based upon state outside the function
Pure functions have two big upsides:
- Functions which avoid mutating local data and avoid reading or mutating global data are easier to understand and more amenable to proofs of correctness.
- Functions which avoid reading and writing shared state can be run concurrently without the usual concerns of concurrency.
Now, the obvious objection to writing all our functions to run without side-effects is that we need side-effects for our programs to ultimately do anything useful. As the saying goes, a program without side-effects does nothing more than make your CPU hot. The ideal in functional programming, then, is simply to isolate state changes, not eliminate them entirely. In a pure functional language, like Haskell, functional purity is enforced by the compiler such that all potential side-effects must be explicitly circumscribed in code. An impure functional language, though, like Clojure, does no such enforcement: Clojure helps you structure your code in a functional way, but it is ultimately up to you to keep side-effects out of your purely functional code.
- immutable data
If functions are to avoid mutating state, and if they are to avoid relying upon mutating state, it makes sense then to simply make as much data as possible immutable: if data can't change, there's no danger of it maybe changing. In Clojure, the standard collection types are immutable, and in fact, even local variables are immutable: once defined, the value bound to a local does not change.
You're likely thinking this makes Clojure sound unusable, but it is surprising how much work can be done without mutable data. It's true that real world programs can't have entirely immutable data, but much more data can be immutable than is commonly believed by programmers accustomed only to imperative programming.
- first-class functions
In functional programming, functions are "first-class", i.e. functions are themselves values which can be passed as arguments or stored in variables. Many dynamic languages--Javascript, Ruby, Python, etc.--have first-class functions, but these languages are generally not considered functional.
- function-based control flow
In an imperative language, control flow is achieved via special constructs, e.g. if-else statements. In functional languages, control flow mechanisms are either functions or at least function-like in that they return values. For instance, if in Clojure executes one of two branches and returns the value returned by that branch. Such constructs allow for idioms that otherwise would require mutating variables.
[edit] Data types
There are a few notable things to say about all of Clojure's types:
- Clojure is implemented in Java: the compiler is written in Java, and Clojure code itself is run as Java VM code. Consequently, data types in Clojure are Java data types: all values in Clojure are regular Java reference objects, i.e. instances of Java classes.
- Most Clojure types are immutable, i.e. once created, they never change.
- Clojure favors equality comparisons over identity comparisons: instead of, say, comparing two lists to see if they are the very same object in memory, the Clojure way is to compare their actual values, i.e. their content. Most languages (including Java) don't do things this way because inspecting the values of deeply structured objects is costly, but Clojure makes it cheap: when created, a Clojure object keeps around a hash of itself, and it's this hash which is compared in an equality comparison rather than actually inspecting the objects; this hash suffices as long as the compared structures are entirely immutable. (Watch out for cases of mutable Java objects stored in immutable Clojure collection objects. If the mutable object changes, this won't be reflected in the collection's hash.)
[edit] Numbers
Java includes wrapper reference types for its primitive number types, e.g. java.lang.Integer "boxes" (wraps) the primitive int type. Because every Clojure function is a JVM method expecting Object arguments, Java primitives are usually boxed in Clojure functions: when Clojure calls a Java method, a returned primitive is automatically wrapped, and any arguments to a Java method are automatically unwrapped as necessary. (However, type hinting allows non-parameter locals in Clojure functions to be unboxed primitives, which can be useful when you're trying to optimize a loop.)
Java also includes the classes java.lang.BigInteger and java.lang.BigDecimal for arbitrary precision integer and decimal values, respectively. The Clojure arithmetic operations intelligently return these kinds of values as necessary to ensure the results are always fully precise, e.g. adding the Integer values 2000000000 and 1000000000 will return the value 3000000000 as a BigInteger because it is too big to be an Integer. (If you use the primitive type-hinting briefly mentioned above, you lose this guarantee of mathematical accuracy.)
Some rational values simply can't be represented in floating-point, so Clojure adds a Ratio type. A Ratio value is a ratio between two integers. Written as a literal, a Ratio is two integers with a slash between them, e.g. 23/55 (twenty-three fifty-fifths).
Clojure arithmetic operations intelligently return integers or ratios as necessary, e.g. 7/3 plus 2/3 returns 3, and 11 divided by 5 returns 11/5. As long as your calculations involve only integers and ratios, the results will be mathematically fully accurate, but as soon as a floating-point or BigDecimal value enters the mix, you'll get floating-point or BigDecimal results, which may lead to results which are not mathematically fully accurate, e.g. 1 divided by 7 returns 1/7, but 1 divided by 7.0 returns 0.14285714285714285.
[edit] Strings
A string in Clojure is simply an instance of java.lang.String. As in Java, string literals are written in double quotes, but unlike in Java, string literals may span onto multiple lines.
[edit] Characters
A java.lang.Character literal is written as \ followed by the character:
\e \t \tab \newline \space
As you can see, whitespace characters are written as words after the \.
[edit] Booleans
The literals true and false represent the values java.lang.Boolean.TRUE and java.lang.Boolean.FALSE, respectively.
[edit] Nil
In most Lisp dialects, there is a value semi-equivalent to Java null called nil. In Clojure, nil is simply Java's null value, end of story.
In Java, only true and false are legitimate values for condition expressions, but in Clojure, condition expressions treat nil as having the truth value false. So whereas !null ("not null") is invalid Java, the Clojure equivalent ("not nil") returns true.
[edit] Functions
A function in Clojure is a type of object, so a Clojure function can not only be invoked but can also be passed as an argument. As Clojure is a dynamic language, Clojure function parameters are not typed---arguments of any type can be passed to any Clojure function---but a Clojure function has a set arity, so an exception is thrown if you pass a function the wrong number of arguments. However, the last parameter of a function can be declared to accept any extra arguments as a list (like "variable arguments" in Java) such that the function accepts n or more arguments.
[edit] Vars
Var is one of the few mutable types in Clojure. A Var is basically a single storage cell for holding another object---a collection of one, basically.
A single Var can actually constitute multiple references: a root binding (a binding visible to all threads) and any number of thread-local bindings (bindings each visible to a single thread). When the value of a Var is accessed, the binding accessed may depend upon the thread doing the access: the value of the Var's thread-local binding is returned if the Var has a thread-local binding for that thread; otherwise, the value of the Var's root binding value (if any) is returned.
Typically, all global functions and variables in Clojure are each stored in the root binding of a Var. Because a Var is mutable, we can change the Var's value to monkey-patch the system as it runs. For instance, we can substitute a buggy function with a fixed replacement. This works because, in Clojure, a compiled function is bound to the Vars holding the functions it invokes, not the functions themselves, nor the names used to specify the Vars; since a Var is mutable, the function(s) called by a function can change without redefining the function.
Local parameters and variables in Clojure are immutable: they are bound at the start of their lifetime and then never bound again. Sometimes, however, we really do want mutable locals, and Vars with thread-local bindings can serve this purpose.
Thread-local bindings also allow us to monkey-patch just for the span of a local context. Say we have a function cat which calls a function stored in a Var; if a function goat is root-bound to the Var, then cat will normally call goat; however, if we call cat in a scope where we have thread-locally bound a function moose to that Var, then cat will invoke moose instead of goat.
[edit] Namespaces
You should organize your code into namespaces. A Clojure namespace is an object representing a mapping of symbol values to Var and/or java.lang.Class objects.
- A Var can either be referred or interned in a namespace: the difference is that a Var can only be interned in one namespace but can be referred in any number of namespaces. In other words, the namespace in which a Var is interned is the namespace to which it "really" belongs.
- A Class can only be referred, not interned, in namespaces. When a namespace is created, it automatically includes refers to the classes of java.lang.
In a sense, namespaces themselves live in one global namespace: a namespace name is unique to one single namespace, e.g. you never have more than one namespace named foo.
When Clojure starts, it creates a namespace called clojure in which it maps the symbol *ns* to a Var which is used to hold "the current namespace". Then, Clojure runs a script called core.clj, which interns in clojure many standard functions, including functions for manipulating the current namespace, such as:
- in-ns sets the current namespace to a particular namespace (manipulating clojure/*ns* directly is frowned upon).
- import refers Class objects into the current namespace.
- refer refers the interned Vars of another namespace into the current namespace.
[edit] Symbols
In Lisp, what are normally called identifiers in other languages are called symbols. A symbol, however, is not just a name seen by the compiler but rather a kind of value, a string-like kind of value---i.e. a sequence of characters. As a symbol is a value, a symbol can be stored in a collection, passed as an argument to a function, etc., just like any other object.
A symbol can only contain alphanumeric characters and * + ! / . : - _ ? but must not begin with a numeral or colon:
rubber-baby-buggy-bumper! ; valid j3_!:7 ; valid HELICOPTER ; valid +fiduciary+ ; valid
3moose ; invalid rubber baby buggy bumper ; invalid
Symbols containing a / are namespace qualified:
foo/bar ; a symbol qualified with the namespace name "foo"
Symbols containing . are treated specially at evaluation time, as we'll see.
[edit] Collections
A key feature of Clojure is that its standard collection types---lists and hashmaps, mainly---are all persistent. A persistent collection is an object which is immutable but from which producing a new collection based on the existing collection is cheap because the existing data needn't be copied. For instance, the operation which appends an element to a persistent list does not actually modify the list but rather returns a new list which is the same as the original but with an extra element; this new list is created cheaply because it mostly requires just creating a new node and linking it to the already existing list nodes, which are now shared between the two lists. Both the original collection and the new collection have the same performance characteristics.
- Lists
The Clojure persistent list type is a singly-linked list and is expressed as a literal in parentheses:
(53 "moo" asdf) ; a list of three elements: a number, a string, and a symbol
- Vectors
Singly-linked lists are often inappropriate, performance-wise, so Clojure includes a type it calls vector. A Clojure vector is an ordered, one-dimensional sequence like a list, but a vector is implemented as a hashmap-like structure such that index look up times are O(log32 n) instead of O(n). A vector is expressed as a literal in square brackets:
[53 "moo" asdf] ; a vector of three elements: a number, a string, and a symbol
- Hashmaps
A hashmap is expressed as a literal in curly braces such that each group of two arguments is a key-value pair:
{35 "moo" "quack" 21} ; a hashmap with the key-value pairs 35 -> "moo" and quack -> 21
- Sequence
A sequence is not an actual collection type but an interface to which list, vector, hashmap, and all other Clojure collection types conform. A sequence supports the operations first and rest: first retrieves the first item of the collection while rest retrieves a sequence of all the remaining items. As we'll see, sequences support a large number of operations built upon these two fundamental operations.
(When a sequence is produced from a map, first means retrieving a single pair of the map as a vector; the pair returned is effectively random as far as the programmer is concerned. The rest of a map-based sequence is the sequence of all remaining pairs as vectors.)
[edit] Keywords
A keyword is a variant of a symbol, distinguished by being preceded by a colon:
:rubber-baby-buggy-bumper! ; valid :j3_!:7 ; valid :HELICOPTER ; valid :+fiduciary+ ; valid
The character . is never valid in a keyword.
Keywords exist simply because, as you'll see, it's useful to have names in code which are symbol-like but not actually symbols. Keywords are by default not namespace-qualified. However, in some cases it may be useful to generate a keyword that is namespace-qualified so as to avoid name clashes with other code. For that purpose, one can either qualify the namespace explicitly or type a symbol preceded by two colons:
::gina ; equivalent to :adam/gina (assuming this is in the namespace "adam") ;; in the REPL, after (in-ns 'adam) and (clojure.core/refer 'clojure.core) adam=> (namespace :gina) ; no namespace nil adam=> (namespace ::gina) "adam" adam=> (namespace :adam/gina) "adam"
Note: There is a caveat with programmatically generated keywords regarding namespaces. One can generate a keyword that looks like it is part of a namespace, but (namespace) will return nil:
; use (namespace) to see what the namespace of the returned keywords is user=> (keyword "test") ; a keyword with no namespace :test user=> (keyword "user" "test") ; a keyword in the user namespace :user/test user=> (keyword "user/test") ; a keyword that has no namespace but looks like it does! :user/test
[edit] Metadata
Metadata is data describing other data. A Clojure object can have a single other object (any object implementing IPersistentMap) attached to it as metadata, e.g. a Vector can have a hashmap attached to it as metadata.
Attaching metadata to an object does not modify the object but rather creates a new object---effectively, an object with different metadata is a different object. However, equality comparisons ignore metadata.
[edit] Compiling and interpreting Lisp
Lisp source code is executed in two phases:
- The source is read by the reader, which parses the source into a data structure.
- Then the evaluator traverses this data structure, interpreting the data into action according to a simple set of rules.
While the source of just about all languages is processed by first translating it into a data structure---an Abstract Syntax Tree (AST)---Lisp is unique in that it keeps this data structure and the evaluation rules simple enough for you, the programmer using the language, to understand the details of this process in full. Once you understand how the reader and evaluator "think", you'll understand Lisp.
Like other dynamic languages, Lisp can be used in an interactive command-line mode. Lisp calls this mode the Read Evaluate Print Loop (REPL): each time a command is typed at the REPL, it is read and then evaluated, and then the value returned by evaluation is printed. The REPL is useful as a calculator, for quick experiments, or for inspecting or modifying a running program (such as when a break is triggered when debugging).
Programs, however, are of course written as source code files. You might expect the reader to read a whole source file up front before handing off to the evaluator, but this is not the case: in some Lisp dialects, the behavior of the reader can be modified by evaluated code, so the reader hands off each chunk of usable code to the evaluator as it reads them, thereby allowing code to modify how code below it will be read. In Clojure, the reader's behavior is currently not modifiable, but Clojure follows this traditional read-evaluate pattern anyway. (In the future, Clojure may allow reader modification.)
[edit] The Reader
In Clojure, the reader ignores ; and everything after it on the line.
; this is a comment
Whitespace is required to separate "atoms" (symbols, keywords, number literals, and string literals, etc.) from each other but is otherwise ignored. The , character is considered whitespace, which is useful stylistically for creating a clearer visual separation between atoms:
foo, bar ; the comma is treated the same is if it were a space
But in string literals, of course, whitespace, ;, and , are treated as themselves.
"hi, there"
Otherwise, the reader basically sees source as a bunch of literals. For instance:
(def jerry [1 2 3])
The reader parses this into a list containing the symbol def, then the symbol jerry, and last a vector with the Integers 1, 2, and 3. Because this list is at the "top level" of code (it isn't contained in another collection), it is passed off to the evaluator immediately once it is fully read.
The behavior of the reader can be modified by special prefixes on literals called reader macros, but these are pretty much just convenience features, so we'll discuss them later.
[edit] Indentation style
Because Lisp code is just a bunch of literals, it is hard to read if not indented in a readable style. Well-formatted Clojure looks like this:
(defn my-zipmap [keys vals]
(loop [map {}
ks (seq keys)
vs (seq vals)]
(if (and ks vs)
(recur (assoc map (first ks) (first vs))
(rest ks)
(rest vs))
map)))
(my-zipmap [:a :b :c] [1 2 3])
The general idea here is that, when spread onto multiple lines, the elements of a list or vector should be indented in on the lines below. Notice that the line beginning (loop is indented in by two spaces under (defn because the list starting (loop is a direct element of the list starting (defn. However, below that, the author broke from this rule by choosing to line (recur with (and above instead of two spaces in from (if. Also note that the vector starting [map starts interior to its line, so the two lines below continuing the vector are aligned just right of the opening [. The rules are:
- Indent continuation lines by 2 spaces...
- Unless there is an opportunity to align opening (, [, or { characters...
- Or unless continuing an interior literal.
In essence, always indent in underneath the (, [, or { to which an element belongs. Admittedly, this scheme does take some getting used to, both to read and to write.
It's quite typical in Lisp that you end up with many trailing parentheses:
(defn supply-arg [arg flist]
(loop [flist flist cnter 0]
(if (first flist) (do ((first flist) arg)
(recur (rest flist) (+ 1 cnter)))))) ; remember, the + in this line is just an ordinary symbol
Text editors like Emacs can help you cope with parentheses matching and indentation style.
[edit] Evaluation
Strings, numbers, characters, nil, true, false, and keywords evaluate to themselves. When the evaluator encounters these things, it simply returns them as is.
When the evaluator encounters a vector or hashmap, it traverses and evaluates the contents before returning the vector or hashmap. Effectively, if a vector or hashmap contains just strings, numbers, characters, nil, true, false, keywords, and other vectors or hashmaps, the vector/hashmap will be returned as is:
["eat my shorts" \space 35.6 true {:fred 22 :alison 8}] ; will simply be returned by the evaluator
Lists and symbols, however, are evaluated specially:
[edit] Symbol evaluation
The evaluator resolves symbols depending upon the kind of symbol:
A namespace-qualified symbol resolves to the root binding of the Var mapped to the symbol in the specified namespace:
hedgehog/rabbit ; a symbol resolving to the root binding of the Var mapped to rabbit in the namespace hedgehog
Though a Class may be referred in a namespace, a namespace-qualified symbol will only ever resolve to a Var. If no Var is named by the symbol, an exception is thrown.
A package-qualified symbol (a symbol with . in it) resolves to a Class:
java.util.Arrays ; a symbol resolving to the Class Arrays in the package java.util
If no such Class can be found, an exception is thrown.
Resolution of a non-qualified symbol is more complicated:
- If the symbol is the first item in a list and matches one of the dozen special form names, the list is a special form and evaluated specially (discussed shortly).
- If not, the symbol might map to a Class referred in the current namespace.
- If not, the symbol might map to a local variable (a local variable is created by special forms, as we'll see).
- If not, the symbol might map to a binding of the Var interned or referred in the current namespace. (This may be the root binding or a thread-local binding, as previously discussed.)
- If not, the symbol resolves to nothing, and an exception is thrown.
[why must Class lookup be done before locals? shouldn't local names take precedence over class names?]
(Another way of thinking about symbol resolution is that they resolve into Vars, and the Vars in turn evaluate into their bound values. This is more or less correct, but note that namespace-qualified symbols always resolve to the root bound value regardless of the thread context.)
[edit] List evaluation
An empty list is simply evaluated into itself:
() ; the evaluator returns this as is
A non-empty list handed to the evaluator should start either with a symbol or another list. If a list starts with a symbol:
- As stated above, a list beginning with a non-qualified symbol matching a special form name is evaluated specially.
- Otherwise, the symbol may resolve to a Var containing a macro (a special kind of function, as we'll see), in which case the remaining elements of the list are left unevaluated and passed to a call to the macro. The value returned by the macro is then substituted in place of the macro call and then evaluated.
- Otherwise, the symbol should resolve to a Var containing a function, in which case the remaining elements of the list are evaluated (left-to-right) and then passed to a call to the function. Evaluation of the list returns the value returned by the function.
- Otherwise, an exception is thrown.
For example:
(def x 3) ; def is a special form, so this list is evaluated specially (rooster ox (lion 3))
Assuming rooster resolves to a macro, the macro is called with the arguments of the symbol ox and the list (lion 3). But if rooster resolves to a regular function, the function is called with the arguments of the value resolved from ox and the value returned by evaluation of (lion 3).
Less commonly, a list might start with another list. In this case, the initial list is evaluated and expected to return a macro or function to call (or return a Var bound to a macro or function):
((hamster) "moo") ; if (hamster) returns a function, that function is called with the argument "moo"
[edit] Special Forms
In any Lisp dialect, you need special "primitives", called special forms, which are forms evaluated in a special way. For instance, the rules of Clojure evaluation are missing a way to do conditional evaluation, so we have the special form if.
The way arguments to a special form are evaluated is particular to each special form, and these evaluations may change based on context, e.g. the evaluation of special form A may depend on the fact that it is used inside special form B. The Clojure special forms are:
(Arguments to the forms are denoted in italics. Arguments ending ? are optional. Arguments ending * represent 0 or more arguments. Arguments ending + represent 1 or more arguments.)
- (if test then else?)
If test returns true (any value other than false or nil), then is evaluated and returned. Otherwise, optional else? is evaluated and returned, though if no else? is specified, nil is returned.
(if moose a b) ; if moose is not false or nil, return a; otherwise, return b (if (frog) (cow)) ; if (frog) returns something other than false or nil, return value of (cow); otherwise, return nil
- (quote form)
form is returned unevaluated:
(quote (foo ack bar)) ; returns the list (foo ack bar) (quote bat) ; returns the symbol bat itself, not the Var or value resolved from the symbol bat
This is useful when you wish to, say, pass a symbol as an argument to a regular function:
(foo (quote bar)) ; call function foo with argument symbol bar (if foo is a macro, the macro is passed the list (quote bar))
- (var symbol)
Normally, a symbol resolving to a Var further evaluates into the value of the Var. The special form var returns the Var itself from a resolved symbol, not the Var's value:
(var goose) ; return the Var mapped to goose
If the symbol does not resolve to a Var, an exception is thrown.
- (def symbol value)
In the current namespace, symbol is mapped to an interned Var holding value. If a Var mapped to that symbol already exists, def assigns it the new value.
(def george 7) ; create/set a Var mapped to symbol george in the current namespace and give that Var the value 7 (def george -3) ; change the value of that Var to -3
def returns the affected Var.
You can def a namespace-qualified symbol, but only if a Var is already interned by that name in that namespace:
(def nigeria/fred "hello") ; change value of Var mapped to nigeria/fred
; throws an exception if the Var mapped to nigeria/fred does not already exist
Attempting to def to a symbol already mapped to a referred Var throws an exception.
- (fn name? [params*] body*)
Returns a newly defined function object.
name?: the name of the function seen inside the function; useful for recursive calls.
params*: symbols bound to the local parameters.
body*: args to "evaluate" when the function is called; a function call returns the last value returned in its body.
For example:
(fn [] 3) ; returns a function which takes no arguments and returns 3 (fn [a b] (+ a b)) ; returns a function which returns the sum of its two arguments (fn victor [] (victor)) ; returns a function which does nothing but infinitely recursively call itself
Normally, the function object returned by fn is preserved in some way, either passed to a function or bound to a Var or some such. In principle, though, you can immediately call the function returned (not that this is a sensible thing to do):
((fn [a b] (+ a b)) 3 5) ; calls the function with args 3 and 5, returning 8
The symbols provided for name? and params* are not resolved but instead establish local names for the function body. Clojure is lexically scoped, so bindings to local variables in a function take precedence over bindings external to the function, e.g. the symbol foo in a function body will resolve to foo in the current namespace only if there is no local foo and no enclosing function with a local named foo.
When a Clojure function is called, its body is not executed by evaluation as you might think, and in fact, the fn body is partially evaluated before it returns: symbols are resolved and macros are evaluated so that this work is not done each time the function is called; also, special forms in the body are evaluated as best as logic permits, e.g. a fn is evaluated to save from doing the work later, but an if is not evaluated because it represents real work that doesn't make sense to do until the function is actually called. Furthermore, when a function is called, the Clojure evaluator is not really involved because Clojure compiles functions into JVM bytecode. Consider:
(fn [] (frog 5))
When this special form is evaluated, the symbol frog is resolved to a Var in the current namespace. If this Var holds a macro at evaluation time, then the macro call is expanded, but otherwise, the list is compiled into a function call. Assuming frog is not a macro, then when the function we're defining is called and (frog 5) executed, the function held by the Var of frog is called with the argument 5; if the Var does not refer to a function at that time, an exception is thrown.
(Actually, when the evaluator encounters any function call---whether inside a function body or outside---it always compiles it into bytecode; the difference is that calls outside a function definition are immediately executed by the evaluator after compilation.)
Normally, a function has a fixed arity, i.e. it takes a fixed number of arguments. However, the last parameter of a function can be preceded by & to denote that it takes 0 or more extra arguments as a list:
(fn [a b & c] ...) ; takes 2 or more arguments; all arguments beyond 2 are passed as a list to c (fn [& x] ...) ; takes 0 or more arguments; all arguments passed as a list to x
(Normally, & is just a symbol like any other, but it is treated specially by fn for this purpose, so effectively, you can't have a local named &.)
A single function can also be defined with different bodies for different arities using this form:
(fn name? ([params*] body*)+)
For example:
(fn ([] 1) ; a function which can be called with 0, 1, 2, or 3-or-more arguments
([a] 2) ; returns a different number based on how many args are passed to it
([a b] 3)
([a b c & d] 4))
In such a function, only one body can have variable arity, and that body's arity must be greater than any other body's.
- (do body*)
body is any number of arguments which are evaluated in order; the value of the last argument is returned. (We typically don't use do very frequently because a function body is effectively an implicit do that usually meets our needs.)
- (let [local*] body*)
where local => name value
Declare a local scope in which one or more locals exist:
; a local scope in which aaron is bound to the value 3
; while bill is bound to the value returned by (moose true)
(let [aaron 3
bill (moose true)]
(print aaron)
(print bill))
The locals defined are immutable and only visible inside the let.
A local name can be used to define another local later in the list:
(let [mike 6
kim mike] ; local kim defined by value of mike
...)
- (recur args*)
recur sends execution back to the last "recursion point", which is usually the immediately enclosing function. Unlike a regular recursive call, recur reuses the current stack frame, so it is effectively Clojure's way to do tail-recursion. recur must be used in "tail position" (i.e. as the very last expression possibly evaluated in the function):
(defn factorial [n]
(defn fac [n acc]
(if (zero? n)
acc
(recur (- n 1) (* acc n)))) ; recursive call to fac, but reuses the stack; n will be (- n 1), and acc will be (* acc n)
(fac n 1))
In the future, the JVM may add built-in support for tail-call optimization, at which point recur would become unnecessary.
- (loop [params*] body*)
loop is just like let, except it establishes a "recursion point" for the sake of recur. Effectively, loop is a basic way to do iteration within a function:
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt)))))) ; send execution back to the enclosing loop with new bindings
; cnt will be (dec cnt) and acc will be (* acc cnt)
- (throw expr)
Equivalent of Java's throw.
(throw (rat)) ; throw the exception returned by (rat) (throw newt) ; throw the exception named by newt
Just like in Java, a thrown object must be of type java.lang.Throwable or a descendant thereof.
- (try body* (catch class name body*)* (finally body*)?)
Equivalent of Java's try-catch-finally.
(try
(bla)
(bla)
(catch Antelope x
(bla x))
(catch Gorilla y
(bla y)
(finally
(bla)))
- (monitor-enter)
- (monitor-exit)
Hickey says, "these are synchronization primitives that should be avoided in user code". You should instead use the macro clojure/locking.
- (set!)
(Two more special forms, . and new are described in the next section.)
[edit] Java Interop
- (. instance method args*)
- (. class method args*)
The special form ., also known as host, invokes a public Java method or retrieves the value of a public Java field:
(. foo bar 7 4) ; call the method bar of the instance/class foo with arguments 7 and 4 (. alice bob) ; return the value of public field bob of the instance/class alice
Notice that accessing a field might be mistaken for invoking a parameter-less method: if a class has a parameter-less method and public field of the same name, the ambiguity is resolved by assuming that calling the method is what's intended.
If the first argument to . is a symbol, the symbol is evaluated specially: if the symbol resolves to a Class referred in the current namespace, then this is a call to one of that Class's static methods; otherwise, this is a call to a method of the instance resolved from the symbol. So confusingly, using . with a non-referred symbol resolving to a Class is an invocation of a method of that Class object, not an invocation of a static method of the class represented by that Class object:
(. String valueOf \c) ; invoke the static method String.valueOf(char) with argument 'c' (def ned String) ; interned Var mapped to ned now holds the Class of String (. ned valueOf \c) ; exception: attempt to call Class.valueOf, which doesn't exist
While the . operator is the generic operator for accessing java, there are more readable reader macros that should be preferred instead of using . directly:
- (.field instance args*)
- Class/StaticField or (Class/StaticMethod args*)
Note: Prior to Subversion revision 1158, .field was also usable for static access. However, in recent versions of Clojure, a (.field ClassName) form is treated as if ClassName were the corresponding instance of class java.lang.Class.
Thus, the above examples would be written as:
(String/valueOf \c) ; Static method access! (def ned String) (.valueOf ned \c) ; This will fail (.valueOf String \c) ; And in recent versions, so will this.
If there is need to manipulate the Class object, for example to call the Class.newInstance method, one can do the following:
(. (identity String) newInstance "fred") ; will create a new instance; this will work in all versions of clojure (.newInstance String "fred") ; will work only in a recent enough version of clojure. Expands to the above form. (.newInstance (identity String) "fred") ; this was required in old versions
- (new class args*)
The special form new instantiates a Java class, calling its constructor with the supplied arguments:
(new Integer 3) ; instantiate Integer, passing 3 as argument to the constructor
Like with calling static methods with ., the class must be specified as a symbol, not as the Class object of the class you wish to instantiate:
(new String "yo") ; new String("yo")
(def ned String) ; interned Var mapped to ned now holds the Class of String
(new ned "yo") ; exception: new Class("yo") is invalid
A reader macro exists for new as well:
- (Classname. args*)
(String. "yo") ; equivalent to (new String "yo"). Notice the dot!
[edit] Reader Macros
A reader macro (not to be confused with a regular macro) is a special character sequence which, when encountered by the reader, modifies the reader behavior. Reader macros exist for syntactical concision and convenience.
'foo ; (quote foo)
#'foo ; (var foo)
@foo ; (clojure/deref foo)
^foo ; (clojure/meta foo)
#^{:ack bar} foo ; (clojure/with-meta foo {:ack bar})
#"regex pattern" ; create a java.util.regex.Pattern from the string (this is done at read time,
; so the evaluator is handed a Pattern, not a form that evaluates into a Pattern)
#(foo %2 bar %) ; (fn [a b] (foo b bar a))
The #() syntax is intended for very short functions being passed as arguments. It takes parameters named %, %2, %3, %n ... %&.
The most complicated reader macro is syntax-quote, denoted by ` (back-tick). When used on a symbol, syntax-quote is like quote but the symbol is resolved to its fully-qualified name:
`meow ; (quote cat/meow) ...assuming we are in the namespace cat
Applying syntax-quote to an atomic value expands to that same value. For instance:
`10 ; expands to 10 `1/2 ; expands to 1/2 `"hello" ; expands to "hello"
When used on a list, vector, or map form, syntax-quote quotes the whole form except, a) all symbols are resolved to their fully-qualified names and, b) components preceded by ~ are unquoted:
(defn rabbit [] 3) `(moose ~(rabbit)) ; (quote (cat/moose 3)) ...assume namespace cat
(def zebra [1 2 3]) `(moose ~zebra) ; (quote (cat/moose [1 2 3]))
Components preceded by ~@ are unquote-spliced:
`(moose ~@zebra) ; (quote (cat/moose 1 2 3))
If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.
For all forms other than Symbols, Lists, Vectors and Maps, `x is the same as 'x.
Syntax-quotes can be nested within other syntax-quotes:
`(moose ~(squirrel `(whale ~zebra)))
For Lists syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted.
`(x1 x2 x3 ... xn)
is interpreted to mean
(clojure.core/seq (clojure.core/concat |x1| |x2| |x3| ... |xn|))
where the | | are used to indicate a transformation of an xj as follows:
- |form| is interpreted as (clojure.core/list `form), which contains a syntax-quoted form that must then be further interpreted.
- |~form| is interpreted as (clojure.core/list form).
- |~@form| is interpreted as form.
If the syntax-quote syntax is nested, the innermost syntax-quoted form is expanded first. This means that if several ~ occur in a row, the leftmost one belongs to the innermost syntax-quote.
An important exception is the empty list:
`()
is interpreted to mean
(clojure.core/list)
Following the rules above, and assuming that the var a contains 5, an expression such as
``(~~a)
would be expanded (behind the curtains) as follows:
(clojure.core/seq (clojure.core/concat (clojure.core/list (quote clojure.core/seq))
(clojure.core/list (clojure.core/seq (clojure.core/concat (clojure.core/list (quote clojure.core/concat))
(clojure.core/list (clojure.core/seq (clojure.core/concat (clojure.core/list (quote clojure.core/list))
(clojure.core/list a)))))))))
and then evaluated, producing;
(clojure.core/seq (clojure.core/concat (clojure.core/list 5)))
Of course the same expression could also be equivalently expanded as
(clojure.core/list `list a)
which is indeed much easier to read. Clojure employs the former algorithm which is more generally applicable in cases where there is also splicing.
The principle is that the result of an expression with syntax-quotes nested to depth k is the same only after k successive evaluations are performed, regardless of the expansion algorithm (Guy Steele).
For Vectors, Maps, and Sets we have the following rules:
`[x1 x2 x3 ... xn] ; is interpreted as (clojure.core/apply clojure.core/vector `(x1 x2 x3 ... xn))
`{x1 x2 x3 ... xn} ; is interpreted as (clojure.core/apply clojure.core/hash-map `(x1 x2 x3 ... xn))
`#{x1 x2 x3 ... xn} ; is interpreted as (clojure.core/apply clojure.core/hash-set `(x1 x2 x3 ... xn))
At this time, Clojure does not allow you to define your own reader macros, but this may change in the future.
[edit] Macros
Macros are functions which effectively allow us to create our own syntactical conveniences. For many Lispers, macros are the essential feature which make Lisp Lisp.
[edit] Simple Macro
If you understand the reader and evaluator, there actually isn't all that much more to understand about the operation and creation of macros, for a macro is simply a regular function that is called in a special way. When called as a macro, a function takes its arguments unevaluated and returns an expression to be evaluated in place of the call to the macro. A very simple (and pointless) macro would be one that simply returns its argument:
(def pointless (fn [n] n))
Whatever is passed to this macro---a list, a symbol, whatever---will be returned unmolested and then evaluated after the call. Effectively, calling this macro is pointless:
(pointless (+ 3 5)) ; pointless returns the list (+ 3 5), which is then evaluated in its place (+ 3 5) ; may as well just write this instead
But as we defined pointless above, it is just a regular function, not a macro. To make it a macro, we need to attach the key-value pair :macro true as metadata to the Var mapped to pointless by the def. There are a number of ways to do this, but most commonly we would simply define the function as a macro with the provided macro clojure/defmacro:
(defmacro pointless [n] n) ; define a macro pointless that takes one parameter and simply returns it
[edit] A macro that does something
An actually useful macro typically returns a syntax-quoted list expression. Take, for instance, the case of wanting to do certain things when a DEBUG flag is on and not do them at all when the flag is off. Start by defining the flag:
(def DEBUG true)
Now, we want to do certain things when it's true and not when they aren't. If we define a function:
(defn on-debug-fn [& args]
(when DEBUG
(eval `(do ~@args)))) ; Done this way to expand the list of args.
Then (on-debug-fn "Debug") does indeed only return "Debug" when DEBUG is true. However, (on-debug-fn (println "Debug")) always prints "Debug". This is because, for a function, the arguments are always evaluated and it comes with a side effect: printing "Debug".
What we need, instead, is a macro. Then it becomes possible to check the DEBUG flag without evaluating the forms passed to it.
(defmacro on-debug [& body]
`(when DEBUG
(do ~@body)))
So, let's look at what that does:
(macroexpand-1 '(on-debug (println "Debug"))) => (clojure.core/when user/DEBUG (do (println "Debug")))
(macroexpand-1 ...) is a function that shows what a given macro expands into. So it checks the value of user/DEBUG and only evaluates the body if debug is not false. Looking closer:
(macroexpand '(on-debug (println "Debug"))) => (if* user/DEBUG (do (do (println "Debug"))))
(macroexpand ... ) is a function that expands every macro within the form passed to it. What this shows is that (when ... ) is actually a macro that expands into a check using (if* ... ) and a (do ... ) block, thus neatly rendering our (do ... ) block unnecessary.
So the final version of the macro is:
(defmacro on-debug [& body]
`(when DEBUG
~@body))
Now, what if we want multiple debug levels, where 1 is essential information and 3 is painful spam?
(defmacro on-debug-level [level & body]
`(when (and DEBUG
(<= ~level DEBUG))
~@body))
level is prefixed with an unquote symbol so that it isn't taken as belonging to the namespace. DEBUG is not prefixed with an unquote symbol because we want to be able to change debug levels without having to re-evaluate every function that contains an (on-debug-level ...) form.
If, instead, we had used ~DEBUG in the macro, when a macro using (on-debug-level ... ) was evaluated, ~DEBUG would be replaced with the value of user/DEBUG. This means that, if the debug level were later changed, all of those macros would also have to be re-evaluated in order to put the new value into the conditions.
Testing that in the REPL, (on-debug-level 2 (println "Debug")) indeed prints "Debug" when the debug level is 2 and does nothing when the debug level is 1. That can, of course, be used in a function so that there's a single way to print a debugging string.
(defn debug-println [level st]
(on-debug-level level
(println st)))
[edit] Macros as control structures
This macro assumes a function (get-connection) that opens and returns a connection to a database. It provides a typical Lisp with- idiom, where it binds *conn* to the connection, runs the body of the macro, closes the connection and returns the body's return value.
(defonce *conn* nil)
This binds a var, *conn*, which will be used in the body of the macro to refer to the database connection.
(defmacro with-connection [& body]
`(binding [*conn* (get-connection)]
(let [ret# (do ~@body)]
(.close *conn*)
ret#)))
Stepping through this:
(defmacro with-connection [& body] ...)
This uses the variable-argument syntax in order to get the arguments as a list.
`(binding ...
The backtick operator indicates that everything within this form should be quoted and not evaluated (unless preceded by the unquote operator (~)). Thus, binding becomes clojure.core/binding and everything with it is built as a list rather than as a series of function calls.
`(binding [*conn* (get-connection)]
This takes the var *conn* and binds it so that within this form, it has the value returned by (get-connection). In other words, operations within a (with-connection ... ) form can operate on *conn* as a database connection.
Note here that the function (get-connection) is not evaluated until the macro is actually used. This is due to the backtick before (binding ... If, on the other hand, we wanted to evaluate the function when the macro is created, the unquote (~) operator would be used:
`(binding [*conn* ~(get-connection)]
This calls (get-connection) once, when the macro is defined, and all future use of that macro would use the value initially returned by ~(get-connection).
(let [ret# (do ~@body)]
This statement is a bit more complicated. ret# creates a gensymmed name; a unique name, ensuring that, if the symbol ret is used within the body of a (with-connection ... ) form, it will not come into conflict with the symbol used in the macro definition.
The latter half of the statement unpacks the body and passes it to do. Variable arguments passed to a macro (as with [& body] above) are given as a list. The ~@ (unquote-splicing) operator replaces a list with the values contained in it.
Without the ~@ operator, the statement would look something like this:
`(do (list 1 2 3)) => (do (clojure.core/list 1 2 3))
With the ~@ operator, the list is replaced by the values within, giving:
`(do ~@(list 1 2 3)) => (do 1 2 3)
Thus the statement:
(let [ret# (do ~@body)]
Binds the name ret# to the value returned by evaluating the body. The reason for this is to return the value of the body form, rather than the value returned by closing the database.
Finally:
(.close *conn*)
ret#)))
Calls the connection's close method and returns the body's return value. What this means, all told, is that:
(with-connection (str *conn*))
Opens the database connection, binds it to *conn*, catches the value returned by the body (in this case, just the string representation of the connection), closes the database connection and returns the return value of the body.
[edit] More Data Structures
[edit] Sets
A set is a collection containing no duplicate items. Clojure has two set types:
- a hash set is implemented as a hashmap and so has (near) constant lookup, insertion, and removal times.
- a sorted set is implemented as a linked list and so has linear lookup, insertion, and removal times.
The reader recognizes a literal syntax for hash sets:
#{67 2 8.8 -78} ; a hash set of four numbers
[edit] Lazy sequences
Many sequence functions produce lazy sequences, which are sequences not actually backed by their own data: the items of a lazy sequence are produced on request from a function or retrieved from some other source, such as another collection. For example, the sequence representing the first 8 items of a pre-existing vector can be lazy because there's no need to copy the items: if we, say, request the 3rd item of the lazy sequence, we get back the 3rd item of the vector.
A lazy sequence based on a function produces items based on the index passed to the function. Because such sequences are backed by no actual data at all, they can be infinite in size. For instance, the function cycle returns a lazy sequence representing the endless repetition of another sequence:
(cycle [1 2 3]) ; returns a lazy sequence of 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3... etc.
[hmm, under what scenarios does a lazy seq keep around the items as they're produced? I assume a cycle doesn't end up backed by a huge actual list or vector, but doseq does seem to keep around the produced values. What other cases are like doseq?]
[edit] StructMaps
What Clojure calls a structmap (as in "structure-map") is simply a variant of a persistent hashmap, but one in which a pre-defined set of keys have optimized storage and lookup.
A structmap is created by first calling clojure/create-struct to define a blueprint called a basis. The function clojure/struct-map is then used to create actual structmaps using a basis:
(def george (create-struct :apple :banana :orange)) (struct-map george :apple 3 :banana 9 :orange 12) ; create a structmap with the key-val pairs :apple => 3, :banana => 9, and :orange => 12 (struct-map george :banana 9 :apple 3 :orange 12) ; notice the key order need not be that used in create-struct (struct-map george :apple 3 :orange 12) ; the key :banana is not specified, so its value defaults to nil
You can create a structmap by specifying only the values for the keys using clojure/struct:
(struct george -87 0 9) ; keys implied by their order in create-struct, so this is :apple => -87, :banana => 0, :orange => 12
[edit] Multimethods and polymorphism
Most languages treat encapsulation and implementation inheritance as the primary features of object-oriented programming, but Clojure thinks these things are overrated. The real virtue of OOP, Clojure says, is polymorphism---encapsulation and inheritance are simply straight-jackets holding back polymorphism's potential.
When doing object-oriented programming in Clojure, we simply use structmaps in place of objects while, in place of traditional encapsulated methods, we use what Clojure calls multimethods, functions which pass on their arguments to other functions based on some criteria of the arguments, such as the number and/or type of the arguments.
A multimethod is made up of three things: a dispatch function, a set of methods, and a set of values associated with those methods. Calling a multimethod calls the dispatch function, and the value returned determines which method to call. A multimethod is created with clojure/defmulti and methods added to it by the macro clojure/defmethod. Here's a reductively simple multimethod with just two methods:
(defmulti victor (fn [a] a)) (defmethod victor 3 [a] "hello") ; attach a method to call when the dispatch returns 3; the function takes 1 argument, which it ignores, returning "hello" (defmethod victor 5 [a] "goodbye") (victor 3) ; returns "hello" (victor 5) ; returns "goodbye" (victor 4) ; exception: No method for dispatch value: 4
Note that the methods' arities must match the multimethod arity because the dispatch function calls the methods by passing on its arguments. (Also note the stylistic inconsistency: defmulti expects a function argument whereas defmethod imitates the defn form.)
Multimethods can be used to imitate traditional single- and multiple-inheritance polymorphism as well as more flexible kinds of polymorphism.
[edit] Collection functions
The clojure namespace contains many functions and macros for handling collections. Rather than repeat reference information from the API documentation, I'll give just a few example operations:
- (count collection)
Returns the number of items in the collection.
(count [a b c]) ; return 3 (count nil) ; return 0
- (conj collection item)
conjoin. Returns a new copy of the collection with the item added. How the item is added depends upon the collection type, e.g. an item conjoined to a list is added to the beginning while an item conjoined to a vector is added to the end.
(conj [5 7 3] 9) ; returns [5 7 3 9] (conj (list 5 7 3) 9)) ; returns list (9 5 7 3) (conj nil item) ; returns list (item)
- (list item*)
- (vector item*)
- (hash-map keyval*) where keyval => key value
Produce a list, vector, or hashmap, respectively. You can generally always just use literal syntax in place of these functions, but having functions gives us something to pass to other functions expecting function arguments.
- (nth collection n)
Return the nth item from collection, where collection is any collection but not a map. (The collection can be a sequence over a map, but not an actual map.)
(nth [31 73 -11] 2) ; returns -11 (nth (list 31 73 -11) 0) ; returns 31 (nth [8 4 4 1] 9) ; exception: out of bounds
- keywords as functions
A keyword can be used as a function to look up a key's value in the various kinds of maps (hashmaps, sets, structmaps, etc.):
(:velcro {:velcro 5 :zipper 7}) ; returns 5, the value mapped to :velcro in the hashmap
- maps as functions
The map types themselves can be used as functions to look up their key's values:
({:velcro 5 :zipper 7} :velcro) ; returns 5, the value mapped to :velcro in the hashmap
({3 "cat" :tim "moose"} :tim) ; returns "moose", the value mapped to :tim in the hashmap
(["hi" "bye"] 1) ; returns "bye", the value of the second index in the vector
[edit] Sequence functions
Most of the sequence functions can be passed objects which are not sequences but from which a sequence can be derived, e.g. you can pass a list in place of a sequence.
Where there are redundancies between sequence operations and more operations specialized for specific collection types, the redundancies are mostly for performance considerations. For instance, to produce a reversed-order sequence of a vector, you can use the sequence operation reverse or the vector specific rseq: while rseq is less general, it is more efficient.
- (seq collection)
Return a sequence representing the collection. seq also works on native Java arrays, Strings, and any object that implements Iterable, Iterator, or Enumeration.
[edit] Destructuring
The special forms let, loop, and fn are actually macros: the real special forms are named let*, loop*, and fn*. The macros are most commonly used in place of the actual special forms because the macros add a convenience feature called destructuring.
Often a function expects to receive a collection argument out of which it means to use particular items:
; return the sum of the 1st and 2nd items of the sequence argument (defn foo [s] (+ (first s) (frest s))) ; sum the first item and the "frest" (first of the rest) item
Having to extract out the values we want can get quite verbose, but using destructuring can help in many cases:
; return the sum of the 1st and 2nd items of the sequence argument
(defn foo [s]
(let [[a b] s]
(+ a b)))
Above, where a parameter name is normally specified in the argument list, we put a vector followed by a parameter name: the items in the collection passed to the parameter s are bound to the names a and b in the vector according to position. Similarly, we can destructure using a map:
(def m {:apple 3 :banana 5})
(let [{x :apple} m] ; assign the value of :apple in m to x
x) ; return x
[I have to say this feels backwards to me: shouldn't it be (let [{:apple x} m] x)? I like to think of the destructuring expression as mirroring the collection being destructured.]
Destructuring can also be used to pull out the value of collections within collections. Hickey gives more complete coverage here.
[edit] Ref
A Ref, like a Var, is a storage cell for holding another object, but a Ref is intended to serve a very different purpose, that of storing concurrently shared data.
- A Ref can only be modified inside a block of code called a transaction.
- A Ref can be read at any time, including outside of a transaction.
- In a transaction, changes to the Ref by other threads are not seen. A transaction sees the value a Ref had at the start of the transaction; if the transaction itself changes the Ref's value, that is the value seen in the rest of the transaction (or until it changes the value again).
- Changes to a Ref made in a transaction will not be seen by other threads until the transaction completes and has been committed.
- If an exception causes exit out of a transaction, changes in that transaction will never be committed and are hence lost.
- A transaction commit will fail if, during the transaction's run, a Ref modified by this transaction is successfully committed to by another transaction.
- A transaction is automatically retried until it successfully commits. (Consequently, transactions usually should be side-effect free lest the side effects get performed multiple times.)
Effectively, when two transactions overlapping in time modify the same Ref, the first to finish will succeed but the second will fail and retry. The concurrent modification of shared data is thereby sequenced into discrete chunks of code, each with a consistent view of that shared data.
Sometimes, however, you want Ref-mutating operations to succeed regardless of other transactions: a commute operation applies a mutating operation to a Ref, changing its value for the remainder of the transaction like normal, but the commit will not fail on account of commits to this Ref by other transactions: if no other transaction has committed to the Ref, this transaction's local Ref value is committed; otherwise, the transaction-local value is discarded and instead the mutating operation is applied again, this time to the new current Ref value, and the resulting value is committed. In practice, commute operations should be commutative (hence the name): a set of commutative operations can be applied in any order to get the same result. For instance, incrementing a counter is commutative such that it only matters how many times a counter is incremented, not the order in which the increments are performed.
[edit] Agent
Unlike a Var or Ref, an Agent is always just a single reference. An Agent is modified only by requests sent to a queue, where the requests are processed asynchronously in a thread pool but guaranteed to run in the order the requests are received. The state of the agent acted upon in these requests is the state established by the previously processed request (which is not necessarily the state at the time of the request because, when a new request is made, earlier requests may still be pending).
An agent can be read at any time, but the value read is the current one, not the value resulting from the completion of all pending requests. If you want the "latest" value, you can await the agent, meaning you can block the current thread until all pending requests on the agent have gone through (not including new requests made during this blocking time).
When a request is made in a transaction, the request isn't submitted to the queue until the transaction successfully commits.