Introducing Julia/Dictionaries and sets

From Wikibooks, open books for an open world
Jump to: navigation, search
« Introducing Julia
Dictionaries and sets
»
Functions Strings and characters

Dictionaries[edit]

Many of the functions introduced so far have been shown working on arrays (and tuples). But arrays are just one type of collection. Julia has others.

A simple look-up table is a useful way of organizing many types of data: given a single piece of information, such as a number, string, or symbol, called the key, what is the corresponding data value? For this purpose, Julia provides the Dictionary object, called Dict for short. It's an "associative collection" because it associates keys with values.

Creating dictionaries[edit]

You can create a simple dictionary using the following syntax:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3)

Dict{String,Int64} with 3 entries:

  "c" => 3
  "b" => 2
  "a" => 1

dict is now a dictionary. The keys are "a", "b", and "c", the corresponding values are 1, 2, and 3. The => operator is called the Pair() function. In a dictionary, keys are always unique – you can't have two keys with the same name.

If you know the types of the keys and values in advance, you can (and probably should) specify them after the Dict keyword, in curly braces:

julia> dict = Dict{String,Integer}("a"=>1, "b" => 2)

Dict{String,Integer} with 2 entries:
  "b" => 2
  "a" => 1

You can also create dictionaries using the generator/comprehensions syntax:

julia> Dict(string(i) => sind(i) for i = 0:5:360)
Dict{String,Float64} with 73 entries:
  "320" => -0.642788
  "65"  => 0.906308
  "155" => 0.422618
       => 

Use the following syntax to create a typed empty dictionary:

julia> Dict{String,Int64}()
Dict{String,Int64} with 0 entries

or you can omit the types, and get an untyped dictionary:

julia> Dict()
Dict{Any,Any} with 0 entries

It's sometimes useful to create dictionary entries using a for loop:

files = ["a.txt", "b.txt", "c.txt"]
fvars = Dict()
for (n, f) in enumerate(files)
   fvars["x_$(n)"] = f
end

This is one way you could create a set of 'variables' stored in a dictionary:

julia> fvars
Dict{Any,Any} with 3 entries:
 "x_1" => "a.txt"
 "x_2" => "b.txt"
 "x_3" => "c.txt"

Looking things up[edit]

To get a value, given a key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)
julia> dict["a"]
1

if the keys are strings. Or, if the keys are symbols:

julia> symdict = Dict(:x => 1, :y => 3, :z => 6)
Dict{Symbol,Int64} with 3 entries:
  :z => 6
  :x => 1
  :y => 3

julia> symdict[:x]
1

Or if the keys are integers:

julia> intdict = Dict(1 => "one", 2 => "two", 3  => "three")
Dict{Int64,String} with 3 entries:
  2 => "two"
  3 => "three"
  1 => "one"

julia> intdict[2]
"two"

You can instead use the get() function, and provide a fail-safe default value if there's no value for that particular key:

julia> get(dict, "a", 0)
1
julia> get(dict, "1", 0)
0

If you don't use a default value as a safety precaution, you'll get an error if there's no key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)
Dict{String,Int64} with 5 entries:
  "c" => 3
  "e" => 5
  "b" => 2
  "a" => 1
  "d" => 4

julia> get(dict, "w", 0)
0

If you don't want get() to provide a default value, use a try...catch block:

try
    dict["f"]
    catch error
       if isa(error, KeyError)
           println("sorry, I couldn't find anything")
       end
end

sorry, I couldn't find anything


To change a value assigned to an existing key (or assign a value to a hitherto unseen key):

julia> dict["a"] = 10
10

(Remember that keys must be unique for a dictionary. There's always only one key called a in this dictionary, so when you assign a value to a key that already exists, you're not creating a new one, just modifying an existing one.)

To see if the dictionary contains a key, use haskey():

julia> haskey(dict, "z")
false

To check for the existence of a key/value pair:

julia> in(("b" => 2),dict)
true

To add a new key and value to a dictionary, use this:

julia> dict["d"] = 4
4

You can delete a key from the dictionary, using delete!():

julia> delete!(dict, "d")
Dict{Any,Any} with 3 entries:
  "c" => 3
  "b" => 2
  "a" => 10

You'll notice that the dictionary doesn't seem to be sorted in any way. The keys are in no particular order. This is due to the way they're stored, and you can't sort them in place. (But see Sorting, below.)

To get all keys, use the keys() function:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5);
julia> keys(dict)
Base.KeyIterator for a Dict{String,Int64} with 5 entries. Keys:
  "c"
  "e"
  "b"
  "a"
  "d"

The result is a Key iterator, that, as its name suggests, is ideally suited for iterating through a dictionary key by key:

julia> [uppercase(key) for key in keys(dict)]
5-element Array{Any,1}:
 "C"
 "E"
 "B"
 "A"
 "D"

This uses the list comprehension form ([ new-element for loop-variable in iterator ]) and each new element is collected into an array. An alternative would be:

julia> map(uppercase, collect(keys(dict)))
5-element Array{String,1}:
 "C"
 "E"
 "B"
 "A"
 "D"

To get all the values, use the values() function:

julia> values(dict)
Base.ValueIterator for a Dict{String,Int64} with 5 entries. Values:
  3
  5
  2
  1
  4

If you want to go through a dictionary and process each key/value, you can make use the fact that dictionaries themselves are iterable objects:

julia> for kv in dict
         println(kv)
       end
"c"=>3
"e"=>5
"b"=>2
"a"=>1
"d"=>4

where kv is a tuple containing each key/value pair in turn.

Or you could do:

julia> for k in keys(dict)
          println(k, " ==> ", dict[k])
       end
c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Better, you can use a key/value tuple to simplify the iteration even more:

julia> for (key, value) in dict
         println(key, " ==> ", value)
       end
c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Here's another example:

for tuple in Dict("1"=>"Hydrogen", "2"=>"Helium", "3"=>"Lithium")
    println("Element $(tuple[1]) is $(tuple[2])")
end

Element 1 is Hydrogen
Element 2 is Helium
Element 3 is Lithium

(Notice the string interpolation operator, $. This allows you to use a variable's name in a string and get the variable's value when the string is printed. You can include any Julia expression in a string using $().)

Sorting a dictionary[edit]

Because dictionaries don't store the keys in any particular order, you'll have to output the dictionary to a sorted array if you need to obtain the items in order:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)

Dict{String,Int64} with 6 entries:
  "f" => 6
  "c" => 3
  "e" => 5
  "b" => 2
  "a" => 1
  "d" => 4

julia> for key in sort(collect(keys(dict)))
           println("$key => $(dict[key])")
       end

a => 1
b => 2
c => 3
d => 4
e => 5
f => 6

If you really need to have a dictionary that remains sorted all the time, you can use the SortedDict data type from the DataStructures.jl package (after having installed it).

julia> Pkg.add("DataStructures")
julia> import DataStructures
julia> dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
  "b" => 2
  "c" => 3
  "d" => 4
  "e" => 5
  "f" => 6
julia> dict["a"] = 100000
100000
julia> dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
  "a" => 100000
  "b" => 2
  "c" => 3
  "d" => 4
  "e" => 5
  "f" => 6

Simple example: counting words[edit]

A simple application of a dictionary is to count how many times each word appears in a piece of text. Each word is a key, and the value of the key is the number of times that word appears in the text.

Let's count the words in the Sherlock Holmes stories. I've downloaded the text from the excellent Project Gutenberg and stored them in a file "sherlock-holmes-canon.txt". To create a list of words from the loaded text in canon, it suffices to split the text using a regular expression, after converting the text to lower case:

julia> f = open("sherlock-holmes-canon")
julia> wordlist = split(lowercase(readstring(f)), r"\W", keep=false)

669336-element Array{SubString{String},1}:
 "the"
 "complete"
 "sherlock"
 "holmes"
 "arthur"
 "conan"
 "doyle"
 "table"
 "of"
 "contents"
 "a"
 "study"
 "in"
 "scarlet"
 "the"
 "sign"
 "of"
 "the"
 "four"
 "the"
 "adventures"
 "of"
 "sherlock"
 "holmes"
 "a"
 "scandal"
 "in"
 "bohemia"
 "the"
 
 "in"
 "our"
 "archives"
 "watson"
 "some"
 "day"
 "the"
 "true"
 "story"
 "may"
 "be"
 "told"

To store the words and the word counts, we'll create a dictionary:

julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64} with 0 entries

To build the dictionary, loop through the list of words, and use get() to look up the current tally, if any. If the word has already been seen, the count can be increased. If the word hasn't been seen before, the fall-back third argument of get() ensures that the absence doesn't cause an error, and 1 is stored instead.

for word in wordlist
    wordcounts[word]=get(wordcounts, word, 0) + 1
end

Now you can look up words in the wordcounts dictionary and find out how many times they appear:

julia> wordcounts["watson"]
1040

julia> wordcounts["holmes"]
3057

julia> wordcounts["sherlock"]
415

julia> wordcounts["lestrade"]
244

Dictionaries aren't sorted, but you can use the collect() and keys() functions on the dictionary to collect the keys and then sort them. In a loop you can work through the dictionary in alphabetical order:

julia> for i in sort(collect(keys(wordcounts)))
  println("$i, $(wordcounts[i])")
end

000, 5
1, 8
10, 7
100, 4
1000, 9
104, 1
109, 1
10s, 2
10th, 1
11, 9
1100, 1
117, 2
117th, 2
11th, 1
12, 2
120, 2
126b, 3
            
zamba, 2
zeal, 5
zealand, 3
zealous, 3
zenith, 1
zeppelin, 1
zero, 2
zest, 3
zig, 1
zigzag, 3
zigzagged, 1
zinc, 3
zion, 2
zoo, 1
zoology, 2
zu, 1
zum, 2

But how do you find out the most common words? One way is to use collect() to convert the dictionary to an array of tuples, and then to sort the array by looking at the last value of each tuple:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
19171-element Array{Pair{String,Int64},1}:
 ("the",36244)     
 ("and",17593)     
 ("i",17357)       
 ("of",16779)      
 ("to",16041)      
 ("a",15848)       
 ("that",11506)   
                  
 ("enrage",1)      
 ("smuggled",1)    
 ("lounges",1)     
 ("devotes",1)     
 ("reverberated",1)
 ("munitions",1)   
 ("graybeard",1)

To see only the top 20 words:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:20]
20-element Array{Pair{String,Int64},1}:
 ("the",36244) 
 ("and",17593) 
 ("i",17357)   
 ("of",16779)  
 ("to",16041)  
 ("a",15848)   
 ("that",11506)
 ("it",11101)  
 ("in",10766)  
 ("he",10366)  
 ("was",9844)  
 ("you",9688)  
 ("his",7836)  
 ("is",6650)   
 ("had",6057)  
 ("have",5532) 
 ("my",5293)   
 ("with",5256) 
 ("as",4755)   
 ("for",4713)

In a similar way, you can use the filter() function to find, for example, all words that start with "k" and occur less than four times:

julia> filter(tuple -> startswith(first(tuple), "k") && last(tuple) < 4, collect(wordcounts))
73-element Array{Pair{String,Int64},1}:
 ("keg",1)
 ("klux",2)
 ("knifing",1)
 ("keening",1)
 ("kansas",3)
 
 ("kaiser",1)
 ("kidnap",2)
 ("keswick",1)
 ("kings",2)
 ("kratides",3)
 ("ken",2)
 ("kindliness",2)
 ("klan",2)
 ("keepsake",1)
 ("kindled",2)
 ("kit",2)
 ("kicking",1)
 ("kramm",2)
 ("knob",1)

More complex structures[edit]

A dictionary can hold many different types of values. Here for example is a dictionary where the keys are strings and the values are arrays of arrays of points. For example, this could be used to store graphical shapes describing the letters of the alphabet (some of which have two or more loops):

    julia> p = Dict{String, Array{Array}}()  <!-- p = Dict{String, Array{Array{Point}}}() -->
    Dict{String,Array{Array{T,N},N}}
    
    julia> p["a"] = Array[[Point(0,0), Point(1,1)], [Point(34, 23), Point(5,6)]]
     2-element Array{Array{T,N},1}:
     [Point(0.0,0.0), Point(1.0,1.0)]
     [Point(34.0,23.0), Point(5.0,6.0)]
    
    julia> push!(p["a"], [Point(34.0,23.0), Point(5.0,6.0)])
    3-element Array{Array{T,N},1}:
     [Point(0.0,0.0), Point(1.0,1.0)]
     [Point(34.0,23.0), Point(5.0,6.0)]
     [Point(34.0,23.0), Point(5.0,6.0)]

Or create a dictionary with some already-known values:

    julia> d = Dict("shape1" => Array [ [ Point(0,0), Point(-20,57)], [Point(34, -23), Point(-10,12) ] ])
    Dict{String,Array{Array{T,N},1}} with 1 entry:
     "shape1" => Array [ [ Point(0.0,0.0), Point(-20.0,57.0)], [Point(34.0,-23.0), Point(-10.0,12.0) ] ]

Add another array to the first one:

    julia> push!(d["shape1"], [Point(-124.0, 37.0), Point(25.0,32.0)])
    3-element Array{Array{T,N},1}:
     [Point(0.0,0.0), Point(-20.0,57.0)]
     [Point(34.0,-23.0), Point(-10.0,12.0)]
     [Point(-124.0,37.0), Point(25.0,32.0)]

Sets[edit]

A set is a collection of elements, just like an array or dictionary, with no duplicated elements.

The two important differences between a set and other types of collection is that in a set you can have only one of each element, and, in a set, the order of elements isn't important (whereas an array can have multiple copies of an element and their order is remembered).

You can create an empty set using the Set constructor function:

julia> colors = Set()
Set{Any}({})

As elsewhere in Julia, you can use curly braces to specify the type:

julia> primes = Set{Int64}()
Set{Int64}({})

You can create and fill sets in one go:

julia> colors = Set{String}(["red","green","blue","yellow"])
Set(String["yellow","blue","green","red"])

or you can get Julia to play "guess the type":

julia> colors = Set(["red","green","blue","yellow"])
Set{String}({"yellow","blue","green","red"})

Quite a few of the functions that work with arrays also work with sets. Adding elements to sets, for example, is a bit like adding elements to arrays. You can use push!():

julia> push!(colors, "black") 
Set{String}({"yellow","blue","green","black","red"})

But you can't use unshift!(), because that works only for ordered things like arrays. What happens if you try to add something to the set that's already there? Absolutely nothing. You don't get a copy added, because it's a set, not an array, and sets don't store repeated elements. To see if something is in the set, you can use in():

julia> in("green", colors)
true

There are some standard operations you can do with sets, namely find their union, intersection, and difference, with the functions, union(), intersect(), and setdiff():

julia> rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set(String["indigo","yellow","orange","blue","violet","green","red"])

The union of two sets is the set of everything that is in one or the other sets. The result is another set – so you can't have two "yellow"s here, even though we've got a "yellow" in each set:

julia> union(colors, rainbow)
Set(String["indigo","yellow","orange","blue","violet","green","black","red"])

The intersection of two sets is the set that contains every element that belongs to both sets:

julia> intersect(colors, rainbow)
Set(String["yellow","blue","green","red"])

The difference between two sets is the set of elements that are in the first set, but not in the second. This time, the order in which you supply the sets matters. The setdiff() function finds the elements that are in the first set, colors, but not in the second set, rainbow:

julia> setdiff(colors, rainbow)
Set(String["black"])

Other functions[edit]

Functions that work on arrays and sets sometimes work on dictionaries too. The official documentation doesn't always spell it out, but you can quickly try things out. For example, some of the set operations can be applied to dictionaries:

 julia> d1 = Dict(1=>"a", 2 => "b")
 Dict{Int64,String} with 2 entries:
   2 => "b"
   1 => "a"
  
 julia> d2 = Dict(2 => "b", 3 =>"c", 4 => "d")
 Dict{Int64,String} with 3 entries:
   4 => "d"
   2 => "b"
   3 => "c"
 
 julia> union(d1, d2)
 4-element Array{Pair{Int64,String},1}:
  2=>"b"
  1=>"a"
  4=>"d"
  3=>"c"
 
 julia> intersect(d1, d2)
 1-element Array{Pair{Int64,String},1}:
  2=>"b"
  
 julia> setdiff(d1, d2)
  1-element Array{Pair{Int64,String},1}:
   1=>"a"

Notice that the results are returned as arrays of Pairs, rather than as Dictionaries.

Functions such as filter(), map(), and collect() which we've already seen being used with arrays also work with dictionaries:

 julia> filter((k, v) -> k == 1, d1)
 Dict{Int64,String} with 1 entry:
   1 => "a"

There's a merge() function which can merge two dictionaries:

 julia> merge(d1, d2)
 Dict{Int64,String} with 4 entries:
   4 => "d"
   2 => "b"
   3 => "c"
   1 => "a"