# Dictionaries and Error Handling

## Motivation

• Problem: you want to count how often different people send messages to a mailing list
• Input: a file with one email address per line
• Each address may appear many times
• Output: a table of addresses, sorted by frequency of appearance
• Solution: keep track of counts in a list of pairs
• Each entry in the list is `[address, countSoFar]`
• Each time an address is read in, search the list, and either:
• Increment the count of an existing entry, or
• Create a new entry with a count of 1
• Once everything has been read, sort the entries
• Note: you can pass your own comparison function to `list.sort`
• ```import sys

for x in counts:
if x[0] == address:
return x
return None

def countFreq(lines):
counts = []
for line in lines:
line = line.strip()
existing = countFind(counts, line)
if existing:
existing[1] += 1
else:
counts.append([line, 1])
return counts

def compareByFrequency(a, b):
if a[1] < b[1]:
return 1
if a[1] == b[1]:
return 0
return -1

if __name__ == '__main__':
instream = open(sys.argv[1], 'r')
instream.close()

result = countFreq(lines)
result.sort(compareByFrequency)

for address, frequency in result:
print frequency, ":", address
```
• This is clumsy and inefficient
• Clumsy: writing a lot of code to do something that ought to be simple
• Inefficient: repeatedly searching the entire list
• There's an easier way…

## String Formatting

• But before we start, let's tidy up a few loose ends
• Python's `%` operator is used to format strings
• Left side is a string specifying how things are to be formatted
• Right side is the values to be formatted
• One value on its own…
• …or a tuple if there are many values
• `'here %s go' % 'we'` creates a new string `"here we go"`
• The `"%s"` in the left string means “insert a string here”
• So the string on the right is inserted
• Since strings are immutable, this creates a new string (does not modify the format string in place)
• `'left %d right %d' % (-1, 1)` creates `"left -1 right 1"`
• `"%d"` stands for “decimal integer”
• Format specifiers are matched with values left to right
• `'do %s %d times' % ('this', 99)` creates `"do this 99 times"`
• `'[%4d]' % 13` creates `"[  13]"`
• The integer between the `"%"` and `"d"` specifies a width
• `'[%-4d]' % 13` creates `"[13  ]"`, because negative widths mean “left justify”
• `'[%6.2f %%]' % 37.2` creates `"[ 37.20 %]"`
• `"%6.2f"` means “a floating point number, six characters wide, with two after the decimal point”
• `"%%"` is translated into a single `"%"` character
• Just as `"\\"` is how you represent a single `"\"` in a string
• `'[%6.2e %%]' % 37.2` creates `"[3.72e+001 %]"`
• `"%e"` is scientific (exponential) notation
• Python will use more characters than it's been told to if it has to

## Dictionaries

• A sequence (such as a string or list) associates one value with each integer from 0 to N-1
• A dictionary associates one value with each of its keys
• Also called maps, hashes, and associative arrays
• Keys don't have to be integers
• In fact, they are usually strings
• Often visualized as a two-column table
• Each key in the left column must be unique
• Values can occur several times
• Example planet names as keys, and the lengths of their years as values
•  Mercury 87.97 Venus 224.70 ⋮ ⋮ Pluto 90550.0

## The Mechanics

• Create a dictionary by putting key/value pairs inside `{}`
• `{'Newton':1642, 'Darwin':1809}`
• Empty dictionary written `{}`
• Get a value from an existing dictionary using `[]` (just like everything else in Python)
• ```birthday = {
'Newton' : 1642,
'Darwin' : 1809
}
print "Darwin's birthday:", birthday['Darwin']
print "Newton's birthday:", birthday['Newton']
```
```Darwin's birthday: 1809
Newton's birthday: 1642
```
• Assigning to a dictionary key:
• Creates a new entry if the key is not already in dictionary
• Overwrites the previous value if the key is already present
• ```birthday = {}
birthday['Darwin'] = 1809
birthday['Newton'] = 1942  # oops
birthday['Newton'] = 1642
print birthday
```
```{'Darwin': 1809, 'Newton': 1642}
```
• Can only access keys that are present
• Just as you can't index elements of a list that aren't there
• ```birthday = {
'Newton' : 1642,
'Darwin' : 1809
}
print birthday['Turing']
```
```Traceback (most recent call last):
File "key_error.py", line 5, in ?
print birthday['Turing']
KeyError: 'Turing'
```
• Remove an entry using `del d[k]`
• Can only remove entries that are actually present
• ```birthday = {
'Newton' : 1642,
'Darwin' : 1809,
'Turing' : 1912
}

print 'Before deleting Turing:', birthday
del birthday['Turing']
print 'After deleting Turing:', birthday
```
```Before deleting Turing: {'Turing': 1912, 'Newton': 1642, 'Darwin': 1809}
After deleting Turing: {'Newton': 1642, 'Darwin': 1809}
```
• `for k in d` loops over the dictionary's keys (rather than its values)
• Note: this is different from lists, where `for` loops over the values, rather than the “keys” (indices)
• ```birthday = {
'Newton' : 1642,
'Darwin' : 1809,
'Turing' : 1912
}

for name in birthday:
print name, birthday[name]
```
```Turing 1912
Newton 1642
Darwin 1809
```
• Test whether a key `k` is in a dictionary `d` using `k in d`
• Once again, inconsistent with behavior of lists, but useful
• ```birthday = {
'Newton' : 1642,
'Darwin' : 1809
}

for name in ['Newton', 'Turing']:
if name in birthday:
print name, birthday[name]
else:
print 'Who is', name, '?'
```
```Newton 1642
Who is Turing ?
```

## Counting Frequency

• Revisit the problem of counting how often things appear in a file
• Use the things (email addresses) as keys in a dictionary
• The value associated with each key is the number of times it has been seen so far
• ```# Data to count.
names = ['Be','Mg','Mg','Ca','Be','Mg', 'Be']

# Build a dictionary of frequencies.
freq = {}
for name in names:

# Already seen, so increment count by one.
if name in freq:
freq[name] = freq[name] + 1

# Never seen before, so add to dictionary.
else:
freq[name] = 1

# Display.
print freq
```
```{'Be': 3, 'Mg': 3, 'Ca': 1}
```
•  Figure 10.1: Counting Frequency
• Note: dictionaries do not store keys in order
• No way to predict where a dictionary will insert a new value
• Paradoxically, this is part of what makes dictionaries so fast
• So, to print frequency in alphabetic order by address:
• Get the list of keys
• Sort that list
• Loop over it
• ```# Build the frequency dictionary as before.
names = ['Be','Mg','Mg','Ca','Be','Mg', 'Be']
freq = {}
for name in names:
if name in freq:
freq[name] = freq[name] + 1
else:
freq[name] = 1

# Print in alphabetical order by key.
keys = freq.keys()
keys.sort()
for k in keys:
print k, freq[k]
```
```Be 3
Ca 1
Mg 3
```
• Can simplify this code using `dict.get`
• Get either the count associated with the key, or 0, then add one to it
• ```# Use 'get' to simplify the counting loop.
names = ['Be','Mg','Mg','Ca','Be','Mg', 'Be']
freq = {}
for name in names:
freq[name] = freq.get(name, 0) + 1

# Print.
keys = freq.keys()
keys.sort()
for k in keys:
print k, freq[k]
```
```Be 3
Ca 1
Mg 3
```
• But how to print in order of frequency?
• Need to invert the dictionary (i.e., swap keys and values)
• Problem: there might be collisions, since values aren't guaranteed to be unique
• What is the inverse of `{'a':1, 'b':1, 'c':1}`?
• Solution: store a list of values instead of just a single value
• ```# Count values as before.
names = ['Be','Mg','Mg','Ca','Be','Mg', 'Be']
freq = {}
for name in names:
freq[name] = freq.get(name, 0) + 1

# Invert.
inverse = {}
for (key, value) in freq.items():
seen = inverse.get(value, [])
seen.append(key)
inverse[value] = seen

# Print.
keys = inverse.keys()
keys.sort()
for k in keys:
print k, inverse[k]
```
```1 ['Ca']
3 ['Be', 'Mg']
```
•  Figure 10.2: Tracing Execution of Dictionary Inversion

## Formatting Strings with Dictionaries

• Complex string formatting can be hard to understand
• Especially if one value needs to be used several times
• E.g., producing an email form letter, and need to put the person's name in three places
• Instead of a tuple, `"%"` can take a dictionary as its right argument
• Use `"%(varname)s"` or `"%(varname)4d"` inside format string to identify what's to be substituted
• `'%(word)s, t%(word)s, and everyw%(word)s' % {'word' : 'here'}` creates `"here, there, and everywhere"`

## Catching Errors

• Like other modern languages, Python uses exceptions to handle errors
• Separating normal (expected) operation from exceptional cases makes code easier to read
• Structured like `if`/`else`
• Introduce a block of statements with `try`
• Then introduce error handling code with `except`
• When something goes wrong in the `try` block, Python raises an exception
• Python finds the matching `except` block, and executes whatever instructions it contains
• ```print 'before try/except'

try:
print '..first line of try block'
print 1/0
print '..last line of try block'
except:
print '..handling error!'

print 'after try/except'
```
```before try/except
..first line of try block
..handling error!
after try/except
```
•  Figure 10.3: Flow of Control in Try/Except
• You can include code to execute when there isn't an error using `else`
• ```for val in (0.0, 1.0):
print "val is", val
try:
x = 1.0/val
except:
print '..handling error!'
else:
print '...in the else block'
```
```val is 0.0
..handling error!
val is 1.0
...in the else block
```

## Exception Objects

• When an error occurs, Python actually creates an object to store information about it
• Typically an error message
• Can choose to handle specific errors by specifying an exception type in the `except` statement
• E.g., handle division by zero, but not out-of-bounds list index
• Syntax is `except ExceptionType, variable`
• `ExceptionType` is what you want to catch
• If something of that type is raised, the exception object is assigned to `variable`
• ```values = [-1.0, 0.0, 1.0]
for i in range(4):    # note: top index will be out of bounds
try:
x = 1.0 / values[i]
except ZeroDivisionError, e:
print 'divide by zero:', e
```
```divide by zero: float division
```
```Traceback (most recent call last):
File "except_with_type.py", line 4, in ?
x = 1.0 / values[i]
IndexError: list index out of range
```
• You can associate any number of `except` blocks with a single `try`
• They're tested in order—whichever matches first, wins
• If a “naked” `except` appears, it must be the last one (since it catches everything)
• Better to use `except Exception, e` so that you have the exception object
• Exceptions are organized in a hierarchy
• E.g., `ZeroDivisionError`, `OverflowError`, and `FloatingPointError` are all specialized versions of `ArithmeticError`
• And `IndexError` (list/string index out of bounds) and `KeyError` (non-existent dictionary key) are specializations of `LookupError`
• ```for i in range(2):
try:
if i == 0:
vals = ['a', 'b']
x = vals[99]
else:
vals = {20 : 'a', 30 : 'b'}
x = vals[40]
except KeyError, e:
print 'loop %d, handling key errors:' % i, e
except LookupError, e:
print 'loop %d, handling generic lookup errors:' % i, e
```
```loop 0, handling generic lookup errors: list index out of range
loop 1, handling key errors: 40
```
• We'll see in a later lecture how this hierarchy is implemented
• Hint: it has something to do with objects…
• Common exception types
• NamePurpose
`Exception`Root of exception hierarchy.
`ArithmeticError`Illegal arithmetic operation
`IndexError`Bad index to sequence (out of bounds or illegal type)
`KeyError`Nonexistent index to dictionary
`TypeError`Illegal type (e.g., trying to add integer and string)
`ValueError`Illegal value (e.g., `math.sqrt(-1)`)
`IOError`Unable to create or open file, read data, etc.

Table 10.2: Common Exception Types

## Functions and Exceptions

• So if a function raises an exception, who gets to handle it?
• Each time Python enters a `try`/`except` block, it pushes the `except` handlers on a stack
• Just like the function call stack
• When an exception is raised, Python look in this stack, and jumps to the first matching handler
• ```def invert(vals, index):
try:
vals[index] = 1/vals[index]
except ArithmeticError, e:
print 'inner exception handler:', e

def each(func, vals, indices):
try:
for i in indices:
func(vals, i)
except IndexError, e:
print 'outer exception handler:', e

# Note: top index will be out of bounds
allValues = [-1, 0, 1]
allIndices = [0, 1, 2, 3]
each(invert, allValues, allIndices)
```
```calling func for index 0
calling func for index 1
inner exception handler: integer division or modulo by zero
calling func for index 2
calling func for index 3
outer exception handler: list index out of range
```
•  Figure 10.4: Stacking Exception Handlers

## Raising Exceptions

• Use `raise` to trigger exception processing
• Obviously, have to specify the type of exception you're raising
• E.g., `raise Exception('this is an error message')`
• ```for i in range(4):
try:
# Raise exception if value is odd.
if (i % 2) == 1:
raise ValueError('index is odd')
else:
print 'not raising exception for %d' % i
except ValueError, e:
print 'caught exception for %d' % i, e
```
```not raising exception for 0
caught exception for 1 index is odd
not raising exception for 2
caught exception for 3 index is odd
```
• Raise exceptions to report errors in functions
• Rather than returning `None` or -1 or something like that
• `list.find` (which returns -1 if something can't be found) is unusual in this respect
• But it's more convenient than `list.index`, which throws an exception
• General rule: throw low, catch high
• I.e., throw lots of very specific exceptions, but only catch them in a few places, high in your code, where you can take corrective action
• Reason #1: every application handles errors differently
• If someone is using your library in a GUI, you don't want to be printing to `stderr`
• Reason #2: error handling code gets in the way of understanding the algorithm, so disentangle the two
• ```import sys

# Look for the first matching line in a file.
def findFirstMatchingLine(filename, word):
infile = open(filename, 'r')
infile.close()
for line in lines:
if line.find(word) >= 0:
return line.rstrip()
errMsg = '%s does not contain %s' % (filename, word)
raise ValueError(errMsg)

if __name__ == '__main__':
try:
word = sys.argv[1]
for filename in sys.argv[2:]:
line = findFirstMatchingLine(filename, word)
print '%s / %s => %s' % (filename, word, line)
except ValueError, e:
print >> sys.stderr, e
except IOError, e:
print >> sys.stderr, e
except Exception, e:
print >> sys.stderr, e
```
```import sys

# Look for the first matching line in a file.
def findFirstMatchingLine(filename, word):
infile = open(filename, 'r')
infile.close()
for line in lines:
if line.find(word) >= 0:
return line.rstrip()
errMsg = '%s does not contain %s' % (filename, word)
raise ValueError(errMsg)

if __name__ == '__main__':
try:
word = sys.argv[1]
for filename in sys.argv[2:]:
line = findFirstMatchingLine(filename, word)
print '%s / %s => %s' % (filename, word, line)
except ValueError, e:
print >> sys.stderr, e
except IOError, e:
print >> sys.stderr, e
except Exception, e:
print >> sys.stderr, e
```

## Assertions

• An assertion is a claim about the state of a program that must be true if the program is correct
• E.g., “this list is not empty”, or “this value is greater than that one”
• Three kinds of assertions:
• A pre-condition is something that must be true in order for something else to work
• E.g., “argument is non-negative” is a pre-condition for `math.sqrt`
• A post-condition is something that is guaranteed to be true after something else happens
• E.g., “result is either -1 or a legal index to the string” is a post-condition of `string.find`
• Anything else is simply called a state assertion
• E.g., “this loop always executes at least once”
• Embed assertions in programs using the `assert` statement
• First argument is a Boolean condition
• Second (optional) is an error message
• If the condition isn't satisfied, Python raises an `AssertionError` exception
• ```def findRange(values):
assert (values != None) and (len(values) > 0), 'Illegal input'
left = min(values)
right = max(values)
assert left < right, 'Empty range'
return (left, right)
```
• Good programmers practice defensive programming by putting lots of assertions in their code
• “Executable documentation”
• Describes how the program is supposed to behave, in a way that the computer can check
• Can't ever be out of step with the actual code, since it is code
• Fail early, fail often
• The less time that elapses between when an error occurs, and when its effects show up, the easier it is to fix
• Bugs grow up to be assertions
• If you made an error once, there's a good chance you'll make it again
• Adding an assertion helps you find and fix it faster the second time
• Assertions should only be used to check the program's correctness, not the correctness of its inputs
• I.e., don't use `assert` to check that your program managed to open a file
• It isn't the program's fault if it doesn't have read permission…
• Complex assertions are more likely to contain bugs than to catch them
• If it won't fit comfortably on one line, it probably shouldn't be an assertion

## Running Other Programs

• One program can run another by spawning a new process
• Make and the shell both do this
• So can your programs
• `os.popen` runs a command, and lets you either:
• Write to its standard input, or
• Read from its standard output
•  Figure 10.5: Running Another Program
• Example: read the output of the `ls` command into your program
• ```import os

instream = os.popen('ls', 'r')
instream.close()
for line in lines[:4]:
line = line.rstrip()
print '%4d: %s' % (len(line), line)
```
```  13: addresses.txt
13: assertions.py
14: create_dict.py
18: create_dict.py.out
```
• Example: send the output of your program to `gzip` for compression:
• ```import os

# Create compressed output and report its size.
outstream = os.popen('gzip -c > temp.txt.gz', 'w')
for i in range(1000):
for word in 'this is a test'.split():
print >> outstream, word
outstream.close()
print 'output is %d bytes' % os.stat('temp.txt.gz').st_size

# See how big it would have been.
instream = os.popen('gunzip -c temp.txt.gz', 'r')
total = 0
for line in instream.readlines():
total += len(line)
instream.close()
print 'output would have been %d bytes' % total
```
```output is 79 bytes
output would have been 15000 bytes
```
• You can use `os.popen2` to connect to the other program's input and output simultaneously
• ```import os

childIn, childOut = os.popen2('sort')
for word in 'this is a test'.split():
print >> childIn, word
childIn.close()
for line in childOut.readlines():
print line.rstrip()
childOut.close()
```
```a
is
test
this
```
• But you must be very careful about deadlock
• If the child is trying to write to the parent while the parent is trying to write to the child, both are blocked
• The operating system can only hold on to a limited amount of input or output data at a time, so increasing the buffer size only delays the problem
•  Figure 10.6: Deadlock
• Things are even trickier if you use `os.popen3`, which gives you the child process's `stdin`, `stdout`, and `stderr`
• If the parent is trying to read from the child's `stdout` while it's trying to write an error message to `stderr`, both programs will be blocked
• Solutions are outside the scope of this course
• Use `os.popen` (with temporary files if necessary) until you're comfortable enough to move on

## Exercises

Exercise 10.1:

Suppose you wanted to sort entries with the same frequency alphabetically. What changes would you have to make to `compareByFrequency`?