Software Carpentry

Welcome

Course Outline

Acknowledgments

Introduction

Motivation

Meeting Standards

The Most Important Idea in This Course

Who You Are

A Quick Self-Test

Learn by Building

Topics

Setting Up

Recommended Reading

Typographic Conventions

Version Control

Problem #1: Synchronizing Files

Problem #2: Undoing Changes

Solution: Version Control

CVS and Subversion

Basic Use

How To Do It

Working Together

What Versions Actually Mean

Warning: Binary Files

Rolling Back Changes

And Finally, Getting Started

Subversion Command Reference

How to Read Subversion Output

Branching and Merging

Exercises

Exercise 3.1:

Follow the instructions given to you by your instructor to check out a copy of the Subversion repository you'll be using in this course. Unless otherwise noted, the exercises below assume that you have done this, and that your working copy is in a directory called course. You will submit all of your exercises in this course by checking files into your repository.

Exercise 3.2:

Create a file course/ex01/bio.txt (where course is the root of your working copy of your Subversion repository), and write a short biography of yourself (100 words or so) of the kind used in academic journals, conference proceedings, etc. Commit this file to your repository. Remember to provide a meaningful comment when committing the file!

Exercise 3.3:

What's the difference between mv and svn mv? Put the answer in a file called course/ex01/mv.txt and commit your changes.

Once you have committed your changes, type svn log in your course directory. If you didn't know what you'd just done, would you be able to figure it out from the log messages? If not, why not?

Exercise 3.4:

In this exercise, you'll simulate the actions of two people editing a single file. To do that, you'll need to check out a second copy of your repository. One way to do this is to use a separate computer (e.g., your laptop, your home computer, or a machine in the lab). Another is to make a temporary directory, and check out a second copy of your repository there. Please make sure that the second copy isn't inside the first, or vice versa—Subversion will become very confused.

Let's call the two working copies Blue and Green. Do the following:

a) Create Blue/ex01/planets.txt, and add the following lines:

Mercury
Venus
Earth
Mars
Jupiter
Saturn

Commit the file.

b) Update the Green repository. (You should get a copy of planets.txt.)

c) Change Blue/ex01/planets.txt so that it reads:

1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn

Commit the changes.

d) Edit Green/ex01/planets.txt so that its contents are as shown below. Do not do svn update before editing this file, as that will spoil the exercise.

Mercury 0
Venus 0
Earth 1
Mars 2
Jupiter 16 (and counting)
Saturn 14 (and counting)

e) Now, in Green, do svn update. Subversion should tell you that there are conflicts in planets.txt. Resolve the conflicts so that the file contains:

1. Mercury 0
2. Venus 0
3. Earth 1
4. Mars 2
5. Jupiter 16
6. Saturn 14

Commit the changes.

f) Update the Blue repository, and check that planets.txt now has the same content as it has in the Green repository.

Exercise 3.5:

Add another line or two to course/ex01/bio.txt and commit those changes. Then, use svn merge to restore the original contents of your biography (course/ex01/bio.txt), and commit the result. When you are done, bio.txt should look the way it did at the end of the first part of the previous exercise.) Note: the purpose of this exercise is to teach you how to go back in time to get old versions of files—while it would be simpler in this case just to edit bio.txt, you can't (reliably) do that when you've made larger changes, to multiple files, over a longer period of time.

Shell Basics

Introduction

The Shell vs. the Operating System

The File System

A Few Simple Commands

Creating Files and Directories

Wildcards

Exercises

Exercise 4.1:

Suppose you are in your home directory, and ls shows you this:

Makefile        biography.txt   data
enrolment.txt   programs        thesis

What argument(s) do you have to give to ls to get it to put a trailing slash after the names of subdirectories, like this:

Makefile        biography.txt   data/
enrolment.txt   programs/       thesis/

If you run ls data, it shows:

earth.txt       jupiter.txt     mars.txt
mercury.txt     saturn.txt      venus.txt

What command should you run to get the following output:

data/earth.txt          data/jupiter.txt        data/mars.txt
data/mercury.txt        data/saturn.txt         data/venus.txt

What if you want this (note that an extra entry is being displayed):

total 7
drwxr-xr-x    7 someone        0 May  6 08:27 .svn
-rw-r--r--    1 someone     2396 May  6 08:38 earth.txt
-rw-r--r--    1 someone     1263 May  6 08:38 jupiter.txt
-rw-r--r--    1 someone     1015 May  6 08:43 mars.txt
-rw-r--r--    1 someone      946 May  6 08:41 mercury.txt
-rw-r--r--    1 someone     1714 May  6 08:40 saturn.txt
-rw-r--r--    1 someone      881 May  6 08:40 venus.txt

Note: the command will display your user ID, rather than someone. On some machines, the command will also display a group ID. Ignore these differences for the purpose of this question.

Exercise 4.2:

According to the listing of the data directory above, who can read the file mercury.txt? Who can write it (i.e., change its contents or delete it)? When was mercury.txt last changed? What command would you run to allow everyone to edit or delete the file?

Exercise 4.3:

Suppose you want to remove all files whose names (not including their extensions) are of length 3, start with the letter a, and have .txt as extension. What command would you use? For example, if the directory contains three files a.txt, abc.txt, and abcd.txt, the command should remove abc.txt , but not the other two files.

Exercise 4.4:

What does the command cd ~ do? What about cd ~gvwilson?

Exercise 4.5:

What's the difference between the commands cd HOME and cd $HOME?

Exercise 4.6:

Suppose you want to list the names of all the text files in the data directory that contain the word "carpentry". What command or commands could you use?

Exercise 4.7:

Suppose you have written a program called analyze. What command or commands could you use to display the first ten lines of its output? What would you use to display lines 50-100? To send lines 50-100 to a file called tmp.txt?

Exercise 4.8:

The command ls data > tmp.txt writes a listing of the data directory's contents into tmp.txt. Anything that was in the file before the command was run is overwritten. What command could you use to append the listing to tmp.txt instead?

Exercise 4.9:

What command(s) would you use to find out how many subdirectories there are in the lectures directory?

Exercise 4.10:

What does rm *.ch? What about rm *.[ch]?

Exercise 4.11:

What command(s) could you use to find out how many instances of a program are running on your computer at once? For example, if you are on Windows, what would you do to find out how many instances of svchost.exe are running? On Unix, what would you do to find out how many instances of bash are running?

Exercise 4.12:

What do the commands pushd, popd, and dirs do? Where do their names come from?

Exercise 4.13:

How would you send the file earth.txt to the default printer? How would you check it made it (other than wandering over to the printer and standing there)?

Exercise 4.14:

A colleague asks for your data files. How would you archive them to send as one file? How could you compress them?

Exercise 4.15:

The instructor wants you to use a hitherto unknown command for manipulating files. How would you get help on this command?

Exercise 4.16:

You have changed a text file on your home PC, and mailed it to the university terminal. What steps can you take to see what changes you may have made, compared with a master copy in your home directory?

Exercise 4.17:

How would you change your password?

Exercise 4.18:

grep is one of the more useful tools in the toolbox. It finds lines in files that match a pattern and prints them out. For example, assume I have files earth.txt and venus.txt containing lines like this:

Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02

If I type grep Period *.txt in that directory, I get:

earth.txt:Period: 365.26 days
venus.txt:Period: 224.70 days

Search strings can use regular expressions, which will be discussed in a later lecture. grep takes many options as well; for example, grep -c /bin/bash /etc/passwd reports how many lines in /etc/passwd (the Unix password file) that contain the string /bin/bash, which in turn tells me how many users are using bash as their shell.

Suppose all you wanted was a list of the files that contained lines matching a pattern, rather than the matches themselves—what flag or flags would you give to grep? What if you wanted the line numbers of matching lines?

Exercise 4.19:

diff finds and displays the differences between two files. It works best if both files are plain text (i.e., not images or Excel spreadsheets). By default, it shows the differences in groups, like this:

3c3,4
< Inclination: 0.00
---
> Inclination: 0.00 degrees
> Satellites: 1

(The rather cryptic header "3c3,4" means that line 3 of the first file must be changed to get lines 3-4 of the second.)

What flag(s) should you give diff to tell it to ignore changes that just insert or delete blank lines? What if you want to ignore changes in case (i.e., treat lowercase and uppercase letters as the same)?

Exercise 4.20:

Suppose you wanted ls to sort its output by filename extension, i.e., to list all .cmd files before all .exe files, and all .exe's before all .txt files. What command or commands would you use?

More Shell

Redirecting Input and Output

Pipes

Environment Variables

How the Shell Finds Programs

Basic Tools

Ownership and Permission: Unix

Ownership and Permission: Windows

More Advanced Tools

Exercises

Exercise 5.1:

You're worried your data files can be read by your nemesis, Dr. Evil. How would you check whether or not he can, and if necessary change permissions so only you can read or write the files?

Basic Scripting

Why Python?

Running Python Interactively

Running Saved Programs

Variables

Printing and Quoting

Numbers and Arithmetic

Booleans

Comparisons

Conditionals

While Loops, Break, and Continue

Strings, Lists, and Files

Where We Just Were

But First, Strings

Slicing, Bounds, and Negative Indices

String Methods

Lists

List Methods

For Loops and Ranges

Membership

Nesting Lists

Tuples

Files

Other Ways to Do It

Exercises

Exercise 7.1:

What does "aaaaa".count("aaa") return? Why?

Exercise 7.2:

What does the built-in function enumerate do? Use it to write a function called findOver that takes a list of numbers called values, and a number called threshold, as arguments, and returns a list of the locations where items in values are greater than threshold. For example, findOver([1.1, 3.8, -1.6, 7.4], 2.0) should return [1, 3], since the values in the input list at locations 1 and 3 are greater than the threshold 2.0.

Exercise 7.3:

What do each of the following five code fragments do? Why?

x = ['a', 'b', 'c', 'd']
x[0:2] = []
x = ['a', 'b', 'c', 'd']
x[0:2] = ['q']
x = ['a', 'b', 'c', 'd']
x[0:2] = 'q'
x = ['a', 'b', 'c', 'd']
x[0:2] = 99
x = ['a', 'b', 'c', 'd']
x[0:2] = [99]

Exercise 7.4:

What does 'a'.join(['b', 'c', 'd']) return? If you have a list of strings, how can you concatenate them in a single statement? Why do you think join is written this way, rather than as ['b', 'c', 'd'].join('a')?

Functions, Libraries, and the File System

Where We Just Were

Defining Functions

Scope

Parameter Passing Rules

Default Parameter Values

Extra Arguments

Functions Are Objects

Creating Modules

The Math Library

The System Library

Times

Working with the File System

Manipulating Pathnames

Knowing Where You Are

Where to Learn More

Exercises

Exercise 8.1:

Write a function that takes two strings called text and fragment as arguments, and returns the number of times fragment appears in the second half of text. Your function must not create a copy of the second half of text. (Hint: read the documentation for string.count.)

Exercise 8.2:

What does the Python keyword global do? What are some reasons not to write code that uses it?

Exercise 8.3:

Consider the following sample of code and its output:

def settings(first, **rest):
    print 'first is', first
    print 'rest is'
    for (name, value) in rest.items():
        print '...', name, value
    print

settings(1)
settings(1, two=2, three="THREE")
first is 1
rest is

first is 1
rest is
... two 2
... three THREE

What does the variable rest do? What does the double asterisk ** in front of its name mean? How does it compare to the example with *extra (with a single asterisk) in the lecture?

Exercise 8.4:

Python allows you to import all the functions and variables in a module at once, making them local name. For example, if the module is called values, and contains a variable called Threshold and a function called limit, then after the statement from values import *, you can then refer directly to Threshold and limit, rather than having to use values.Threshold or values.limit. Explain why this is generally considered a bad thing to do, even though it reduces the amount programmers have to type.

Exercise 8.5:

sys.stdin, sys.stdout, and sys.stderr are variables, which means that you can assign to them. For example, if you want to change where print sends its output, you can do this:

import sys

print 'this goes to stdout'
temp = sys.stdout
sys.stdout = open('temporary.txt', 'w')
print 'this goes to temporary.txt'
sys.stdout = temp

Do you think this is a good programming practice? When and why do you think its use might be justified?

Exercise 8.6:

os.stat(path) returns an object whose members describe various properties of the file or directory identified by path. Using this, write a function that will determine whether or not a file is more than one year old.

Exercise 8.7:

Write a Python program that takes as its arguments two years (such as 1997 and 2007), prints out the number of days between the 15th of each month from January of the first year until December of the last year.

Exercise 8.8:

Write a simple version of which in Python. Your program should check each directory on the caller's path (in order) to find an executable program that has the name given to it on the command line.

Testing Basics

Motivation

Terminology

Example: Rectangle Overlap

General Rules for Unit Tests

A Simple Testing Framework

Choosing Test Cases

Dictionaries and Error Handling

Motivation

String Formatting

Dictionaries

The Mechanics

Dictionary Methods

Counting Frequency

Formatting Strings with Dictionaries

Catching Errors

Exception Objects

Functions and Exceptions

Raising Exceptions

Assertions

Running Other Programs

Exercises

Exercise 10.1:

Suppose you wanted to sort entries with the same frequency alphabetically. What changes would you have to make to compareByFrequency?

Debugging

What It Is

What's Wrong with Print Statements

Symbolic Debuggers

Running in a Debugger

Basic Operations

How Debuggers Work

Advanced Operations

Rule 0: Get It Right the First Time

Rule 1: What Is It Supposed to Do?

Rule 2: Is It Plugged In?

Rule 3: Make It Fail

Rule 4: Divide and Conquer

Rule 5: Change One Thing at a Time, For a Reason

Rule 6: Write It Down

Rule 7: Be Humble

Summary

Object-Oriented Programming

Motivation

A Naked Class

Methods

Defining a Queue

Special Methods

Inheritance

Polymorphism

The Substitution Principle

Class Members

Overloading Operators

Structured Unit Testing

A Unit Testing Framework

Mechanics

Testing a Function

Eliminating Redundancy

Testing for Failure

Testing I/O

Testing With Classes

Test-Driven Development

Exercises

Exercise 13.1:

Python has another unit testing module called doctest. It searches files for sections of text that look like interactive Python sessions, then re-executes those sections and checks the results. A typical use is shown below.

def ave(values):
    '''Calculate an average value, or 0.0 if 'values' is empty.
    >>> ave([])
    0.0
    >>> ave([3])
    3.0
    >>> ave([15, -1.0])
    7.0
    '''

    sum = 0.0
    for v in values:
        sum += v
    return sum / float(max(1, len(values)))

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Convert a handful of the tests you have written for other questions in this lecture to use doctest. Do you prefer it to unittest? Why or why not? Do you think doctest makes it easier to test small problems? Large ones? Would it be possible to write something similar for C, Java, Fortran, or Mathematica?

Automated Builds

How Do You Rebuild A Program?

Automate, Automate, Automate

Our Example

Hello, Make

Multiple Targets

Phony Targets

Automatic Variables

Pattern Rules

Dependencies

Defining Macros

Analysis

Exercises

Exercise 14.1:

How can you stop Make from removing intermediate files automatically when it finishes processing?

Exercise 14.2:

Make gets definitions from environment variables, command-line parameters, and explicit definitions in Makefiles. What order does it check these in?

Coding Style and Reading Code

Introduction

Why Read Code?

Seven Plus or Minus

What Does This Function Do?

Naming

Idioms

Style Tools

What About Documentation?

Traceability

Executable Documentation

Active Reading

Summary

Watching Programs Run

Turing's Great Insight

Faking Objects

How Other Languages Do It

Runtime Tricks

Coverage

Profiling

Summary

Exercises

Exercise 16.1:

What percentage of your code is tests? Is tested?

Exercise 16.2:

Can you honestly say that you write tests before code? Find out how many tests currently pass or fail with a single command? Identify the tests associated with a bug? Tell if your code meets the team's standards?

Exercise 16.3:

Can you find out which functions use the most CPU time? How long threads spend blocked on I/O? Who allocates memory, where, for what? How accurate these numbers are? How today's profile differs from last month's? How the profile differs across machines?

Regular Expressions

Introduction

A Simple Example

Anchoring

Escape Sequences

Extracting Matches

Compiling

Using REs in Other Languages

But Wait, There's More

Exercises

Exercise 17.1:

By default, regular expression matches are greedy: the first term in the RE matches as much as it can, then the second part, and so on. As a result, if you apply the RE ⌈X(.*)X(.*)⌋ to the string "XaX and XbX", the first group will contain "aX and Xb", and the second group will be empty.

It's also possible to make REs match reluctantly, i.e., to have the parts match as little as possible, rather than as much. Find out how to do this, and then modify the RE in the previous paragraph so that the first group winds up containing "a", and the second group " and XbX".

Basic XML and XHTML

Overview

History

Formatting Rules

XHTML

Attributes

More XHTML Tags

Connecting to Other Data

Accessibility

The Document Object Model

The Basics

Creating a Tree

Walking a Tree

Modifying the Tree

Summary

A Mini-Project

Eating Your Own Cooking

Checking for Tabs

Running Tools

Checking for Printable Characters

Checking Glossary Entries

Checking Cross-References

Summary

Exercises

Exercise 19.1:

What does getopt do when it encounters an argument it doesn't recognize? Write a short program that demonstrates this behavior, that can be run on its own without the user passing in any command-line arguments.

Binary Data

Isn't It All 'Binary'?

How Numbers Are Stored

Bitwise Operators

Shifting

Floating Point