Software Carpentry
Exercise 2

Due: 5:00 p.m. EST, Friday 07 October 2005.

Weight: 10% of course grade.

Submit your solutions by adding files to the directory ex02 in your Subversion repository.

Background: your lab has just bought a machine that can run assay experiments in batches. Each time the machine runs, it performs 12 tests on 6 different concentrations of reagants at 4 different temperatures, and writes the results to a file. Each file has four sections; each section has 6 rows, and each row has 12 values, which are floating-point numbers separated by commas. An example of such an output file is in ex02/sample.dat.

Your job is to write a Python program to read in such a file, transpose the sections (so that rows become columns, and columns become rows), and write each transposed section to a separate file to be processed by a Fortran program your supervisor wrote in 1977, and doesn't want to have to rewrite. You will do this in stages.

Question 1

Write a function called readLinesFromFile that takes a filename as its argument, and returns a list of the lines that were in the file, with leading and trailing whitespace stripped off. For example, if the file verysmall.dat contains:

1.1,2.2
3.3,4.4

5.5,6.6
7.7,8.8

then readLinesFromFile('verysmall.dat') should return the list:

[ '1.1,2.2', '3.3,4.4', '', '5.5,6.6', '7.7,8.8' ]

Note that the carriage returns (or newlines and carriage returns, if you're on Windows) have been stripped off. Note also that as a result, the third entry in the list is the empty string.

Put your solution in a file called ex02/reader.py. Including the following lines at the bottom of the file will help you test your function:

if __name__ == '__main__':
    import sys
    inputFile = sys.argv[1]
    lines = readLinesFromFile(inputFile)
    print lines

This page explains what this does, and how it works.

Question 2

Write a function called splitIntoSections that takes a list of lines (of the kind returned by readLinesFromFile) and returns a list of lists, with one sublist for each section of data. For example, given the list:

[ '1.1,2.2', '3.3,4.4', '', '5.5,6.6', '7.7,8.8' ]

splitIntoSections should create the following list of lists:

[
  ['1.1,2.2', '3.3,4.4'], 
  ['5.5,6.6', '7.7,8.8']
]

Note that you don't know in advance how many lines there are in each section, but you do know that each section is supposed to have the same number of lines.

Put splitIntoSections in a file called ex02/splitter.py. Include the following lines at the bottom of ex02/splitter.py to test your function:

if __name__ == '__main__':

    testCases = [
        [
            "One value per line, one line per section, one section",
            ['1.1'],
            [['1.1']]
        ],
        [
            "One value per line, two lines per section, one section",
            ['1.1','2.2'],
            [['1.1', '2.2']]
        ],
        [
            "One value per line, one line per section, two sections",
            ['1.1', '', '2.2'],
            [['1.1'], ['2.2']]
        ]
    ]

    for (name, fixture, expected) in testCases:
        actual = splitIntoSections(fixture)
        if actual == expected:
            print 'pass', name
        else:
            print 'fail', name

Add at least ten more tests to the set shown above. One of your tests should use [] as input (i.e., should test what splitIntoSections does for an empty file). It's up to you to decide what the result should be.

Question 3

Write a function called transpose that takes a list of the lines in a single data section (i.e., one of the sublists produced by the splitIntoSections function you wrote above) and transposes the values, so that rows become columns and columns become rows. The function's output should be a list of strings. For example, given this list as input:

['1.1,2.2,3.3', '4.4,5.5,6.6']

transpose should produce the list:

['1.1,4.4', '2.2,5.5', '3.3,6.6']

Note that every string in the input should contain the same number of values (i.e., every row should have the same number of columns).

Put your solution in a file called ex02/transpose.py. Use the __name__=='__main__' convention to embed at least ten tests at the bottom of the file.

Question 4

Write a program that:

If the input file is named something.dat, then the output files should be named something.1.dat, something.2.dat, and so on. If any of these files already exist, then none of the files should be overwritten; instead, the program should just print an error message to standard error.

Put your solution in ex02/process.py. Do not copy readLinesFromFile, splitIntoSections, or transpose into this file; instead, use the import statement to load them from the files they're already in.

Question 5

Make up an exercise that would be suitable for an introductory assignment on the shell. Put the exercise, your solution, and a brief explanation of what you think students would learn from doing the exercise, in ex02/shell-exercise.txt.

Question 6

Make up an exercise that would be suitable for an introductory assignment on Python (like this one). Put the exercise, your solution, and a brief explanation of what you think students would learn from doing the exercise, in ex02/python-exercise.txt.