Scientific Computing Spring 2006

Assignments

Assignment 0

Build groups of three people, as interdisciplinary as possible. Send an email to Jacob (jpr@cs.indiana.edu) with a CC to me (pgottsch@osl.iu.edu) which include: It is important to send it as early as possible, we have to set up version control repositories and mailing list for proper working. This is not a real assignment, moreover a prerequisite for the real assignments. And there will be no points for it.

Assignment 1

Due date: Wednesday in a week (02/08/06 23:59:59)

Submission

Your group repository contains the 3 standard subdirectories: trunk, branches, and tags. For the submission we will only use the trunk directory. You can use the branches if you work on different versions in your group. The directory trunk/hw0x shall only contain the answers to the questions. Please remove any temporary file before submission (e.g., after the first assignment hw01 must contain 6 files). When you submit your answers in the final version (before due time) mark this in the commitment, e.g. for assignment 1:

svn commit -m 'homework 1 submission'

Question 1: Algorithmic Complexity

Find a more efficient algorithm to compute Pi than the integration methods presented in lecture 1.

For instance you can search for appropriate power series on the web. Write a short program to see the convergence. As programming languages you can use C, C++, perl or python. For others you have to find an agreement with Jacob. Put your result into fast_pi.c, fast_pi.cpp, fast_pi.pl, or fast_pi.py. How many iterations you need to get 8, 10, 12, and 14 correct digits? (Compare with a constant for instance.) Give a guess how the number of correct digits depends on the number of iterations. To be better than the integration your computation must be faster than O(log(N)). Write this as plain text into fast_pi.analysis.

Question 2: Vector Operations

Write a short C or C++ program that computes the vector addition z= x + y in three different ways for different vector sizes.

Put your complete gcc-compilable program into vector_addition.cpp. It is up to you if you use C arrays or STL vectors of doubles. You are free to use inheritance for more concise implementation. The three calculation versions are

As vector length use 100, 1000, 10000, 100000, and 1000000. Measure the compute time for all three versions on all vector sizes (use compiler optimization). Put the command line of the compilation and the results into vector_addition.time_log. Repeat the computation often enough to have at least two significant digits (you can use finer grained timings than day_time_of_day like MPI_TIME or boost time).

Compute the required time per operation (only the addition not the assignment) for all 15 combinations. Give a short explanation why the timing differs for different sizes and different implementations of mathematically identical operations. Put your explanation in plain text into vector_addition.analysis.

Question 3: Floating Point Accuracy

Extend the example of precise addition of two floating point numbers from the lecture to the addition of three numbers.

The result will be three floating point numbers, where some of them can be zero. Your directory trunk/hw01 already contains a source code precision.cpp with some tests. Complete the code and check if the tests are computed correctly. Be aware that we will add more tests for the grading.

Assignment 2

Due date: Friday in a week (03/03/06 23:59:59)

Question 1: Fastest Matrix Product Wins

Multiply 1000x1000 matrices as fast as possible.

CHANGE !!!!!!!!!!! Say we call our matrices A, B, and C with C = AB. A is initialized so that aij=i+j. B is 2A so that bij=2(i+j). You can use the initialization in blocked_matrix_product_optimized.cpp.

Test: Obviously, the result C is an NxN matrix with

cij=1/3 N (1 - 3i - 3j + 6ij - 3N + 3iN + 3jN + 2N2).

Verify this for three elements. The relative error

|cij,computed-cij,expected|/|cij,expected|

should be less than 0.0001.

You are free to choose any blocking factor you want. More than 6 levels of nestings are allowed. Run this computation for matrices of doubles and floats.

Run 10 repetitions of each computation to get precise enough timing (accumulated time not for each repetition). Write your timing of 5 different implementations (blocking, nesting, ...) into matrix_multiplication.time_log. Start your timing after the matrices are initialized. Call your C or C++ program that contains all 5 different variations of matrix product (e.g., 5 templated functions or 10 non-templated functions (for double and float)) matrix_multiplication.cpp. Write your compile command with all options into matrix_multiplication.compilation. With 5 versions and applying this to double and float you will have ten timings. You can use one of the programs from the class as starting point. Of course you will make a new directory hw02 in trunk. The checkin that you consider as submission shall contain an appropriate comment.

Use molerat for your measurings. Log into burrow.cs.indiana.edu and ssh to molerat or log into molerat.cs.indiana.edu directly.