LU Factorization Example
The algorithm for LU factorization is given in Figure 1.4 and the graphical representation of the algorithm is given in Figure 1.5, as it would look part way through the computation. The black square represents the current pivot element. The horizontal shaded rectangle is a row from the upper triangle of the matrix. The vertical shaded rectangle is a column from the lower triangle of the matrix. The L and U labeled regions are the portions of the matrix that have already been updated by the algorithm. The algorithm has not yet reached the region labeled A'.

 find maximum element in column(i) (starting below the diagonal) swap the row of maximum element with row(i) scale column(i) by 1/A(i,i) let A' = A(i:N,i:N) A' <- A' + x y^T (rank one update)

We create an upper and lower triangle view of the original matrix A. We use typedefs to create the specific triangular matrix type from the MTL class triangle_view. We then create the U and L matrix objects and pass the matrix A to their constructors. The L and U objects are just handles, so their creation is inexpensive (constant time complexity). Every component and algorithm in MTL is inside the mtl:: namesapce. If you prefer not to have to specify the namespace each time you can do using namespace mtl;.

```  typedef mtl::row_type<Matrix>::type RowMatrix;
typedef mtl::column_type<Matrix>::type ColumnMatrix;
typedef mtl::triangle_view<RowMatrix, unit_upper>::type Unit_Upper;
typedef mtl::triangle_view<ColumnMatrix, unit_lower>::type Unit_Lower;
Unit_Upper U(A);
Unit_Lower L(A);
```
We can access the rows and columns from L and U through their TwoD iterators, which are created with the begin() method. Dereferencing the TwoD iterators gives a row or column.

```  Unit_Upper::iterator rowi = U.begin();
Unit_Lower::iterator columni = L.begin();
```
The first operation in the LU factorization is to find the maximum element in the column. When we search for the maximum element, we want to include the element in the pivot row, so we cannot use the L triangle object defined above since it is unit diagonal. Instead we create another lower triangle D. We can then simply apply the max_index() function to the dcolumn object.

```  typedef mtl::triangle_view<Matrix, lower>::type D(A);
Lower::iterator dcoli = D.begin();

jp = mtl::max_index(*dcoli);
```

The next operation in the LU factorization is to swap the current row with the row that has the maximum element. The following code snippet shows how to call the swap() function to interchange the rows. The pivot variable is the row index for the current pivot element. Note how rows from the matrix A are being used as arguments in the swap() algorithm. In MTL, rows and columns are first class vector objects (with some caveats for sparse rows and columns). Similarly, submatrices in MTL are first class matrix objects, and can be used in the same way as a matrix.

```  if (jp != j)
mtl::swap(rows(A)[j], rows(A)[jp]);
```

The third operation in the LU factorization is to scale the column under the pivot by 1/A(i,i), which can be performed with the scale() MTL algorithm, which we apply to the column dereferenced by *columni.

```  if (j < M - 1)
mtl::scale(*columni, T(1) / A(j,j));
```

The final operation in the LU factorization is to update the trailing submatrix according to A' = A' + x y^T A' = A' + x yT. We first create the submatrix A' from matrix A using the sub_matrix() method. The algorithm rank_one_update() can then be applied to the submatrix. We do not use *rowi and *columni directly because their indexing is in terms of A and not subA. A future release of MTL will provide a better way of adjusting the indices than this method of copying.

```  subA = A.sub_matrix(j+1, M, j+1, N);
copy(*columni, c);  copy(*rowi, r);
mtl::rank_one_update(subA, scaled(c, T(-1)), r);
```

The complete implementation of this example LU factorization can be found in lu.h.

## Blocked LU Factorization Example

The execution time of many linear algebra operations on modern computer architectures can be decreased dramatically through blocking to increase cache utilization [6,7]. In algorithms where repeated matrix-vector operations are done ( rank_one_update() for instance), it is beneficial to convert the algorithm to use matrix-matrix operations to introduces more opportunities for blocking.

The LU factorization algorithm can be reformulated to use more matrix-matrix operations [16]. First we split matrix A into four submatrices, using a blocking factor of r (i.e., A11 is r x r).

```A = [       |      ]
[  A_11 | A_12 ] r
[-------|------]
[  A_21 | A_22 ] n - r
[       |      ]
r     n - r
```
Then we formulate A = LU in terms of the blocks.
```[ A_11 A_12 ] = [ L_11  0   ] [ U_11 U_12 ]
[ A_21 A_22 ]   [ L_21 L_22 ] [  0   U_22 ]
```

From this we can derive the following equations for the submatrices of A. The matrix on the right shows the values that should occupy Aafter one step of the blocked LU factorization.

We find L11, U11, and L21 by applying the pointwise LU factorization to the combined region of A11 and A21. We then calculate U12 with a triangular solve applied to A12. Finally we calculate A*22 with a matrix product of L21 and U12. The algorithm is then applied recursively to A*22 .

In the implementation of block LU factorization MTL can be used to create a partitioning of the matrix into submatrices. The use of the submatrix objects throughout the algorithm removes the redundant indexing that a programmer would typically have to do to specify the regions for each submatrix for each operation. The partition() method of the MTL matrix cuts the original matrix along particular rows and columns. The partitioning results in a new matrix whose entries are the submatrices. In the code below we create the partitioning that corresponds to Figure 1.7 with the matrix object Ap. The partitioning for Figure 1.8 is created with matrix object As.

```  int row_sep1[] = { j, j + jb };
int col_sep1[] = { j };
Matrix Ap = A.partition(array_to_vec(row_sep1),
array_to_vec(col_sep1));

int row_sep2[] = { j, j + jb };
int col_sep2[] = { j, j + jb };
Matrix As = A.partition(array_to_vec(row_sep2),
array_to_vec(col_sep2));

triangle_view<Matrix, unit_lower>::type L_11(As(1,1));
```
Figure 1.7 depicts the block factorization part way through the computation. The matrix is divided up for the pointwise factorization step. The region including A11 and A21 is labeled A1. Since there is pivoting involved, the rows in the regions labeled A0 and A2 must be swapped according to the pivots used in A1.

The implementation of this step in MTL is very concise. The Ap(1,1) submatrix object is passed to the lu_factorize() algorithm. We then use the multi_swap() function on Ap(1,0) and Ap(1,2) to pivot their rows to match Ap(1,1).

```  Pvector pivots(jb);
lu_factorize(Ap(1,1), pivots);
multi_swap(Ap(1,0), pivots);
multi_swap(Ap(1,2), pivots);
```

Once A1 has been factored, the A12 and A22 regions must be updated. The submatrices involved are depicted in Figure 1.8. The A12 region needs to be updated with a triangular solve. This version of triangular solve can be found in the matmat namespace of the MTL.

To apply the tri_solve() algorithm to L11 and A12, we merely call the MTL routine and pass in the L_11 and As(1,2) matrix objects.

```  mtl::tri_solve(L_11, As(1,2), mtl::left_side());
```

The last step is to calculate A*22 with a matrix-matrix multiply according to A*22 <- A22 - L21U12. To implement this operation we use the As(1,2) and As(2,1) matrix objects, which have been overwritten in the previous steps with U12 and L21, and make a call to the mult() function.

```  mtl::mult(scaled(As(2,1), T(-1)), As(1,2), As(2,2));
```

The complete version of the MTL block LU factorization algorithm is given in blocked_lu.h.

If you have comments or suggestions, email mtl-devel@osl.iu.edu
 Author: Andrew Lumsdaine, Jeremy Siek Lie-Quan Lee E-Mail: lums@osl.iu.edu, jsiek@osl.iu.edu, llee@osl.iu.edu Created: July 27, 1998 Modified: Mon 27-Aug-2012 EST Copyright ©1997-2015