
We create an upper and lower triangle view of the original matrix A. We use typedefs to create the specific triangular matrix type from the MTL class ">triangle_view. We then create the U and L matrix objects and pass the matrix A to their constructors. The L and U objects are just handles, so their creation is inexpensive (constant time complexity). Every component and algorithm in MTL is inside the mtl:: namesapce. If you prefer not to have to specify the namespace each time you can do using namespace mtl;.
typedef mtl::row_type<Matrix>::type RowMatrix; typedef mtl::column_type<Matrix>::type ColumnMatrix; typedef mtl::triangle_view<RowMatrix, unit_upper>::type Unit_Upper; typedef mtl::triangle_view<ColumnMatrix, unit_lower>::type Unit_Lower; Unit_Upper U(A); Unit_Lower L(A);We can access the rows and columns from L and U through their TwoD iterators, which are created with the begin() method. Dereferencing the TwoD iterators gives a row or column.
Unit_Upper::iterator rowi = U.begin(); Unit_Lower::iterator columni = L.begin();The first operation in the LU factorization is to find the maximum element in the column. When we search for the maximum element, we want to include the element in the pivot row, so we cannot use the L triangle object defined above since it is unit diagonal. Instead we create another lower triangle D. We can then simply apply the ">max_index() function to the dcolumn object.
typedef mtl::triangle_view<Matrix, lower>::type D(A); Lower::iterator dcoli = D.begin(); jp = mtl::max_index(*dcoli);
The next operation in the LU factorization is to swap the current row with the row that has the maximum element. The following code snippet shows how to call the ">swap() function to interchange the rows. The pivot variable is the row index for the current pivot element. Note how rows from the matrix A are being used as arguments in the swap() algorithm. In MTL, rows and columns are first class vector objects (with some caveats for sparse rows and columns). Similarly, submatrices in MTL are first class matrix objects, and can be used in the same way as a matrix.
if (jp != j) mtl::swap(rows(A)[j], rows(A)[jp]);
The third operation in the LU factorization is to scale the column under the pivot by 1/A(i,i), which can be performed with the ">scale() MTL algorithm, which we apply to the column dereferenced by *columni.
if (j < M  1) mtl::scale(*columni, T(1) / A(j,j));
The final operation in the LU factorization is to update the trailing submatrix according to A' = A' + x y^T A' = A' + x y^{T}. We first create the submatrix A' from matrix A using the sub_matrix() method. The algorithm "> rank_one_update() can then be applied to the submatrix. We do not use *rowi and *columni directly because their indexing is in terms of A and not subA. A future release of MTL will provide a better way of adjusting the indices than this method of copying.
subA = A.sub_matrix(j+1, M, j+1, N); copy(*columni, c); copy(*rowi, r); mtl::rank_one_update(subA, scaled(c, T(1)), r);
The complete implementation of this example LU factorization can be
found in lu.h.
The execution time of many linear algebra operations on modern computer architectures can be decreased dramatically through blocking to increase cache utilization [">6,">7]. In algorithms where repeated matrixvector operations are done ( rank_one_update() for instance), it is beneficial to convert the algorithm to use matrixmatrix operations to introduces more opportunities for blocking.
The LU factorization algorithm can be reformulated to use more matrixmatrix operations [16]. First we split matrix A into four submatrices, using a blocking factor of r (i.e., A_{11} is r x r).
A = [  ] [ A_11  A_12 ] r [] [ A_21  A_22 ] n  r [  ] r n  rThen we formulate A = LU in terms of the blocks.
[ A_11 A_12 ] = [ L_11 0 ] [ U_11 U_12 ] [ A_21 A_22 ] [ L_21 L_22 ] [ 0 U_22 ]
From this we can derive the following equations for the submatrices of A. The matrix on the right shows the values that should occupy Aafter one step of the blocked LU factorization.
We find L_{11}, U_{11}, and L_{21} by applying the pointwise LU factorization to the combined region of A_{11} and A_{21}. We then calculate U_{12} with a triangular solve applied to A_{12}. Finally we calculate A*_{22} with a matrix product of L_{21} and U_{12}. The algorithm is then applied recursively to A*_{22} .
In the implementation of block LU factorization MTL can be used to create a partitioning of the matrix into submatrices. The use of the submatrix objects throughout the algorithm removes the redundant indexing that a programmer would typically have to do to specify the regions for each submatrix for each operation. The ">partition() method of the MTL matrix cuts the original matrix along particular rows and columns. The partitioning results in a new matrix whose entries are the submatrices. In the code below we create the partitioning that corresponds to Figure 1.7 with the matrix object Ap. The partitioning for Figure 1.8 is created with matrix object As.
int row_sep1[] = { j, j + jb }; int col_sep1[] = { j }; Matrix Ap = A.partition(array_to_vec(row_sep1), array_to_vec(col_sep1)); int row_sep2[] = { j, j + jb }; int col_sep2[] = { j, j + jb }; Matrix As = A.partition(array_to_vec(row_sep2), array_to_vec(col_sep2)); triangle_view<Matrix, unit_lower>::type L_11(As(1,1));Figure 1.7 depicts the block factorization part way through the computation. The matrix is divided up for the pointwise factorization step. The region including A_{11} and A_{21} is labeled A_{1}. Since there is pivoting involved, the rows in the regions labeled A_{0} and A_{2} must be swapped according to the pivots used in A_{1}.
The implementation of this step in MTL is very concise. The Ap(1,1) submatrix object is passed to the lu_factorize() algorithm. We then use the multi_swap() function on Ap(1,0) and Ap(1,2) to pivot their rows to match Ap(1,1).
Pvector pivots(jb); lu_factorize(Ap(1,1), pivots); multi_swap(Ap(1,0), pivots); multi_swap(Ap(1,2), pivots);
Once A_{1} has been factored, the A_{12} and A_{22} regions must be updated. The submatrices involved are depicted in Figure 1.8. The A_{12} region needs to be updated with a triangular solve. This version of triangular solve can be found in the matmat namespace of the MTL.
To apply the ">tri_solve() algorithm to L_{11} and A_{12}, we merely call the MTL routine and pass in the L_11 and As(1,2) matrix objects.
mtl::tri_solve(L_11, As(1,2), mtl::left_side());
The last step is to calculate A*_{22} with a matrixmatrix multiply according to A*_{22} < A_{22}  L_{21}U_{12}. To implement this operation we use the As(1,2) and As(2,1) matrix objects, which have been overwritten in the previous steps with U_{12} and L_{21}, and make a call to the mult() function.
mtl::mult(scaled(As(2,1), T(1)), As(1,2), As(2,2));
The complete version of the MTL block LU factorization algorithm is given in blocked_lu.h.