123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377 |
- <HTML>
- <!--
- Copyright (c) Jeremy Siek 2000
-
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)
- -->
- <Head>
- <Title>Sparse Matrix Ordering Example</Title>
- <BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
- ALINK="#ff0000">
- <IMG SRC="../../../boost.png"
- ALT="C++ Boost" width="277" height="86">
- <BR Clear>
- <H1><A NAME="sec:sparse-matrix-ordering"></A>
- Sparse Matrix Ordering
- </H1>
- <P>
- Graph theory was identified as a powerful tool for sparse matrix
- computation when Seymour Parter used undirected graphs to model
- symmetric Gaussian elimination more than 30 years ago [<A HREF="bibliography.html#parter61:_gauss">28</A>].
- Graphs can be used to model symmetric matrices, factorizations and
- algorithms on non-symmetric matrices, such as fill paths in Gaussian
- elimination, strongly connected components in matrix irreducibility,
- bipartite matching, and alternating paths in linear dependence and
- structural singularity. Not only do graphs make it easier to
- understand and analyze sparse matrix algorithms, but they broaden the
- area of manipulating sparse matrices using existing graph algorithms
- and techniques [<A
- HREF="bibliography.html#george93:graphtheory">13</A>]. In this section, we are
- going to illustrate how to use BGL in sparse matrix computation such
- as ordering algorithms. Before we go further into the sparse matrix
- algorithms, let us take a step back and review a few things.
- <P>
- <H2>Graphs and Sparse Matrices</H2>
- <P>
- A graph is fundamentally a way to represent a binary relation between
- objects. The nonzero pattern of a sparse matrix also describes a
- binary relation between unknowns. Therefore the nonzero pattern of a
- sparse matrix of a linear system can be modeled with a graph
- <I>G(V,E)</I>, whose <I>n</I> vertices in <I>V</I> represent the
- <I>n</I> unknowns, and where there is an edge from vertex <I>i</I> to
- vertex <I>j</I> when <i>A<sub>ij</sub></i> is nonzero. Thus, when a
- matrix has a symmetric nonzero pattern, the corresponding graph is
- undirected.
- <P>
- <H2>Sparse Matrix Ordering Algorithms</H2>
- <P>
- The process for solving a sparse symmetric positive definite linear
- system, <i>Ax=b</i>, can be divided into four stages as follows:
- <DL>
- <DT><STRONG>Ordering:</STRONG></DT>
- <DD>Find a permutation <i>P</i> of matrix <i>A</i>,
- </DD>
- <DT><STRONG>Symbolic factorization:</STRONG></DT>
- <DD>Set up a data structure for the Cholesky
- factor <i>L</i> of <i>PAP<sup>T</sup></i>,
- </DD>
- <DT><STRONG>Numerical factorization:</STRONG></DT>
- <DD>Decompose <i>PAP<sup>T</sup></i> into <i>LL<sup>T</sup></i>,
- </DD>
- <DT><STRONG>Triangular system solution:</STRONG></DT>
- <DD>Solve <i>LL<sup>T</sup>Px=Pb</i> for <i>x</i>.
- </DD>
- </DL>
- Because the choice of permutation <i>P</i> will directly determine the
- number of fill-in elements (elements present in the non-zero structure
- of <i>L</i> that are not present in the non-zero structure of
- <i>A</i>), the ordering has a significant impact on the memory and
- computational requirements for the latter stages. However, finding
- the optimal ordering for <i>A</i> (in the sense of minimizing fill-in)
- has been proven to be NP-complete [<a
- href="bibliography.html#garey79:computers-and-intractability">30</a>]
- requiring that heuristics be used for all but simple (or specially
- structured) cases.
- <P>
- A widely used but rather simple ordering algorithm is a variant of the
- Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm.
- This algorithm can be used as a preordering method to improve ordering
- in more sophisticated methods such as minimum degree
- algorithms [<a href="bibliography.html#LIU:MMD">21</a>].
- <P>
- <H3><A NAME="SECTION001032100000000000000">
- Reverse Cuthill-McKee Ordering Algorithm</A>
- </H3>
- The original Cuthill-McKee ordering algorithm is primarily designed to
- reduce the profile of a matrix [<A
- HREF="bibliography.html#george81:__sparse_pos_def">14</A>].
- George discovered that the reverse ordering often turned out to be
- superior to the original ordering in 1971. Here we describe the
- Reverse Cuthill-McKee algorithm in terms of a graph:
- <OL>
- <LI><I>Finding a starting vertex</I>: Determine a starting vertex
- <I>r</I> and assign <i>r</i> to <I>x<sub>1</sub></I>.
- </LI>
- <LI><I>Main part</I>: For <I>i=1,...,N</I>, find all
- the unnumbered neighbors of the vertex <I>x<sub>i</sub></I> and number them
- in increasing order of degree.
- </LI>
- <LI><I>Reversing ordering</I>: The reverse Cuthill-McKee ordering
- is given by <I>y<sub>1</sub>,...,y<sub>N</sub></I> where
- <I>y<sub>i</sub></I> is <I>x<sub>N-i+1</sub></I> for <I>i = 1,...,N</I>.
- </LI>
- </OL>
- In the first step a good starting vertex needs to be determined. The
- study by George and Liu [<A
- HREF="bibliography.html#george81:__sparse_pos_def">14</A>] showed that
- a pair of vertices which are at maximum or near maximum ``distance''
- apart are good ones. They also proposed an algorithm to find such a
- starting vertex in [<A
- HREF="bibliography.html#george81:__sparse_pos_def">14</A>], which we
- show in <A
- HREF="#fig:ggcl_find_starting_vertex">Figure
- 1</A> implemented using the BGL interface.
- <P>
- <P></P>
- <DIV ALIGN="CENTER"><A NAME="fig:ggcl_find_starting_vertex"></A>
- <table border><tr><td>
- <pre>
- namespace boost {
- template <class Graph, class Vertex, class Color, class Degree>
- Vertex
- pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc,
- Color color, Degree degree)
- {
- typename property_traits<Color>::value_type c = get(color, u);
- rcm_queue<Vertex, Degree> Q(degree);
-
- typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end;
- for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
- put(color, *ui, white(c));
- breadth_first_search(G, u, Q, bfs_visitor<>(), color);
- ecc = Q.eccentricity();
- return Q.spouse();
- }
-
- template <class Graph, class Vertex, class Color, class Degree>
- Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) {
- int eccen_r, eccen_x;
- Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d);
- Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d);
- while (eccen_x > eccen_r) {
- r = x;
- eccen_r = eccen_x;
- x = y;
- y = pseudo_peripheral_pair(G, x, eccen_x, c, d);
- }
- return x;
- }
- } // namespace boost
- </pre>
- </td></tr></table>
- <CAPTION ALIGN="BOTTOM">
- <table><tr><td>
- <STRONG>Figure 1:</STRONG>
- The BGL implementation of <TT>find_starting_node</TT>. The key
- part <TT>pseudo_peripheral_pair</TT> is BFS with a custom queue
- type virtually.
- </td></tr></table></CAPTION>
- </DIV>
- <P></P>
- The key part of step one is a custom queue type with BFS as shown in
- <A
- HREF="#fig:ggcl_find_starting_vertex">Figure
- 1</A>. The main Cuthill-McKee algorithm shown in Figure <A
- HREF="#fig:cuthill-mckee">Figure 2</A>
- is a fenced priority queue with a generalized BFS. If
- <TT>inverse_permutation</TT> is a normal iterator, the result stored
- is the inverse permutation of original Cuthill-McKee ordering. On the
- other hand, if <TT>inverse_permutation</TT> is a reverse iterator, the
- result stored is the inverse permutation of reverse Cuthill-McKee
- ordering. Our implementation is quite concise because many components
- from BGL can be reused.
- <P></P>
- <DIV ALIGN="CENTER"><A NAME="fig:cuthill-mckee"></A><A NAME="6261"></A>
- <table border><tr><td>
- <pre>
- template < class Graph, class Vertex, class OutputIterator,
- class Color, class Degree >
- inline void
- cuthill_mckee_ordering(Graph& G,
- Vertex s,
- OutputIterator inverse_permutation,
- Color color, Degree degree)
- {
- typedef typename property_traits<Degree>::value_type DS;
- typename property_traits<Color>::value_type c = get(color, s);
- typedef indirect_cmp<Degree, std::greater<DS> > Compare;
- Compare comp(degree);
- fenced_priority_queue<Vertex, Compare > Q(comp);
- typedef cuthill_mckee_visitor<OutputIterator> CMVisitor;
- CMVisitor cm_visitor(inverse_permutation);
-
- typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end;
- for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
- put(color, *ui, white(c));
- breadth_first_search(G, s, Q, cm_visitor, color);
- }
- </pre>
- </td></tr></table>
- <CAPTION ALIGN="BOTTOM">
- <TABLE>
- <TR><TD> <STRONG>Figure 2:</STRONG> The BGL implementation of
- Cuthill-McKee algoithm. </TD></TR> </TABLE></CAPTION> </DIV><P></P>
- <P>
- <P>
- <H3>Minimum Degree Ordering Algorithm</H3>
- <P>
- The pattern of another category of high-quality ordering algorithms in
- wide use is based on a greedy approach such that the ordering is
- chosen to minimize some quantity at each step of a simulated -step
- symmetric Gaussian elimination process. The algorithms using such an
- approach are typically distinguished by their greedy minimization
- criteria [<a href="bibliography.html#ng-raghavan">34</a>].
- <P>
- In graph terms, the basic ordering process used by most greedy
- algorithms is as follows:
- <OL>
- <LI><EM>Start:</EM> Construct undirected graph <i>G<sup>0</sup></i>
- corresponding to matrix <i>A</i>
-
- </LI>
- <LI><EM>Iterate:</EM> For <i>k = 1, 2, ... </i> until
- <i>G<sup>k</sup> = { }</i> do:
-
- <UL>
- <LI>Choose a vertex <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i>
- according to some criterion
- </LI>
- <LI>Eliminate <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> to form
- <i>G<sup>k+1</sup></i>
-
- </LI>
- </UL>
- </LI>
- </OL>
- The resulting ordering is the sequence of vertices <i>{v<sup>0</sup>,
- v<sup>1</sup>,...}</i> selected by the algorithm.
- <P>
- One of the most important examples of such an algorithm is the
- <I>Minimum Degree</I> algorithm. At each step the minimum degree
- algorithm chooses the vertex with minimum degree in the corresponding
- graph as <i>v<sup>k</sup></i>. A number of enhancements to the basic
- minimum degree algorithm have been developed, such as the use of a
- quotient graph representation, mass elimination, incomplete degree
- update, multiple elimination, and external degree. See [<A
- href="bibliography.html#George:evolution">35</a>] for a historical
- survey of the minimum degree algorithm.
- <P>
- The BGL implementation of the Minimum Degree algorithm closely follows
- the algorithmic descriptions of the one in [<A
- HREF="bibliography.html#LIU:MMD">21</A>]. The implementation presently
- includes the enhancements for mass elimination, incomplete degree
- update, multiple elimination, and external degree.
- <P>
- In particular, we create a graph representation to improve the
- performance of the algorithm. It is based on a templated ``vector of
- vectors.'' The vector container used is an adaptor class built on top
- the STL <TT>std::vector</TT> class. Particular characteristics of this
- adaptor class include the following:
- <UL>
- <LI>Erasing elements does not shrink the associated memory.
- Adding new elements after erasing will not need to allocate
- additional memory.
- </LI>
- <LI>Additional memory is allocated efficiently on demand when new
- elements are added (doubling the capacity every time it is
- increased). This property comes from STL vector.
- </LI>
- </UL>
- <P>
- Note that this representation is similar to that used in Liu's
- implementation, with some important differences due to dynamic memory
- allocation. With the dynamic memory allocation we do not need to
- over-write portions of the graph that have been eliminated,
- allowing for a more efficient graph traversal. More importantly,
- information about the elimination graph is preserved allowing for
- trivial symbolic factorization. Since symbolic factorization can be
- an expensive part of the entire solution process, improving its
- performance can result in significant computational savings.
- <P>
- The overhead of dynamic memory allocation could conceivably compromise
- performance in some cases. However, in practice, memory allocation
- overhead does not contribute significantly to run-time for our
- implementation as shown in [] because it is not
- done very often and the cost gets amortized.
- <P>
- <H2>Proper Self-Avoiding Walk</H2>
- <P>
- The finite element method (FEM) is a flexible and attractive numerical
- approach for solving partial differential equations since it is
- straightforward to handle geometrically complicated domains. However,
- unstructured meshes generated by FEM does not provide an obvious
- labeling (numbering) of the unknowns while it is vital to have it for
- matrix-vector notation of the underlying algebraic equations. Special
- numbering techniques have been developed to optimize memory usage and
- locality of such algorithms. One novel technique is the self-avoiding
- walk [].
- <P>
- If we think a mesh is a graph, a self-avoiding walk(SAW) over an
- arbitrary unstructured two-dimensional mesh is an enumeration of all
- the triangles of the mesh such that two successive triangles shares an
- edge or a vertex. A proper SAW is a SAW where jumping twice over the
- same vertex is forbidden for three consecutive triangles in the walk.
- it can be used to improve parallel efficiency of several irregular
- algorithms, in particular issues related to reducing runtime memory
- access (improving locality) and interprocess communications. The
- reference [] has proved the existence
- of A PSAW for any arbitrary triangular mesh by extending an existing
- PSAW over a small piece of mesh to a larger part. The proof
- effectively provides a set of rules to construct a PSAW.
- <P>
- The algorithm family of constructing a PSAW on a mesh is to start from
- any random triangle of the mesh, choose new triangles sharing an edge
- with the current sub-mesh and extend the existing partial PSAW over
- the new triangles.
- <P>
- Let us define a dual graph of a mesh. Let a triangle in the mesh be a
- vertex and two triangles sharing an edge in the mesh means there is an
- edge between two vertices in the dual graph. By using a dual graph of
- a mesh, one way to implement the algorithm family of constructing a
- PSAW is to reuse BGL algorithms such as BFS and DFS with a customized
- visitor to provide operations during
- traversal. The customized visitor has the function <TT>tree_edge()</TT>
- which is to extend partial PSAW in case by case and function
- <TT>start_vertex()</TT> which is to set up the PSAW for the starting vertex.
- <br>
- <HR>
- <TABLE>
- <TR valign=top>
- <TD nowrap>Copyright © 2000-2001</TD><TD>
- <A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
- </TD></TR></TABLE>
- </BODY>
- </HTML>
|