All publications sorted by Books and proceedings |
Overview of many distributed and parallel depth-first search algorithms |
A quantitative comparison of the BSP and LogP models of parallel computation is developed. We concentrate on a variant of LogP that disallows the so-called stalling behavior, although issues surrounding the stalling phenomenon are also explored. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of an asymptotic nature, BSP and (stall-free) LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to its greater simplicity and portability, and slightly greater power. LogP lends itself more naturally to multiuser mode. |
We present distributed algorithms for constructing a depth-first search tree for a communication network which are more efficient than previous methods. Our algorithms require $2|V| âˆ’ 2$ messages and units of time in the worst case, where $|V|$ is the number of sites in the network, and as little as $|V|$ messages and time units in the best case. The actual number of messages and time units depends on the network topology and possibly on the routing chosen. We can interpret this to mean that the number of messages and time units is $|V|(1 + r)$ in the worst case, where $0 \leq r < 1$ and the value of $r$ depends on the topology and the routing. In a best case for the simplest algorithm, for example when the underlying network has a ring topology, $r = 0$ and our basic algorithm requires $|V|$ messages and time units, regardless of routing. We extend the basic algorithm to achieve the same best case bound for other topologies. The worst case bound, which has $r = 1 âˆ’ 2/|V|$, applies if the network topology is a tree. The improvement over the best of previous algorithms is achieved by dynamic backtracking, with a minor increase in message length. |
The sequential depth-first-search algorithm, distributed over processor nodes of a graph, yields a distributed depth-first-search algorithm that uses exactly $2|V|$ messages and $2|V|$ units of time. |
A new distributed depth-first-search algorithm is presented whose communication and time complexities are bounded by $3|E|$ and $2|V|$, respectively. |
This formulation contains an error. See the corrected result in Tsin's {\em Some remarks on distributed depth-first search}. |
This paper presents a parallel algorithm that computes the breadth-first search (BFS) numbering of a directed graph in $O(log^2 n)$ time using $M(n)$ processors on the exclusive-read exclusive-write (EREW) parallel random access machine (PRAM) model, where $M(n)$ denotes the number of processors needed to multiply two $n imes n$ integer matrices over the ring $({\mathbb Z}, +, x)$ in $O(log n)$ time. The best known bound for $M(n)$ is $O(n^2.376)$ (Coppersmith and Winograd, 1987). The algorithm presented in their paper uses fewer processors than the classical algorithm for BFS that employs matrix powering over the semiring (dioid) $({\mathbb N}, min, +)$, using $O(log n)$ time and $O(n^3)$ processors on the concurrent-read concurrent-write (CRCW) model, or using $O(log^2 n)$ time and $n^3log n$ processors on the EREW model. |
In this paper we study the problem of distributed construction of a depth-first-search tree for an asynchronous communication network. First, we point out that any algorithm requires at least $2nâˆ’2$ time units and $2m$ messages in the worst case, where n and m are the number of nodes and the number of edges in the network, respectively. We then provide a modification to a recent algorithm due to Awerbuch (1985), and show that the new algorithm is time-optimal, while requiring less then $4mâˆ’(nâˆ’1)$ messages. |
This paper presents a new distributed Depth-First-Search (DFS) algorithm for an asynchronous communication network, whose communication and time complexities are $O(|E|)$ and $O(|V|)$, respectively. The output of the algorithm is the DFS tree, kept in a distributed fashion. The existing algorithm, due to Cheung (1983), requires $O(|E|)$ both in communication and time complexities. |
This paper concerns the computational complexity of depth-first search. Suppose we are given a rooted graph G with fixed adjacency lists and vertices u, v. We wish to test if u is first visited before v in depth-first search order of G. We show that this problem, for undirected and directed graphs, is complete in deterministic polynomial time with respect to deterministic log-space reductions. This gives strong evidence that depth-first search ordering can be done neither in deterministic space $(log n)^c$ nor in parallel time $(log n)^c$, for any constant $c > 0$. |
This paper presents the Parallel BGL, a generic C++ library for distributed graph computation. Like the sequential Boost Graph Library (BGL) upon which it is based, the Parallel BGL applies the paradigm of generic programming to the domain of graph computations. Emphasizing efficient generic algorithms and the use of concepts to specify the requirements on type parameters, the Parallel BGL also provides flexible supporting data structures such as distributed adjacency lists and external property maps. The generic programming approach simultaneously stresses flexibility and efficiency, resulting in a parallel graph library that can adapt to various data structures and communication models while retaining the efficiency of equivalent hand-coded programs. Performance data for selected algorithms are provided demonstrating the efficiency and scalability of the Parallel BGL. |
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5. |
Deals with parallel transitive closure computations. The sort-based approaches introduced sorts the tuples of the relation into topological order, and the sorted relation is then horizontally partitioned and distributed across several processing nodes of a message passing multiprocessor system. This data partitioning strategy allows the transitive closure computation of the local data fragments to be computed in parallel with no interprocessor communication. The construction of the final result then requires only a small number of joins. Extensive analytical results are included in the paper as well. They show that the proposed techniques leads to a speedup that is essentially linear with the number of processors. Its performance is significantly better than the recently published hashless parallel algorithm |
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.
This document was translated from BibT_{E}X by bibtex2html