A direct comparison of the fly, human,
salamander, and rat mitochondrial genomes with the fly
mitochondrial genome. Generated using the HPC Dotplot
software.
|
Cheminformatics Projects
Synthetic Programming for Molecular Fingerprint Comparison
Synthetic programming is a new style of programming that uses
small, runtime generated computational kernels to enable scripting
languages to achieve performance on par with or greater than that
of compiled languages. The first example of synthetic programming
is a molecular fingerprint comparison algorithm that uses the
Tanimoto metric to compare 166-bit fingerprints (bit vectors)
using the AltiVec SIMD processor. Implmented in Python, the
synthetic programming environment specializes the Tanimoto
algorithm for the current data at runtime and is 50% faster than
the (non-AltiVec, but -O3) C++ implementation.
Visualizing Chemical Similarity Matrices
-
Synthetic Similarity Matrix Generation

Chris Mueller. For I590: Programming for Science Informatics, April 28, 2006
In this project, we applied the techniques developed for
visualizing similarity matrices using sparse matrix ordering
algorithms to chemical data.
-
Applying Visual Similarity Matrices to Chemical Databases

Chris Mueller. For I571: Chemical Information Technology, December 1, 2005
Bioinformatics Projects
Protein Clustering with Sparse Matrix Ordering Algorithms
This is an ongoing research project started as a class project
for Sun Kim's Machine Learning for Bioinformatics (I532)
course. The goal is to explore techniques for de-novo
generation of protein families based on comparison graphs. Using
the all-pairs pairwise comparison graph
(the nxn sparse-matrix generated by comparing all pairs of
proteins and thresholding to reduce noise), we generate orderings
of the nodes (proteins) in the graph using existing and modified
sparse matrix ordering algorithms.
HPC Dotplot
The Dotplot is a classic bioinformatics tool for directly
comparing genomic sequences. Originally a class project for
Mehmet Dalkilic's Introduction to Bioinformatics (L519)
course, we planned to explore techniques for visualizing full
genome comparisons. However, the
current tools direct comparison tools did not scale to large (>
5M base-pairs) genomes. To generate the data, we developed two
parallel implementations of the dotplot algorithm—a low-level
data-parallel core for individual processors and a coarse-grained
parallel version for multi-processor and cluster environments.
The results were presented at HiCOMB 2005 in Denver, CO and
the 2005 SIAM
Conference on Computational Science and Engineering.
Comparative Genomics -
High-Performance Direct Pairwise Comparison of Large Genomic
Sequences
Chris Mueller, Mehmet Dalkilic and Andrew Lumsdaine. HiCOMB 2005
- HPC Dotplot Presentation (HiCOMB)
- HPC Dotplot Presentation (SIAM)
- HPC Dotplot Presentation (class)
- Bacterial Comparison (Listeria)
- Mitochondrial Comparison rendered with lines (Fly vs (Fly,
Human, Salamander, Rat))
- Mitochondrial Comparison rendered with dots (Fly vs Human))
(Large
file: ~20 MB)
In July 2004, I took the Genomics in
Biomedical Research course at Lawrence Berkeley Labs. These
presentations are introductions to comparative genomics based
on the material presented at the workshop.
- Biology Overview (OmniGraffle)
- Comparative Genomics Presentation (OSL Summer 2004)
- Comparative Genomics Presentation (I532 Fall 2004)
- SALL1 Human-Fugu

