Chem/Bioinformatics Projects


A direct comparison of the fly, human, salamander, and rat mitochondrial genomes with the fly mitochondrial genome. Generated using the HPC Dotplot software.

Cheminformatics Projects


Synthetic Programming for Molecular Fingerprint Comparison
Synthetic programming is a new style of programming that uses small, runtime generated computational kernels to enable scripting languages to achieve performance on par with or greater than that of compiled languages. The first example of synthetic programming is a molecular fingerprint comparison algorithm that uses the Tanimoto metric to compare 166-bit fingerprints (bit vectors) using the AltiVec SIMD processor. Implmented in Python, the synthetic programming environment specializes the Tanimoto algorithm for the current data at runtime and is 50% faster than the (non-AltiVec, but -O3) C++ implementation.
Visualizing Chemical Similarity Matrices
In this project, we applied the techniques developed for visualizing similarity matrices using sparse matrix ordering algorithms to chemical data.

Bioinformatics Projects


Protein Clustering with Sparse Matrix Ordering Algorithms
This is an ongoing research project started as a class project for Sun Kim's Machine Learning for Bioinformatics (I532) course. The goal is to explore techniques for de-novo generation of protein families based on comparison graphs. Using the all-pairs pairwise comparison graph (the nxn sparse-matrix generated by comparing all pairs of proteins and thresholding to reduce noise), we generate orderings of the nodes (proteins) in the graph using existing and modified sparse matrix ordering algorithms.
  • Sparse Ordering Presentation PPT
  • Sparse Ordering Paper PDF
HPC Dotplot
The Dotplot is a classic bioinformatics tool for directly comparing genomic sequences. Originally a class project for Mehmet Dalkilic's Introduction to Bioinformatics (L519) course, we planned to explore techniques for visualizing full genome comparisons. However, the current tools direct comparison tools did not scale to large (> 5M base-pairs) genomes. To generate the data, we developed two parallel implementations of the dotplot algorithm—a low-level data-parallel core for individual processors and a coarse-grained parallel version for multi-processor and cluster environments. The results were presented at HiCOMB 2005 in Denver, CO and the 2005 SIAM Conference on Computational Science and Engineering.
  • High-Performance Direct Pairwise Comparison of Large Genomic Sequences
    Chris Mueller, Mehmet Dalkilic and Andrew Lumsdaine. HiCOMB 2005 PDF
  • HPC Dotplot Presentation (HiCOMB) PPT PDF
  • HPC Dotplot Presentation (SIAM) PPT PDF
  • HPC Dotplot Presentation (class) PPT PDF
  • Bacterial Comparison (Listeria) PDF
  • Mitochondrial Comparison rendered with lines (Fly vs (Fly, Human, Salamander, Rat)) PDF
  • Mitochondrial Comparison rendered with dots (Fly vs Human)) PDF (Large file: ~20 MB)
Comparative Genomics
In July 2004, I took the Genomics in Biomedical Research course at Lawrence Berkeley Labs. These presentations are introductions to comparative genomics based on the material presented at the workshop.
  • Biology Overview (OmniGraffle) PDF
  • Comparative Genomics Presentation (OSL Summer 2004) PPT
  • Comparative Genomics Presentation (I532 Fall 2004) PPT PDF
  • SALL1 Human-Fugu PDF