Oral Qualifying Exam Resources
Posters
September 9, 2005, 1:00 PM, 101 Lindley. Passed.
September 29, 2005, 9:30 AM
Reading List: Data Clustering
This reading list is designed to provide an introduction to how
data clustering algorithms are used and implemented, with an
emphasis on life sciences applications and scalable
implementations. Papers marked with a star
are good representatives
of the main themes. If you want a decent overview of the topics,
these are the ones to read.
Goals:
are good representatives
of the main themes. If you want a decent overview of the topics,
these are the ones to read.
Goals:
- Overview of data clustering algorithms
- Applications in Bioinformatics
- High-performance and Scalable Implementation Strategies (with focus on K-Means)
- Data Clustering Overview
-
Data Clustering: A Review
A. K. Jain and M. N. Murty and P. J. Flynn. ACM Computing Surveys, Vol. 31, No. 3, September 1999 -
Survey Of Clustering Data Mining Techniques
Pavel Berkhin.
Accrue Software, San Jose, CA, 2002 -
Clustering in Massive Data Sets
Fionn Murtagh.
Handbook of Massive Data Sets, 501-543, Kluwer academic Publishers, Norwell, MA, USA - Bioinformatics Applications
-
Cluster Analysis for Gene Expression Data: A Survey
Daxin Jiang, Chun Tang, Aidong Zhang IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, November 2004 -
Cluster Analysis and Display of Genome-wide Expression
Patterns
Eisen MB, Spellman PT, Brown PO, Botstein D.
Proc. Natl. Acad. Sci., USA, Vol. 95, pp. 14863-14868, December 1998 -
Large-Scale Clustering of cDNA-Fingerprinting Data
Ralf Herwig, Albert J. Poustka, Christine Muller, Christof Bull, Hans Lehrach, and John O'Brien.
Genome Research, Vol. 9, No. 11, pp. 1093-1105, November 1999 -
Inference from Clustering with Application to
Gene-Expression Microarrays
Edward R. Dougherty, Junior Barrera, Marcel Brun, Seungchan Kim, Roberto M. Cesar, Yidong Chen, Michael Bittner, and Jeffrey M. Trent.
Journal of Computational Biology Vol. 9, No. 1, pp. 105-126, 2002 -
Exploring the conditional coregulation of yeast gene
expression through fuzzy k-means clustering
Audrey P Gasch* and Michael B Eisen*.
Genome Biol. Vol. 3 No. 11 (online 2002 October 10) -
Clustering Files of Chemical Structures Using the Fuzzy
k-Means Clustering Method
Holliday JD, Rodgers SL, Willett P, Chen MY, Mahfouf M, Lawson K, Mullier G.
J. Chem. Inf. Comput. Sci., 44, pp. 894-902, 2004 -
Singular value decomposition for genome-wide expression
data processing and modeling
Orly Alter, Patrick O. Brown, and David Dotstein. Proc. Natl. Acad. Sci., USA, Vol. 97, No. 18 pp. 10101-10106, August 29, 2000 -
Singular value decomposition analysis of protein
sequence alignment score data
F. Fogolari, S. Tessari, H. Molinari. Proteins: Structure, Function, and Genetics, Vol. 46, No. 2, pp. 161-170, December 2001 -
Computation Cluster Validation in Post-genomic Data
Analysis
Handl J, Knowles J, Kell DB.
Bioinformatics Vol. 21, No. 15, , pp. 3201-3212, 2005 - High-performance/Large Data Algorithms
-
Fast and Exact Out-of-Core K-Means Clustering
Anjan Goswami, Ruoming Jin, Gagan Agrawa.
Proc. of International Conference on Data Mining (ICDM), pp. 83-90, Nov. 2004. -
Distributed Data Clustering can be Efficient and Exact
George Foreman and Bin Zhang
SIGKDD Explorations, Vol. 2, No. 2, pp. 34-38, December 2000 -
A Data-Clustering Algorithm on Distributed Memory
Multiprocessors
Inderjit S. Dhillon and Dharmendra S. Modha.
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD, 2000, Springer-Verlag -
Multi-grain Parallel Processing of Data-Clustering on
Programmable Graphics Hardware
Hiroyki Takizawa and Hiroaki Kobayashi
IPSA 2004, in LNCS 3358, pp. 16-27, 2004 -
Scalability for Clustering Algorithms Revisited
Fredrik Farnstrom, James Lewis, Charles Elkan.
SIGKDD Explorations, Vol. 2., No. 1, pp. 51-57 -
Scaling Clustering Algorithms to Large Databases
Paul S. Bradley, Usama M. Fayyad, Cory A. Reina.
Microsoft Research Technical Report, 1998, (also in KDD '98). -
Clustering Binary Data Streams with K-Means
Carlos Ordonez.
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, San Diego, CA, 2003 -
Programming the K-means Clustering Algorithm in SQL
Carlos Ordonez.
Conference on Knowledge Discovery in Data, in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining Seattle WA, USA, 2004 - Changes
- Aug 26: Added Alter, et. al. and Fogolari, et. al. (SVD papers)
- Sept 1: Removed Clare, et. al. (Handl is much more complete)

