Data Mining and Graph Algorithms


HTML tag usage from 1996-2003. Each line shows the usage profile for an HTML tag. The height of each point on the line represents the percentage of pages that the tag appeared in in a given year. For instance, the three straw-colored lines at the top are the HTML, HEAD, and BODY tags. The teal lines rising up in the middle are TABLE, TR, and TD. Full results from this study will be available in the near future.

Data Mining and Graph Algorithms


Sparse Matrix Ordering Algorithms
Started as a project to generate protein families (see Bioinformatics), this has evolved into a more thorough study of how sparse matrix ordering algorithms can be used to visualize large graphs. In the process, we have introduced a new order algorithm that takes into account domain knowledge in the form of dissimilarity scores. This work have not been published, but the some results from the first paper have been added to the graph mining package Pajek. The preliminary results from two studies are available here:
  • Sparse Ordering Presentation PPT
  • Sparse Ordering Paper PDF
  • Weighted Connected Components Presentation PDF
  • Vertex Reordering Algorithms for Cluster Identification (weighted connected components paper) PDF
Data Mining Algorithm Presentations and Posters
Representations, posters and reports on existing data mining algorithms.
  • Data Clustering Overview Poster PDF
        (formatted for two 11x17 sheet of paper for printing) PDF
  • K-means for Large Data slide PDF
  • Support Vector Machines overview poster PDF
  • OPTICS (density-based clustering algorithm) presentation PDF
  • Fastmap vs. MDS comparison (html)