Home arrow What's News arrow Got Data? Will Mine ...
Got Data? Will Mine ... PDF Print E-mail
At the recently concluded ACM KDD 2006 conference in Philadelphia, PA, Virginia Tech faculty and students presented four papers on data mining research, from theoretical work on partial orders to applications in information retrieval and bioinformatics. KDD is the premier international conference on knowledge discovery and data mining with an overall conference acceptance rate this year of less than 23% and equally selective workshops focused on topical issues. Attending were Naren Ramakrishnan and T.M. Murali along with Ph.D. students Deept Kumar and Satish Tadepalli.
VT attendees at KDD2006
The details of the four papers are:
  • Deept Kumar, Naren Ramakrishnan, Richard F. Helm, and Malcolm Potts, "Algorithms for Storytelling" (Main conference)
  • Gregory Grothaus, Adeel Mufti, and T.M. Murali, "Automatic Layout and Visualization of Biclusters" (Workshop on data mining in bioinformatics)
  • Lizhuang Zhao, Mohammed J. Zaki, and Naren Ramakrishnan, "BLOSOM: A Framework for Mining Boolean Expressions" (Main conference)
  • Proceso L. Fernandez, Lenwood S. Heath, Naren Ramakrishnan, and John Paul C. Vergara, "Reconstructing Partial Orders from Linear Extensions" (Workshop on network reconstruction from dynamic data)
The publication by Kumar and Ramakrishnan, in collaboration with VT biochemistry colleagues Rich Helm and Malcolm Potts formulates a new data mining problem called "story telling" with applications to computational linguistics, bioinformatics, and information retrieval. Storytelling is not unlike the word game where we are given two words, e.g., PURE and WOOL, and we must morph one into the other, slowly but meaningfully (PURE -> PORE -> POLE -> POLL -> POOL -> WOOL).

Data mining algorithms often produce a substantial amount of patterns and novel algorithms to visualize and summarize these patterns are required. The publication by Murali and his students Grothaus and Mufti studies algorithms for compactly laying out "biclusters" which are patterns of co-occurrence between two classes of entities, e.g., days and weather conditions, genes and their expression, companies and their stock values. Ths paper has been invited to appear in a special issue of Algorithms for Molecular Biology devoted to data mining in biology.

The BLOSOM paper is in collaboration with RPI colleagues Mohammed Zaki and Lizhuang Zhao. It generalizes the patterns traditionally studied in the KDD community to cover all possible boolean expressions. One of the applications studied in this paper is to reconstruct transcriptional networks of gene expression which can be represented as (complex) boolean expressions.

The "reconstructing partial orders" paper is in collaboration with Professor Lenwood Heath , VT alumnus John Paul C. Vergara (now a professor at Ateneo de Manila University) and Proceso L. Fernandez (Ph.D. student at the same institution). It focuses on reconstructing order (total or partial) information from sequential data, with applications to neuroscience, paleontology, and systems biology.

Virginia Tech hopes to make an even stronger presence next year and showcase more of the exciting data mining research happening on campus!