At the recently concluded ACM KDD 2006 conference in Philadelphia, PA,
Virginia Tech faculty and students presented four papers on data mining
research, from theoretical work on partial orders to applications in
information retrieval and bioinformatics. KDD is the premier international
conference on knowledge discovery and data mining with an overall
conference acceptance rate this year of less than 23% and equally
selective workshops focused on topical issues. Attending were
Naren Ramakrishnan and
T.M. Murali along with Ph.D.
students Deept Kumar and Satish Tadepalli.
The details of the four papers are:
- Deept Kumar, Naren Ramakrishnan, Richard F. Helm, and Malcolm Potts,
"Algorithms for Storytelling" (Main conference)
- Gregory Grothaus, Adeel Mufti, and T.M. Murali, "Automatic Layout and
Visualization of Biclusters" (Workshop on data mining in
bioinformatics)
- Lizhuang Zhao, Mohammed J. Zaki, and Naren Ramakrishnan, "BLOSOM: A
Framework for Mining Boolean Expressions" (Main conference)
- Proceso L. Fernandez, Lenwood S. Heath, Naren Ramakrishnan, and John
Paul C. Vergara, "Reconstructing Partial Orders from Linear Extensions"
(Workshop on network reconstruction from dynamic data)
The publication by Kumar and Ramakrishnan, in collaboration with VT
biochemistry colleagues Rich Helm and Malcolm Potts formulates a new data
mining problem called "story telling" with applications to computational
linguistics, bioinformatics, and information retrieval. Storytelling is
not unlike the word game where we are given two words, e.g., PURE and
WOOL, and we must morph one into the other, slowly but meaningfully (PURE
-> PORE -> POLE -> POLL -> POOL -> WOOL).
Data mining algorithms often produce a substantial amount of patterns and
novel algorithms to visualize and summarize these patterns are required.
The publication by Murali and his students Grothaus and Mufti studies
algorithms for compactly laying out "biclusters" which are patterns of
co-occurrence between two classes of entities, e.g., days and weather
conditions, genes and their expression, companies and their stock values.
Ths paper has been invited to appear in a special issue of Algorithms
for Molecular Biology devoted to data mining in biology.
The BLOSOM paper is in collaboration with RPI colleagues Mohammed Zaki and Lizhuang Zhao. It
generalizes the patterns traditionally studied in the KDD community to
cover all possible boolean expressions. One of the applications studied in
this paper is to reconstruct transcriptional networks of gene expression
which can be represented as (complex) boolean expressions.
The "reconstructing partial orders" paper is in collaboration with Professor Lenwood Heath , VT alumnus
John Paul C. Vergara (now a
professor at Ateneo de Manila University) and Proceso L. Fernandez (Ph.D.
student at the same institution). It focuses on reconstructing order
(total or partial) information from sequential data, with applications to
neuroscience, paleontology, and systems biology.
Virginia Tech hopes to make an even stronger presence next year and
showcase more of the exciting data mining research happening on campus! |