Seminar Series - Mining Useful Patterns

Speaker: Prof. Jilles Vreeken from the Department of Mathematics and Computer Science at the University of Antwerp, Belgium.
Date: Friday, December 2, 2011
Time: 11:15am-12:15pm
Location: 2150 Torgersen

Sponsored by the Discovery Analytics Center (http://dac.cs.vt.edu)

Abstract:
In short, this talk will be about how to find interesting patterns, and how
to put these to good use in a variety of data mining tasks, beating the
competition without having to set parameters - all by employing insights
from information theory.

Pattern mining is a very powerful tool in exploratory data analysis. Given
some dataset, the standard question is 'find me all patterns that are
potentially interesting'. In practice, however, you will not want to ask
that question. Typically, for any non-trivial interestingness-threshold,
there  will exist far too many such patterns, orders more than the size of
the dataset. Moreover, most of these results will be redundant, being only
variations of a theme. As such, finding the true nuggets amongst these
becomes like finding the proverbial needle in the haystack.

As such, instead, you should ask 'find me the optimal set of patterns',
where optimal should value small groups, low redundancy, and high-quality
patterns. This is where information theory comes in. It gives us a
principled way to formalise 'optimal' for our goal. Namely, we can use it to
identify those patterns that describe the data best, or, that do the best
job at predicting the data. I will give a quick overview of the algorithms I
have (co-)developed to this end to identify high quality pattern sets on
binary data.

I will give a number of examples on how the resulting patterns can be put to
good use in tasks including classification, one-class classification,
anomaly detection, missing value estimation, concept-drift detection, and
clustering - obtaining top-notch, highly interpretable results, without
having to set any parameters.

Bio:
Jilles Vreeken is a post-doctoral researcher at the University of Antwerp
(Belgium), and works in the Advanced Database Research and Modeling (ADReM)
group of prof Bart Goethals. His research interests include data mining in
general, and pattern mining specifically; employing insights from
information theory for identifying interesting results and how to put these
to good use. He has published over 20 conference and journal papers on data
mining, and has won two best student research paper awards.

In 2009, he defended his PhD thesis 'Making Pattern Mining Useful' at the
University of Utrecht (Netherlands) under supervision of prof Arno Siebes,
and was awarded the 2010 ACM Best Dissertation Runner-Up award. Before, he
obtained his M.Sc. in Computer Science with honours (Cum Laude) in 2004 from
the University of Utrecht, focusing on bio-inspired robotics and artificial
intelligence, and collaborating with and at the lab of prof Rolf Pfeifer at
University of Zurich (Switzerland).