Seminar Series - Data Mining Meets HCI: Making Sense of Large Graphs
Speaker: Polo Chau, Machine Learning Department, Carnegie Mellon University
Date: Monday, April 2, 2012
Location: 110 McBryde
We have entered the era of big data. Datasets surpassing terabytes now arise in science, government and enterprises. Yet, making sense of these data remains a fundamental challenge. I work in Data Mining and Human-Computer Interaction (HCI), and I combine the best from both worlds to create tools that help people make sense of graphs with billions of nodes and edges. I present my work in three interrelated topics: (1) Attention Routing: I introduce this idea, based on anomaly detection, that automatically draws people's attention to interesting parts of the graph. I describe two examples: the Polonium technology unearths malware from 37 billion machine-file relationships; the NetProbe system fingers bad guys who commit auction fraud. (2) Mixed-Initiative Graph Sensemaking: I describe the Apolo system that combines machine inference and visualization to guide the user to explore large graphs. The user gives examples of relevant nodes, and Apolo recommends which areas the user may want to see next. In a user study, Apolo helped participants find significantly more relevant articles than Google Scholar. (3) Scaling Up: I show how we may enable interactive analytics of large graphs with a hybrid architecture that combines parallel computation, visualization, and interaction.
Duen Horng "Polo" Chau is a Ph.D. candidate in the Machine Learning Department at Carnegie Mellon University. He received a Masters in Human-Computer Interaction (HCI) from Carnegie Mellon. Polo is working to bridge the fields of Data Mining and HCI for big data. Polo solves large-scale, real world problems that make impact to society. His NetProbe auction fraud detection system made national headlines (WSJ, CNN, etc.). His Polonium system (with Symantec) protects 120M people from malware. Polo is the only two-time Symantec fellow. He received a Yahoo! Key Scientific Challenges Award. He contributes to the PEGASUS peta-scale graph mining that won an Open Source Software World Challenge Silver Award. Polo is also an award-winning designer. He designed Carnegie Mellon's ID card.