Seminar Series - Out-of-Sample Embedding

Speaker: Dr. Michael Trosset, Indiana University
Date: Friday, November 16, 2012
Time: 11:15am-12:30pm
Location: 2150 Torgersen

Various problems in statistics and machine learning necessitate embedding new objects in an existing Euclidean space without disturbing a previously embedded configuration of points. In machine learning, this activity is widely known as out-of-sample embedding.

If embedding is performed by a method that makes use of Cartesian coordinates, then the out-of-sample problem is easy to formulate and the challenges are purely computational. After reviewing some possibilities, I will turn to the less intuitive case in which embedding is performed by classical multidimensional scaling (CMDS), which recovers principal component representations from Euclidean inner products. This is the case that has been emphasized in the machine learning literature.

I will describe a principled formulation of the out-of-sample extension of CMDS, as an unconstrained nonlinear least squares problem. The objective function is a fourth-order polynomial, easily minimized by standard gradient-based methods for numerical optimization. More importantly, this formulation provides deeper insight into what earlier proposals accomplish.

Michael Trosset received his Ph.D. in Statistics from the University of California at Berkeley. His research interests lie in the general areas of computational statistics and statistical machine learning, including especially the analysis of proximity data and the representation of high-dimensional non-Euclidean data structures in low-dimensional Euclidean spaces. He is currently Professor and Chair of the Department of Statistics at Indiana University.