September 29, 2010
Title: Fast Superpixels and their Use for Spatio-temporal Segmentation
Abstract: Superpixel segmentation refers to over-segmenting the image into small regions, and can be used to decrease the complexity of many high-level vision tasks. While standard segmentation algorithms can often be tuned to extract superpixels, they can cause severe under-segmentation or are computationally expensive. In the first part of this talk, we address these issues by designing a specialized, geometric-flow based, superpixel segmentation method that is fast and limits the amount of under-segmentation. It produces segments that on one hand respect local image boundaries, while on the other hand limit under-segmentation through a compactness constraint. It is very fast, with complexity that is approximately linear in image size, and can be applied to megapixel sized images with high superpixel densities in a matter of minutes. We show qualitative demonstrations of high quality results on several complex images. The Berkeley database is used to quantitatively compare its performance to a number of over-segmentation algorithms, showing that it yields less under-segmentation than algorithms that lack a compactness constraint, while offering a significant speed-up over N-cuts, which does enforce compactness.
In the second part of the talk, we build on our superpixels to extract coherent spatio-temporal components from a video. Spatio-temporal segmentation is an essential task for video analysis. The strong interconnection between finding an object's spatial support and finding its motion characteristics makes the problem particularly challenging. Motivated by closure detection techniques in 2D images, we introduce the concept of spatio-temporal closure. Treating the spatio-temporal volume as a single entity, we extract contiguous "tubes" whose overall surface is supported by strong appearance and motion discontinuities. Formulating our closure cost over a graph of spatio-temporal superpixels, we show how it can be globally minimized using the parametric maxflow framework in an efficient manner. The resulting approach automatically recovers coherent spatio-temporal components, corresponding to objects, object parts, and object unions, providing a good set of multiscale spatio-temporal hypotheses for high-level video analysis.
Biolgraphy: Alex Levinshtein was born in 1981 in the suburbs of Moscow, Russia. In 2000, he immigrated to Toronto, Ontario, where he completed his BSc (2003), MSc (2005) and PhD (2010) at the University of Toronto in the Department of Computer Science, with the latter two degrees being supervised by Sven Dickinson and Cristian Sminchisescu. His main interests lie in the field of computer vision, with focus on perceptual grouping and image segmentation. In addition to his academic experience, he previously worked on computer vision and image processing projects in Tangam Gaming, Cognovision, and Google. Starting September 2010, he became a jointly sponsored post-doctoral fellow at University of Toronto and Philips Healthcare, working on segmentation and shape analysis with application to radiation therapy.