Grigory Yaroslavtsev (Theory Seminar)

"Advances in Hierarchical Clustering of Vector Data"
Grigory Yaroslavtsev (Indiana University, Bloomington)
Monday, October 14, 2019, 2:00 pm
EBU3B, Room 4258


Abstract: Compared to the highly successful flat clustering (e.g. k-means), despite its important role and applications in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem in the general metric setting. This was enabled by a breakthrough by Dasgupta, who introduced a formal objective into the study of hierarchical clustering.

 

In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable specifically to high-dimensional vector data, including embedding vectors arising from deep learning. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce a new projection-based approximation algorithm for vector data. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.

 

Based on joint works with Vadapalli (ICML’18) and Charikar, Chatziafratis and Niazadeh (AISTATS’19)