"Learning to Hash Robustly, with Guarantees"
Daniel Beaglehole (UCSD)
Monday, October 4th 2021, 2-3pm
Abstract:
The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries, or apply to datasets with an assumed extra structure/model.
We design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. Key to this algorithm is formulating robust NNS as a two player game between a hash player designing optimized hash functions, and a query player designing queries adversarially to these hashes. We show experimentally that our algorithm performs much better than uniform hashing and present a mixture model in which our algorithm has much better guarantees in the worst-case.
Joint work with Alex Andoni at Columbia.