Distribution Testing in the Presence of Arbitrarily Dominant Noise with Verification Queries

Chris Ye (UCSD)
Monday, December 1st, 2025, 2-3pm
Abstract:
We study distribution testing without direct access to a source of relevant data, but rather to one where only a tiny fraction is relevant.
To enable this, we introduce the following verification query model.
The goal is to perform a statistical task on distribution p given sample access to a mixture r = λp + (1 - λ)q and the ability to query whether a sample was generated by p or by q. In general, if m samples from p suffice for a task, then O(m/λ) samples and queries always suffice in our model. Are there tasks for which the number of queries can be significantly reduced?
In this talk, I will present a recent work where we study canonical problems in distribution testing (uniformity, identity, and closeness testing) in this model, and obtain matching upper and lower bounds that reveal smooth trade-offs between sample and query complexity, including query-optimal algorithms.
Joint with Hadley Black (UCSD): https://arxiv.org/abs/2509.17269