On the Communication Complexity of Approximate Pattern Matching
Jakob Nogler (ETH Zurich)
Monday, October 7th 2024, 2-3pm
Abstract:
In the decades-old problem of Pattern Matching with Edits, we are given a length-n string T (the text), a length-m string P (the pattern), and a positive integer k (the threshold), and we are asked to list all substrings of T that are at edit distance at most k from P. We define the one-way communication complexity of this problem as the minimum amount of space needed to encode the answer so that it can be retrieved without accessing the input strings P and T.
The closely related Pattern Matching with Mismatches problem (defined in terms of the Hamming distance instead of the edit distance) is already well understood from the communication complexity perspective: Clifford, Kociumaka, and Porat [SODA 2019] proved that O(n/m * k * log(m/k)) bits are sufficient, and provided a matching lower bound.
We extend these insights to the more complex edit distance setting. We establish an upper bound of O(n/m * k * log^2(m)) bits, achieving optimal communication complexity up to logarithmic factors.
We also provide a useful application of this result: we show how to get near-optimal time quantum algorithms for Pattern Matching with mismatches and edits.
An efficient application requires us to solve an additional problem: Given a system of b substring equations of the form S[x,x')=S[y,y'), we are asked to construct a generic solution string whose characters are equal only when dictated by the equations. While this is known to be possible in O(n+b) time [Gawrychowski, Kociumaka, Radoszewski, Rytter, Waleń; TCS'16], we show that O(b^2) time is sufficient to obtain an O(b)-size representation of S that supports random access and other standard queries in O(log n) time.
Assuming m=O(n), we apply this tool to efficiently construct an O(k)-size representation of strings P^# and T^# obtained from P and T by carefully masking out some characters so that the output to the studied approximate pattern matching problems does not change.
Based on joint work with Tomasz Kociumaka and Philip Wellnitz.