"Estimating the number of unseen species: How far can one foresee?"
Ananda Theertha Suresh
(UC San Diego)
Monday, January 25th, 2016, 2:00 pm
EBU3B, Room 4258
Population estimation is an important problem in many scientific endeavors ranging from linguistics to databases, from ecology, to genomics. Its most popular formulation, introduced by Fisher, uses n samples X^n to estimate U(X^n, Y^m) the number of hitherto unseen elements that will be observed among m new samples Y^m.
In seminal works, Good and Toulmin constructed an intriguing estimator that approximates U(X^n,Y^m) for all m \leq n, and Efron and Thisted showed empirically that a variation of this estimator approximates U(X^n,Y^m) even for some m > n; however, no theoretical guarantees have been known.
We show that the Efron-Thisted estimator and a class of linear estimators can accurately predict U(X^n,Y^m) till m = O(n log n). We also show that no estimator can approximate U(X^n,Y^m) for m beyond O(n log n).
This is joint work with Alon Orlitsky and Yihong Wu.