By Josh Baxt
With joint appointments in UC San Diego’s Computer Science and Engineering and Mathematics departments, Associate Professor Daniel Kane straddles the line between both disciplines. But for Kane, computer science offers some unique opportunities.
“The main difference I see is that computer science is a much younger field, and there's a lot more low-hanging fruit,” said Kane, who joined UC San Diego in 2014. “In math, many of the good problems have been picked over for a few hundred years. Many computer science questions involve models of computation that people haven’t considered until recently.”
Whether he’s focused on a computer science issue or a more classical math problem, Kane is fascinated by how he can apply logic to unravel challenging issues. He has a variety of centuries-old mathematical tools to advance these efforts.
“There’s this vast web of interconnecting ideas,” said Kane. “And when you're working on a problem, you have to figure out what could be useful. How can I come up with some clever new way to get at this problem that allows me to make progress?”
Diving into Robust Statistics
Over the past few years, much of Kane’s work has been focused on computational statistics, which combines statistical and computational methods to solve problems. Kane describes it as statistics from a computer-sciencey perspective.
Specifically, he’s been focusing on robust statistics. For the most part, accurate predictions rely on accurate data. Too much noise, and too little signal, can skew results. But Kane wants to make good predictions, even when some of the data points are actually incorrect. Robust statistics is designed to handle a certain amount of misleading data and still produce good results.
“Say we want to come up with a robust estimator,” said Kane. “I want it to be the case that even if 1% of my data doesn't satisfy certain assumptions, or is even totally wrong, that algorithm will still produce accurate results.”
Previous algorithms had trouble scaling. While they worked reasonably well on small problems, errors would increase as those problems scaled, ultimately producing unusable results.
“Once the problem size grew beyond 20 dimensions, there was very little hope of solving it accurately and in a reasonable amount of time,” said Kane.
Until recently, researchers had to choose between accurate algorithms that were too slow to be practical and fast algorithms that were more vulnerable to corruptions in small fractions of the data, leading to large errors in the final answers. In 2016, a major breakthrough produced the first algorithms that were both fast and robust.
“A reasonable chunk of my work since then has been taking these basic algorithms and trying to improve them,” said Kane. “I’m trying to generalize them, to make them run with the minimum amount of data in the fastest time, trying to take these algorithms that originally worked for just a small number of problems and trying to understand exactly which problems we can solve robustly.”
Kane sees applications in a variety of fields, including the life sciences, but there’s a catch: mathematicians and biomedical researchers don’t necessarily read the same journals or even speak the same scientific language. He’s working on ways to translate the work so that others can take better advantage of his findings.
“I'm running a project with a group of undergrads through the Early Research Scholars Program (ERSP) to implement some of these computational statistics algorithms,” said Kane, “to demonstrate they work in practice and produce some benchmarks.”