CSE’s Arun Kumar is Helping to Solve ML’s Big Data Problem

Jun 21, 2021
CSE and Halicioğlu Data Science Institute Assistant Professor Arun Kumar is working to make it faster, easier and more economical to handle large datasets.

by Josh Baxt


Big datasets can be incredible assets in business, healthcare, the physical and social sciences and many other disciplines – but the data won’t reveal itself. To isolate useful information and harness its predictive capabilities, researchers and organizations rely on sophisticated data sorting techniques, such as machine learning.

But machine learning and related disciplines face their own headwinds. For example, the model building process can be slow and both labor- and resource-intensive. UC San Diego Computer Science and Engineering and Halicioğlu Data Science Institute Assistant Professor Arun Kumar is working to make it faster, easier and more economical to handle these large datasets.

“I bridge the gap, from an academic standpoint, between computing systems and machine learning,” said Kumar. “I focus on reducing the resource costs of those building processes – what we call model selection – and improving resource efficiency: reducing costs, run times and energy consumption during the model building process.”

Kumar is largely focused on deployment issues, including scalability and usability. Two of his primary projects are Cerebro, which is developing artificial neural networks, and Sorting Hat, which focuses on reducing data preparation times.

Borrowing approaches from the database community, which has been studying these issues for some time, Kumar and colleagues want to apply that knowledge to machine learning to find the most efficient ways to analyze data.

These skills are in great demand from both industry and academia. He is currently working with health and social scientists and discussing future collaborations with computational physicists and neuroscientists. It seems everybody has data they need to crunch.

“The domain science folks have these large-scale data analytics problems, but they can't build the software themselves,” said Kumar. “Off the shelf software is not up to par, so we build the tools they need.”

Measuring Movement

One recent paper, published in the Journal for the Measurement of Physical Behaviour, highlights how Kumar’s work intersects with health sciences. He and Loki Natarajan, UC San Diego professor of Family Medicine and Public Health, and colleagues recently tested different deep learning algorithms to determine which ones are better at measuring physical activity in patients.

“These were cohorts of cancer survivors and obese people who wore accelerometers to measure their movements,” said Kumar. “The deep learning models we built could more accurately log their movements, as well as analyze their exercise patterns and predict longitudinal health outcomes.”

Using Cerebro, the team compared an artificial neural network, called a convolutional neural network (CNN), to other machine learning algorithms, called random forest and logistic regression. In the study, 28 women wore two different motion tracking devices and the team compared the predictive outputs from CNN, random forest and logistic regression.

The CNN did a much better job classifying whether the participants were sitting, standing or walking. These findings give health scientists better tools to measure activity out in the wild.


In addition to his academic partners, Kumar has also received great support from industry. While he’s not taking projects from these companies, their interests can align on initiatives he’s already created. The lab has received support from VMware for Cerebro and Google and Amazon for Sorting Hat. Cerebro is also funded by Kumar’s National Science Foundation CAREER grant.

“The Cerebro project is fundamentally about reducing resource costs and the energy footprint,” said Kumar. “VMWare offers cloud solutions for their enterprise customers, so they were interested in reducing run times and resources costs.”

On top of all that, Kumar was recently honored with the 2021 IEEE TCDE Rising Star Award, which is given to junior researchers “for designing and deploying data analytics systems powered by innovative machine learning and artificial intelligence algorithms.”

“It was a great honor,” said Kumar. “A number of people very graciously supported me. I hope I can continue to reward their faith in my work.”