A new National Science Foundation initiative has created a $10 million dollar institute led by computer and data scientists at University of California San Diego that aims to transform the core fundamentals of the rapidly emerging field of Data Science.
Called The Institute for Emerging CORE Methods in Data Science (EnCORE), the institute will be housed in the Department of Computer Science and Engineering (CSE), in collaboration with The Halıcıoğlu Data Science Institute (HDSI), and will tackle a set of important problems in theoretical foundations of Data Science.
UC San Diego team members will work with researchers from three partnering institutions – University of Pennsylvania, University of Texas at Austin and University of California, Los Angeles – to transform four core aspects of data science: complexity of data, optimization, responsible computing, and education and engagement.
EnCORE will join three other NSF-funded institutes in the country dedicated to the exploration of data science through the NSF’s Transdisciplinary Research in Principles of Data Science Phase II (TRIPODS) program.
“The NSF TRIPODS Institutes will bring advances in data science theory that improve health care, manufacturing, and many other applications and industries that use data for decision-making,” said NSF Division Director for Electrical, Communications and Cyber Systems Shekhar Bhansali.
UC San Diego Chancellor Pradeep K. Khosla said UC San Diego’s highly collaborative, multidisciplinary community is the perfect environment to launch and develop EnCORE. “We have a long history of successful cross-disciplinary collaboration on and off campus, with renowned research institutions across the nation. UC San Diego is also home to the San Diego Supercomputer Center, the HDSI, and leading researchers in artificial intelligence and machine learning,” Khosla said. ”We have the capacity to house and analyze a wide variety of massive and complex data sets by some of the most brilliant minds of our time, and then share that knowledge with the world.”
Barna Saha, the EnCORE project lead and an associate professor in UC San Diego’s Department of Computer Science and Engineering and HDSI, said: “We envision EnCORE will become a hub of theoretical research in computing and Data Science in Southern California. This kind of national institute was lacking in this region, which has a lot of talent. This will fill a much-needed gap.”
The other UC San Diego faculty members in the institute include professors Kamalika Chaudhuri, and Sanjoy Dasgupta from CSE; Arya Mazumdar, Gal Mishne, and Yusu Wang from HDSI; and Fan Chung Graham from CSE and the Department of Mathematics. Saura Naderi of HDSI will spearhead the outreach activities of the institute.
“Professor Barna Saha has assembled a team of exceptional scholars across UC San Diego and across the nation to explore the underpinnings of data science. This kind of institute, focused on groundbreaking research, innovative education and effective outreach, will be a model of interdisciplinary initiatives for years to come,” said Department of Computer Science and Engineering Chair Sorin Lerner.
CORE Pillars of Data Science
The EnCORE Institute seeks to investigate and transform three research aspects of Data Science:
- C, for Complexities of Data: data the researchers are dealing with is complex, of massive size and noisy. They will investigate what new tools and approaches are needed to address data complexity, including an overhaul of the concepts of algorithms, statistics and machine learning.
- O, for Optimization: a very old and traditional field, it now needs to be data driven, which brings new challenges. Modern data and technology have created a large gulf between theory and practice of optimization. Adaptive methods and human intervention can lead to major advancement in machine learning.
- R, for Responsible Learning: the ethical responsibility of when researchers are dealing with massive data, data with sensitive information and using that data to make decisions needs to be reoriented to adapt to an uncertain world.
“EnCORE represents exactly the kind of talent convergence that is necessary to address the emerging societal need for responsible use of data. As a campus hub for data science, HDSI is proud of a compelling talent pool to work together in advancing the field,” said HDSI founding director Rajesh K. Gupta.
Team members expressed excitement about the opportunity of interdisciplinary research that the institute will provide. They will work together to improve privacy-preserving machine learning and robust learning, and to integrate geometric and topological ideas with algorithms and machine learning methodologies to tame the complexity in modern data. They envision a new era in optimization with the presence of strong statistical and computational components adding new challenges.
“One of the exciting research thrusts at EnCORE is data science for accelerating scientific discoveries in domain sciences,” said Gal Mishne, an assistant professor at HDSI. As part of EnCORE, the team will be developing fast, robust low-distortion visualization tools for real-world data in collaboration with domain experts. In addition, the team will be developing geometric data analysis tools for neuroscience, a field which is undergoing an explosion of data at multiple scales.
From K-12 and Beyond
A distinctive aspect to EnCORE will be the “E,” education and engagement, component.
The institute will engage students at all levels, from K-12 to postdoctoral students, and junior faculty and conduct extensive outreach activities at all of its four sites.
The geographic span of the institute in three regions of the United States will be a benefit as the institute executes its outreach plan, which includes regular workshops, events, hiring of students and postdoctoral students. Online and joint courses between the partner institutions will also be offered.
Activities to reach out to high school, middle school and elementary students in Southern California are also part of the institute’s plan, with the first engagement planned for this summer with the Sweetwater Union High School District to teach students about the foundations of data science.
There will also be mentorship and training opportunities with researchers affiliated with EnCORE, helping to create a pipeline of data scientists and broadening the reach and impact of the field. Additionally, collaboration with industry is being planned.
Mazumdar, an associate professor in the HDSI and an affiliated faculty member in CSE, said the team has already put much thought and effort into developing data science curricula across all levels. “We aim to create a generation of experts while being mindful of the needs of society and recognizing the demands of industry,” he said.
“We have made connections with numerous industry partners, including prominent data science techs and also with local Southern California industries including start-ups, who will be actively engaged with the institute and keep us informed about their needs,” Mazumdar added.
An interdisciplinary, diverse field- and team
Data science has footprints in computer science, mathematics, statistics and engineering. In that spirit, the researchers from the four participating institutions who comprise the core team have diverse and varied backgrounds from four disciplines.
“Data science is a new, and a very interdisciplinary area. To make significant progress in Data Science you need expertise from these diverse disciplines. And it’s very hard to find experts in all these areas under one department,” said Saha. “To make progress in Data Science, you need collaborations from across the disciplines and a range of expertise. I think this institute will provide this opportunity.”
And the institute will further diversity in science, as EnCORE is being spearheaded by women who are leaders in their fields.