On September 1, the Design Lab at UC San Diego will launch a new project to help teach incoming graduate students how to program in the era of big data. The project is funded by the National Science Foundation (NSF) Innovations in Graduate Education (IGE) program, and the Design Lab project is one of 10 new IGE grants awarded a total of $4.8 million to “pilot, test and validate innovative and potentially transformative ways to teach science, technology, engineering and mathematics (STEM).”
The UC San Diego team will receive approximately $500,000 over three years to develop a new data-science teaching approach via “Augmenting, Piloting and Scaling Computational Notebooks to Train New Graduate Researchers in Data-Centric Programming.”
The project's principal investigator and Design Lab co-founder is James Hollan, a distinguished professor of Cognitive Science with an adjunct appointment in Computer Science and Engineering (CSE). He leads a team including co-PIs Scott Klemmer, Philip Guo and Bradley Voytek. Design Lab co-founder Klemmer has a joint faculty appointment in CSE and Cognitive Science, while Guo and Voytek are professors of Cognitive Science (in Voytek’s case, with a joint appointment in Neuroscience).
“Virtually all graduate STEM training programs are currently confronting challenges to ensure their students have the computational skills required to function in increasingly data-intensive research domains,” observed Hollan in the proposal. “One singularly important challenge in the current era of big data is the growing need to train new graduate students in the programming and data analysis skills needed to be able to manage and exploit large-scale data in virtually every domain.”
The Design Lab team proposed to take the popular concept of introductory “bootcamps” for new grad students, and to scale that approach while exploiting the growing movement of computational notebooks. Specifically, the researchers propose to augment the Jupyter Notebook, a widely used open-source web application, with other pedagogical tools to support training in data-centric programming in a wide range of STEM disciplines.
“This has the potential to improve the efficacy of training graduate students in data-centric programming,” said Klemmer. “But the impact could be much greater in the long run because all of the new capabilities can be harnessed for teaching in other domains, and the open-source nature of the notebooks and tools will ensure that the technology will be widely available via the Web.”
In producing an open-source version of the Jupyter Notebook for teaching data-centric programming, the UC San Diego researchers plan to integrate other tools that have been widely used, particularly for massive open online courses (MOOCs). Co-PI Klemmer helped develop Talkabout and PeerStudio. “Both systems have been used by tens of thousands of students in dozens of MOOCs on the Coursera online education platform over the past four years,” said Klemmer. Indeed, students and other learners taking Klemmer’s widely-watched “Interactive Design” courses on Coursera already have access to the software tools to provide feedback (PeerStudio) and to enable discussion among widely-distributed course participants (Talkabout).
Co-PI Philip Guo developed Python Tutor for tutoring support. It has been available for seven years, and in that time, over 3.5 million people in over 180 countries have used Python Tutor to visualize over 30 million pieces of code, either directly online or via Python Tutor’s integration into MOOCs from edX, Coursera and Udacity.
PI Hollan developed analysis tools – notably Traces and ChronoViz – to support the education of graduate students at scaleBoth are software tools widely used in analyzing video of real-world activity.
“Our team,” noted Hollan, “has deep experience in implementing, deploying and maintaining these tools over extended periods of time.”
Co-PI Bradley Voytek, a professor of Computational Cognitive Science and Neuroscience, has been teaching Introduction to Data Science (COGS 9) since 2014, when he joined the UC San Diego faculty.
He notes that COGS 9 class size has ballooned since then, from 24 students that first quarter, to 280 students in the latest quarter. In spring 2017, Voytek launched his first course for upper division students, Data Science in Practice (COGS 108), with approximately 420 students.
Voytek’s classes have used the Jupyter Notebook, and he will employ iterative versions of the augmented version to assess its utility in university classrooms, in comparison with the augmented notebook’s use in MOOCs and other distributed learning environments. The resulting system will be available online through a GitHub repository. “This will enable it to be widely shared, evolved and tailored to specific discipline requirements,” said Hollan.
According to NSF, all 10 new projects evaluate approaches that could be scaled for use at other institutions nationally. Among the new approaches on the drawing-board: career peer-mentoring, gender-based case studies, faculty and student learning communities, revamped gateway courses, community and family engagement, and digital platforms for real-time feedback.
In addition to UC San Diego, universities receiving IGE grants in the latest round included University of Arizona, University of Chicago, University of Arkansas, Montana State, Ohio State, College of William and Mary, Stony Brook University, SUNY Buffalo and Cal Poly, and Georgia Tech.