Interactive systems for code and data demography

(CSE Colloquium Lecture Series)

Elena_glassman.jpg
Elena Glassman

Elena Glassman
Current Affiliation: UC Berkeley EECS
Monday April 9, 2018 @ 11am
Room 1242, CSE Building
Faculty host: Nadia Polikarpova

Interactive systems for code and data demography

Abstract: Programming—the means by which we tell computers what to do—has changed a lot over time. Programming today means programming alongside hundreds of fellow students, thousands of fellow professional software engineers at a particular company, or millions of fellow developers in the open-source community sharing their code online. In this talk, I will describe several interactive systems I have built that exploit the structure within large volumes of peer-produced code to help communities of programmers learn about, reflect on, and teach how to write more correct, readable code.

These systems are made possible by code demography, which I define as statistics, algorithms, and visualizations that help people comprehend and interact with population-level structure and trends in large code corpora. The key to my approach is designing or inferring abstractions that capture critical features and abstract away variation that is irrelevant to the user. Code demography can reveal strategically diverse sets of aligned code examples which, according to theories of human concept learning, help people learn, i.e., construct mental abstractions that generalize well.

I will focus this talk on systems that use program analysis, program synthesis, and visualization to power active data-driven teaching in large programming classrooms and passive knowledge sharing within developer communities. As a result of integrating my systems into the course infrastructure of UC Berkeley’s largest introductory programming class (>1500 students), the teaching staff required to give composition feedback to the entire class have dropped from 35-40 to only 4-5 teachers. I will conclude with my vision for how the techniques of code demography can be generalized to more types of large complex data corpora and enable new data-driven programming paradigms.

Bio: Elena Glassman is an EECS postdoctoral researcher at UC Berkeley, in the Berkeley Institute of Design, funded by both the NSF ExCAPE Expeditions in Computer Augmented Program Engineeringgrant and the Moore/Sloan Data Science Fellowship from the UC Berkeley Institute for Data Science (BIDS). In August 2016, she completed her PhD thesis in EECS at MIT within theCSAIL Usable Programming Group, advised by Rob Miller. For her thesis, she created scalable systems that analyze, visualize, and provide insight into the code of thousands of programming students. She has been a summer research intern at both Google and Microsoft Research, working on systems that help people teach and learn. She recently joined the program committees of ACM CHI, ACM Learning at Scale, and two SPLASH workshops on programming usability. She was awarded the 2003 Intel Foundation Young Scientist Award, both the NSF and NDSEG graduate fellowships, the MIT EECS Oral Master’s Thesis Presentation Award, a Best of CHI Honorable Mention, and the MIT Amar Bose Teaching Fellowship for innovation in teaching methods.

Prior to entering the field of human-computer interaction (HCI), she earned her MEng in the MIT CSAIL Robot Locomotion Group and was a visiting researcher at Stanford in the Stanford Biomimetics and Dextrous Manipulation Lab.