Representation Learning: What Is It and How Do You Teach It?

(CSE Colloqiuium Lecture Series)

Speaker: Clayton Greenberg, Ph.D. Candidate, Saarland University
Date: Monday, May 8, 2017
Time: 11am
Location: Room 1202, CSE Building
Host: CSE Prof. Mohan Paturi (paturi@eng.ucsd.edu)

Representation Learning: What Is It and How Do You Teach It?

Abstract: In this age of Deep Learning, Big Data, and ubiquitous graphics processors, the knowledge frontier is often controlled not by computing power, but by the usefulness of how scientists choose to represent their data.  Bigger and faster computation creates such an opportunity to answer what previously seemed to be unanswerable research questions, but also can be rendered meaningless if the structure of the data is not sufficiently understood.  Natural language is the extreme case of complex-structured data:  one thousand mathematical dimensions still cannot capture all of the kinds of information encoded by a word in its context.  In the first part of this talk, I will present my completed and ongoing work on how computers can learn useful representations of linguistic units, especially in the case in which units at different levels, such as a word and the underlying event it describes, must work together within a speech recognizer, translator, or search engine.  Then, I will share the educational objectives for students of data science inspired by my research, and how, with interactive and innovative teaching, I have trained and will continue to train students to be successful in their scientific pursuits.

Bio:  Clayton Greenberg is a Ph.D. Candidate at the Saarland University Graduate School of Computer Science, where he is advised by Dietrich Klakow.  As an Adjunct Lecturer (Lehrbeauftragter) of the Computer Science, and Language Science and Technology departments, he teaches courses on Methods of Mathematical Analysis, Probability Theory, Syntactic Theory, and Computational Linguistics.  As a Research Staff Member of the Collaborative Research Center on Information Density and Linguistic Encoding, he analyzes cross-level interactions between vector-space representations of linguistic units.  His general research interests include data-driven methods for natural language processing, representation learning, information theory, and statistical analysis of experimental data.  He received his M.Sc. in Language Science and Technology from Saarland University and his A.B. in Linguistics and Computation from Princeton University.

Recent Research Publications
Improving Unsupervised Vector-Space Thematic Fit Evaluation via Role-Filler Prototype Clustering
Sub-Word Similarity-based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modeling