NSF Center for Science of Information
Date: Friday, February 17
Location: Room 1202, CSE Building
Long-Read Assembly via Overlap Graphs with Maximal Repeat Resolution
(CSE Colloquium Lecture Series)
Abstract: Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce mis-assemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve maximal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding "hinges" to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. Extensive validation on bacterial datasets demonstrates the advantages of HINGE's new approach, and allows us to identify many datasets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches either fragment the assembly or resolve the ambiguity arbitrarily. (This is a joint work with G. Kamath, F. Xia, T. Courtade and D. Tse.)
Bio: Ilan Shomorony is a postdoctoral scholar at UC Berkeley through the NSF Center for Science of Information (CSoI), working with Thomas Courtade and David Tse. He obtained his PhD in Electrical and Computer Engineering at Cornell University in August 2014, and a B.S. in mathematics from the Worcester Polytechnic Institute in 2009. He received the Qualcomm Innovation Fellowship in 2013 and a Simons postdoctoral fellowship in 2014.