On Monday, August 21 at the 2017 IEEE/ACM Hot Chips Symposium on High Performance Chips (HOTCHIPS), researchers from the University of California San Diego, Cornell University, University of Michigan and UCLA jointly unveiled Celerity, the first open-source, RISC-V tiered accelerator fabric system on chip with a neural network accelerator and 511 RISC-V processor cores.
UC San Diego Computer Science and Engineering (CSE) Ph.D. student Scott Davidson as well as Ph.D. students Khalid Al-Hawaj (from Cornell) and Austin Rovinski (U. Michigan) each gave a 30-minute talk at the HOTCHIPS Conference in Cupertino, CA. Their talk was one of only three academic talks out of a total 29 talks. HOTCHIPS is the premier conference where industry releases details of their latest chips, and the students shared the stage with developers of top chips being released by Intel, Nvidia, Google, AMD, and Qualcomm.
The Celerity SoC is a 5x5-mm 360-million-transistor chip in TSMC's advanced 16-nm technology, split between 5 Linux-capable RISC-V (pronounced risk-five) cores and a NOC-connected manycore of 496 RISC-V cores, plus a binarized neural network accelerator running at 625 MHz..
"The emerging RISC-V open-source software and hardware ecosystem provided a baseline to reduce design, implementation and verification effort," said CSE Professor Michael Taylor (who is joining the University of Washington this fall but will remain an adjunct faculty at UC San Diego). "Then we turned up the awesome factor to create the most powerful RISC-V system in history with 511 cores, a neural network accelerator, and on-chip synthesizable clock and voltage regulation. This is also arguably the most complex chip ever created in academia."
Most of the work of designing the Celerity SoC was done by a team of first- and second-year graduate students, including a large team of 12 from the labs of CSE faculty Taylor and Rajesh Gupta, and Electrical and Computer Engineering (ECE) professors Patrick Mercier and Ian Galton. The student team from UC San Diego included Ph.D. student Scott Davidson, Master’s students Anuj Rao and Paul Gao, visiting researcher Shaolin Xie, staff member Luis Vega, postdoctoral researcher Chun Zhao, former visiting graduate student Ningxiao Sun, and remote collaborator from India's IIT Rourkee, Bandhav Veluri (all in Taylor’s Bespoke Systems Group); Ph.D. student Atieh Lotfi (from Gupta’s lab); ECE grad student Julian Puscar (M.S. ’17) from Galton’s lab; and ECE Ph.D. students Xiaoyang Wang and Loai Salem from Mercier’s lab. The UC San Diego students worked closely with Ph.D. students advised by Prof. Ronald Dreslinski at U. Michigan and Profs. Chris Batten and Zhiru Zhang at Cornell.
"The RISC-V ecosystem played a critical role in enabling a relatively modest team of junior graduate students to fabricate a complex SoC in just nine months," said CSE Ph.D. student Scott Davidson. "While ultimately a success, we still faced non-trivial challenges that we hope the broader RISC-V community can address in the future."
The student presenters said the Celerity SoC achieves a speedup of 700 to 1,220 times due to their use of specialty and many-core tiers in collaboration.
The Celerity project grew out of the CERTUS initiative funded in 2016 by the Defense Advanced Research Projects Agency (DARPA) Circuit Realization At Faster Timescales (CRAFT) program. CERTUS was awarded the first phase of a $5 million, five-year effort to reduce the time it takes to design an SoC by a factor of ten (i.e., to do it in 16 weeks rather than the approximately 160 weeks it currently takes to design a custom ASIC chip for the Department of Defense). The CERTUS project focuses on high-performance SoCs that integrate one or more IP blocks.
In keeping with that mission, the Celerity SoC was specifically designed to contain an array of processing cores based on RISC-V technology to speed up the design process. The team leveraged not only the RISC-V instruction set, but also its software stack, the Rocket Linux-capable processor and memory system generators from Berkeley, as well as verification suite, and system-level hardware infrastructure.
The team taped-out the Celerity chip in April, barely nine months after starting work on the prototype, at an overall cost of approximately $1.3 million (small by comparison with advanced chips developed by industry). Theoretically, the SoC was designed primarily for use in autonomous vehicles, where the neural network accelerator can be critical to processing real-time sensor data in order to make split-second decisions to avoid a collision or other safety challenge.
RISC-V is a new instruction-set architecture (ISA) to support computer architecture research and education. Originally developed by computer scientists at UC Berkeley, RISC-V is fast becoming a standard open architecture for industry implementations under the governance of the RISC-V Foundation, a nonprofit corporation controlled by its members to drive adoption of the ISA (including several RISC-V-based implementations showcased at HOTCHIPS 2017).
"This is a team that worked like a charm," said Gupta in a Facebook post about the demo at HOTCHIPS. He went on to urge potential employers to "look for these students when they graduate. Each one is special."
The team expects first silicon of the Celerity in September, and they will present their results in an academic venue with a conference paper at the first Workshop on Computer Architecture Research with RISC-V (CARRV 2017), set for October 14, 2017 (and co-located with IEEE MICRO this year in Boston, MA).