Ex-CSE Faculty, Alumni Unveil Computer Vision Success Story with UCSD Roots

Jun 4, 2015
Serge Belongie

Former CSE students and faculty are making headlines with a computer vision system that had it roots in a research project at UC San Diego. At that time, the Visipedia project was a joint venture between then-CSE Prof. Serge Belongie's lab and the Computer Vision Lab of Pietro Perona at Caltech. When Belongie (at left) moved to Cornell Tech in New York City in January 2014, Visipedia became a Caltech-Cornell Tech collaboration. To complicate matters further, two CSE alumni ended up at Caltech working on the same project.

The newly launched Merlin Bird ID App is based on the Visipedia computer vision system developed by CSE alumni Steve Branson (PhD '12) and Grant Van Horn (BS, MS '12, '14), together with Belongie and Perona. Branson (far right) is now a postdoctoral researcher in Perona's group at Caltech, and Van Horn (near right) is now a Ph.D. student in the same group. 

Against that backdrop, on June 8 the Visipedia project will present its Merlin Bird Photo Identifier at the Computer Vision and Pattern Recognition (CVPR) conference in Boston.  Van Horn is the first author on the paper*, and both he and Branson will deliver their results at CVPR 2015.

The Merlin software can recognize 400 of the most commonly encountered birds in North America, returning the right answer in the top three results about 90% of the time. "Computers can process images much more efficiently than humans -- they can organize, index, and match vast constellations of visual information such as the colors of the feathers and shapes of the bill," says Belongie. "The state-of-the-art in computer vision is rapidly approaching that of human perception, and with a little help from the user, we can close the remaining gap and deliver a surprisingly accurate solution."

The Merlin Bird ID App is already available for both iOS and Android over the Internet (link). Since Merlin is a machine learning-based tool, its ability to recognize birds will improve as more people use it. Since the project began at UCSD and Caltech, a number of papers have shown that keeping humans 'in the loop' is critical; in this case, the individual uploads a bird photo, draws a box around the bird, and clicks on its bill, eye and tail. The user then indicates where and when the picture was taken. From there, it scans data derived from tens of thousands of bird photos (the 'training set') and narrows the search to species found typically at the location and time of year when the photo was taken.

Birds are just one challenge. The Visipedia project wants to create a 'visual encyclopedia' or augmented version of Wikipedia where images would be "first-class citizens alongside text" when searching the Internet or other databases.

*G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona and S. Belongie, Building a Bird Recognition App and Large Scale Dataset with Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection, Computer Vision and Pattern Recognition (CVPR), Boston, 2015.

Download the Merlin Bird ID App.