Database Lab Faculty and Students at ICDE 2017

May 3, 2017
CSE Ph.D. students Chunbin Lin (left) and Jianguo Wang with GQFast system overview

The 33rd IEEE International Conference on Data Engineering (ICDE) took place this year in San Diego, with UC San Diego organizing the event. The three-day meeting took place in late April at the Hilton San Diego Resort and Spa in Mission Bay. ICDE speakers explored research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications.

ChunbinLin_JianguoWang400.jpg
Ph.D. students Chunbin Lin (left) and Jianguo
Wang work in the Database Lab under their
advisor, CSE Prof. Yannis Papakonstantinou.

A leading forum for researchers, practitioners, developers, and users, the conference allowed attendees to explore cutting-edge ideas and exchange techniques, tools, and experiences.

CSE Ph.D. students Chunbin Lin and Jianguo Wang and their advisor, CSE Prof. Yannis Papakonstantinou, presented a demonstration system for "Fast Graph Exploration with Context-Aware Autocompletion". Dubbed GQFast, the system is an in-memory SQL Analytics on Graphs for efficiently answering relationship queries - a common type of online analytical processing of queries in graph databases. A tutorial and online demo of the tool are available online here

ICDE2017_GQFast.png
System overview of GQFast tool.

According to Papakonstantinou and his students, GQFast "discovers relevant entities efficiently and uses small space." In addition, a context-aware query completion feature instantly gives a list of suggested queries based on the current context, and the 'type-ahead-search' feature instantly visualizes search results during the query generation period to allow users' interaction.

The same two CSE students, Lin and Wang, also presented research with faculty from other universities on "Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics." The co-authors included Arizona State professor Yasin Silva, and professors from Renmin University of China (Xiaoyong Du and Wei Liu) and Tianjin Polytechnic (Chuitian Rong), also in China.

ICDE2017_Poster.png
Poster presented at ICDE 2017

'Set similarity joins' are key operations in big data analytics, notably for data integration and data cleaning. In the introduction to a research poster presented at ICDE, the team of researchers noted that the process involves finding "similar pairs from two collections of [data] sets." The standard method for performing a similarity join uses MapReduce, but this and other techniques have limitations: generation of many duplicates; a skewness problem; and verification processing that can be expensive.

To address these shortcomings, "we proposed a vertical-partitioning based algorithm, called FS-Join, to support parallel set similarity joins without generating duplicates,"  according to the team. "In addition, it guarantees load balancing in both map and reduce phases." Their research also introduced three new segment-based filtering methods to reduce the number of candidates, and proposed an optimization method by integrating two types of data partitioning - horizontal and vertical - to produce higher scalability.

CSE was also heavily involved in the organization of ICDE this year. Like Prof. Papakonstantinou, fellow CSE Prof. Arun Kumar is a member of CSE's Database Lab as well as the Center for Networked Systems (CNS). Kumar was the only UC San Diego researcher to chair an ICDE session, in his case, one focused on Systems for New Analytics. Kumar kicked off the session announcing the recipient of the Best Paper award. He was also invited to be a panelist on the Ph.D. Symposium. "I spoke about my grad school and job search experience, including my advisor changes, my coming-out process and how these interplayed with my research trajectory," recalled Kumar. "I also talked about how students can cope with mental health-related issues caused by grad school - issues such as stress, anxiety, depression and loneliness."

The main organizers of ICDE 2017 from UC San Diego were drawn from the CSE department's Database Lab and the San Diego Supercomputer Center (SDSC). SDSC research scientist Chaitan Baru was general co-chair of ICDE 2017, while CSE's Papakonstantinou co-chaired the Program Committee. Other CSE personnel involved in the organization of the conference included professor Alin Deutsch, co-chair of the Tutorials committee, and Ilkay Altintas, Chief Data Science Officer at SDSC and a lecturer in CSE, who co-chaired the Workshops committee.
_________________________________
Poster:  Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics, Chuitian Rong, Chunbin Lin, Yasin Silva, Jianguo Wang, Wei Lu and Xiaoyong Du, Proceedings of the International Conference on Data Engineering (ICDE), April 2017, San Diego, CA.

ICDE 2017
GQFast Demo
Research Poster Presented at ICDE 2017