August 21, 2017
Computer Scientists Develop Automated Tools to Uncover Advertising by Human Traffickers
Organizers of the 23rd ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) just wrapped up their five-day annual conference, which ended on August 17 in Halifax, Nova Scotia, Canada. One of the highest-profile presentations was a paper with co-authors from UC Berkeley, UC San Diego and New York University, in which they outlined automated approaches to detecting human traffickers based on analysis of their online classified sex advertisements.
At KDD 2017, first-author UC Berkeley Ph.D. student Rebecca S. Portnoff presented the paper, "Backpage and Bitcoin: Uncovering Human Trafficking," which is partly based on her Ph.D. dissertation. Her co-authors include UC San Diego computer-science Ph.D. candidate Danny Yuxing Huang, who is getting ready to defend his doctoral dissertation on Bitcoin and "Using Crypto-Currencies to Track Cyber-Attacks, Speculative Investors and Human Traffickers." Their co-authors include NYU professor Damon McCoy (a former postdoctoral researcher in the CSE department at UC San Diego) and his Ph.D. student Periwinkle Doerfler, as well as research scientist Sadia Afroz at the International Computer Science Institute.
The computer scientists argue that the sheer quantity of online classified sex advertising used by human traffickers "makes manual exploration and analysis unscalable," especially with thousands of new ads posted daily. It's also difficult to separate ads for independent sex workers from ads for a victim of sex trafficking.
The paper notes that "almost no work has been done in building tools that can automatically process and classify these ads." So the team focused on developing and demonstrating automatic techniques for clustering sex ads by owner (on the assumption that individual ads for a single sex worker would be less likely to be placed by a trafficker, whose ads more often offer the services of multiple sex workers).
Over a four-week period, the researchers carried out a study using a single sex-ad website, called Backpage, to demonstrate a proof of concept for automated approaches and how they can be used to find human traffickers. (After the research was done, Backpage discontinued its adult advertising section, though not the ads, which now appear in multiple sections of the website.)
One technique was a machine-learning classifier algorithm using stylometry (the analysis of an individual's writing style to identify authorship) to distinguish between ads posted by the same vs. different authors - with a 96 percent rate of accuracy. They also designed a linking technique that uses publicly available information from the Bitcoin mempool and blockchain in order to determine the timestamp indicating that payment for a sex ad was made to the timestamp of ads appearing in Backpage. If multiple ads linked to a single Bitcoin wallet, there is a strong likelihood that human trafficking might be involved.
Using a sampling of 10,000 real adult ads from Backpage over four weeks, the researchers reported an 89 percent "true-positive" rate for grouping ads by author based on their automated author-identification techniques. The team also reported a high rate of success in linking ads they placed themselves to the corresponding transactions in the Bitcoin blockchain.
Taken together, the automated techniques are believed to be the first to identify adult ads tied to human-trafficking rings by linking the ads to public information from Bitcoin. Said former UC San Diego postdoc Damon McCoy: "There are hundreds of thousands of these ads placed every year, and any technique that can surface commonalities between ads and potentially shed light on the owners is a big boost for those working to curb exploitation."
Ultimately, the study didn't prove that the ads believed to be placed by human traffickers were actually tied to trafficking. Only law enforcement can pursue that linkage, but now they have some new automated tools to point investigators in specific directions.
Cognex Gift Supports CSE Research at Intersection of Deep Learning and 3-D Image Reconstruction
The University of California San Diego has received a $100,000 gift from Cognex Corporation, a leader in machine vision. The gift will allow teams of professors and graduate students at the Jacobs School of Engineering to explore research at the intersection of deep learning and 3-D image reconstruction. Generally speaking, 3-D image reconstruction describes the process of capturing the shape and appearance of real, three-dimensional objects. Applying deep learning principles to 3-D image reconstruction could lead to advances in robotics, medical imaging, autonomous vehicle navigation, telemedicine, and more.
The gift, which contributes to the Campaign for UC San Diego , will support research in the labs of professors Manmohan Chandraker and Ryan Kastner in the Computer Science and Engineering (CSE) department.
With this support, Cognex becomes a member of the Center for Visual Computing , an industry-focused research center directed by CSE professor Ravi Ramamoorthi . The center draws together world-class faculty, students and industry partners working in computer graphics, augmented and virtual reality, computational imaging and computer vision.
"The generous support from Cognex will be instrumental in helping us realize our goal of using deep learning to overcome some of the toughest challenges in 3-D image reconstruction," said CSE professor Manmohan Chandraker. "Our group is addressing these challenges through deep neural networks that incorporate domain knowledge, such as the physics of image formation and functional object parts, to recover 3-D geometric and semantic properties."
Chandraker also noted that Cognex's high-speed, precision 3D scanning systems are ideal for acquiring large-scale data needed for training such deep networks.
Over the past five years, Cognex has provided approximately $500,000 to support professor Kastner's computer science research group, a world leader in hardware acceleration and FPGA (field-programmable gate array) design. This support has helped Kastner's group develop hardware design methodologies for mapping high-throughput image processing applications to FPGAs.
"The funding provided by Cognex has been instrumental in our research efforts to develop smarter image sensors," said Kastner. "The collaboration allows us to develop novel hardware platforms around the latest Cognex image sensors. This provides our research group members invaluable real-world experience, and has created a pipeline of talent and research for Cognex." A number of Cognex Advanced Product Group employees, including Senior Engineering Group Manager Ali Irturk and Senior Engineer Janarbek Matai, are Ph.D. graduates of Kastner's research group.
"Working in the Kastner lab helped me build expertise and confidence as a researcher in embedded systems. In addition, my experiences in the Kastner lab helped me develop a wide range of professional skills that shape who I am today," said CSE alumnus Ali Irturk (Ph.D. '09). Irturk went on to join Cognex as a Principal Engineer, and he has played important roles in forming the Advanced Product Group at Cognex in San Diego and in successfully leading research projects within the group.
"In the Kastner lab, I developed into an expert in embedded systems," said CSE alumnus Janarbek Matai (Ph.D. '15), a leading expert in FPGA design with high-level synthesis (HLS). "My experiences prepared me well for cutting-edge academic research and for solving challenging engineering problems. This expertise helps me today to transform research ideas from concepts to working prototypes."
Matai conducted research in FPGA design of complex algorithms as a Ph.D. student focusing on designing FPGAs with high-level synthesis by exploiting computational patterns and templates. He participated in collaborative work between the Kastner group and Cognex that demonstrated the feasibility of designing FPGAs with high-level languages. Matai was subsequently hired by Cognex to continue his work.
CSE Faculty Receive Another Test-of-Time Award for 16-Year-Old Cyber Security Paper
Denial of service attacks (DoS) have crippled even the likes of Google and Amazon in recent years, topping at a reported 1.1 terabits per second in 2016. But they were a relatively unexplored phenomenon in the year 2000, when three computer scientists from UC San Diego set out to find out how prevalent they were.
Their research and resulting academic paper won the Best Paper award when it was presented at the 10th USENIX Security Symposium in 2001. At the time, the study provided the only publicly available data quantifying DoS activity on the Internet. Now, 16 years later, that same paper - "Inferring Internet Denial-of-Service Activity" - has received the 2017 USENIX Security Test of Time Award.
The award was announced today at the opening session of USENIX Security 2017 in Vancouver, Canada. UC San Diego Computer Science and Engineering (CSE) professor Geoffrey M. Voelker accepted the award on behalf of his co-authors, fellow CSE professor Stefan Savage, and their former Ph.D. student David Moore (C.Phil. '05), who went on to track Internet activity as a project scientist at CAIDA, the Center for Applied Internet Data Analysis. Moore has worked at Google since 2012.
"Test of Time awards are important because they single out research that has a lasting impact despite the rapid change we've witnessed in the computing field," said Dean Tullsen, chair of the CSE department at UC San Diego. "Stefan Savage and Geoff Voelker have done continuously groundbreaking work in cybersecurity for almost two decades, and this award underscores the department's well-deserved reputation for innovation in areas including security as well as systems and networking."
The Test of Time award recognizes outstanding work in security research that has had a lasting impact on the community. To qualify, a paper must have been presented at a USENIX conference at least 10 years earlier.
Denial-of-service attacks disable servers linked to the Internet by overloading them with messages, which usually contain false source addresses ("spoofing") to conceal the location of the attacker. The UC San Diego researchers used key features of those forged signatures to detect and track the attacks. The study found that some attacks flooded their targets with "instantaneous loads" peaking at 600,000 message packets per second - crippling the infrastructure.
"Quantifying the problem was always meant to be the first step toward stopping or at least curbing attacks of this kind," recalled UC San Diego's Savage, who co-directs the Center for Networked Systems (CNS) at UC San Diego. "Our 2001 study helped network engineers understand the nature of recent attacks and to study long-term trends and recurring patterns of attacks."
In the 2001 paper, the co-authors also developed a novel technique to cut through the clutter of Internet data. Called "backscatter analysis," the technique involves observing packets from machines that fall victim to DoS attacks by sending responses to the forged IP addresses. By observing these backscatter packets at a statistically significant portion of IP addresses, backscatter analysis can quantify the scope of a DoS attack.
For their study, Moore, Voelker and Savage looked at three week-long datasets in February 2001 to assess the number, duration and focus of attacks, and to characterize their behavior. In the space of one week, they observed more than 12,000 attacks against more than 5,000 distinct targets, ranging from well-known e-commerce companies such as Amazon, to small foreign Internet Service Providers (ISPs) and - remember, this is 16 years ago - dial-up connections. "At the time," said Voelker, "our work was the only publically available data quantifying denial-of-service activity in the Internet."
The technique produced an estimate of worldwide malicious DoS activity - with approximately 4,000 sites experiencing DoS attacks weekly (as of 2000), and over 12,000 attacks over three weeks.
The 2001 study also was among the first to use the quantitative data to characterize the victims of DoS attacks, which seem almost quaint in retrospect. "Only five percent of attacks targeted infrastructure such as routers and name servers," explained Voelker. "There were a few very large attacks against broadband, and up to 20 percent of attacks were targeted at home machines - evidence that minor DoS attacks were used for personal vendettas."
The CSE department at UC San Diego is no stranger to winning Test of Time awards handed out by USENIX at a few of its major conferences. Already in 2017, CSE professor George Porter shared in the Test of Time award at the USENIX Symposium on Networked Systems Design and Implementation (NSDI) for "X-Trace: A Pervasive Network Tracing Framework", originally published at NSDI 2007. And in 2016, former CSE professor Amin Vahdat and his co-authors received the NSDI Test of Time award for a paper presented at NSDI 2006.
CSE Ph.D. and Faculty Presence at USENIX Security Symposium 2017
The 26th USENIX Security Symposium took place Aug. 16-18 in Vancouver, Canada, and security researchers in the CSE department were well represented on the conference program (in addition to the opening announcement of the USENIX Security Test of Time Award, which went to CSE's Geoffrey Voelker and Stefan Savage, and former CAIDA technical manager David Moore (now at Google); see separate story above). Ph.D. students Craig Disselkoen, David Kohlbrenner, Zhaomo Yang and Brian Johannesmeyer had papers on the program, together with CSE faculty including Leo Porter, Dean Tullsen, Hovav Shacham, Sorin Lerner and research scientist Kirill Levchenko.
The three CSE papers on the program and abstracts for each are included below (with links to full papers):
Prime+Abort: A Timer-Free High-Precision L3 Cache Attack Using Intel TSX, by Craig Disselkoen, David Kohlbrenner, Leo Porter, and Dean Tullsen
Last-Level Cache (LLC) attacks typically exploit timing side channels in hardware, and thus rely heavily on timers for their operation. Many proposed defenses against such side-channel attacks capitalize on this reliance. This paper presents PRIME+ABORT, a new cache attack which bypasses these defenses by not depending on timers for its function. Instead of a timing side channel, PRIME+ABORT leverages the Intel TSX hardware widely available in both server- and consumer-grade processors. This work shows that PRIME+ABORT is not only invulnerable to important classes of defenses, it also outperforms state-of-the-art LLC PRIME+PROBE attacks in both accuracy and efficiency, having a maximum detection speed (in events per second) 3× higher than LLC PRIME+PROBE on Intel's Skylake architecture while producing fewer false positives.
On the Effectiveness of Mitigations against Floating-Point Timing Channels, by David Kohlbrenner and Hovav Shacham
The duration of floating-point instructions is a known timing side channel that has been used to break Same-Origin Policy (SOP) privacy on Mozilla Firefox and the Fuzz differentially private database. Several defenses have been proposed to mitigate these attacks. We present detailed benchmarking of floating-point performance for various operations based on operand values. We identify families of values that induce slow and fast paths beyond the classes (normal, subnormal, etc.) considered in previous work, and note that different processors exhibit different timing behavior. We evaluate the efficacy of the defenses deployed (or not) in Web browsers to floating-point side channel attacks on SVG filters. We find that Google Chrome, Mozilla Firefox, and Apple's Safari have insufficiently addressed the floating-point side channel, and we present attacks for each that extract pixel data cross-origin on most platforms. We evaluate the vector-operation based defensive mechanism proposed at USENIX Security 2016 by Rane, Lin and Tiwari and find that it only reduces, it does not eliminate, the floating-point side channel signal. Together, these measurements and attacks cause us to conclude that floating point is simply too variable to use in a timing security-sensitive context.
Dead Store Elimination (Still) Considered Harmful, by Zhaomo Yang, Brian Johannesmeyer, Sorin Lerner and Kirill Levchenko (and Aalborg University's Anders Trier Olesen)
Dead store elimination is a widely used compiler optimization that reduces code size and improves performance. However, it can also remove seemingly useless memory writes that the programmer intended to clear sensitive data after its last use. Security-savvy developers have long been aware of this phenomenon and have devised ways to prevent the compiler from eliminating these data scrubbing operations.
In this paper, we survey the set of techniques found in the wild that are intended to prevent data-scrubbing operations from being removed during dead store elimination. We evaluated the effectiveness and availability of each technique and found that some fail to protect data-scrubbing writes. We also examined eleven open source security projects to determine whether their specific memory scrubbing function was effective and whether it was used consistently. We found four of the eleven projects using flawed scrubbing techniques that may fail to scrub sensitive data and an additional four projects not using their scrubbing function consistently. We address the problem of dead store elimination removing scrubbing operations with a compiler-based approach by adding a new option to an LLVM-based compiler that retains scrubbing operations. We also synthesized existing techniques to develop a best-of-breed scrubbing function and are making it available to developers.
CSE professor Deian Stefan was also at USENIX Security. He chaired a session on "Side-Channel Countermeasures" on the first day of the conference.
CSE Lecturer Honored for Research Leadership in Scientific and Technical Computing
The ACM Special Interest Group on High-Performance Computing (SIGHPC) gives an award every other year to single out a woman in the field. The ACM SIGHPC Emerging Woman Leader in Technical Computing Award will be presented this November at SC17 to UC San Diego computer scientist Ilkay Altintas, a Lecturer in CSE and Chief Data Science Officer in the San Diego Supercomputer Center (SDSC).
In honoring Altintas, the four-member SIGHPC selection committee - including former CSE professor and SDSC director Francine Berman (now at Rensselaer Polytechnic Institute) - cited Altintas's "research leadership that makes distributed scientific and technical computing applications more reusable, scalable, and reproducible."
Altintas' research focuses on approaches to make distributed computing and workflow systems more programmable, reusable, scalable and reproducible. With over 100 journal articles and conference papers, her work has been applied to computations in bioinformatics, geoinformatics, high-energy physics, multi-scale biomedical science, computational drug discovery, smart manufacturing, hazard management, and smart cities. She is a co-founding developer of Kepler, a widely-used tool that enables research teams to build and run workflows, and to share computational results across a broad range of scientific and engineering disciplines.
Altintas is director of Workflows for Data Science Center of Excellence, and division director of Cyberinfrastructure Research, Education, and Development.
In addition to undergraduate teaching as a lecturer in CSE, Ilkay also teaches in the Master of Advanced Studies program in Data Science in the Jacobs School of Engineering, and she leads courses in data science and big data on the popular online learning platforms Coursera and edX. She is an active mentor for women pursuing careers in HPC and data science.
According to the selection committee, "there are very few awards recognizing individuals in the middle stage of their careers, and none aimed specifically at women." For the purposes of the HPC award, "technical computing" applies to all the fields that are part of high-performance computing, including analytics, visualization, operations, scientific application software, libraries, etc., as well as professionals working with medium- to large-scale systems among the TOP500 HPC machines in the world.
Altintas earned her Ph.D. in computational science at the University of Amsterdam in the Netherlands.
Expanding the Pipeline Via CSE's Early Research Scholars Program
In the August 2017 issue of Computing Research News, published by the Computing Research Association (CRA), CSE associate teaching professor Christine Alvarado has an excellent article about UC San Diego's Early Research Scholars Program and how it engages undergraduates in research. Titled "Expanding the Pipeline", Alvarado notes that "engaging undergraduates in research can be an effective way to increase their confidence, perception of science and sense of belonging." Yet even when there are opportunities, she argues, incoming students may not have the background, training or support to take full advantage of them -- and such issues "are particularly acute for women and other underrepresented groups in computer science as they tend to have less pre-college computer science experience."
In creating CSE's Early Research Scholars Program (ERSP), the department focused on providing early-research opportunities to women and students from racial and cultural groups that are underrepresented in computer science. "Students apply during the spring of their first year at UC San Diego, and they are selected based on their academic performance in their early classes, their motivation to participate in the program, and their understanding of the issues facing students from minority groups in computer science," writes Alvarado. "They are grouped into teams of four and matched with a research mentor, who is a faculty member or researcher in CSE (and who is usually assisted by one or more graduate students or post-docs)."
During the fall, the students take an introduction to CS research course and observe their mentor's research group meetings. They also work in the fall with their research mentor and the ERSP program director to develop a research proposal. During the winter and spring, they carry out their proposed research as a team under the dual mentorship of the ERSP staff and their research mentor. The students then present their work in a poster session at the end of t spring quarter.
Explains Alvarado: "The team-based, dual mentoring structure is key to ERSP's framework."
Now beginning its fourth year, ERSP has 40 students participating in 10 projects each academic year. Of the nearly 100 students in the program to date, 72 were female and 19 from underrepresented racial or ethnic groups. In the program's third year, CSE collaborated with the CRA's Center for Evaluating the Research Pipeline (CERP) to carry out a survey last spring. The study found that "the program is achieving its goals of exposure to research and building confidence and community for at least some participants", and it "helps some of the students see graduate school as a path they could pursue," even if they entered ERSP as first-generation college students.
"Overall, we have found ERSP to be transformative for our department and its students," concludes Alvarado. "It has helped build a culture and practice of early undergraduate research, particularly for students from groups that are underrepresented in computer science."
Read the complete article in the August 2017 issue of Computing Research News.
UPCOMING EVENTS MONDAY, AUGUST 28, 2017
Hardening Cloud and Datacenter Systems against Misconfigurations: Principles and Tool Support
CSE Ph.D. candidate Tianyin Xu will stage the final defense of his doctoral dissertation on Aug. 28. The title of his thesis is "Hardening Cloud and Datacenter Systems against Misconfigurations: Principles and Tool Support." Xu's advisor, CSE professor Yuanyuan Zhou, will chair a faculty committee that includes CSE professors Bill Griswold, Scott Klemmer, Stefan Savage, Geoffrey Voelker and ECE professor Pamela Cosman.
Date: Monday, August 28, 2017
Time: 10:00 a.m.
Location: Room 3217, CSE Building
Abstract: Misconfigurations (a.k.a., configuration errors from a system's standpoint) are among the dominant causes of today's catastrophic system failures that turn down cloud-scale services and affect hundreds of millions of end users. Despite their wide adoption, traditional fault-tolerance and failure-recovery techniques are not effective in dealing with configuration errors, especially in large-scale software systems deployed in cloud and datacenters. To make the matters worse, even the tolerance and recovery mechanisms themselves are often misconfigured in the real world, which impairs the immune system of the entire cloud and datacenters.
This thesis explores two fundamental questions towards the solutions for the inevitable misconfigurations-how to build reliable cloud and datacenter systems in the face of configuration errors; moreover, how to prevent misconfigurations in the first place by better configuration design. The goal is to enable software systems to proactively anticipate and defend against misconfigurations, rather than reacting to their manifestations and consequences.
This thesis presents three key principles of systems design and implementation for hardening cloud and datacenter systems against misconfigurations-anticipating misconfigurations, early detection of configuration errors, and simplicity-oriented configuration design. The thesis demonstrates that applying these principles can effectively defend cloud and datacenter systems against misconfigurations. Moreover, the dissertation presents the corresponding techniques and tool support that can automatically and systematically apply these principles to existing systems software.
The main technical insight is that configurations are essentially used by the systems, while configuration errors are mostly manifested through the faulty execution that uses the configuration values. Therefore, by analyzing the system's code that uses configuration values, one can understand and make use of system-level information of configurations to build the defense against potential errors. This thesis first presents Spex that enables systems to anticipate misconfigurations. Spex automatically infers configuration constraints from a system's source code, and then leverages the constraints to test the system's resilience to misconfigurations and detect error-prone configuration design/handling. On step further, the thesis introduces PCheck to automatically generate checking code which captures configuration errors at the system's initialization time to prevent their late manifestations and the corresponding failure damage.
Going beyond, this thesis presents simplicity-oriented configuration design towards more usable and less error-prone software configuration. The key idea is to apply the user-centric philosophy to design configuration as an interface-configurations are essentially the interface for controlling and customizing system behavior, but have rarely been treated as it is. The thesis shows that configurations in today's systems software can be significantly simplified and effectively navigated, with the understanding of how they are actually used in the field.
Take the CSE Alumni Survey. It only takes a few minutes. We'd really like to hear from you!