Organizers of the 23rd ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) just wrapped up their five-day annual conference, which ended on August 17 in Halifax, Nova Scotia, Canada. One of the highest-profile presentations was a paper with co-authors from UC Berkeley, UC San Diego and New York University, in which they outlined automated approaches to detecting human traffickers based on analysis of their online classified sex advertisements.
At KDD 2017, first-author UC Berkeley Ph.D. student Rebecca S. Portnoff presented the paper, "Backpage and Bitcoin: Uncovering Human Trafficking," which is partly based on her Ph.D. dissertation. Her co-authors include UC San Diego computer-science Ph.D. candidate Danny Yuxing Huang, who is getting ready to defend his doctoral dissertation on Bitcoin and "Using Crypto-Currencies to Track Cyber-Attacks, Speculative Investors and Human Traffickers." Their co-authors include NYU professor Damon McCoy (a former postdoctoral researcher in the CSE department at UC San Diego) and his Ph.D. student Periwinkle Doerfler, as well as research scientist Sadia Afroz at the International Computer Science Institute.
The computer scientists argue that the sheer quantity of online classified sex advertising used by human traffickers "makes manual exploration and analysis unscalable," especially with thousands of new ads posted daily. It's also difficult to separate ads for independent sex workers from ads for a victim of sex trafficking.
The paper notes that "almost no work has been done in building tools that can automatically process and classify these ads." So the team focused on developing and demonstrating automatic techniques for clustering sex ads by owner (on the assumption that individual ads for a single sex worker would be less likely to be placed by a trafficker, whose ads more often offer the services of multiple sex workers).
Over a four-week period, the researchers carried out a study using a single sex-ad website, called Backpage, to demonstrate a proof of concept for automated approaches and how they can be used to find human traffickers. (After the research was done, Backpage discontinued its adult advertising section, though not the ads, which now appear in multiple sections of the website.)
One technique was a machine-learning classifier algorithm using stylometry (the analysis of an individual's writing style to identify authorship) to distinguish between ads posted by the same vs. different authors - with a 96 percent rate of accuracy. They also designed a linking technique that uses publicly available information from the Bitcoin mempool and blockchain in order to determine the timestamp indicating that payment for a sex ad was made to the timestamp of ads appearing in Backpage. If multiple ads linked to a single Bitcoin wallet, there is a strong likelihood that human trafficking might be involved.
Using a sampling of 10,000 real adult ads from Backpage over four weeks, the researchers reported an 89 percent "true-positive" rate for grouping ads by author based on their automated author-identification techniques. The team also reported a high rate of success in linking ads they placed themselves to the corresponding transactions in the Bitcoin blockchain.
Taken together, the automated techniques are believed to be the first to identify adult ads tied to human-trafficking rings by linking the ads to public information from Bitcoin. Said former UC San Diego postdoc Damon McCoy: "There are hundreds of thousands of these ads placed every year, and any technique that can surface commonalities between ads and potentially shed light on the owners is a big boost for those working to curb exploitation."
Ultimately, the study didn't prove that the ads believed to be placed by human traffickers were actually tied to trafficking. Only law enforcement can pursue that linkage, but now they have some new automated tools to point investigators in specific directions.