In the age of screen shots and data trails, the idea of putting yourself ‘out there’ has gained new meaning, especially as dating apps are increasingly mined for users’ potentially quite personal info. In a new perceived privacy breach, one developer managed to scrape up thousands of daters’ photos so as to teach artificial intelligence some new tricks, and even shared his method for doing it with the world.
Last week, developer Stuart Colianni uploaded a data set representing tens of thousands of ‘scraped’ Tinder profile photos to the machine learning and data science platform Kaggle, which Google recently snapped up. As TechCrunch reported, the facial data set was made available for download as six public domain zip files, comprising over 40,000 photos of the Bay Area’s Tinder population and two sample sets with approximately 500 images of determinedly male and female users each.
Dubbed “People of Tinder,” the data set has since been taken down, but not before it was downloaded hundreds of times, according to TechCrunch. Colianni also posted the code that other developers would need to perform the same photo-grabbing crawl all over again on GitHub, where it remains available as of Tuesday afternoon.
According to its description on GitHub, the scraping program is “a simple script that exploits the Tinder API to allow a person to build a facial data set,” created to give users access to images from “thousands of people within miles of you” for the sake of building “a better, larger facial data set” than the kind Colianni is used to seeing. In his case, the script was specifically used to round up stores of images on which AI might be trained to recognize gender based on a person’s facial features.
He explained on GitHub, “I plan on using the [data set] with TensorFlow’s Inception to try and create a [convolutional neural network] that is capable of distinguishing between men and women.”
The project has already drawn criticism for its perceived violation of users’ privacy, as well as for its execution. As The Next Web pointed out, Tinder reportedly contacted Kaggle to request that the data set be taken down, potentially as a violation of its own privacy terms. The script for the image-scraper has also seen push-back over Colianni’s use of the terms “hoe” and “hoes” to refer to Tinder users it targets.
In response, an update on GitHub noted, “The use of the words
hoes as data structures within the original script was an oversight … [and part of syntax] borrowed from a Tinder auto-liker, which I used as a reference when learning to interact with the Tinder API programmatically. I regret this oversight, and the code has been corrected.” Colianni also wrote that Tinder’s API documentation has been widely available for years, allowing for numerous projects (on GitHub and otherwise) that make use of its accessible, massive base of user profiles.
For those concerned about privacy on Tinder and other social sites, however, the fundamental problems with the program remain. Oliver Keyes, a long-term data scientist, technology ethics commentator, and an incoming PhD at the University of Washington, was one of the first to call for the script’s removal from GitHub, arguing that its destructive capabilities and implications far exceed the threat of merely having one’s poolside photos show up on the wider web.
Keyes explained by email, “The theft of the photos–and that’s what it was, because it happened in breach of both Tinder’s terms of service and without the consent of the subjects–isn’t as bad as it could have been. Were it paired with names, biographies and metadata it would have been far more damaging.”
“That is not to say, however, that there isn’t a very real potential for harm: many classes of individual, be they vulnerable people in societal environments where this kind of dating is verboten, trans or queer people who aren’t out to their families, friends, and colleagues (but are [out] in the context of relationships), or simply people who would in other contexts be subject to shame or harassment for the behavior they choose to exhibit in photographs in a semi-closed space, are at various levels of risk.”
Keyes noted that this kind of privacy reach “isn’t a new problem,” either. Rather, they said, “the combination of easier automated data access and greater enthusiasm for it is making more and more information that people assume to be semi-private, at risk of being publicized.” At the same time, Keyes said, “There’s a tendency in these situations to blame the subjects–to say, ‘Well, it was on the internet, they should have expected this’–which is, frankly, a facile and offensive dodging of responsibility.”
Rather, Keyes said, it’s up to companies to take responsibility for how they protect and handle user data based on their work in two key areas. Firstly, companies need to take the time to “design for evil” when building their systems, Keyes said, by “deliberately gaming out ways that malicious or uncaring people could harm your users with what you have built.” In the case of a semi-public API like Tinder’s that is “obscure to customers” but easy for developers to tap into, a lack of care and consideration in planning the system is apparent, according to Keyes.
“And this isn’t just a data breach thing,” they added. “This lack of care and consideration is also responsible, for example, for online harassment in spaces like Twitter: a failure to consider what happens when users are there for unintended purposes. It’s an industry-wide problem.”
In addition, Keyes said, companies and the tech community alike have a responsibility to educate software developers and data scientists about “the ethical implications of their work,” which the industry as a whole has been slow to do.
“Many data scientists and developers come to the field through unorthodox academic routes which simply have no use for human subjects training around digital data–say, from bioinformatics and physics,” Keyes said. “Many others start off in software engineering, computer science, or data science, areas where universities or boot camps should care about ethical training but all-too-often don’t.”
Across the technology industry, however, Keyes has yet to “see many signs” that company leaders are taking those responsibilities and considerations seriously–meaning consumers shouldn’t expect improvements anytime soon. “There are many brilliant people writing and thinking and advocating on it, but very few CEOs willing to listen,” Keyes said. “I’m sorry to say that I think the Tinder breach is not merely a sequel to the OKCupid one, but a prequel to many, many more.”
Setting aside issues with privacy, Keyes also noted being “extremely concerned by the lack of thought and caution” behind the greater task for which the Tinder Face-Scraper was created.
“Building a ‘genderizer’–particularly one designed to divide the world into male or female–is not an apolitical act, or an act without moral and ethical implications,” Keyes said. “Notwithstanding the reinforcement of the (false) gender binary, such automated systems have pretty big implications for the well-being of transgender, non-binary and genderqueer people if they’re actually implemented–or even just people who don’t present in a stereotypically masculine or feminine way!”
“It’s a tough enough experience being trans or non-conforming without someone building robots that call you out on it,” they added.
According to Tinder, the app currently generates 1.6 billion swipes and 26 million matches per day, has matched users more than 20 billion times in over 190 countries to date, and leads to an impressive (if, apparently, quietly monitored) number of dates each week, or around 1.5 million on average.
Given their scope, it doesn’t seem likely that Tinder’s millions of users would want to be fitted into just two categories based on a breakdown of their facial structure, anyway–nor that they’d ultimately trust a robot, or a website, which does so.
Source: SANS ISC SecNewsFeed @ May 2, 2017 at 03:15PM