As precious as Facebook’s user data is to the company, it has been less than careful in how it uses that data.
The more than 2 billion people that use Facebook every month provide it with information: how old they are, who their friends are, where they live, who they work for, where they went to school, what music they like and more. It’s a prodigious amount of information. And, because of Facebook’s access to that data, it has made an immense sum of money selling ads to advertisers who want to reach not only 18- to 34-year-olds in the US, but 18- to 34-year-olds in the US who are in a new relationship and also in the military.
Facebook can do this because people tell it things like when they’ve entered a new relationship or if they work for the US Army. But people can lie. More problematically, Facebook can’t tell when people are lying.
Point of fact: On Thursday, ProPublica reported that Facebook had approved ads targeted using audience categories that included “Jew hater,” “Hitler did nothing wrong” and “how to burn jews.” The absurdity of the idea of Facebook pooling together people associated with such hateful characteristics for brands to advertise to — so that Facebook could profit — is only overshadowed by the fact that Facebook did. Worse, Facebook wasn’t aware of what it had done.
How this happened
No human being at Facebook created ad-targeting options that let advertisers pay to reach people affiliating themselves with hate, such as individuals who list “NaziParty” as their employer. A computer did. An algorithm developed by Facebook crawled users’ profiles, identified patterns in the data and created new audience segments based on that information.
In this case, enough people had typed “NaziParty” as their employer or “Jew hater” as their educational field of study on their Facebook profiles that the computer decided that these people warranted their own grouping, in the same way that people who work for AT&T or studied supply-chain management might.
The algorithm did not know that the Nazi Party was a German political party that oversaw a mass genocide or that “Jew hater” is an alternative descriptor for anti-Semites and that grouping together people who identify as such constitutes organizing a hate group.
It only knew that the sequences of letters that people inputted as the names of their employers, schools, job titles and fields of study appeared on thousands of people’s profiles and that a group of thousands is a group that may be valuable to an advertiser.
Facebook’s algorithm may not have known what it was doing, but Facebook should have. Facebook should have known that an algorithm trained to identify the characters in a text, but not to evaluate the meaning of those characters, would be susceptible to being fooled.
It should have known this in the same way that a grocery store manager should know that telling an employee to “write a list of all the different boxed products shelved in the cereal aisle” is a poor way to get a valid inventory of cereal. After all, customers do strange things, and one may have placed a box of rice next to the Rice Krispies. By following instructions to the letter and counting all the boxed products shelved in the cereal aisle, that employee may include that box of rice. But Facebook did not employ this kind of logic, and now it must learn the hard way.
Facebook’s response
In the wake of ProPublica’s report, Facebook has removed the ability for advertisers to target based on users’ input for four free-form text fields that were identified to be problematic: education, employment, field of study and job title, according to a Facebook spokesperson.
“To help ensure that targeting is not used for discriminatory purposes, we are removing these self-reported targeting fields until we have the right processes in place to help prevent this issue,” the company wrote in a blog post published on Thursday.
The issue of trust
While Facebook is reducing its usage of self-reported user data for ad targeting, it is not doing so entirely. Facebook may have stopped listing employer names and job titles as ad-targeting fields, but it still uses the employment information people submit to their profiles when listing industries for advertisers to target. And the company offers other ad-targeting categories that are based on information supplied by Facebook’s users, such as a person’s age and relationship status.
ProPublica’s report is only the latest example this month of the difficulty in relying on user-provided data for ad targeting purposes. Last week Pivotal Research analyst Brian Wieser identified the discrepancy between the number of 18- to 34-year-olds in the US that Facebook says it can reach and the number of 18- to 34-year-olds that actually live in the US, according to the Census Bureau.
The comparison was not apples to apples, but the dramatic difference underscored the issue with self-reported user data: people can lie. People can tell Facebook that they are 18 years old when in reality they are 11 years old and wouldn’t be allowed on Facebook unless they lied. Yet, for the most part, Facebook relies on the accuracy of self-reported information, and advertisers must trust Facebook in spite of any evidence to the contrary.
Facebook can try to mitigate the issue. It can work with third-party data providers to augment the information it has on users, such as by using people’s email addresses to cross-reference their profile information with similar data gathered by these companies through other means. But then Facebook is trusting those companies’ data to be correct and is asking advertisers to trust that it is so.
The issue of trust in digital advertising may never be erased. But with every event that highlights the discrepancies between the truth and what media companies say, the leap of faith is getting larger.