In April 2018, Mark Zuckerberg, Facebook’s chief executive, told Congress about an ambitious plan to share huge amounts of posts, links and other user data with researchers around the world so that they could study and flag disinformation on the site.
“Our goal is to focus on both providing ideas for preventing interference in 2018 and beyond, and also for holding us accountable,” Mr. Zuckerberg told lawmakers questioning him about Russian interference on the site in the 2016 presidential election. He said he hoped “the first results” would come by the end of that year.
But nearly 18 months later, much of the data remains unavailable to academics because Facebook says it has struggled to share the information while also protecting its users’ privacy. And the information the company eventually releases is expected to be far less comprehensive than originally described.
As a result, researchers say, the public may have little more insight into disinformation campaigns on the social network heading into the 2020 presidential election than they had in 2016. Seven nonprofit groups that have helped finance the research efforts, including the Knight Foundation and the Charles Koch Foundation, have even threatened to end their involvement.
BuzzFeed News earlier reported on researchers’ concerns over delays in Facebook’s data sharing project.
“Silicon Valley has a moral obligation to do all it can to protect the American political process,” said Dipayan Ghosh, a fellow at the Shorenstein Center at Harvard and a former privacy and public policy adviser at Facebook. “We need researchers to have access to study what went wrong.”
Political disinformation campaigns have continued to grow since the 2016 campaign. Last week, Oxford researchers said that the number of countries with disinformation campaigns more than doubled to 70 in the last two years, and that Facebook remained the No. 1 platform for those campaigns.
But while company executives express an eagerness to prevent the spread of knowingly false posts and photos on the social network, by far the world’s largest, they also face numerous questions about their ability to secure people’s private information.
Revelations last year that Cambridge Analytica, a political consulting firm, had harvested the personal data of up to 87 million Facebook users set off an outcry in Washington. In the months after the scandal, Facebook cut off many of the most common avenues for researchers accessing information about the more than two billion people on the service. This past July, it also agreed with federal regulators to pay $5 billion for mishandling users’ personal information.
“At one level, it’s difficult as there’s a large amount of data and Facebook has concerns around privacy,” said Tom Glaisyer, chairman of the group of seven nonprofits supporting the research efforts.
“But frankly, our digital public square doesn’t appear to be serving our democracy,” said Mr. Glaisyer, who is also the managing director of the Democracy Fund, a nonpartisan group that promotes election security.
Elliot Schrage, Facebook’s vice president of special projects, who oversees the initiative, defended the company’s efforts.
“The whole reason Mark announced this program in the first place is he believes that the most productive and instructive debates are driven by data and independent analysis,” Mr. Schrage said in an interview. “I know of no private company that has invested more to build tools and technologies to make private data publicly available for public research.”
Three months after Mr. Zuckerberg spoke in Washington last year, Facebook announced plans to provide approved researchers with detailed information about users, like age and location, where a false post appeared in their feeds and even their friends’ ideological affiliation. Dozens of researchers applied to get the information.
The company partnered with an independent research commission, Social Science One, which had been set up for the initiative, to determine what information could be sent to researchers. Facebook and Social Science One also brought in the Social Science Research Council, an independent nonprofit organization that oversees international social science research, to sort through the applications from academics and conduct a peer review and an ethical review on their research proposals.
But privacy experts brought in by Social Science One quickly raised concerns about disclosing too much personal information. In response, Facebook began trying to apply what’s known in statistics and data analytics as “differential privacy,” in which researchers can learn a lot about a group from data, but virtually nothing about a specific individual. It is a method that has been adopted by directors at the Census Bureau and promoted by Apple.
Facebook is still working on that effort. But researchers say that even when Facebook delivers the data, what they can learn about activity on the social network will be much more limited than they planned for.
“We and Facebook have learned how difficult it is to make” a database that was not just privacy-protected but at a “grand scale,” said Nate Persily, a Stanford law professor and co-founder of Social Science One.
Facebook said researchers had access to other data sets, including from its ads archive and Crowdtangle, a news-tracking tool that Facebook owns. Two researchers said they and others visited Facebook’s headquarters in California in June to learn how to study the available data set.
And both Facebook and Social Science One said they would continue to make more data available to researchers in time. In September, the two released 32 million links that included data about whether users labeled millions of posts as fake news, spam or hate speech, or if fact-check organizations raised doubts about the posts’ accuracy. It also included how many times stories were shared publicly and the countries where the stories were most shared.
Facebook’s effort is a “tremendous step forward,” said Joshua Tucker, a professor at New York University studying the spread of polarizing content across multiple platforms. “In the long term, if methods for making these data available for outside research are successfully implemented, it will have a very positive impact.”
But other researchers say the existing databases are severely limiting. And some say that Facebook’s concerns about privacy are overblown.
Ariel Sheen, a doctoral student at Universidad Pontificia Bolivariana in Medellin, Colombia, whose research team has been through the Social Science One approval process but has not yet received the data, said his group has uncovered on its own hints of a large coordinated campaign in Venezuela.
His group believes it has found more than 3,000 still-active fake Facebook accounts — profiles run by people impersonating others, for example — that are spreading false information. The accounts, Mr. Sheen said, are tied to Telesur, a Latin American television network largely financed by the Venezuela government.
But because Facebook is not providing the original data described, Mr. Sheen said, his team’s work cannot proceed as planned.
“We believe that it is imperative for our research to continue as was originally agreed to by Facebook,” he said.
Mr. Glaisyer of the Democracy Fund said it is important that researchers “can operate independently” but that Facebook “may consider other ways of granting researchers and analysts access such as on-site — as the Census Bureau does.” Mr. Sheen said that is precisely what his team has proposed.
Facebook said there were other possibilities for sharing data with researchers but that it could not commit to specific methods at this point.
Philip Howard, director of the Oxford Internet Institute, a department at Oxford University studying the use of social media to spread misinformation, said his team deliberately chose not to participate in the Facebook and Social Science One data sharing project.
“It takes so frustratingly long to get data sets that it’s easier for us to build our own tools and push the science forward on our own,” Mr. Howard said.
But Samantha Bradshaw, a researcher who works with Mr. Howard, said that collecting their own data for research is also limiting.
“It’s only a small glimpse into what are very big broad phenomenons,” she said.