Endless AI-generated spam risks clogging up Google’s search results

Over the past year, AI systems have made huge strides in their ability to generate convincing text, churning out everything from song lyrics to short stories. Experts have warned that these tools could be used to spread political disinformation, but there’s another target that’s equally plausible and potentially more lucrative: gaming Google.

Instead of being used to create fake news, AI could churn out infinite blogs, websites, and marketing spam. The content would be cheap to produce and stuffed full of relevant keywords. But like most AI-generated text, it would only have surface meaning, with little correspondence to the real world. It would be the information equivalent of empty calories, but still potentially difficult for a search engine to distinguish from the real thing.

Just take a look at this blog post answering the question: “What Photo Filters are Best for Instagram Marketing?” At first glance it seems legitimate, with a bland introduction followed by quotes from various marketing types. But read a little more closely and you realize it references magazines, people, and — crucially — Instagram filters that don’t exist:

You might not think that a mumford brush would be a good filter for an Insta story. Not so, said Amy Freeborn, the director of communications at National Recording Technician magazine. Freeborn’s picks include Finder (a blue stripe that makes her account look like an older block of pixels), Plus and Cartwheel (which she says makes your picture look like a topographical map of a town.

The rest of the site is full of similar posts, covering topics like “How to Write Clickbait Headlines” and “Why is Content Strategy Important?” But every post is AI-generated, right down to the authors’ profile pictures. It’s all the creation of content marketing agency Fractl, who says it’s a demonstration of the “massive implications” AI text generation has for the business of search engine optimization, or SEO.

“Because [AI systems] enable content creation at essentially unlimited scale, and content that humans and search engines alike will have difficulty discerning […] we feel it is an incredibly important topic with far too little discussion currently,” Fractl partner Kristin Tynski tells The Verge.

To write the blog posts, Fractl used an open source tool named Grover, made by the Allen Institute for Artificial Intelligence. Tynski says the company is not using AI to generate posts for clients, but that this doesn’t mean others won’t. “I think we will see what we have always seen,” she says. “Blackhats will use subversive tactics to gain a competitive advantage.”

The history of SEO certainly supports this prediction. It’s always been a cat and mouse game, with unscrupulous players trying whatever methods they can to attract as many eyeballs as possible while gatekeepers like Google sort the wheat from the chaff.

As Tynski explains in a blog post of her own, past examples of this dynamic include the “article spinning” trend, which started 10 to 15 years ago. Article spinners use automated tools to rewrite existing content; finding and replacing words so that the reconstituted matter looked original. Google and other search engines responded with new filters and metrics to weed out these mad-lib blogs, but it was hardly an overnight fix.

AI text generation will make article spinning “look like child’s play,” writes Tynski, allowing for “a massive tsunami of computer-generated content across every niche imaginable.”

Mike Blumenthal, an SEO consultant and expert, says these tools will certainly attract spammers, especially considering their ability to generate text on a massive scale. “The problem that AI-written content presents, at least for web search, is that it can potentially drive the cost of this content production way down,” Blumenthal tells The Verge.

And if the spammers’ aim is simply to generate traffic, then fake news articles could be perfect for this, too. Although we often worry about the political motivations of fake news merchants, most interviews with the people who create and share this context claim they do it for the ad revenue. That doesn’t stop it being politically damaging.

The key question, then, is: can we reliably detect AI-generated text? Rowan Zellers of the Allen Institute for AI says the answer is a firm “yes,” at least for now. Zellers and his colleagues were responsible for creating Grover, the tool Fractl used for its fake blog posts, and were able to also engineer a system that can spot Grover-generated text with 92 percent accuracy.

“We’re a pretty long way away from AI being able to generate whole news articles that are undetectable,” Zellers tells The Verge. “So right now, in my mind, is the perfect opportunity for researchers to study this problem, because it’s not totally dangerous.”

Spotting fake AI text isn’t too hard, says Zeller, because it has a number of linguistic and grammatical tells. He gives the example of AI’s tendency to re-use certain phrases and nouns. “They repeat things … because it’s safer to do that rather than inventing a new entity,” says Zeller. It’s like a child learning to speak; trotting out the same words and phrases over and over, without considering the diminishing returns.

However, as we’ve seen with visual deepfakes, just because we can build technology that spots this content, that doesn’t mean it’s not a danger. Integrating detectors into the infrastructure of the internet is a huge task, and the scale of the online world means that even detectors with high accuracy levels will make a sizable number of mistakes.

Google did not respond to queries on this topic, including the question of whether or not it’s working on systems that can spot AI-generated text. (It’s a good bet that it is, though, considering Google engineers are at the cutting-edge of this field.) Instead, the company sent a boilerplate reply saying that it’s been fighting spam for decades, and always keeps up with the latest tactics.

SEO expert Blumenthal agrees, and says Google has long proved it can react to “a changing technical landscape.” But, he also says a shift in how we find information online might also make AI spam less of a problem.

More and more web searches are made via proxies like Siri and Alexa, says Blumenthal, meaning gatekeepers like Google only have to generate “one (or two or three) great answers” rather than dozens of relevant links. Of course, this emphasis on the “one true answer” has its own problems, but it certainly minimizes the risk from high-volume spam.

The end-game of all this could be even more interesting though. AI-text generation is advancing in quality extremely quickly, and experts in the field think it could lead to some incredible breakthroughs. After all, if we can create a program that can read and generate text with human-level accuracy, it could gorge itself on the internet and become the ultimate AI assistant.

“It may be the case that in the next few years this tech gets so amazingly good, that AI-generated content actually provides near-human or even human-level value,” says Tynski. In which case, she says, referencing an Xkcd comic, it would be “problem solved.” Because if you’ve created an AI that can generate factually-correct text that’s indistinguishable from content written by humans, why bother with the humans at all?

Source link