On Wednesday, Google previewed what may be one of the biggest changes to the search engine in its history.
Google will use AI models to combine and aggregate information from across the web in response to search queries, a product it’s calling Search Generative Experience.
Instead of “ten blue links,” the phrase that describes Google’s usual search results, Google shows some users paragraphs of AI-generated text and a handful of links at the top of the results page.
The new AI-based search is being tested for a select group of users and is not yet generally available. But website publishers already fear that if it becomes Google’s default way of displaying search results, it could hurt them, as fewer visitors will be directed to their websites and they will stay on Google.com.
The controversy highlights a long-standing tension between Google and the websites it indexes, with a new twist on artificial intelligence. Publishers have long had concerns about Google reusing their verbatim content in snippets on its own site, but now Google uses advanced machine learning models that scan large swaths of the web to “train” the software to spit out human-like text and responses.
Rutledge Daugette, CEO of TechRaptor, a website that focuses on gaming news and reviews, said Google’s move was made with no regard for publishers’ interests and that Google’s AI boils down to improving content.
“Their focus is on zero-click searches, leveraging information from publishers and authors who put the time and effort into creating quality content without providing any benefits other than the potential of a click,” Daugette told CNBC. “Until now, the AI has been quick to reuse the information of others without benefiting them, and in cases like Google, Bard doesn’t even offer an indication of where the information it’s using came from.”
Luther Lowe, a longtime Google critic and head of the public policy department howlsaid Google’s update is part of a decades-long strategy to keep users on the site longer, rather than directing them to the sites that originally hosted the information.
“The exclusive self-preference of Google’s ChatGPT clone in search is the final chapter in the web’s bloodshed,” Lowe told CNBC.
According to Search Engine Land, a news site that closely follows changes to Google’s search engine, tests so far show the AI-generated results above the organic search results. CNBC previously reported on Google’s plans to redesign its results page to promote AI-generated content.
SGE comes in a different colored box – green in the example – and contains box links to three websites on the right side. In Google’s primary example, all three site headings were truncated.
According to Google, the information does not come from the websites but is confirmed by the links. Search Engine Land said the SGE approach is an improvement and a “healthier” way of linking than Google’s Bard chatbot, which rarely linked to publisher sites.
Some publishers are wondering if they can stop AI firms like Google from extracting their content to train their models. Companies like the firm behind Stable Diffusion are already facing lawsuits from data owners, but the right to scrape web data for AI remains an undecided frontier. Other companies like Reddit have announced plans to charge for access to their data.
At the forefront of the publishing world is Barry Diller, Chairman of I.A.Cwhich includes sites like All Recipes, People Magazine and The Daily Beast.
“If all the information in the world can be sucked down that maw and then essentially repackaged into meaningful sentences, in what’s called chat, but it’s not chat – as many Grafs as you want, 25 on any subject -, there will be no release because it will be impossible,” Diller said at a conference last month.
“What you need to do is get the industry to say you can’t remove our content until you work out schemes where the publisher is given a way to pay,” Diller continued, saying that Google is collaborating with this problem will be faced.
Diller believes that publishers can sue AI companies over copyright and that current “fair use” restrictions need to be redefined. The Financial Times reported Wednesday that Diller is leading a group of publishers “that will state that if necessary, we will change the copyright.” An IAC spokesman declined a request to make Diller available for an interview.
A challenge for publishers is to ensure that their content is used by AI. Google has not disclosed any training sources for its large language model underlying SGE PaLM 2, and Daugette says that while he has seen examples of competitor citations and rating results re-attributed to Bard without attribution, it is difficult to say whether the Information without direct source information from its website comes from linked sources.
A Google spokesman said the company does not plan to report on publisher compensation for training data.
“We’re introducing this new generative AI experience as an experiment in Search Labs to help us iterate and improve while incorporating feedback from users and other stakeholders,” Google said in a statement.
“PaLM 2 is based on a wide range of openly available data on the web and we obviously care about the health of the web ecosystem. And that’s really part of how we think about how we build our products to make sure we have a healthy ecosystem where creators are part of that thriving ecosystem,” Google vice president of research Zoubin Ghahramani said in a media briefing earlier this week.
Daugette says Google’s moves make it difficult to be an independent publisher.
“I think it’s really frustrating for our industry to have to worry about our hard work being lost when so many colleagues are laid off,” Daugette said. “It’s just not okay.”
– CNBC’s Jordan Novet contributed to the coverage.