A couple of weeks ago at Google I/O, we announced that we’d be bringing AI Overviews to everyone in the U.S.
User feedback shows that with AI Overviews, people have higher satisfaction with their search results, and they’re asking longer, more complex questions that they know Google can now help with. They use AI Overviews as a jumping off point to visit web content, and we see that the clicks to webpages are higher quality — people are more likely to stay on that page, because we’ve done a better job of finding the right info and helpful webpages for them.
In the last week, people on social media have shared some odd and erroneous overviews (along with a very large number of faked screenshots). We know that people trust Google Search to provide accurate information, and they’ve never been shy about pointing out oddities or errors when they come across them — in our rankings or in other Search features. We hold ourselves to a high standard, as do our users, so we expect and appreciate the feedback, and take it seriously.
Given the attention AI Overviews received, we wanted to explain what happened and the steps we’ve taken.
How AI Overviews work
For many years we’ve built features in Search that make it easier for people to find the information they’re looking for as quickly as possible. AI Overviews are designed to take that a step further, helping with more complex questions that might have previously taken multiple searches or follow-ups, while prominently including links to learn more.
AI Overviews work very differently than chatbots and other LLM products that people may have tried out. They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional “search” tasks, like identifying relevant, high-quality results from our index. That’s why AI Overviews don’t just provide text output, but include relevant links so people can explore further. Because accuracy is paramount in Search, AI Overviews are built to only show information that is backed up by top web results.
This means that AI Overviews generally don’t “hallucinate” or make things up in the ways that other LLM products might. When AI Overviews get it wrong, it’s usually for other reasons: misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available. (These are challenges that occur with other Search features too.)
This approach is highly effective. Overall, our tests show that our accuracy rate for AI Overviews is on par with another popular feature in Search — featured snippets — which also uses AI systems to identify and show key info with links to web content.
About those odd results
In addition to designing AI Overviews to optimize for accuracy, we tested the feature extensively before launch. This included robust red-teaming efforts, evaluations with samples of typical user queries and tests on a proportion of search traffic to see how it performed. But there’s nothing quite like having millions of people using the feature with many novel searches. We’ve also seen nonsensical new searches, seemingly aimed at producing erroneous results.
Separately, there have been a large number of faked screenshots shared widely. Some of these faked results have been obvious and silly. Others have implied that we returned dangerous results for topics like leaving dogs in cars, smoking while pregnant, and depression. Those AI Overviews never appeared. So we’d encourage anyone encountering these screenshots to do a search themselves to check.
But some odd, inaccurate or unhelpful AI Overviews certainly did show up. And while these were generally for queries that people don’t commonly do, it highlighted some specific areas that we needed to improve.
One area we identified was our ability to interpret nonsensical queries and satirical content. Let’s take a look at an example: “How many rocks should I eat?” Prior to these screenshots going viral, practically no one asked Google that question. You can see that yourself on Google Trends.
There isn’t much web content that seriously contemplates that question, either. This is what is often called a “data void” or “information gap,” where there’s a limited amount of high quality content about a topic. However, in this case, there is satirical content on this topic … that also happened to be republished on a geological software provider’s website. So when someone put that question into Search, an AI Overview appeared that faithfully linked to one of the only websites that tackled the question.
In other examples, we saw AI Overviews that featured sarcastic or troll-y content from discussion forums. Forums are often a great source of authentic, first-hand information, but in some cases can lead to less-than-helpful advice, like using glue to get cheese to stick to pizza.
In a small number of cases, we have seen AI Overviews misinterpret language on webpages and present inaccurate information. We worked quickly to address these issues, either through improvements to our algorithms or through established processes to remove responses that don’t comply with our policies.
Improvements we’ve made
As is always the case when we make improvements to Search, we don’t simply “fix” queries one by one, but we work on updates that can help broad sets of queries, including new ones that we haven’t seen yet.
From looking at examples from the past couple of weeks, we were able to determine patterns where we didn’t get it right, and we made more than a dozen technical improvements to our systems. Here’s a sample of what we’ve done so far:
- We built better detection mechanisms for nonsensical queries that shouldn’t show an AI Overview, and limited the inclusion of satire and humor content.
- We updated our systems to limit the use of user-generated content in responses that could offer misleading advice.
- We added triggering restrictions for queries where AI Overviews were not proving to be as helpful.
- For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections.
In addition to these improvements, we’ve been vigilant in monitoring feedback and external reports, and taking action on the small number of AI Overviews that violate content policies. This means overviews that contain information that’s potentially harmful, obscene, or otherwise violative. We found a content policy violation on less than one in every 7 million unique queries on which AI Overviews appeared.
At the scale of the web, with billions of queries coming in every day, there are bound to be some oddities and errors. We’ve learned a lot over the past 25 years about how to build and maintain a high-quality search experience, including how to learn from these errors to make Search better for everyone. We’ll keep improving when and how we show AI Overviews and strengthening our protections, including for edge cases, and we’re very grateful for the ongoing feedback.