Can Search Engines Detect AI-Generated Content?

August 7, 2023

Unmasking the Mystery: Can Search Engines Detect AI-Generated Content?

The past year has witnessed incredible growth of AI tools, which has had a huge impact on digital marketers, particularly those working in the realm of SEO.

With content creation being both time-consuming and costly, marketers have increasingly turned to AI for assistance, although we can safely say the results have been mixed.

Amidst these developments, a burning question has emerged: "Can search engines detect AI-generated content?" This question is extremely important, since its answer could potentially invalidate other queries about whether, and how, AI should be employed in content creation.

A brief history of machine-generated content

While the surge in machine-generated content creation is unprecedented, it's not entirely new, nor is it always detrimental.

News websites, for instance, have long employed data from various sources, such as stock markets and seismometers, to expedite content creation.

For example, it's perfectly acceptable to publish a news article generated by a robot reporting a recent earthquake. Such updates are essential for delivering information promptly to readers.

Conversely, we've also witnessed numerous "black hat" implementations of machine-generated content.

Google has consistently condemned the use of techniques like Markov chains for text generation and low-effort content spinning, categorising them as "automatically generated pages that provide no added value."

The enigma of "No Added Value"

The concept of "no added value" has raised many eyebrows and blurred lines in the realm of content creation. It has become increasingly vital to understand how Large Language Models (LLMs) like GPTx and ChatGPT operate, and what sets them apart:

1. Text is generated based on probability distribution

LLMs generate text based on a probability distribution. When given a prompt, they predict the most likely word to come next based on their training data, akin to advanced predictive text on smartphones.

2. Generative AI uncertainty

LLMs are generative artificial intelligence, meaning their output is not predictable. There's an element of randomness, and they may produce different responses to the same prompt.

This understanding reveals a fundamental limitation: LLMs lack traditional knowledge and don't ‘know’ things in the same way that humans do.

his limitation leads to errors (also known as ‘hallucinations’), where AI-generated text can yield incorrect results or contradictory responses.

3. The challenge of consistency and accuracy

These ‘hallucinations’ raise significant doubt on the consistency of ‘adding value’ through AI-generated text, particularly concerning topics related to Your Money, Your Life (YMYL).

Such topics can have substantial real-world implications, and AI-generated content that's factually incorrect can be extremely detrimental, especially when it comes to people’s finances.

Major publications like Men's Health and CNET have been caught publishing factually incorrect AI-generated content, underscoring the issue. Google, too, has struggled to rein in its Search Generative Experience (SGE) content on YMYL topics, despite promising to exercise caution.

4. Google's stance and the emergence of MUM

Google seems to believe that there's a place for machine-generated content to answer user queries.

This belief matches their Multitask Unified Model (MUM), which was introduced to address the fact that people issue an average of eight queries for complex tasks. MUM aims to generate complete answers based on an initial query and anticipated follow-up questions, relying on Google's vast knowledge index.

However, while this approach might be ideal for this user, it could potentially wide out "long-tail" or low-volume keyword strategies that SEOs often rely on for SERP visibility.

If Google can identify queries suitable for AI-generated responses, many questions may already be "solved." This poses a dilemma for Google: show users a pre-generated answer or send them to a page that already exists?

5. Detecting AI content: A delicate balance

As the usage of tools like ChatGPT surged, several ‘AI content detectors’ emerged, claiming to assess the AI-generated nature of text content. These detectors provide a percentage score indicating the certainty that the text is AI-generated.

However, a misunderstanding arises from the way these detectors label percentages.

For example, ‘75% AI / 25% Human’ does not mean ‘75% of the text was written by AI and 25% by a human.’ Instead, it signifies ‘I am 75% certain that AI wrote 100% of this text.’ This misconception has led some to offer advice on tweaking text inputs to ‘fool’ AI detectors, clearly further complicating matters.

6. Google's policies and actions on AI content

Google's statements about AI content have been incredibly vague, giving them the flexibility they need when it comes to enforcement. However, updated guidance from Google Search Central explicitly emphasises a focus on content quality rather than the method of content production. Google lists examples of how AI can generate useful content, such as sports scores and weather forecasts.

The ultimate goal for Google is to combat SERP manipulation. They've made major strides in this regard over the years, claiming that advances in their systems have rendered 99% of searches "spam-free.’ Google's ability to detect and remove AI-generated content indicates their commitment to quality.

7. Real-life experiments and the elusive line of quality

Numerous experiments have been conducted to gauge how Google responds to AI-generated content and where they draw the line in terms of quality.

An experiment involving a website with 10,000 pages of content generated primarily by an unsupervised GPT-3 model showcased that Google was not classifying such content as ‘quality.’ Moreover, Google could detect and suppress these results, indicating that AI content doesn't always meet their quality standards.

8. Finding the right question

Based on Google's guidelines, insights from search systems, SEO experiments and common sense, the question ‘Can search engines detect AI content?’ may not be the right question to ask. It's a short-term perspective at best.

While AI is making strides in generating answers for content-scarce queries, Google's long-term goals with SGE may shift the focus back to longer-form expert content.

Google's knowledge systems could become the primary source for addressing longtail queries, potentially diminishing the need for directing users to various small websites.

Get in touch with EWM to discuss your exact digital marketing needs.