Breaking
OpenAI announces GPT-5 with breakthrough reasoning capabilities | OpenAI announces GPT-5 with breakthrough reasoning capabilities |

Home / Cloudflare Sets 2026 Deadline for AI Crawlers to Separate Search from Training

Technology

Cloudflare Sets 2026 Deadline for AI Crawlers to Separate Search from Training

Saran K | July 2, 2026 | 4 min read

Cloudflare AI crawler policy

Table of Contents

    A New Line in the Sand for Web Scraping

    Cloudflare is escalating its effort to reshape the economic relationship between AI companies and the publishers whose data powers them. The company has announced a definitive deadline: by September 15, 2026, Cloudflare will change its default settings to block “mixed-use” crawlers from any pages that host advertisements.

    The move targets a specific, contentious practice in the AI industry—the use of a single bot to handle both traditional search indexing and the ingestion of data for Large Language Model (LLM) training. For years, the web has operated on a “quid pro quo” basis: publishers allow search engines to crawl their sites in exchange for traffic. However, AI companies are increasingly using that same access to train models that can answer user queries directly, potentially removing the need for the user to ever click through to the source website.

    Under the new policy, any bot that blends search, agentic activity, and training will be blocked by default for new Cloudflare customers, new sites, and all existing free-tier users. This forces AI providers to either create dedicated, transparent bots for different purposes or risk losing access to a massive swath of the internet’s ad-supported content.

    The ‘Google Problem’ and the Search Paradox

    Cloudflare is not minceing words regarding the current market imbalance. In its announcement, the company specifically pointed toward the world’s largest search engine—a clear reference to Google—claiming it has access to roughly twice as much information as its competitors. The core of the issue is the “search paradox”: Google makes it difficult for publishers to remain discoverable in search results without also consenting to be used for AI training.

    While Google has introduced “Google Extended”—a tool allowing site owners to opt-out of AI training for Gemini and Vertex AI without affecting Search rankings—Cloudflare suggests this isn’t enough. The friction remains in how Googlebot handles AI Overviews and other agentic features that sit directly atop the search experience.

    Matthew Prince, co-founder and CEO of Cloudflare, framed the move as a necessity for survival in an era where human traffic is no longer the dominant force online. “Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge,” Prince stated. This shift toward bot-dominated traffic arrived sooner than analysts expected, accelerating the need for a new commercial framework.

    From ‘Pay Per Crawl’ to ‘Pay Per Use’

    Beyond blocking bots, Cloudflare is attempting to build a new monetization layer for the open web. The company is evolving its “Pay Per Crawl” marketplace into a more sophisticated “Pay Per Use” model. Rather than charging a flat fee for a bot to fetch a page, the new system allows publishers to monetize the actual value generated by the AI.

    This means a publisher could potentially be paid when their content is used to generate a specific answer in an AI search result, rather than just for the act of being indexed. To pilot this, Cloudflare is partnering with Ceramic.ai and You.com. When a publisher opts into this ecosystem, they receive payments based on the appearance of their content in Ceramic’s AI results or when You.com accesses premium data.

    There is also a significant technical incentive for this shift. Cloudflare data indicates that over 50% of AI crawler traffic is wasted on re-fetching pages that haven’t changed. By forcing a move toward transparent, intent-based crawling, Cloudflare aims to reduce unnecessary compute load and bandwidth waste for both the publisher and the AI provider.

    Related News

    #ai #webInfrastructure #digitalPublishing #cloudflare #internetEconomics

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *