Breaking
OpenAI announces GPT-5 with breakthrough reasoning capabilities | OpenAI announces GPT-5 with breakthrough reasoning capabilities |

Home / The Tokenization Trap: Why Google’s AI Still Can’t Count Letters

Technology

The Tokenization Trap: Why Google’s AI Still Can’t Count Letters

Saran K | May 28, 2026 | 4 min read

Google AI Overviews

Table of Contents

    The Alphabet Struggle

    For a company that organized the world’s information, Google is currently struggling with the basics of the alphabet. In a series of documented failures within its AI Overviews, the search giant’s generative AI has repeatedly failed at tasks a first-grader could handle: counting letters and spelling simple words.

    The errors are glaring. In some instances, the AI claimed there are two ‘Ps’ in the word “Google.” In others, it insisted there is exactly one ‘r’ in “poop,” while managing to spell “journalism” as “j-o-u-r-n-a-d-i-s-m.” It even stumbled on the name of a former U.S. president, spelling Trump as “t-r-p-u-m.” While these may seem like harmless glitches or “hallucinations” for the sake of a viral screenshot, they expose a fundamental architectural flaw in how Large Language Models (LLMs) process human language.

    In a statement to TechCrunch, Google acknowledged the lapse, noting that “counting within words has been a known challenge for LLMs,” and stated the company is working to resolve the issue.

    The Architecture of a Blind Spot

    To understand why an AI can write a functional Python script but cannot count the letter ‘r’ in “strawberry,” one has to look at tokenization. Humans read text as a sequence of letters that form words. LLMs do not. Instead, they use a transformer architecture that breaks text down into “tokens”—chunks of characters that can be whole words, syllables, or fragments.

    When you type a prompt into Google’s AI, the system doesn’t see the letters G-O-O-G-L-E. It sees a numerical representation (an encoding) of the token “Google.” Because the model operates on these encoded chunks, it lacks an inherent “visual” or granular awareness of the individual characters that make up those chunks.

    Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, explains that the model possesses an encoding of what a word means, but it doesn’t actually “know” the letters that constitute it. The AI is essentially predicting the next most likely token based on statistical patterns, not analyzing the orthography of the word.

    The Persistence of the Tokenizer

    This isn’t a bug that can be patched with a simple software update. It is a byproduct of how these models are scaled for efficiency. Breaking language into tokens allows LLMs to process massive amounts of data quickly; if they processed every single character individually, the computational cost would skyrocket and the context window would shrink.

    Sheridan Feucht, a PhD student studying LLM interpretability at Northeastern University, suggests that a “perfect tokenizer” may be an impossibility. The inherent fuzziness of how language is chunked means that even with expert intervention, models will likely continue to struggle with character-level precision because they are designed for semantic meaning, not literal spelling.

    These failures follow a pattern of erratic behavior since Google integrated AI Overviews into the core search experience. Users previously reported the AI suggesting that people add non-toxic glue to pizza sauce—citing a satirical Reddit thread—or advising the consumption of rocks for health benefits. While Google has since patched several of these high-profile errors, the spelling lapses persist because they are baked into the transformer’s DNA.

    The Trust Gap

    The recurring nature of these errors serves as a critical reminder of the gap between fluency and accuracy. Generative AI is designed to be persuasive and confident, often presenting a wrong answer with the same authority as a correct one.

    As Google continues to pivot its 29-year-old search engine toward a generative-first interface, these “kindergarten errors” highlight a lingering risk. If a model cannot reliably count the letters in a word, its ability to perform complex factual synthesis without human oversight remains a point of contention for researchers and a source of frustration for users.

    Related News

    #artificialIntelligence #google #search #machineLearning

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *