The Hy3 Mystery: Why a Mediocre Chinese LLM is Suddenly Dominating OpenRouter

Table of Contents
An Unexpected Leader in the LLM Charts
In the volatile world of Large Language Model (LLM) adoption, the most capable model rarely holds the crown for usage. While developers and power users often gravitate toward the ‘intelligence’ of Claude or GPT-4, the actual traffic patterns on aggregator platforms tell a different story—one driven by cost, latency, and specific architectural efficiencies.
OpenRouter, which serves as a unified API gateway for dozens of models, has recently released data that reveals a strange anomaly. A model dubbed Hy3 preview has climbed the rankings, surpassing industry favorites in total token usage by a significant margin. For those not deeply embedded in the Chinese open-source ecosystem, Hy3 is an obscure entity. Released by the tech conglomerate Tencent, the model’s presence on Hugging Face is sparse, and its own benchmarks are surprisingly humble, often trailing behind other competitive Chinese open-weights models.
The rise of Hy3 is particularly jarring because it doesn’t fit the typical trajectory of a ‘breakout’ model. There was no viral Twitter thread, no sudden leap in the LMSYS Chatbot Arena, and very little discussion on Hacker News or Reddit. Yet, the numbers are undeniable: Hy3 is being hammered by users.
The Economics of the ‘Loss Leader’
To understand Hy3’s ascent, one has to look at the pricing. Currently, Hy3 preview is listed at $0.066 per million input tokens. To put that in perspective, the highly popular DeepSeek V4 Flash—already known for its aggressive pricing—sits at $0.10 per million tokens. In an era where autonomous coding agents can burn through millions of tokens in a single session, a 34% price difference is a powerful incentive.
The data suggests a calculated onboarding strategy. Around May 6, OpenRouter offered Hy3 via a free endpoint. While free tiers usually attract a transient crowd, the usage did not collapse when the model transitioned to a paid SKU on May 8. Instead, the volume remained steady. This suggests that a core group of users—likely those running high-volume, low-complexity agents—found a specific utility in Hy3 that justified the shift to a paid tier.
The Infrastructure Play: SiliconFlow and Prompt Caching
The mystery deepens when examining who is actually serving the model. While DeepSeek V4 Flash is available through 13 different providers on OpenRouter, Hy3 preview is tied almost exclusively to a single provider: the Singapore-based SiliconFlow. Before the introduction of Hy3, SiliconFlow’s footprint on the platform was relatively minor. The sudden spike in Hy3 usage is, by extension, a spike in SiliconFlow’s traffic.
However, the raw cost per token isn’t the only factor. The real driver in modern LLM deployments is prompt caching. Because LLM calls are stateless, every new turn in a conversation requires the entire history to be re-processed. Prompt caching allows providers to reuse previous computations, drastically reducing both latency and cost.
Most top-tier providers—including OpenAI, Google, and Anthropic—offer cache reads at roughly 10% of the standard input cost. However, the implementation varies. While Anthropic requires a specific ‘cache write’ payment, other providers automate the process. If SiliconFlow has implemented a more aggressive or efficient caching mechanism for Hy3, the effective cost for a user running a long-context agent becomes negligible compared to the competition.
Context Over Intelligence
This phenomenon highlights a growing divide in the AI market: the gap between ‘frontier intelligence’ and ‘operational efficiency.’ If a user is running a repetitive task—such as log analysis, basic code refactoring, or data extraction—they don’t need the reasoning capabilities of a Claude 3.5 Sonnet. They need a model that is ‘good enough’ and incredibly cheap to keep running in a loop.
Hy3 preview may not be the most intelligent model on the board, but its combination of Tencent’s backing, SiliconFlow’s infrastructure, and a pricing model that rewards high-volume input makes it a pragmatic choice for the ‘agentic’ web. The rankings prove that in the race for AI dominance, the lowest friction—not always the highest IQ—wins the most traffic.