Nobody was watching this model until it crushed OpenRouter

I check OpenRouter rankings most weeks. The usual suspects hold the top slots. Claude, GPT, and DeepSeek. But three weeks ago something odd appeared: a model called Hy3 preview had climbed to #1 by token volume, and nobody I know had heard of it. So I dug in.

What Is Hy3

Hy3 (Hunyuan 3.0) is Tencent's latest open-weight model. 295 billion total parameters with 21 billion active per token thanks to a Mixture-of-Experts design. That's the same architectural trick DeepSeek popularized: keep the knowledge of a massive model while only paying for a fraction of it at inference time. The model runs on a 256K context window, supports reasoning modes, and according to Tencent, was built on a completely rebuilt training infrastructure in under 90 days.

It's open-source under Tencent's community license, available on Hugging Face, and hosted exclusively on OpenRouter through the Singapore-based provider SiliconFlow. That exclusivity is part of the story, and it tells us a lot about how open-weight model distribution works in 2026.

Benchmarks and the Numbers That Matter

The SWE-bench Verified score is 74.4%, up from Hy2's 53.0%, a 40% jump in a single generation. That puts it ahead of GLM-5 (77.8%) and Kimi K2.5 (76.8%) and within striking distance of Claude Opus 4.6 at 80.8%. Not bad for a model nobody was talking about.

On BrowseComp, a benchmark for web-based agentic search, Hy3 scored 67.1%, beating both Gemini 2.0 Ultra (65.4%) and DeepSeek V4 (62.8%). On Tsinghua University's 2026 spring math PhD qualifying exam, it averaged 88.4, ranking first among Chinese models.

The model also scores strongly on FrontierScience-Olympiad and IMOAnswerBench, suggesting the reasoning improvements are real and generalizable, not just coding overfit. The model understands things beyond its training distribution.

The Pricing Puzzle

Here is where it gets interesting. Max Woolf from minimaxir.com did the math on why Hy3 is topping the OpenRouter charts. The headline cost is $0.063/M input tokens and $0.21/M output. Competitive, but not the cheapest. But in 2026, roughly 98% of LLM API costs are input tokens, driven by long context in agentic workflows. So the real cost depends on prompt caching efficiency.

Hy3 (via SiliconFlow) has a 44% cache read cost, meaning users pay nearly half the input price even on cache hits. Compare that to DeepSeek V4 Flash served directly by DeepSeek, which has just a 2% cache read cost thanks to their proprietary KV caching approach. The effective pricing difference is stark. DeepSeek V4 Flash costs about $0.018/1M input tokens, while Hy3 costs about $0.034/1M despite having a lower stated price.

This is the kind of detail that gets buried in stated price comparisons. If you're running an agent workflow with 50K+ token contexts on every loop, the cache economics matter more than the list price.

The Mystery of Usage

The minimaxir analysis shows Hy3's popularity isn't driven by a few large apps. The top 5 apps account for less than 1% of its token consumption. Usage stayed steady even after the model moved from a free SKU to paid. No single app is propping it up. The author's best guess: a single large data-processing pipeline using it as a backbone.

On Reddit, reactions are mixed but telling. Users in r/openrouter describe it as "actually good and creative" as a GLM-5 replacement. In r/ClaudeCode, someone noted they cut per-agent run costs from $100 to $20 by switching. The r/LocalLLaMA thread at release showed a mix of excitement about the specs and caution about the restrictive license terms.

Shunyu Yao, the former OpenAI researcher who pioneered Tree of Thoughts and ReAct, now Tencent's Chief AI Scientist, is leading Hy3's development. The model shows his fingerprints. The agentic vision is coherent: WildClaw (Tencent's internal benchmark) evaluates multi-step tasks like travel planning and software debugging, reflecting the ReAct philosophy.

So What

Hy3's OpenRouter dominance is not really about Hy3 being a magic model. On most benchmarks it trails Claude Opus 4.7 and GPT-5.5 by 5-7 points. What it shows is the economics of open-weight models in 2026. A 295B MoE model from a Chinese tech giant, served by a Singapore provider, beating every frontier model on raw usage. Not because it's better, but because it's cheap enough and good enough for the use case that matters most: high-volume, non-agentic data processing.

The real winner here might be DeepSeek V4 Flash, which has better cache economics and will likely pull ahead as users become more price-savvy. But for now, the lesson is clear: stated prices are meaningless in 2026. The effective cost is everything.

Sources

minimaxir.com/2026/05/openrouter-hy3
huggingface.co/tencent/Hy3-preview
thelec.net/news/articleView.html?idxno=6830
businessanalytics.substack.com/p/tencent-open-sources-hy3-at-744
reddit.com/r/LocalLLaMA/comments/1stk2mz
reddit.com/r/openrouter/comments/1sthwb9
reddit.com/r/ClaudeCode/comments/1thntzy
alexlavaee.me/blog/deepseek-v4-architecture-benchmarks-engineer-verdict

What Is Hy3

Benchmarks and the Numbers That Matter

The Pricing Puzzle

The Mystery of Usage

So What

Sources

RELATED_ENTRIES

That 27B model was too big for a phone. Not anymore.

$4.40 per million tokens just matched the $200 tier

AI coding costs hit $2,000 per engineer and budgets blew up