Your AI coding agent is burning money on every cache miss

deepseek-reasonix-coding-agent_01

Most coding agents treat caching like an afterthought. Something the API provider handles, so why bother? The thing is, DeepSeek's prefix cache is the cheapest inference you will ever get. $0.028 per million tokens on a hit versus $0.28 on a miss. That is a 10x difference. And most agents leave that money on the table because they were not designed to keep the byte prefix stable across sessions.

Reasonix is different. It is an open-source terminal agent built specifically around DeepSeek's prefix-cache mechanics. Cache stability is not a config flag you flip. It is the invariant the entire loop is built around. Once you see how the numbers work out, you will start noticing how much your current agent is wasting.

Architecture

Reasonix partitions its context into three zones. The immutable prefix holds system prompts, tool specs, and few-shot examples, hashed and pinned so they never shift. The append-only log tracks the conversation. Every turn adds to the end; nothing rewrites what came before. The volatile scratch holds transient state, things like R1 reasoning traces, plan variables, and in-progress thoughts, and it resets every turn so it never bloats the prefix.

This matters because DeepSeek's cache matches on full byte-prefix alignment. If your system prompt shifts by even a single byte between calls, you lose the cache. Most multi-provider agents like Claude Code and Aider cannot guarantee this, because they swap models and instructions based on context. Reasonix can because it only talks to DeepSeek. That limitation is actually the point.

DeepSeek's own API docs describe the caching mechanism: the system stores overlapping request prefixes on disk, and a cache hit occurs only when a new request fully matches an existing cache prefix unit. The system automatically detects common prefixes across requests and persists them as independent units. This is a best-effort service, no code changes required, and you get transparency through the usage object in every API response showing prompt_cache_hit_tokens versus prompt_cache_miss_tokens.

Real-World Numbers

The repo documents a 99.82% cache hit rate on a real single-day workload. The concrete math: one user processed 435 million input tokens in a single day. Without caching, that would cost roughly $61. With Reasonix's cache stability, the same workload cost $12.

Per-turn costs average under $0.05 with the default v4-flash model. The agent auto-escalates to v4-pro only when it detects three or more struggling events (failed tool calls, repair triggers) in a single turn. You can also manually arm the next turn with /pro for targeted hard problems. There is a real-time cost badge in the TUI that shows green under $0.05, yellow between $0.05 and $0.20, and red above $0.20.

The Tool-Call Repair Pipeline

DeepSeek models have a specific failure mode. They sometimes drop tool arguments on complex schemas. They bury valid tool calls inside reasoning_content blocks. They emit unbalanced JSON when they hit token limits. Reasonix has a four-pass pipeline that handles each one.

First, the Flatten pass converts nested schemas to dot-notation so DeepSeek does not drop arguments. Second, the Scavenge pass recovers tool calls hidden inside reasoning traces. Third, the Truncation Repair pass fixes unbalanced JSON braces from truncated responses. Fourth, the Storm Detection pass suppresses infinite loops of identical tool calls.

This matters because generic agents that work well with GPT or Claude will silently fail with DeepSeek, producing broken edits or hanging loops. Reasonix was tuned on actual DeepSeek failure modes, not synthetic test cases.

Community Reaction

The r/DeepSeek community has already noticed the difference. One user said "Claude Code is not the most efficient tool when it comes to the DeepSeek API's prefix cache. I would recommend using DeepSeek TUI or Reasonix." Another is running 390 million tokens for 64 cents. At that rate, caching makes DeepSeek cheaper than running local models on consumer hardware.

The project hit top-3 on Oosmetrics for LLM velocity within days of launch and has an active Discord community. GitHub Projects picked it up on social media within hours.

Comparison

Reasonix versus the field:

Reasonix: DeepSeek only, engineered cache strategy, 4-pass tool repair, MIT license, embedded dashboard, ~$0.05/turn.
Claude Code: Anthropic only, incidental cache, no tool repair, closed source, no dashboard, $0.15 to $0.60+/turn.
Aider: Any backend, incidental cache, no tool repair, Apache 2.0, no dashboard, varies per turn.

The tradeoff is obvious. You get one provider, but everything about it works perfectly for that provider. Reasonix does not need to abstract over caching behaviors that conflict. It picks one, DeepSeek, and optimizes the hell out of it.

So What

I do not think every coding agent should be single-provider. But I do think the multi-provider abstraction has a real cost, and Reasonix is the clearest example yet of what you gain by giving it up. A 99.82% cache hit rate on a 435M-token day moves this from nice-to-have to "how much is your API bill costing you on cache misses?"

The deeper point is that "works with any model" is not a virtue when models have completely different inference mechanics. Claude Code on DeepSeek is fine. But Reasonix on DeepSeek is a new category of cheap. Pick your poison.

Sources:

GitHub: https://github.com/esengine/DeepSeek-Reasonix
Project page: https://esengine.github.io/DeepSeek-Reasonix
Agent Harness Field Guide: https://agents.buttonscli.com/field-guide/reasonix
DeepSeek KV Cache docs: https://api-docs.deepseek.com/guides/kv_cache
HN discussion: https://news.ycombinator.com/item?id=48256953
Reddit r/DeepSeek: 390M tokens for 64 cents: https://www.reddit.com/r/DeepSeek/comments/1td3r1n/390m_tokens_for_64_cents

Architecture

Real-World Numbers

The Tool-Call Repair Pipeline

Community Reaction

Comparison

So What

RELATED_ENTRIES

That 27B model was too big for a phone. Not anymore.

$4.40 per million tokens just matched the $200 tier

AI coding costs hit $2,000 per engineer and budgets blew up