Xiaomi beats Claude Code with a better harness, not a better model

MiMo Code V0.1.0 claims higher scores on 200-plus-step multi-session tasks by solving the memory problem Claude Code ignores.

Alessandro Benigni

PUBLISHED JUN 13, 2026

3 MIN READ

Follow on Google

-974 MIN AGO

Xiaomi beats Claude Code with a better harness, not a better model — featured image for AI Insiders

Xiaomi’s MiMo team has shipped an open-source terminal-native coding agent that the company claims outperforms Claude Code on long-horizon, multi-session tasks. The winning difference is not a stronger underlying model. It is a four-layer memory architecture that keeps the agent from forgetting what it was doing.

MiMo Code V0.1.0, released on GitHub under an MIT license and built on top of the OpenCode framework, targets the failure mode every coding agent hits at scale: context exhaustion. When an agent runs for 200-plus steps across multiple sessions, early decisions about architecture, constraints, and task scope fall out of the context window. The agent forgets. Work repeats. Errors compound.

VentureBeat reported on June 11 that MiMo Code addresses this directly with a cross-session memory system. A persistent MEMORY.md project file stores high-level decisions. Session checkpoints record intermediate state. Per-task progress logs track what has been completed. A SQLite FTS5 full-text search index runs across all four layers. When the context window approaches its limit, an independent checkpoint-writing subagent extracts the current working state and writes it to disk. On the next session start, that state is restored.

The result, per the benchmark claims, is an agent that can sustain coherent progress on tasks most agents abandon after a few hundred steps. The company has not published independent third-party evaluations; the benchmark results cited in the release are Xiaomi’s own.

The architecture also includes a task-tree system for breaking long work into tracked subtasks, and two distillation commands: /dream and /distill, which consolidate accumulated knowledge from past sessions into a compressed, searchable form. The default model channel runs on MiMo-V2.5 with a one-million-token context window and requires no API key to start.

The strategic signal here is harder to miss than the benchmarks. Xiaomi did not beat Claude by training a superior model. It beat Claude Code, the harness Anthropic built around Claude, by engineering a better memory layer. The contest has shifted from what a model knows to how an agent remembers.

This directly echoes the debate Anthropic has been running publicly about managed agents and dynamic workflows. The argument is becoming a market fact: the coordination and memory layer is where coding agents differentiate, and that layer is not exclusive to frontier labs. A consumer electronics company shipping MIT-licensed infrastructure that out-remembers a product from one of the best-resourced AI labs in the world confirms the claim.

The parallel to Anthropic’s own June work on dynamic workflows is precise. Anthropic is building memory and coordination into its managed agent infrastructure. Xiaomi built it as an open harness anyone can fork. Both are solving the same problem from opposite directions: top-down from the lab, and bottom-up from the open-source community.

The moat question this raises is concrete. If the harness matters more than the model for long-horizon tasks, and if harnesses are MIT-licensed and forkable, the defensive position of any coding agent product that relies on model quality alone weakens on exactly the tasks that matter most to professional developers. A 200-step refactor or a multi-session codebase migration is not a demo. It is the use case teams pay for.

Teams currently evaluating Claude Code for extended agentic workflows should run MiMo Code against their own task benchmarks before committing; the memory architecture is the specific claim to validate, and the MIT license means the cost of testing is zero.

Reported by VentureBeat (venturebeat.com), 2026-06-11.

Xiaomi beats Claude Code with a better harness, not a better model

The morning brief for people inside the AI industry.

More in Tools

Debug the data, not the model

One dev trained a custom LLM from scratch for $80

The tokenizer is your cheapest cost lever. Here's how to optimize it.