Gemini 3.5 Flash (Low), the cheapest reasoning-effort variant of Google’s Flash-tier model, generates roughly 45% fewer tokens than the Medium variant and outperforms the High variant on software-engineering tasks, according to developer testing posted to X on May 22 by user mohansolo.
The counterintuitive finding is that on coding workloads, less reasoning effort produces better outputs. Google’s three-tier system (Low, Medium, High) was designed so that High provides the deepest chain-of-thought for hard problems, but the SWE-bench-style results indicate that for code generation specifically, the longer reasoning chains introduce more errors than they fix. The model spends more tokens, costs more, and writes worse code.
The result is a single developer’s anecdotal benchmark, not a published evaluation across a representative task distribution. Real workloads vary, and the comparison may not hold for debugging, refactoring, or other coding tasks the test did not cover.
For teams routing requests across Gemini’s tiers, the practical move is to test your own production workload before defaulting to the highest tier. If the pattern holds on your tasks, switching the default tier from High to Low can cut inference cost roughly in half while improving output quality.
Posted on X by mohansolo on 2026-05-22.