Z.ai ships GLM-5.2 with a 1M-token context for full codebases

The Hangzhou lab's third GLM-5 release targets agentic software engineering with a five-fold context expansion and dual reasoning modes.

Alessandro Benigni

PUBLISHED JUN 18, 2026

3 MIN READ

Follow on Google

9 HR AGO

Z.ai ships GLM-5.2 with a 1M-token context for full codebases — featured image for AI Insiders

Z.ai shipped GLM-5.2 on June 13, framing the release explicitly around agentic coding rather than general capability. The announcement positions the model to read, reason about, and act across entire repositories in a single pass, a capability increasingly central to the coding-agent market that GitHub Copilot, Cursor, and Claude Code are all competing to serve.

The headline specification is a 1-million-token context window, available via the glm-5.2[1m] variant. That figure is roughly five times the 200,000-token ceiling of GLM-5.1. Practical output capacity extends to 131,072 tokens per response, which matters for code generation tasks that require producing entire files or multi-file diffs in one shot.

The model runs on the same 744-billion-parameter Mixture-of-Experts architecture as GLM-5, activating roughly 40 billion parameters per token. Z.ai added a dual thinking-effort system with High and Max levels, letting developers trade latency for deeper reasoning on tasks that span long dependency chains or complex refactor requests. The control is a direct response to patterns seen in OpenAI’s o-series and Anthropic’s extended thinking: users want the ability to dial up compute on hard problems without paying that cost on every call.

At launch, GLM-5.2 was available only to paid GLM Coding Plan subscribers across Lite, Pro, Max, and Team tiers, with subscriptions starting around $18 per month. Z.ai said API access, chatbot support, full technical documentation, and MIT-licensed open weights would follow within the week. Indicative API pricing surfaced at roughly $1.40 per million input tokens for input and $4.40 per million for output, competitive with mid-tier frontier coding models.

Z.ai published no benchmark results at launch. No SWE-bench numbers, no Terminal-Bench scores, no Code Arena data appeared in the announcement. That is an unusual choice for a frontier coding release. Third-party tracking later placed GLM-5.2 Max near the top of Code Arena’s frontend coding leaderboard, but that data came from the community, not the company. Whether Z.ai withheld benchmarks to avoid scrutiny of specific failure modes or simply to let the product speak first is not explained in the release notes.

The MIT license is the detail that matters most for the developer market. It means teams can self-host GLM-5.2 or fine-tune it without license restrictions, placing it in direct competition with the open-weight coding models from Qwen and DeepSeek that have attracted enterprise adoption on cost and control grounds. A million-token context window combined with an open license at that parameter count is a pairing the open-weight ecosystem has not consistently offered before.

Chinese labs have now shipped capable coding models on aggressive timelines for three consecutive quarters. Each release in that cycle has come in at price points that pressure Western frontier pricing. Teams currently evaluating coding infrastructure for long-horizon agentic tasks should benchmark GLM-5.2 once the API becomes publicly available, particularly on repository-scale context tasks where the 1M window creates a meaningful capability gap versus models capped at 128K or 200K.

Reported by Z.ai (company announcement) on 2026-06-13.

Z.ai ships GLM-5.2 with a 1M-token context for full codebases

The morning brief for people inside the AI industry.

More in Models

Microsoft Extends Phi Silica to NVIDIA GPUs, Testing NPU Limits

OpenAI's GPT-Bidi-1 aims to fix voice mode's turn-taking problem

Qwen-RobotWorld makes plain language the control layer for robots