Poolside's Laguna XS 2.1 lifts SWE-bench score, loosens its license

The 33B-parameter coding model gains 5.4 points on SWE-bench Multilingual and moves to a permissive license as local, open-weight coding models multiply.

Alessandro Benigni

PUBLISHED JUL 3, 2026

3 MIN READ

Follow on Google

1 HR AGO

Poolside's Laguna XS 2.1 lifts SWE-bench score, loosens its license — featured image for AI Insiders

Poolside released Laguna XS 2.1 on July 2, an updated version of its XS.2 coding model that raises SWE-bench Multilingual performance by 5.4 points to 63.1%. The company kept the same underlying architecture as its predecessor: a 33 billion parameter Mixture-of-Experts model that activates roughly 3 billion parameters per token, aimed at agentic coding and long-horizon work meant to run on a developer’s own machine rather than a hosted cluster.

Poolside says the update also improves terminal-style task handling, though the SWE-bench Multilingual jump is the figure the company chose to lead with. The model is built to be used through a coding agent, and Poolside is pushing its own terminal-based agent, called pool, as the intended front end for it.

Deployment support is broad. Poolside lists compatibility with vLLM, SGLang, Nvidia’s TensorRT-LLM, Hugging Face transformers, and Ollama, with llama.cpp support still pending. The company is shipping three quantized checkpoints, in FP8, INT4, and NVFP4 formats, so developers with limited VRAM or compute can run the model locally, and it plans to add quantized GGUF checkpoints later. Alongside the checkpoints, Poolside is open-weighting DFlash speculator models tuned for each one, which it says roughly double the tokens generated per second during local inference.

Laguna XS 2.1 ships under OpenMDW-1.1, a permissive licensing framework for AI model weights that Nvidia and the Linux Foundation are building out. Poolside frames the switch as a move toward more open model distribution, describing OpenMDW as a step toward reducing the licensing friction that has slowed adoption of some open-weight releases.

Poolside’s benchmark claims come entirely from its own testing. The company ran evaluations through the Laude Institute’s Harbor Framework using its own agent harness, capped at 500 steps per task inside a sandboxed environment, with results averaged across multiple attempts per task. It compared Laguna XS 2.1 against six models, including Qwen3.6, Cohere’s North Mini Code, Microsoft’s MAI-Code-1-Flash, gpt-oss-120b, Claude Haiku 4.5, and GPT-5.4 Nano, but most of those comparison scores came from each vendor’s own release materials rather than an independent leaderboard. Poolside also disclosed running a post-hoc reward-hacking check on its results and reported no significant issues after manual review, a detail worth noting given how often agentic coding benchmarks get gamed by narrow task shortcuts.

The release lands as the open-weight coding tier increasingly competes on efficiency rather than raw size. Laguna XS 2.1 activates only 3 billion of its 33 billion parameters per token, which Poolside positions against far larger dense systems such as the 137 billion parameter MAI-Code-1-Flash, betting that sparse routing plus aggressive quantization wins local deployments where closed, hosted-only systems like Claude Haiku 4.5 and GPT-5.4 Nano cannot compete on cost or on-device control.

The weights are available on Hugging Face in BF16, FP8, NVFP4, and INT4 formats, and the model is also reachable through OpenRouter and Poolside’s own API at 256K context length, with paid pricing matched to XS.2: $0.10, $0.20, and $0.05 per million tokens for input, output, and cache reads respectively. Poolside is retiring XS.2 from its API within a week of the new release, though Baseten will keep hosting it for dedicated deployments.

Teams running local coding agents should benchmark the INT4 and NVFP4 checkpoints against whatever model they currently deploy before locking in infrastructure for the second half of 2026, especially if VRAM budget, not top-line accuracy, is the binding constraint.

Reported by Poolside on July 2, 2026.

Poolside's Laguna XS 2.1 lifts SWE-bench score, loosens its license

The morning brief for people inside the AI industry.

More in Models

ByteDance Seed's Model Card Puts Evaluation Design Before Benchmarks

Apple Recycles the Tokens Diffusion Language Models Throw Away

Meta's unreleased Watermelon model reportedly closes gap with GPT-5.5