Three separate product teams shipped persistent memory for AI agents today. Meanwhile, two research papers revealed how the models underneath those agents actually work, and one lab bought its way deeper into the developer stack.
Agents Go Persistent: Skills, Schedules, and Saved State
Three platforms shipped persistent-memory features on the same day, a signal that stateless chat is no longer a viable product baseline.
- Cursor Trains a Coding Agent from Scratch to Beat the Inference Layer — Cursor did not fine-tune an existing model. It trained Composer 2.5 from the ground up using reinforcement learning and synthetic data, building an agent that understands multi-file edits natively rather than inferring them from prompts.
- xAI adds persistent Skills to Grok across web and mobile — Grok can now retain user-defined functions across every session, not just within a single conversation. The practical effect is a chatbot that behaves more like a configured automation layer than a stateless Q&A tool.
- Manus Adds Persistent Context to Its Scheduled Task Engine — Manus updated its scheduled task runner so that automated jobs carry project state between runs. Without persistent context, scheduled automation resets to zero each time.
- Lovable adds reusable Skills to cut repetitive prompt setup — Markdown-based instruction sets that load automatically into any project, removing a friction point that previously had to be re-typed on every session start.
What the Model Actually Does: Circuits, Mode-Switching, and Bad Benchmarks
Three research pieces this week pull back the curtain on how language models behave internally — and none of the findings are reassuring for teams that rely on current evaluation methods.
- Censorship in Qwen 3.5 Is a Removable Circuit, Not a Knowledge Gap — Interpretability researchers located the specific suppression mechanism inside Qwen 3.5-9B. The underlying knowledge is intact; the model simply has a circuit that blocks it.
- LLMs Switch Modes Mid-Training in Ways Optimizers Cannot Fix — Models oscillate between memorization and adaptive reasoning during pre-training, and standard optimization techniques have no reliable remedy.
- Static Benchmarks Cannot Measure Agents. What Comes Next? — As agents take on high-stakes tasks, evaluation frameworks built for static text completion are simply measuring the wrong thing.
Hardware Moves Up the Stack: CPUs, Robot Video, and Cheap Pretraining
NVIDIA shipped two distinct products at different points in the AI compute chain, while a separate lab collapsed the cost floor for foundation-model training.
- NVIDIA hand-delivers Vera CPUs to Anthropic, OpenAI, and Oracle — NVIDIA’s first custom CPU has landed at the three largest AI infrastructure buyers. Owning the CPU layer means NVIDIA can now control both sides of the data center equation at its top accounts.
- NVIDIA’s Cosmos Predict 2.5 Gains LoRA Fine-Tuning for Robot Video — Labs working on physical AI no longer need a multi-GPU cluster to adapt the base video model to their specific robot and task.
- A 1B model that costs $1,500 to train from scratch — Sapient’s HRM-Text puts foundation-model pretraining inside the budget of a solo researcher or a seed-stage team.
Platform Control: Anthropic Buys the Toolchain
Anthropic’s acquisition of Stainless is the kind of infrastructure deal that looks quiet until you map out who already depended on it.
- Anthropic acquires Stainless to bring SDK generation in-house — Stainless was already used by OpenAI, Google, and Cloudflare to ship their own SDKs. Anthropic now owns that layer, with direct control over the developer tooling that touches every team building on any of those APIs.
Today’s Quick Hits
- Alibaba’s Qwen3.7 enters Arena rankings at positions 13 and 16 — Two Qwen models inside the top 20 on Chatbot Arena, the clearest sign yet that a Chinese open-weight lab can match frontier closed-model performance.