Agents Are In. The Infrastructure Is Not.

Three major players shipped agent deployment frameworks today while researchers confirmed that the security layer underneath those frameworks does not yet exist, and the result is a widening gap between how fast organizations are putting agents into workflows and how far they can trust what those agents do.

Agents in the Workflow: Anthropic, IBM, and NVIDIA Each Claim a Piece of the Stack

Three separate agent deployments landed today, each targeting a different layer of the enterprise workflow. Together they move the contest to own agent infrastructure from demos to production.

Anthropic’s Claude Tag Turns Slack Into a Shared Agent Layer. A new Slack-native feature lets an entire team delegate tasks to a single Claude instance that accumulates channel context over time, a shift from single-user copilot to shared organizational memory.
IBM’s CUGA agent harness shifts governance from a retrofit to a default. IBM Research’s open-source Configurable Generalist Agent bakes policy controls and state management into the runtime before deployment rather than bolting them on after, when most governance failures happen.
NVIDIA Builds an Agent Software Stack on Top of Its Chips. The Agent Toolkit bundles open models, a tool layer, and a sandboxed runtime into one enterprise package, positioning NVIDIA to capture software margin on top of the hardware it already sells.

The Trust Deficit: Deployment Outpaced Security Before the First Major Breach

The agents now shipping into production face a security layer that researchers say is structurally broken. Two technical papers and a federal policy standoff make the gap concrete.

Gray Swan Is Building the Security Layer AI Labs Cannot Build Themselves. The Carnegie Mellon spinout red-teams frontier models for Anthropic and other labs, and its founders say the first major AI security breach is a matter of when, not if.
Why prompt injection is really a failure of role perception. New research argues that models cannot reliably distinguish injected commands from their own reasoning, so detection-based defenses patch a symptom rather than the cause.
Meta Is the Last Major US Lab Without a Federal AI Review Deal. The Trump administration is pressing Meta to submit its models for voluntary government evaluation, weeks after ordering Anthropic to pull a release over national security concerns.

The Inference Bill: Memory, Profiling, and Hardware as Competitive Variables

Serving costs are now overtaking training costs as the primary financial constraint for AI teams. Three developments target different parts of that equation.

Engram raises $98M to separate memory from reasoning in AI. The 13-person startup claims its learned memory layer can match frontier performance while using up to 100 times fewer tokens, though no independent benchmarks exist yet.
Graphsignal brings production inference profiling to every GPU in the stack. The open-source profiler gives teams continuous, low-overhead visibility across models, engines, and accelerators without capturing prompt content.
AWS adds RTX PRO 4500 Blackwell GPUs to EC2 G7 instances for inference. NVIDIA and AWS put Blackwell-generation GPUs into a new G7 family, claiming up to 4.6x inference throughput over the prior G6 generation.

The Document Layer: Smarter Extraction and Unified Retrieval

Two infrastructure releases target the ingestion layer upstream of most RAG and agentic pipelines, both aimed at cutting the number of specialized tools a team has to run.

Mistral’s OCR 4 Adds Bounding Boxes and 170-Language Support. The new extraction model ships structured output, confidence scores, and single-container self-hosting in a compact package built to slot into RAG pipelines without a separate document service.
Fluree DB packs graph, vector, text, and geo search into one engine. The open-source database ships all four retrieval modes natively, removing the multi-tool stack most RAG pipelines depend on today.

Today’s Quick Hits

ByteDance’s Seedance 2.5 Generates 30-Second 4K Video From One Prompt. The new model accepts up to 50 reference files and targets China first, extending ByteDance’s lead in a crowded domestic video market.
Krea releases open-weight image models built to escape default aesthetics. Krea 2 trades the single polished output most diffusion models default to for a multi-stage pipeline built for broad stylistic exploration.
Baidu’s Unlimited OCR parses dozens of pages in a single forward pass. Built on DeepSeek OCR, the open-source model uses a constant KV cache to sidestep context-length limits rather than extend them.
Momentic ships autonomous QA platform as AI-generated bugs pile up. The testing startup rebuilds around a knowledge base and self-updating tests, targeting the gap between faster AI code output and slower human verification.