Three major players shipped agent deployment frameworks today while researchers confirmed that the security layer underneath those frameworks does not yet exist, and the result is a widening gap between how fast organizations are putting agents into workflows and how far they can trust what those agents do.
Agents in the Workflow: Anthropic, IBM, and NVIDIA Each Claim a Piece of the Stack
Three separate agent deployments landed today, each targeting a different layer of the enterprise workflow. Together they move the contest to own agent infrastructure from demos to production.
- Anthropic’s Claude Tag Turns Slack Into a Shared Agent Layer. A new Slack-native feature lets an entire team delegate tasks to a single Claude instance that accumulates channel context over time, a shift from single-user copilot to shared organizational memory.
- IBM’s CUGA agent harness shifts governance from a retrofit to a default. IBM Research’s open-source Configurable Generalist Agent bakes policy controls and state management into the runtime before deployment rather than bolting them on after, when most governance failures happen.
- NVIDIA Builds an Agent Software Stack on Top of Its Chips. The Agent Toolkit bundles open models, a tool layer, and a sandboxed runtime into one enterprise package, positioning NVIDIA to capture software margin on top of the hardware it already sells.
The Trust Deficit: Deployment Outpaced Security Before the First Major Breach
The agents now shipping into production face a security layer that researchers say is structurally broken. Two technical papers and a federal policy standoff make the gap concrete.
- Gray Swan Is Building the Security Layer AI Labs Cannot Build Themselves. The Carnegie Mellon spinout red-teams frontier models for Anthropic and other labs, and its founders say the first major AI security breach is a matter of when, not if.
- Why prompt injection is really a failure of role perception. New research argues that models cannot reliably distinguish injected commands from their own reasoning, so detection-based defenses patch a symptom rather than the cause.
- Meta Is the Last Major US Lab Without a Federal AI Review Deal. The Trump administration is pressing Meta to submit its models for voluntary government evaluation, weeks after ordering Anthropic to pull a release over national security concerns.
The Inference Bill: Memory, Profiling, and Hardware as Competitive Variables
Serving costs are now overtaking training costs as the primary financial constraint for AI teams. Three developments target different parts of that equation.
- Engram raises $98M to separate memory from reasoning in AI. The 13-person startup claims its learned memory layer can match frontier performance while using up to 100 times fewer tokens, though no independent benchmarks exist yet.
- Graphsignal brings production inference profiling to every GPU in the stack. The open-source profiler gives teams continuous, low-overhead visibility across models, engines, and accelerators without capturing prompt content.
- AWS adds RTX PRO 4500 Blackwell GPUs to EC2 G7 instances for inference. NVIDIA and AWS put Blackwell-generation GPUs into a new G7 family, claiming up to 4.6x inference throughput over the prior G6 generation.
The Document Layer: Smarter Extraction and Unified Retrieval
Two infrastructure releases target the ingestion layer upstream of most RAG and agentic pipelines, both aimed at cutting the number of specialized tools a team has to run.
- Mistral’s OCR 4 Adds Bounding Boxes and 170-Language Support. The new extraction model ships structured output, confidence scores, and single-container self-hosting in a compact package built to slot into RAG pipelines without a separate document service.
- Fluree DB packs graph, vector, text, and geo search into one engine. The open-source database ships all four retrieval modes natively, removing the multi-tool stack most RAG pipelines depend on today.
Today’s Quick Hits
- ByteDance’s Seedance 2.5 Generates 30-Second 4K Video From One Prompt. The new model accepts up to 50 reference files and targets China first, extending ByteDance’s lead in a crowded domestic video market.
- Krea releases open-weight image models built to escape default aesthetics. Krea 2 trades the single polished output most diffusion models default to for a multi-stage pipeline built for broad stylistic exploration.
- Baidu’s Unlimited OCR parses dozens of pages in a single forward pass. Built on DeepSeek OCR, the open-source model uses a constant KV cache to sidestep context-length limits rather than extend them.
- Momentic ships autonomous QA platform as AI-generated bugs pile up. The testing startup rebuilds around a knowledge base and self-updating tests, targeting the gap between faster AI code output and slower human verification.