NVIDIA open-sources LongLive 2.0 for real-time interactive video

NVlabs ships a full training and inference stack for long-form video generation, reaching 45.7 FPS on NVFP4 hardware and accepted at ICLR 2026.

Alessandro Benigni

PUBLISHED MAY 20, 2026

3 MIN READ

Follow on Google

MAY 20, 2026

NVIDIA open-sources LongLive 2.0 for real-time interactive video — featured image for AI Insiders

NVIDIA’s research division NVlabs released LongLive 2.0 on May 13, making the full training and inference codebase publicly available under an Apache 2.0 license at the NVlabs/LongLive GitHub repository. The release ships model weights, a parallel training framework, and production-grade inference tooling capable of generating long-form video from sequential text prompts in real time.

The practical gap this fills is significant. Teams building interactive video products today face a binary choice: call a closed API from RunwayML or Google Veo and accept the pricing and rate limits that come with it, or assemble a patchwork of open components that were not designed to work together for streaming, interactive generation. LongLive 2.0 is the first complete open framework (training code, distillation pipeline, and inference runtime included) that explicitly targets the interactive use case rather than single-clip generation.

What “streaming attention and KV-cache optimization” means in practice: standard video diffusion models regenerate context from scratch for every new prompt. LongLive instead keeps a compressed summary of previously generated video in a KV cache (a rolling memory of what was already computed), so each new user instruction only processes the delta rather than the full sequence. Attention sinks anchor that cache to stable reference frames, preventing drift over long sessions. The result is that a user can type sequential prompts and receive new video segments without the multi-second cold-start that full regeneration requires.

The performance numbers are concrete. The flagship LongLive-2.0-5B model running in BF16 precision hits 24.8 frames per second with a VBench quality score of 85.06. Enable NVFP4 quantization (a 4-bit floating point format available on current NVIDIA hardware) with two-step distillation and the same 5-billion-parameter model reaches 45.7 FPS, a near-doubling of throughput at a modest VBench cost of 83.14. The original LongLive-1.3B from the 2025 release remains available on the v1.0 branch for teams that need a smaller footprint.

The architecture builds on two open foundations: the Self-Forcing autoregressive training formulation and Wan2.2 video diffusion model components. That lineage matters for teams doing due diligence. LongLive is not a greenfield research artifact but an engineering layer on top of reproducible prior work.

Placed against the broader landscape, this sits at a different layer from Sora-class closed APIs. OpenAI has not released Sora weights or training code. Google has not released Veo internals. Both offer API access to outputs, not the ability to fine-tune on proprietary data, adjust the generation loop, or run inference on private infrastructure. LongLive 2.0 provides all three. The comparable open frameworks (Open-Sora, CogVideo, SANA-Video, which NVlabs also extended with LongLive in November 2025) support clip generation but not interactive sequential prompting at production frame rates out of the box.

The ICLR 2026 acceptance provides independent peer review of the core LongLive 1.0 methodology. The 2.0 release builds on that foundation rather than superseding it, adding NVFP4 quantization, balanced sequence parallelism for multi-GPU training, multi-shot video support, and async decoding.

The release announcement does not include independent third-party benchmark comparisons against RunwayML Gen-4 or Google Veo 3. The VBench scores are self-reported. Teams evaluating LongLive for production use should run their own quality assessments against domain-specific content before committing to an integration.

Teams currently building game-world simulators, interactive narrative tools, or training-data pipelines for embodied agents should benchmark LongLive 2.0 against their existing stack before their next infrastructure decision: the combination of full code ownership, 45 FPS throughput, and sequential prompt control is a capability profile that no closed API currently matches at any price.

Source: the NVlabs/LongLive GitHub repository (github.com/NVlabs/LongLive), with LongLive 2.0 released by NVIDIA’s NVlabs team on May 13, 2026.

NVIDIA open-sources LongLive 2.0 for real-time interactive video

The morning brief for people inside the AI industry.

More in Models

Anthropic Finds a Workspace for Deliberate Thought in Claude

Broadcom Locks In Apple Silicon Deal Through 2031

Tencent Ships Hy3, a 295B Open Model, Free Through July 21