Amazon Web Services is now offering EC2 G7 instances equipped with NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, expanding the inference tier that sits between general-purpose CPU instances and the HPC-grade H100 and H200 clusters. The announcement, published on the NVIDIA blog on June 24, positions G7 as a cost-aware path for teams that need GPU acceleration without committing to the largest, most expensive instance families.

The headline performance figure is a 4.6x improvement in AI inference throughput over G6 instances. That comparison comes from NVIDIA, not from an independent benchmark, and the baseline is its own prior-generation G6 lineup. The specific workloads driving the 4.6x number are not detailed in the announcement; inference performance varies substantially by model architecture, batch size, and precision, so the figure should be treated as a ceiling under favorable conditions rather than a general multiplier.

G7 instances support configurations from one to eight GPUs, with up to 256 GB of aggregate GPU memory, 700 Gbps of EFA-enabled networking, and up to 7.6 TB of local NVMe SSD storage. The instance type is accessible through AWS Deep Learning AMIs, Amazon EKS, ECS, and EMR, with SageMaker AI support listed as coming soon.

Beyond the compute tier, NVIDIA’s cuVS vector indexing library is now the default acceleration layer in Amazon OpenSearch Serverless. NVIDIA claims GPU-accelerated vector indexing runs up to 10 times faster at roughly one-quarter the cost of CPU-only configurations, again citing its own benchmarks. For teams running retrieval-augmented generation pipelines or semantic search at scale, the practical implication is that GPU acceleration no longer requires explicit configuration inside OpenSearch Serverless; it becomes the default path.

The partnership also includes AWS reaching NVIDIA Exemplar Cloud status for GB300 training workloads, a designation NVIDIA awards when a cloud provider’s deployment meets its reference-architecture performance thresholds on large-scale training runs.

The commercial context matters here. AWS has spent years developing its own AI silicon: Trainium for training and Inferentia for inference. Those chips undercut standard GPU instance pricing for high-volume, steady-state workloads. The G7 launch reinforces NVIDIA’s position on AWS even as AWS pursues silicon independence, suggesting the two companies have settled into a segmented arrangement where NVIDIA captures the flexible, multi-workload inference market while Trainium and Inferentia absorb the highest-volume, cost-optimized inference contracts.

For teams currently choosing between inference instance families, G7 occupies a specific niche: more GPU memory and bandwidth than G6, significantly lower cost than P4 or P5 HPC instances, and more flexibility than Inferentia, which requires model compilation. Teams running vision models, spatial computing workloads, or smaller LLMs in production should evaluate G7 directly against Inferentia2 on their own workloads before committing to either. NVIDIA’s 4.6x figure provides a directional signal, not a procurement decision.

Source: NVIDIA blog, published June 24, 2026, authored by Josiah Byers; original URL at blogs.nvidia.com.