How much VRAM does the H200 NVL have?

The H200 NVL has 141 GB of HBM3e memory.

When was the H200 NVL released?

The H200 NVL was released in 2024 by NVIDIA, based on the Hopper architecture in the NVL form factor.

How much does it cost to rent the H200 NVL?

The H200 NVL rents for $3.62/hr at the cheapest marketplace, with a typical listing-weighted median of $3.81/hr across 3 marketplace partners. Updated daily.

Is the H200 NVL good for AI training or inference?

The H200 NVL delivers 989 FP16 TFLOPS (dense, no sparsity) with 141 GB of VRAM. Suited for large-model training and high-throughput inference.

Home/GPU Prices/H200 NVL/Specifications

NVIDIA · Hopper · 2024

H200 NVL
AIMC Specifications

Name: H200 NVL
Brand: NVIDIA
Availability: InStock

Complete technical reference: architecture, memory, performance, and live rental pricing.

Memory

141 GB

HBM3e

Form Factor

NVL

Datacenter

FP16 Compute

989

TFLOPS (dense)

Open Cost Calculator

Live Rental Pricing

Current market pricing across all authorized partners, updated daily.

Cheapest

$3.62/hr

Typical (median)

$3.81/hr

Marketplaces

3

See full marketplace breakdown for H200 NVL

Full Specifications

Factual specifications from manufacturer datasheets.

Manufacturer	NVIDIA
Architecture	Hopper
Memory Capacity	141 GB
Memory Type	HBM3e
Form Factor	NVL
Release Year	2024
GPU Class	Datacenter
FP16 TFLOPS (dense)	989
VRAM (compute)	141 GB

Architecture & Use Cases

Technical overview of the H200 NVL.

The NVIDIA H200 NVL is an NVLink-optimized variant of the H200, designed for deployment in dual-GPU configurations connected via an NVLink bridge. Announced alongside the H200 SXM in November 2023, the NVL variant began shipping in 2024 and provides a path to H200 memory capacity in PCIe-compatible server platforms.

The H200 NVL features the same 141GB of HBM3e memory and 4.8 TB/s bandwidth as the H200 SXM. The key differentiator is the form factor: while using a PCIe-style card design, it includes an NVLink connector that enables two H200 NVL cards to be connected with 600 GB/s bidirectional bandwidth, creating a unified 282GB memory pool for large model inference.

The dual-card NVL configuration provides an alternative deployment option for organizations that cannot use SXM-based systems like DGX or HGX. It allows existing PCIe server infrastructure to access Hopper's memory expansion without requiring specialized baseboards. TDP is approximately 600W per card.

The H200 NVL is particularly suited for inference workloads requiring large memory capacity but not the full 8-GPU scaling of SXM systems. Common deployments include inference servers running large language models, recommendation systems with large embedding tables, and scientific computing applications with large memory footprints.