How much VRAM does the H100 NVL have?

The H100 NVL has 94 GB of HBM3 memory.

When was the H100 NVL released?

The H100 NVL was released in 2023 by NVIDIA, based on the Hopper architecture in the NVL form factor.

How much does it cost to rent the H100 NVL?

The H100 NVL rents for $1.00/hr at the cheapest marketplace, with a typical listing-weighted median of $3.11/hr across 5 marketplace partners. Updated daily.

Is the H100 NVL good for AI training or inference?

The H100 NVL delivers 989 FP16 TFLOPS (dense, no sparsity) with 94 GB of VRAM. Suited for large-model training and high-throughput inference.

Home/GPU Prices/H100 NVL/Specifications

NVIDIA · Hopper · 2023

H100 NVL
AIMC Specifications

Name: H100 NVL
Brand: NVIDIA
Availability: InStock

Complete technical reference: architecture, memory, performance, and live rental pricing.

Memory

94 GB

HBM3

Form Factor

NVL

Datacenter

FP16 Compute

989

TFLOPS (dense)

Open Cost Calculator

Live Rental Pricing

Current market pricing across all authorized partners, updated daily.

Cheapest

$1.00/hr

Typical (median)

$3.11/hr

Marketplaces

5

See full marketplace breakdown for H100 NVL

Full Specifications

Factual specifications from manufacturer datasheets.

Manufacturer	NVIDIA
Architecture	Hopper
Memory Capacity	94 GB
Memory Type	HBM3
Form Factor	NVL
Release Year	2023
GPU Class	Datacenter
FP16 TFLOPS (dense)	989
VRAM (compute)	94 GB

Architecture & Use Cases

Technical overview of the H100 NVL.

The NVIDIA H100 NVL is a specialized variant of the H100 designed for dual-GPU inference deployments, announced in early 2023. It uniquely features 94GB of HBM3 memory per GPU - 14GB more than the standard H100 SXM - optimized for large language model inference where memory capacity directly impacts batch size and throughput.

The H100 NVL uses a PCIe form factor with an NVLink bridge connector, allowing two cards to be connected with 600 GB/s bidirectional bandwidth. This creates a combined 188GB memory pool that appears as a single unified memory space for inference workloads. The NVLink connection enables efficient tensor parallelism without PCIe bottlenecks.

Each H100 NVL card has a TDP of approximately 400W, for 800W total in a dual-GPU configuration. The form factor is designed for standard server chassis without requiring HGX or DGX baseboards. Memory bandwidth is 3.9 TB/s per GPU, utilizing HBM3 technology for high-throughput inference.

The expanded 94GB memory configuration is specifically sized to fit popular large language models. A dual-GPU setup with 188GB total can run models up to approximately 70 billion parameters in FP16 or 140 billion parameters in FP8 without tensor parallelism overhead across multiple servers.