The NVIDIA H100 NVL is a specialized variant of the H100 designed for dual-GPU inference deployments, announced in early 2023. It uniquely features 94GB of HBM3 memory per GPU - 14GB more than the standard H100 SXM - optimized for large language model inference where memory capacity directly impacts batch size and throughput.
The H100 NVL uses a PCIe form factor with an NVLink bridge connector, allowing two cards to be connected with 600 GB/s bidirectional bandwidth. This creates a combined 188GB memory pool that appears as a single unified memory space for inference workloads. The NVLink connection enables efficient tensor parallelism without PCIe bottlenecks.
Each H100 NVL card has a TDP of approximately 400W, for 800W total in a dual-GPU configuration. The form factor is designed for standard server chassis without requiring HGX or DGX baseboards. Memory bandwidth is 3.9 TB/s per GPU, utilizing HBM3 technology for high-throughput inference.
The expanded 94GB memory configuration is specifically sized to fit popular large language models. A dual-GPU setup with 188GB total can run models up to approximately 70 billion parameters in FP16 or 140 billion parameters in FP8 without tensor parallelism overhead across multiple servers.