The NVIDIA H200 is an extended-memory variant of the Hopper architecture, officially announced in November 2023 and shipping to customers in Q2 2024. It represents a significant memory upgrade over the H100, featuring 141GB of HBM3e memory with 4.8 TB/s of memory bandwidth - a 76% increase in capacity and 43% increase in bandwidth compared to the H100 SXM.
The H200 uses the same GH100 GPU die as the H100, manufactured on TSMC's custom 4N process node, containing 80 billion transistors. It retains all the architectural features of Hopper including 4th-generation Tensor Cores, the Transformer Engine with FP8 precision support, and 4th-generation NVLink with 900 GB/s bidirectional bandwidth.
The SXM form factor is designed for NVIDIA's HGX H200 baseboard, which supports 8-GPU configurations with full NVLink mesh connectivity. The H200 is a drop-in replacement for H100 SXM in existing DGX and HGX infrastructure, requiring only firmware updates. TDP remains at 700W, matching the H100 SXM thermal envelope.
Primary use cases include large language model inference where the expanded memory allows larger batch sizes and longer context lengths, reducing the need for tensor parallelism across multiple GPUs. The H200 can run models like Llama 2 70B at nearly double the throughput of H100 for inference workloads that are memory-bandwidth bound.