The NVIDIA L40S is an Ada Lovelace architecture datacenter GPU optimized for AI inference and training, announced in August 2023. It represents NVIDIA's response to demand for a more affordable alternative to H100 for inference workloads, combining Ada's efficiency improvements with datacenter-grade features.
The L40S uses the AD102 die (the same chip in RTX 4090 and RTX 6000 Ada) manufactured on TSMC's custom 4N process with 76.3 billion transistors. It features 48GB of GDDR6 ECC memory with 864 GB/s bandwidth. The chip includes 18,176 CUDA cores and 568 fourth-generation Tensor Cores.
A key differentiator from the L40 (non-S) is the enhanced Tensor Core configuration optimized for inference. The L40S includes an FP8 Transformer Engine similar to Hopper, enabling efficient large language model inference with automatic precision management. This makes it competitive with H100 for inference throughput at lower cost.
The dual-slot PCIe Gen4 x16 form factor has a 350W TDP, significantly lower than H100's 700W. Fourth-generation Tensor Cores support FP8, FP16, BF16, TF32, and INT8 operations. Primary use cases include LLM inference where H100's bandwidth isn't fully utilized, AI training at smaller scale, and combined AI/graphics workloads.