How much VRAM does the L40S have?

The L40S has 48 GB of GDDR6 memory.

When was the L40S released?

The L40S was released in 2023 by NVIDIA, based on the Ada Lovelace architecture in the PCIe form factor.

How much does it cost to rent the L40S?

The L40S rents for $0.63/hr at the cheapest marketplace, with a typical listing-weighted median of $1.60/hr across 14 marketplace partners. Updated daily.

Is the L40S good for AI training or inference?

The L40S delivers 366 FP16 TFLOPS (dense, no sparsity) with 48 GB of VRAM. Suited for large-model training and high-throughput inference.

Home/GPU Prices/L40S/Specifications

NVIDIA · Ada Lovelace · 2023

L40S
AIMC Specifications

Name: L40S
Brand: NVIDIA
Availability: InStock

Complete technical reference: architecture, memory, performance, and live rental pricing.

Memory

48 GB

GDDR6

Form Factor

PCIe

Datacenter

FP16 Compute

366

TFLOPS (dense)

Open Cost Calculator

Live Rental Pricing

Current market pricing across all authorized partners, updated daily.

Cheapest

$0.63/hr

Typical (median)

$1.60/hr

Marketplaces

14

See full marketplace breakdown for L40S

Full Specifications

Factual specifications from manufacturer datasheets.

Manufacturer	NVIDIA
Architecture	Ada Lovelace
Memory Capacity	48 GB
Memory Type	GDDR6
Form Factor	PCIe
Release Year	2023
GPU Class	Datacenter
FP16 TFLOPS (dense)	366
VRAM (compute)	48 GB

Architecture & Use Cases

Technical overview of the L40S.

The NVIDIA L40S is an Ada Lovelace architecture datacenter GPU optimized for AI inference and training, announced in August 2023. It represents NVIDIA's response to demand for a more affordable alternative to H100 for inference workloads, combining Ada's efficiency improvements with datacenter-grade features.

The L40S uses the AD102 die (the same chip in RTX 4090 and RTX 6000 Ada) manufactured on TSMC's custom 4N process with 76.3 billion transistors. It features 48GB of GDDR6 ECC memory with 864 GB/s bandwidth. The chip includes 18,176 CUDA cores and 568 fourth-generation Tensor Cores.

A key differentiator from the L40 (non-S) is the enhanced Tensor Core configuration optimized for inference. The L40S includes an FP8 Transformer Engine similar to Hopper, enabling efficient large language model inference with automatic precision management. This makes it competitive with H100 for inference throughput at lower cost.

The dual-slot PCIe Gen4 x16 form factor has a 350W TDP, significantly lower than H100's 700W. Fourth-generation Tensor Cores support FP8, FP16, BF16, TF32, and INT8 operations. Primary use cases include LLM inference where H100's bandwidth isn't fully utilized, AI training at smaller scale, and combined AI/graphics workloads.