Cloud gaming runs full graphics workloads on remote GPUs and streams the rendered output to a thin client. The reference architecture: a powerful GPU runs the game at high settings, NVENC encodes the framebuffer at low latency (sub-20ms target), and the encoded stream travels over the internet to a client device that decodes and displays it.
Latency is the primary metric, not raw FPS. The full chain (input to display) must stay below 50-80ms for acceptable gameplay. NVENC's "Low Latency High Performance" preset and Reflex-style frame pacing are critical. VRAM matters for the games themselves: AAA titles at 4K commonly need 12-16 GB.
DLSS, FSR, and other upscaling technologies dramatically reduce render cost for the same output resolution. Cloud gaming services (GeForce Now, Boosteroid, Shadow) typically use RTX-class consumer or pro-line GPUs configured for low-latency rendering rather than maximum throughput.